Advance Directives and Advance Care Planning: Ensuring Patient Voices Are Heard
Knowledge-infused AI
1. Knowledge-infused AI for Healthcare:
Role of Conceptual Medical Knowledge in
Improving Machine Understanding
Artificial Intelligence Institute
Manas Gaur
2. AI
Outline
Why do we need Knowledge Infusion ?
Let Me Tell You About Your Mental Health! :
Contextualized Classification of Reddit Post to DSM-5
Unsupervised Abstractive Summarization of Diagnostic
Mental Health Interviews
Semi-Deep Knowledge Infusion
Shallow Knowledge Infusion
5. Arachie, Chidubem, Manas Gaur, Sam Anzaroot, William Groves, Ke Zhang, and Alejandro Jaimes. "Unsupervised Detection of Sub-events in
Large Scale Disasters." arXiv preprint arXiv:1912.13332 (2019).
Unsupervised Detection of Sub-events in Large Scale
Disasters
AI
9. Probably Approximately Correct Learning
How do you know that a training set has a
good domain coverage?
Robust Classifier → Low Generalizability
Error
Consistent Classifier → Low Training Error
Confidence: More Certainty
(lower δ) means more number of
samples.
Complexity: More complicated
hypothesis (|H|) means more
number of samples
AI
10. PAC Learning to Knowledge Infusion
Challenge:
Existing ML
Models:
Infusion:
True Data
Distribution
Hypothesis Data
Distribution
AI
14. Benefit of Infusing Knowledge
Interpretability: Rules or Axioms that are constructed
from patterns learned by a machine learning model.
Traceability: If we can validate the correctness of rules
or axioms using a ground truth, we achieve traceability
Explainability: Interpretability + Traceability
Interpretability
Explainability
AI
15. Knowledge Infusion
Identification and Integration of Commonsense
knowledge for principled reasoning.
Identification: Finding relevant information at an
appropriate abstraction level in the Knowledge
Graph
Integration: Controlled content enrichment or
modification to reduce Impedance Mismatch in
learning
Benefit: Robustness is ensured
AI
16. Patient is a known case of non-Hodgkin’s lymphoma and
undergone three cycles of chemotherapy.
AI
17. Algorithmic possibilities and
limitations of AI System
AI
Teaching Materials
● Ontology
● Knowledge Graph
● Knowledge Base
● Lexicons
Teaching Materials form a conceptual framework
of interconnecting sets of domain-focused
concepts and relationships
Remove ambiguity and sparsity.
Drug Abuse Ontology
● Concepts (315)
● Relations (31)
● Instances (814)
18. Teaching Materials
Commonsense
Reasoning
Web Mining Knowledge-based Crowdsourcing
E.g. NELL, KnowItAll
E.g. ConceptNet,
OpenMind
Mathematical Informal Large-Scale
E.g. Situation
Calculus
E.g. LIWC, Scripts E.g. CYC, DBpedia
AI
19. Knowledge Infusion in Healthcare
3 Challenges
Abstraction
Contextualization
Personalization
Shallow Infusion
Shallow and Semi-Deep Infusion
Shallow, Semi-Deep, and Deep
Infusion
AI
20. Abstraction : Medical Entity Normalization
I am sick of loss,
need a way out
No way out,
I am tired of my losses
Losses, Losses, I want to die
SuicideDepression
Suicide Depression Suicide
Depression
depress, suicide ideation suicide ideation, depress Depress, suicide attempt
AI
21. Teaching Material: Suicide Severity Lexicon
Suicide Risk Class Number of
Entities
Sample Medical Phrases
Suicide Indicator 1472 Severe mood disorder with
psychotic feature;
Severe major depression;
Family history of suicide;
Sedative
Suicide Ideation 409 Bipolar affective disorder;
Borderline Personality;
Depressive conduct disorder;
Sexual maturation disorder
Suicide Behavior 145 Suicidal behavior;
Intentional self-harm;
Incomplete attempt;
Threatening suicide
Suicide Attempt 123 Attempt actual suicide;
Attempt physical damage;
Intensive care;
Second-degree burns
Suicide by Hanging
[SNOMED ID: 287190007]
<child of> Suicide
[SNOMED ID:44301001]
<sibling of> Drug Overdose
[SNOMED ID:274228002]
<sibling of> Personal history
of self-harm [ICD-10 ID:
Z91.5]
<sibling of> Severe depressive
episode psychotic symptoms
[ICD-10 ID: F32.3]
AI
22. Contextualization
I dont think Ive thought
about it every day of my
entire life. I have for a good
portion of it, however, my
boyfriend may be able to
determine whether I’m worth
his time
Outcome : Suicide Indication
Having a plan for my own
suicide has been a long time
relief for me as well. I more
often than not wish I were
dead.
I dont think Ive thought about
it every day of my entire life. I
have for a good portion of it,
however, my boyfriend may
be able to determine whether
I’m worth his time
Outcome : Suicidal Ideation
AI
24. Personalization
refers to future course of action by taking into account the contextual factors such as user’s health
history, physical characteristics, environmental factors, activity, and lifestyle.
Without
Contextualized
Personalization
With
Contextualized
Personalization
Chatbot with contextualized
(asthma) knowledge is
potentially more personalized
and engaging.
AI
25. Let Me Tell You About Your Mental Health! :
Contextualized Classification of Reddit Post to
DSM-5
Gaur, Manas, Ugur Kursuncu, Amanuel Alambo, Amit Sheth, Raminta Daniulaityte, Krishnaprasad Thirunarayan, and Jyotishman Pathak. "Let
me tell you about your mental health!: Contextualized classification of reddit posts to dsm-5 for web-based intervention." In Proceedings of the
27th ACM International Conference on Information and Knowledge Management, pp. 753-762. ACM, 2018.
AI
27. Motivation
People (clinician and patient)
● Social Anxiety in patient’s face to face conversation
with Mental health Professional
● Poor recall rate of the patient
● Poor understanding of patient’s behavior
Data
● Clinical data is time-limited.
● Twitter data is short and not categorized
● Reddit data is long and categorized
● Reddit categorization does not overlap with Clinician
AI
29. Challenge
➢ How can we use Reddit for psychiatric diagnosis?
○ Is it possible to map Subreddits to Diagnostic
Statistical Manual for Mental Health ?
○ If yes, can we build a learning algorithm for
classifying the user on social media to appropriate
DSM-5 category for suitable diagnosis?
AI
30. 2013, 5th Edition Diagnostic and Statistical Manual of Mental Disorders (DSM-5) is a
psychiatric bible that can cure 46.4% of adult US population suffering from Mental Illness.
Redditors conversing on Alcohol Abuse, Caffeine Intoxication can be mapped to DSM-5
category: Substance-use and Addictive Disorder
There are 21 Diagnostic categories of which 20 are specific to Mental Health
Background on DSM-5
AI
31. Examples
I know you want me to say no and that it is a part of
me blah blah blah. But I can't. Honestly, not having
bipolar disorder would be a huge blessing. I would
be so much happier and could control my life better. I
wouldn't have frantic, scattered thoughts and
depression. I would be normal, happy, and less
dramatic.
Depressive Disorders
Post from Bipolar Subreddit:
DSM-5 Chapter:
Upon additional research, zolpidem (ambien) has a
half-life of 2-3 hours, and so if he’s still awake, he’s
either got a massive tolerance for this stuff or he’s
really trolling.
Suicidal Behavior/Ideation Disorders
Post from Suicidewatch Subreddit:
DSM-5 Chapter:
AI
32. Dataset
2005-2016
550K Users
8 Million Conversations
15 Mental Health
Subreddits
2005-2016
270K Users
( Only Authors of
Main Posts)
3 Million
Conversations (Main
Posts Only)
15 Mental Health
Subreddits
AI
33. Reddit to DSM-5 Mapping
Medical Knowledge Bases
N-grams
(n=1, 2, 3)
LDA
LDA over
Bi-grams
Normalized
Hit
Score
DSM-5
Lexicon
<Reddit Post>
<Subreddit Label>
Input
<Reddit Post>
<DSM-5 Label>
Output
DAO
Drug
Abuse
Ontology
AI
34. ● Topics describing each subreddits are identified through:
○ Skip Gram model to generate n-grams
○ LDA over individual subreddits
○ LDA over bigrams of individual subreddits
● Relevant topics were identified constraining through Topic
Coherence measure.
● We utilize UCI topic coherence model which is Pointwise
Mutual Information.
Language Modeling and Coherence
AI
36. We have computed the Normalized Hit Score (nhs) between
LDA topics of each subreddit (S) and the DSM-5 lexicon (D) to
infer their corresponding DSM-5 category.
Normalized Hit Score
AI
38. BiPolar
Depression Disorder
Subreddits DSM-5 Chapter:
BiPolarReddit
BiPolarSOS
Depression
Addiction
Substance use & Addictive Disorder
Crippling Alcoholism
Opiates Recovery
Opiates
Self-Harm
Stop Self-Harm
Mapping Example
AI
39. SEDO
Semantic Encoding and Decoding Optimization. It is a
procedure to modulate word embedding (vectors) of a word.
Reddit with
DSM-5 labels
Word
Embedding
Model
Correlation Matrix
(Q)over word
vectors
Medical
Knowledge Bases
Domain
Experts
Correlation
Matrix (P)
over DSM-5
Lexicon or DAO
SEDO
Optimiz
e P, Q &
Z
DSM-5 Lexicon
DSM-5
Vocabulary
Matrix
Word-modulated
Word
Embeddings
DSM-5
Classification
Cross Correlation
Matrix (Z)
between word
vectors and DSM-
5 Lexicon or DAO
Linguistic
Features
DAO
Architecture
AI
40. We have infused background knowledge in DSM-5-DAO
to classification process utilizing SEDO.
We introduce SEDO as an approach for obtaining a
discriminative weight matrix between the DSM-5
lexicon and Reddit embedding space
SEDO modulates the embeddings of each word in the
Reddit content of the user based on proximity of the
word to DSM-5 category.
Correlation Matrix
(Q)over word vectors
Correlation Matrix
(P)
over DSM-5
Lexicon or DAO
SEDO
Optimiz
e P, Q &
Z
Cross Correlation
Matrix (Z)
between word
vectors and DSM-5
Lexicon or DAO
Semantic Encoding and Decoding Optimization
AI
41. 12808
Words
300 dimension embedding 300 dimension embedding
20 DSM-5
Categories
R
D
Reddit Word
Embedding
Model
DSM-5 -DAO
Lexicon
W
Solvable Sylvester Equation
Semantic Encoding and Decoding Optimization
AI
42. Encoding DSM-5 to Reddit embedding space
Decoding Reddit to DSM-5 embedding
space
Semantic Encoding and Decoding Optimization
AI
44. Unsupervised Abstractive Summarization of
Diagnostic Mental Health Interviews
Gaur, Manas, Vamsi Aribandi, Ugur Kursuncu, Amanuel Alambo, Krishnaprasad Thirunarayan, Jonathan Beich and Amit Sheth. "Unsupervised Abstractive Summarization of
Diagnostic Mental Health Interviews", under review in The Web Conference 2020
AI
45. ● Mental Health Professionals are involved in interactive and
note-taking activities, which negatively affect the decision
making:
○ lowering empathy towards the patient,
○ accompanied by mistrust due to social stigma and
therapeutic pessimism, and
○ distracting from capturing relevant information,
● Thus thwarting a learned follow-up procedure.
● The proposed research utilizes an infusion of Knowledge
in an Abstractive Summarization framework (PHQxAS).
● The framework summarizes long conversations (58-60
sentences) in 7-8 sentences
Motivation
AI
46. Dataset
● The Distress Analysis Interview Corpus Wizard-of-Oz
(DAIC-WoZ) interviews database consists of clinical
interviews designed to support the diagnosis of psychological
conditions such as anxiety, depression, and post-traumatic
stress disorder.
● It contains data from 189 interviews, generally 7-33 minutes
long, with an average length of 16 minutes.
● The interviews were conducted by a virtual interviewer which
is controlled by a human in another room.
● 5 out of the 189 interviews have been excluded for this
study as they have imperfections in the data collection
or transcription process.
● We further filtered the interview scripts based on
subjectivity, polarity, and entropy analysis.
AI
48. ● Identification of relevant utterances from interview transcripts
using PHQ-9 Lexicon.
● Generation of a semantic similarity score for a word to assess
its relevance to mental issue.
● We do it by retrofitting ConceptNet embedding with the
PHQ-9 Lexicon.
● Let c(wi) be the maximum cosine similarity score between a
word wi in ConceptNet (V vocab size) and PHQ-9 Lexicon.
Word Semantic Score (WSS) of any word wt is calculated as:
AI
Our Approach
49. ● Improvement of generated summaries using linguistic quality
measure (LQ).
● LQ formulation uses WSS(wt), so that more domain-relevant
terms appear in summaries.
● Unification of our modification into an Integer Linear
Programming (ILP) Framework, which optimizes
Informativeness (I) and LQ.
● The ILP framework intrinsically constructs a Word Graph with k
paths (Pk) and tries to maximize the I(Pk) and LQ(Pk).
● TextRank is used to measure
informativeness.
● A language model is used to
evaluate linguistic quality.
● To select the best path, both
measures are incorporated to
formulate an optimization problem.
● This optimization problem is solved
through an ILP framework.
Our Approach
AI
50. We compare our approach with state-of-the art summarization techniques:
● Extractive Summarization (ES) : Greedily identifies important utterances from interview
scripts and produce a summary. It fails to gather context in the conversation.
● Abstractive Summarization (AS) : Examines and Interpret the interview scripts to
generate more contextualized summaries. It fails to gather domain knowledge.
● Abstractive over Extractive (AOES): ES is efficient in filtering out non-informative
sentences which can help AS to generate more coherent summaries.
● Knowledge Infused AS (KIAS) (Our approach): Existing approaches do not consider
domain knowledge, important to end user. AS and ES tend to lose important pieces of
information as explained in illustrated summaries.
Since, there are no ground truth summaries on clinical diagnostic interviews, we considered the
interview transcripts for evaluation.
AI
Baselines
51. KL Divergence Based Evaluation
● Median KL divergence score for different summarization
approaches and PHQxAS over 184 patient summaries.
● Median KL explains the amount of information lost in
summarization and is insensitive to outlier summaries.
● As the ``number of topics (NTopics)'' increases, LDA
tends to identify topics which are specific and rare. As a
result median KL tends to increase and summaries starts
to diverge from conversation.
● Our approach still sets the lower bound by
generating summaries close to pruned
conversations.
● The number of topics were restricted to 7 because of the
length of the interviews per patient.
AI
52. The plot illustrates KL scores
of those patient summaries
where our approach
marginally outperforms with
state-of-the art with a median
KL of 0.48.
The plot illustrates KL
scores of those patient
summaries where our
approach significantly
outperformed the state-of-
the art summarization
approaches with a median
KL of 0.2.
53. Domain Expert Based Evaluation
● Questions with Unclear Context: The questions
interpreted and phrased by the summarizer are
essential to an MHP, but they require some
inferencing by an MHP for apprehension.
● For example: Participant was asked, when was the
last time that happened?, where the referent of
"that" is unclear.
● Questions with Clear Context: These are the
questions that are useful to an MHP as they are
complete and no inferencing is required on the part
of MHP.
● For example: Participant was asked, did they ever
suffer from PTSD?
● Meaningful Response: We consider a
response as significant if it is useful to an MHP
to understand patient behavior, or it matches
well with the question being asked by the
MHP.
AI
55. ● Valiant, Leslie G. "Robust logics." Artificial Intelligence 117.2 (2000): 231-253.
● Banerjee, Siddhartha, Prasenjit Mitra, and Kazunari Sugiyama. "Multi-document abstractive summarization using ilp
based multi-sentence compression." In Twenty-Fourth International Joint Conference on Artificial Intelligence. 2015.
● Nikhil Priyatam, Sangameshwar Patil, Girish Palshikar, and Vasudeva Varma, Medical Concept Normalization by
Encoding Target Knowledge, In NIPS ML4H Workshop, 2019
● Kapanipathi, Pavan, Veronika Thost, Siva Sankalp Patel, Spencer Whitehead, Ibrahim Abdelaziz, Avinash Balakrishnan,
Maria Chang et al. "Infusing Knowledge into the Textual Entailment Task Using Graph Convolutional Networks." arXiv
preprint arXiv:1911.02060 (2019).
● Kursuncu, Ugur, Manas Gaur, and Amit Sheth. "Knowledge Infused Learning (K-IL): Towards Deep Incorporation of
Knowledge in Deep Learning." arXiv preprint arXiv:1912.00512 (2019).
● Kim, Jinkyu, and John Canny. "Interpretable learning for self-driving cars by visualizing causal attention." In Proceedings
of the IEEE international conference on computer vision, pp. 2942-2950. 2017.
● Yang, Bishan, and Tom Mitchell. "Leveraging knowledge bases in lstms for improving machine reading." arXiv preprint
arXiv:1902.09091 (2019).
References
AI
57. In Reddit conversations
can be:
● Main Posts
● Comments
● Replies
Not all the conversations
are informative.
Pw is probability of occurrence of a
word w in a Reddit main post file,
UWS is the set of unique words in S,
and |UWS| is total number of unique
words in a subreddit S.
58. Number of Definite Articles : Tells about the abstractness
of the content. Higher value means personal communication
Number of Words Per Post : Defines descriptiveness of
the content.
First Person Pronouns: Higher use of first person
pronouns defines social anxiety, distress, interpersonal
problems etc.
Number of Pronouns : Depressed users use significantly
more first person singular pronouns then second or third
person.
Subordinate Conjunction : Rational thought process
Horizontal Linguistic Features
59. Number of POS tags : Noun, Verb, and Adjective
Similarity between the posts: detect gradual or
abrupt drifting of topics.
Intra-Subreddit Similarity: defines the similarity
between the users within a subreddit.
Inter-Subreddit Similarity: defined as an average
similarity between a user in a subreddit A and all
other users in other subreddits.
Vertical Lingusitic Features
60. Sentiment Scores: We used AFINN lexicon which is an
evaluation of word list for sentiment analysis in informal text.
Emotion Scores: We used LabMT, a word list that score
happiness of a corpus. Developed over Twitter, Google
Books, and New York Times.
Readability Scores: Using Flesch-Kincaid readability index
to score the content of user suffering from mental illness.
Fine-Grained Features
61. ● Contextual Features: These features defines the context of the user-content.
○ Word Embedding Model : Trained over 3 Million posts from 15 subreddits using
varying window sizes (2,5,10), varying frequency (2 and 5), Skip Gram and
softmax configuration.
○ Linguistic Inquiry and Word Count: psycholinguistic words defining mental state
of the person through written samples. E.g. Worried, Fearful, nervous maps to
Anxiety
○ TF-IDF: Define the importance of the word in a document (subreddit).
● Contextual features with modulation: Since word embedding model ignores
importance of the words, tf-idf scores can help classification by strongly distinguishing
important word over other.
Contextual Features with/without Modulations
62. Legend Method
B1 RF (Baseline)
B2 Baseline + SMOTE
B3 BRF - TF-IDF
R1 BRF Contextual Features (CF)
R2 BRF-CF with TF-IDF
R3 BRF - LIWC Features
R4 BRF - Twitter Word Embedding
O1
BRF - CF (SEDO Weights generated from DSM-5 Lexicon
without DAO)
O2
BRF - CF (SEDO Weights generated from DSM-5 Lexicon with
DAO without Slang Terms)
O3
BRF - CF(SEDO Weights generated from DSM-5 Lexicon without
DAO with Slang Terms)
O4
BRF- Contextual Features(SEDO Weights generated from DSM-
5 Lexicon with DAO and Slang Terms)
Model and Annotator Agreement:
84%
Editor's Notes
Right side of the slides: Make an image of medical graph -- connecting SNOMED-CT with ICD-10, UMLS, SIDER, MedDRA, Datamed
AAAI Logo and Title of the paper
Noise Reduction through KNowledge infusion
KNowledge reduce the number of samples
Decision are in binary--- human are in agreement
Decision are more than binary ----- human may not have complete agreement
<<A KG (or Ontology) schema is designed by domain experts. It is populated from a representative DB (sets of instances). A KG has very large number of instances (mapping to # of training examples).>>
**** The complexity of annotation would directly map to error rate and complexity comes from how many decision points are there. Is it on a simple matter versus mental health versus particle physics versus human psychology
Use the knowledge can help reduce the complexity of annotation time and reduce the true error
Example from Knowledge will propel machine understanding
CSCW ---- important for infusion of knowledge
Each new sample may not add new value to training the model, it may only be reinforcing the model.
How about use the background knowledge to make the model understanding what new information it requires to learn rather than doing randomly.
Explain the slides 5-6 with this equation. (important)
Needs slides on Interpretability and Explainability (text) ----- paper from gary marcus on interpretability and explainability
Influential papers people should look at the end of the slides deck
Noise Reduction through KNowledge infusion
KNowledge reduce the number of samples
Decision are in binary--- human are in agreement
Decision are more than binary ----- human may not have complete agreement
<<A KG (or Ontology) schema is designed by domain experts. It is populated from a representative DB (sets of instances). A KG has very large number of instances (mapping to # of training examples).>>
**** The complexity of annotation would directly map to error rate and complexity comes from how many decision points are there. Is it on a simple matter versus mental health versus particle physics versus human psychology
Use the knowledge can help reduce the complexity of annotation time and reduce the true error
Example from Knowledge will propel machine understanding
CSCW ---- important for infusion of knowledge
Each new sample may not add new value to training the model, it may only be reinforcing the model.
How about use the background knowledge to make the model understanding what new information it requires to learn rather than doing randomly.
How do ensure consistency in learning when labels are not binary?
Do the labels represent adequate semantics (domain knowledge) ?
We provide definition of knowledge infused learning
Provide application domain and examples where “knowledge infused learning would do good”
Robust ML/DL model that generalized well (minimum generalization error)
Intradomain tasks
Interdomain task (transfer learning)
Robust model requires:
Better logics and domain knowledge
Learn functions to these in existing architecture
ML/DL model learns contextual features with minimum training samples
Correctness of training data (examples of measures)
Completeness of training data (examples of measures)
Deep Knowledge Infusion
Define Principled Reasoning
Define CommonSense or Unaxiomatized knowledge
Define Impedance mismatch
Identification and Integration of commonsense or unaxiomatized knowledge for principled reasoning
Extracting relevant information at an appropriate abstraction level that would assist statistical AI methods in reducing impedance mismatch
Thus conclusions derived are justified through interpretability and traceability over the stored knowledge.
Robustness is ensured and brittleness avoided by means of a large-scale continuous process of learning
Providing information at varying level of abstraction which allow relevance-based contextualization.
Commonsense knowledge
Learning un-axiomatized information
Axiomatized information is partial
Taxonomic knowledge to AI
Taxonomic knowledge to virtual assistant
We wont be covering personalization
(P1) I am sick of loss and need a way out ; (P2) No way out, I am tired of my losses; (P3) Losses, losses, I want to die.
Contextualization handles data sparsity and allows creation of richer representation
Contextualization handles data sparsity and allows creation of richer representation
Normalized Entropy based filtering of the dataset
Result image
How each of the features help? How we calculated ? Why Readability is important ? from content as well as social media platform What do you use to extract these features?
Those with symptoms of depression use significantly more first person singular pronouns – such as “me”, “myself” and “I” – and significantly fewer second and third person pronouns – such as “they”, “them” or “she”.
How each of the features help? How we calculated ? Why Readability is important ? from content as well as social media platform What do you use to extract these features?
How each of the features help? How we calculated ? Why Readability is important ? from content as well as social media platform What do you use to extract these features?