A colloquium talk given by Dr. Saeedeh Shekarpour of Kno.e.sis Center at IBM Almaden Research Center, 10 Feb 2017.
Abstract:
Cognitive computing is being the prevalent term for referring a series of computing which mimics human brain. This emerging computing is expected from various disciplines such as natural language processing, information retrieval, inference systems and information extraction, etc. Using knowledge graphs, which jointly enhance semantics as well as the structure of data, is fundamental in promoting capabilities of your computing towards cognitive computing. In this talk, I share my experiences in using knowledge graph for various purposes such as question answering system, information extraction.
2. About me
Education
• 2010-2013: PhD student, AKSW Research
Group, Leipzig University, Germany
• 2014-2015: PhD/Postdocs, EIS Research
Group, Bonn University, Germany
• 2016-present: Postdocs, Knoesis Center, USA
2/10/2017
2
13. How can KG facilitate exploiting answer from several sources?
• Using interlinked datasets enables exploiUng informaUon
which are spread across diverse datasets.
• Horizontal search is applicable, decomposing quesUon is not
necessary.
2/10/2017
13
ntaining information
information, interac-
n Figure 1 the classes
ider are linked using
me are linked to drugs
d possible Disease
een Sider and Disea-
property. Note that
nt the properties be-
h the following three
mation: An example
gs used for Tubercu-
Diseasome, drugs for
d in Drugbank, while
nformation: An ex-
e query: “side e↵ect
ASTHMA”. Here the
obtained by joining
Drugbank (enzymes,
pansion: An exam-
aldecoxib”. Here the
d in Sider, however,
ia Sider.
roach is the first ap-
erlinked datasets by
Figure 1: Schema interlinking for three datasets i.e.
DrugBank, Sider, Diseasome.
Diseasome
Drug
Asthma
?v0
side effectsameAs
a
?v2 ?v3
Disease
Drug Side Effect
a a
a
?v1
enzyme
Enzymes
a
SiderDrugBank
Figure 2: Resources from three di↵erent datasets
Query: What are the side effects of drugs used for Tuberculosis?
Saeedeh Shekarpour, Axel-Cyrille Ngonga Ngomo, Sören Auer: QuesUon answering on
interlinked data. WWW 2013: 1145-1156
16. Query Expansion Task
Linguistic vs. Semantic Features for Query Expansion Task
• LinguisUc features from WordNet:
ü Synonyms: words having a similar meanings.
ü Hyponyms: words represenUng a specializaUon of the input.
ü Hypernyms: words represenUng a generalizaUon of the input.
• SemanUc Features from Linked Data:
ü Using owl:sameAs. And rdfs:seeAlso: using rdfs:seeAlso.
ü Using owl:equivalentClass and owl:equivalentProperty.
ü Following the rdfs:subClassOf or rdfs:subPropertyOf property.
ü Following the rdfs:subClassOf or rdfs:subPropertyOf.
ü Using skos:broader and skos:broadMatch.
ü Using skos:narrower and skos:narrowMatch.
ü Using skos:closeMatch, skos:mappingRela,on and skos:exactMatch.
2/10/2017
16
27. RQUERY Overview
I. Segment Genera,on: (1) TokenizaUon and stop word removal. (2) We generate all possible
segments which can be derived from q.
II. Segment Expansion: This module expands segments derived from the previous module using a
linguisUc the thesaurus using linguisUc features of WordNet as (1) synonyms (2) hypernyms.
III. Derived Word Valida,on: Each derived word is validated against the background knowledge
base.
IV. Detec,ng and ranking possible query rewrites: We aim at disUnguishing and ranking possible
query rewrites. We address the problem of finding the appropriate query rewrite by employing
a Hidden Markov Model (HMM) in three steps:
i. The state space is populated.
ii. TransiUons between states are established.
iii. Parameters are bootstrapped.
2/10/2017
27
RDF Knowledge Base
External Resources
RQUERY
WordNet
Segment
generation
Segment
expansion
Derived
word
validation
Detecting and
ranking query
rewrites model
construct
Input textual
query
Ranked list of
rewritten queries
28. Example – Part 1
2/10/2017
28
• Input Query: ‘What is the profession of bandleader?’
• Steps:
1) RQUERY derives and validates 10 words for the two given input keywords.
2) The state space is populated with all of these 10 validated words.
3) Then, all the transiUons between states are recognized and established.
band
leader
director
music
director
conductor
occupation
profession
line
business
vacation
job
Start
profession bandleader
Observation 1 Observation 2
29. Example – Part 2
4) Finally, we run the Viterbi algorithm, which is a dynamic programming approach for
finding the optimal path through a HMM. This algorithm discovers the most likely states
that the sequence of input keywords is observable through.
5) Thus, after running the Viterbi algorithm for the running query “profession of
bandleader”, the generated top-6 outputs are as follows:
2/10/2017
29
30. Methodology: Modeling by HMM
2/10/2017
30
j
• B : X ⇥ Y ! [0, 1] represents the emission matrix. Each
entry bi seg = P(seg|Si) is the probability of emitting
the segment seg from the state Si.
• ⇡ : X ! [0, 1] denotes the initial probability of states.
We define the basic problem as follows: the sequence
of input keywords q and the model are given, and the
problem is to find the optimal sequence of states qr =
(S1, S2, ..., Sm) which explain the given observation, i.e. in-
put query q(k1, ..., kn). Please note that there are possibly
multiple distinct sequences of states which the given input
query q is observable through, thus the aim is obtaining the
optimal one; formally as: = arg maxqr
{P(qr | q, )}.
P(qr | q, )} is the probability of observing the given query
q through the sequence of states qr. For computing the prob-
ability of any query rewrite qr, the model plays a role as a
constant parameter, thus we assume
P(qr | q, )} ⇡ P(qr | q) =) = arg max
qr
{P(qr | q)}
Assuming that qr is a sequence of states (S1...Sm) (please
(a
pr
(d
ob
parameters of our HMM. Formally, a HMM is a quintuple
= (X, Y, A, B, ⇡) where:
• X is a finite set of states. In our case, X equals the set of
the validated derived words W . In other words, each word
w 2 W forms a state.
• Y denotes the set of observations. Here, Y equals the set
of all segments 8seg 2 S derived from the input n-tuple
of keywords q.
• A : X ⇥ X ! [0, 1] is the transition matrix. Each entry
aij is the transition probability P(Sj|Si) from state Si to
state Sj.
• B : X ⇥ Y ! [0, 1] represents the emission matrix. Each
entry bi seg = P(seg|Si) is the probability of emitting
the segment seg from the state Si.
• ⇡ : X ! [0, 1] denotes the initial probability of states.
We define the basic problem as follows: the sequence
of input keywords q and the model are given, and the
problem is to find the optimal sequence of states qr =
(S1, S2, ..., Sm) which explain the given observation, i.e. in-
For instan
profess
from the s
Transitio
tween stat
We adopt
traditiona
RDF kno
co-occurr
scriptions
s
l
w1
(a)
predicat
32. Triple-based Co-occurence
where:
states. In our case, X equals the set of
d words W. In other words, each word
te.
f observations. Here, Y equals the set
eg 2 S derived from the input n-tuple
1] is the transition matrix. Each entry
probability P(Sj|Si) from state Si to
] represents the emission matrix. Each
seg|Si) is the probability of emitting
om the state Si.
otes the initial probability of states.
ic problem as follows: the sequence
and the model are given, and the
he optimal sequence of states qr =
h explain the given observation, i.e. in-
). Please note that there are possibly
ences of states which the given input
through, thus the aim is obtaining the
y as: = arg maxqr
{P(qr | q, )}.
obability of observing the given query
e of states qr. For computing the prob-
write qr, the model plays a role as a
us we assume
r | q) =) = arg max
qr
{P(qr | q)}
sequence of states (S1...Sm) (please
corresponds to the word wi). We ex-
qr | q) = P(S1...Sm | k1...kn). The
ng the keyword ki from the state Sj is
. As from a state Si either one or mul-
be observable, the number of states
o the number of keywords m <= n.
v property, the probability of reach-
observing the keyword kn is equal to
n | Sm). Thus, the equation (2) can be
Sm 1)⇤P(kn | Sm))⇤P(S1...Sm 1 |
extended further as:
profession, so the keyword profession is emitted
from the state associated with the word job.
Transitions between States. We define transitions be-
tween states based on the concept of co-occurrence of words.
We adopt the concept of co-occurrence of words from the
traditional information retrieval context and move it to the
RDF knowledge bases. Triple-based co-occurrence means
co-occurrence of words in literals found in the resource de-
scriptions of the two resources of a given triple:
s p o
l
w1
l
w2
(a) subject-
predicate.
s p o
l
w1
l
w2
(b) subject-object.
s p w2
l
w1
(c) subject-literal.
s p o
l
w2
l
w1
(d) predicate-
object.
s p w2
l
w1
(e) predicate-
literal.
s" p" o"
a"
c"
l"
‘w2’"
‘w1’"l"
(f) predicate-Type
of subject.
s" p" o"
l"‘w2’"
l"
‘w1’"
a"
c"
(g) predicate-Type of ob-
ject.
Figure 3: The graph patterns employed for recognising co-
occurrence of the two given words w1 and w2. Please note
that the letters s, p, o, c, l and a respectively stand for subject,
predicate, object, class, rdfs:label and rdf:class.
2/10/2017
32
33. Evaluation
ü Evalua,on Criteria: The goal of our evaluaUon is invesUgaUng posiUve as well as
negaUve impacts of the proposed approach by raising the following two
quesUons:
① How effecUve is the approach for addressing the vocabulary mismatch problem when
employing queries having a vocabulary mismatch problem?
② How effecUve is the approach for avoiding noise when employing queries
which do not have a vocabulary mismatch problem?
ü We employ Mean Reciprocal Rank (MRR)?
ü Benchmark: we use an evaluaUon test collecUon for schema-agnosUc query
mechanisms on RDF datasets (i.e. DBpedia) presented in ESWC 2015.
ü hgps://sites.google.com/site/eswcsaq2015/documents
2/10/2017
33
34. Evaluation
• Bootstrapping:
• Issue: Since we encounter a dynamic modeling meaning state space as well as issued
observaUon (i.e., sequence of input keywords) vary query by query. Thus, learning probability
values should be generic and not query-dependent because learning model probabiliUes for
each individual query is not feasible.
• Solu,on: Thus, we rely on bootstrapping, a technique used to esUmate an unknown
probability distribuUon funcUon. We apply three distribuUons (i.e., normal, uniform and
zipfian) to find out the most appropriate distribuUon.
2/10/2017
34
0.76
0.51
0.69
0.85
0.44
0.82
0.68
0.58
0.63
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Uniform Distribu9on Normal Distribu9on Zipfian Distribu9on
Mean Reciprocal Rank
All Queries Q1-10 Q11-20
35. Evaluation Results
0.00
0.20
0.40
0.60
0.80
1.00
Q12 Q15 Q18 Q20 Q21 Q24 Q29 Q31 Q40 Q51 Q54 Q65 Q70 Q76 Q78 Q84
Reciprocal Rank
HMM with Implicit Frequency HMM with Explicit Frequency n-gram Language Model
2/10/2017
35
0.00
0.20
0.40
0.60
0.80
1.00
Q2 Q3 Q5 Q8 Q10 Q16 Q22 Q34 Q37 Q46 Q48 Q49 Q50 Q58 Q59 Q63 Q64 Q69 Q85 Q91 Q93
Reciprocal Rank
HMM with Implicit Frequency HMM with Explicit Frequency n-gram model
Queries which do not have a mismatch problem
Queries which have a mismatch problem
38. Stream of News Headlines
2/10/2017
38
each individual headline tweet ti, so that the headline news knowledge base Khnews is
populated by the triples extracted from the stream of news headline tweets. Formally
the extraction task can be captured as T ! Khnews where T = {t1, t2, ..., tl} is the
stream of news headline tweets and Khnews is a set of triples (in the following, it is
presented that for a given tweet ti being mappable to a relation with n arguments, thus,
more than n + 1 triples are generated). We must address three main challenges: (1)
Creation of a background data model, (2) Relation recognition and entity extraction,
and (3) Pubishing the triples on Linked Open Data. We address the first two in this
paper and discuss the third one in a manuscript in preparation.
Publisher Date News Headlines Tweets
CNN
16/3/2016 no1. Michelle Obama tells #SXSW crowd: I will not run for president
26/2/2016 no2. Instagram CEO meets with @Pontifex to discuss "the power of images to unite people"
14/3/2016 no3. Chemical accident in Bangkok bank kills eight people
BBC
14/3/2016 no4. State elections were "difficult day," German Chancellor Angela Merkel says
10/3/2016 no5. Pope Francis visits Cuba and Mexico
24/2/2016 no6. Storms kill at least three in Virginia
NY Times
10/3/2016 no7. Obama and Justin Trudeau announce efforts to fight climate change
10/3/2016 no8. Pope to meet leader of Russian Orthodox Church for first time in nearly
10/3/2016 no9. 2 air force pilots from United Arab Emirates killed when warplane crashed over Yemen
Challenge 1: Background Data Model. The key question is “What is the background
data model (serving as the pivot) for extracting triples?” Contemporary approaches
to extracting RDF triples that encompass entities and relations use binary relations
[10,6,3]. In this regard, we divide the current triple-based extraction approaches into two
41. Background Data Model
the meet event is associated with entities with type of Participant and Topic
(i.e., topic discussed in the meeting). Considering the sample of tweets in Table ??, the
tweets no1, no4, no7 are instances of the event Communication with the mentions
tell, say, announce. The tweets no2, no5, no8 are instances of the event Meet
with the mentions meet, visit. The tweets no3, no6, no9 are instances of the event
Murder with the mention kill.
subclass
Generic Event
Communica3on Meet
Publisher
Published
By
subclass
xs:date
Murder
subclass
published
date
Loca3on
Time
occurredIn
occurredon
(a) SubClasses of Event
Meet
Par(cipant
Topic
about
A2ended
in
(b) Meet Class
Communica)on
Giver Addressee
Message
expressed
says
addressed
(c) Communication Class
Murder
Vic*m
cause
Killer
quan*ty
kills
killed
caused xs:string
xs:integer
expression
(d) Murder Class
Fig. 1: Subclasses of the Generic Event.
2/10/2017
41
44. Entity Extraction using Linguistic Analysis
2/10/2017
44
with Instagram CEO @Pon4fex the power to
xcomp
compound
case mark det
of images people to unite discuss
dobj
case
nmod mark
dobj
acl
Fig. 2: Dependency tree for the running example.
Definition 3 (Dependent Chunk of ROOT). Dependent Chunk of ROOT (DCR) is the
longest sequence of tokens of a given tweet that satisfies the following conditions: (i)
There is one token that is (directly) dependent on the root, and (ii) any other token
included in a given chunk is dependent on a token already within the given chunk.
Moreover, ROOT is an individual chunk.
Example 2 (Chunking Tweet). We chunk the running example based on the concept
of ROOT Dependent Chunk (RDC). Figure 3 shows the resulting chunks. Except for
the chunk of root (because root is an individual chunk), any other chunk has only one
token that is dependent on the root (only one outgoing arrow to the root) and other
tokens inside that chunk co-reference interior tokens (interior arrows). According to this
definition, the example tweet contains four individual chunks. For the chunk ‘Instagram
CEO’, only the token ‘CEO’ is dependent on the root and the other token ‘instagram’
is dependent on the interior token ‘CEO’.
meets
Instagram CEO With @Pon4fex
nsubj xcomp
compound
case mark det
To discuss the power of images to unite people
nmod
dobj
case
nmod mark
dobj
acl
ROOT
Chunk 1 Chunk 2 Chunk 4
Chunk 3
Fig. 3: Chunking the running example based on the concept of Root Dependent Chunk.
meets
with Instagram CEO @Pon4fex the power to
nsubj xcomp
compound
case mark det
of images people to unite discuss
nmod
dobj
case
nmod mark
dobj
acl
ROOT
Fig. 2: Dependency tree for the running example.
Definition 3 (Dependent Chunk of ROOT). Dependent Chunk of ROOT (DCR) is the
longest sequence of tokens of a given tweet that satisfies the following conditions: (i)
There is one token that is (directly) dependent on the root, and (ii) any other token
included in a given chunk is dependent on a token already within the given chunk.
Moreover, ROOT is an individual chunk.
Example 2 (Chunking Tweet). We chunk the running example based on the concept
of ROOT Dependent Chunk (RDC). Figure 3 shows the resulting chunks. Except for
the chunk of root (because root is an individual chunk), any other chunk has only one
token that is dependent on the root (only one outgoing arrow to the root) and other
tokens inside that chunk co-reference interior tokens (interior arrows). According to this
definition, the example tweet contains four individual chunks. For the chunk ‘Instagram
45. The best observed accuracy for Entity
Extraction Tasks
Entity F-measure Precision Recall
Communication 88.3 82.83 95.3
Giver 81.4 77 81.4
Addressee 73.9 72.1 73.9
Message 78 85.3 71.9
Meet 89.7 83.6 96.7
Participant 80.1 76.1 80.1
Topic 65.2 62.0 65.2
Murder 93.2 90.2 96.4
Victim 91.6 91.6 91.6
Killer 64.8 64.8 64.8
Cause 82.2 88.4 76.8
e best observed accuracy results for the entity extraction
2/10/2017
45
49. CEVO use case 1: Annotating Text
BBC Tweet#1 on 10/3/2016:
Obama and Justin Trudeau announce efforts to fight climate change.
NYT Tweet#2 14/3/2016:
State elections were "difficult day," German Chancellor Angela Merkel says.
CEVO:Communication
CEVO:Communication
2/10/2017
49