Using Knowledge Graph for Promoting Cognitive Computing

Using Knowledge Graph for
Promoting Cognitive
Computing
Presenter: Dr. Saeedeh Shekarpour
2/10/2017
1

About me
Education
•  2010-2013: PhD student, AKSW Research
Group, Leipzig University, Germany
•  2014-2015: PhD/Postdocs, EIS Research
Group, Bonn University, Germany
•  2016-present: Postdocs, Knoesis Center, USA
2/10/2017
2

About me
Research Interest
6+ years experience in research in the following direcUons:
•  Previously:
•  QuesUon Answering Systems, SemanUc Search.
•  Linked Data and SemanUc Web Technologies.
•  StaUsUcal classiﬁer models (e.g. HMM).
•  Ontology Development.
•  Natural Language Processing.
•  Currently:
•  InformaUon ExtracUon and Knowledge graph CreaUon.
•  Mining Social Network.
•  Experiencing Deep Learning.
2/10/2017
3

About me
Selected Publications
•  Saeedeh Shekarpour, Edgard Marx, Sören Auer, Amit Sheth:
RQUERY: Rewri,ng Natural Language Queries on Knowledge Graphs to
Alleviate the Vocabulary Mismatch Problem. AAAI 2017
•  Saeedeh Shekarpour, Axel-Cyrille Ngonga Ngomo, Sören Auer:
Ques,on answering on interlinked data. WWW 2013: 1145-1156
•  Andreas Both, Dennis Diefenbach, Kuldeep Singh, Saeedeh Shekarpour,
Didier Cherix, Christoph Lange: Qanary - A Methodology for Vocabulary-
Driven Open Ques,on Answering Systems. ESWC2016: 625-641
•  Saeedeh Shekarpour, Sören Auer, Axel-Cyrille Ngonga Ngomo, Daniel
Gerber, SebasUan Hellmann, Claus Stadler: Keyword-Driven SPARQL Query
Genera,on Leveraging Background Knowledge. Web Intelligence 2011:
203-210
2/10/2017
4

About me
Selected Publications
•  Saeedeh Shekarpour, Konrad Höﬀner, Jens Lehmann, Sören Auer: Keyword
Query Expansion on Linked Data Using Linguis,c and Seman,c Features.
ICSC 2013: 191-197
•  Saeedeh Shekarpour, Edgard Marx, Axel-Cyrille Ngonga Ngomo, Sören Auer:
SINA: Seman,c interpreta,on of user queries for ques,on answering on
interlinked data. J. Web Sem. 2015

2/10/2017
5

Outline

²  IntroducUon
²  Part 1: Vision
² Advantages of using Knowledge Graph in
ü  QuesUon Answering
ü  Machine Learning
ü NLP
ü  InformaUon Retrieval
²  Part 2: Research in depth
ü RQUERY: RewriUng Natural Language Queries on Knowledge Graphs
to Alleviate the Vocabulary Mismatch Problem
ü HeadEX: Triple ExtracUon from Stream of News Headlines on Twiger
using n-ary RelaUons
2/10/2017
6

Prevalence of using KG
•  Google knowledge graph
•  IBM Watson
•  Using knowledge graph in smart phone
ü  Google Now

2/10/2017
7

2/10/2017
8
The growth of Linked Open Data
EIS research group - Bonn University
8
January 2017
2973 Datasets
More than 140 billion triples
May 2007
12 Datasets
7 January 2015

Outline

²  IntroducUon
ü NLP
2/10/2017
9

SINA Architecture
2/10/2017
10
Client
Query
Preprocessing
Query Expansion
Resource Retrieval
Disambiguation
Query Construction
Representation
Server
Underlying Interlinked
Knowledge Bases
query result
keywords
valid segments
mapped resources
tuple of
resources
SPARQL
queries
OWL API
hgp client
Stanford
CoreNLP
Segment Validation
Reformulated query
Saeedeh Shekarpour, Edgard Marx, Axel-Cyrille Ngonga Ngomo, Sören Auer: SINA: Seman,c interpreta,on of user queries
for ques,on answering on interlinked data. J. Web Sem. 30: 39-51 (2015)

2/10/2017
11
Objective: Transformation from Textual
Query to formal Query
Which televisions shows were created by Walt Disney?
7 January 2015 EIS research group - Bonn University
11
SELECT * WHERE
{ ?v0 a dbo:TelevisionShow.
?v0 dbo:creator dbr:Walt_Disney. }
1
2
3

How can KG facilitate exploiting answer from several sources?
2/10/2017
12
•  TradiUonal QA systems window a porUon of text and try to
exploit answer from there.
•  ExploiUng answers from diﬀerent sources requires
decomposing quesUon.

Query: What are the side eﬀects of drugs used for Tuberculosis?

How can KG facilitate exploiting answer from several sources?
•  Using interlinked datasets enables exploiUng informaUon
which are spread across diverse datasets.
•  Horizontal search is applicable, decomposing quesUon is not
necessary.

2/10/2017
13
ntaining information
information, interac-
n Figure 1 the classes
ider are linked using
me are linked to drugs
d possible Disease
een Sider and Disea-
property. Note that
nt the properties be-
h the following three
mation: An example
gs used for Tubercu-
Diseasome, drugs for
d in Drugbank, while
nformation: An ex-
e query: “side e↵ect
ASTHMA”. Here the
obtained by joining
Drugbank (enzymes,
pansion: An exam-
aldecoxib”. Here the
d in Sider, however,
ia Sider.
roach is the ﬁrst ap-
erlinked datasets by
Figure 1: Schema interlinking for three datasets i.e.
DrugBank, Sider, Diseasome.
Diseasome
Drug
Asthma
?v0
side effectsameAs
a
?v2 ?v3
Disease
Drug Side Effect
a a
a
?v1
enzyme
Enzymes
a
SiderDrugBank
Figure 2: Resources from three di↵erent datasets
Query: What are the side eﬀects of drugs used for Tuberculosis?

Saeedeh Shekarpour, Axel-Cyrille Ngonga Ngomo, Sören Auer: QuesUon answering on
interlinked data. WWW 2013: 1145-1156

How can KG beneKit machine learning approaches?
Structure and semanUcs of Data can be employed as the
emerging features in the machine learning approaches.

•  Structural features are mainly graph-based parameters such
as
² Paths between enUUes.
² Popularity degree
ü  Frequency
ü in-degree
ü out-degree
² Cliques on graph
2/10/2017
14

How can KG beneKit machine learning approaches?
•  SemanUcs features are such as
² Schema-aware features:
ü Hierarchy of concepts
ü Label of properUes
ü  DirecUon of properUes
ü Domain and range of properUes
ü Aligning ontologies and vocabularies across various domain
² Data-driven features:
ü Type of enUUes.
ü Traversing owl:sameAs links
2/10/2017
15

Query Expansion Task
Linguistic vs. Semantic Features for Query Expansion Task

•  LinguisUc features from WordNet:
ü  Synonyms: words having a similar meanings.
ü  Hyponyms: words represenUng a specializaUon of the input.
ü  Hypernyms: words represenUng a generalizaUon of the input.

•  SemanUc Features from Linked Data:
ü  Using owl:sameAs. And rdfs:seeAlso: using rdfs:seeAlso.
ü  Using owl:equivalentClass and owl:equivalentProperty.
ü  Following the rdfs:subClassOf or rdfs:subPropertyOf property.
ü  Following the rdfs:subClassOf or rdfs:subPropertyOf.
ü  Using skos:broader and skos:broadMatch.
ü  Using skos:narrower and skos:narrowMatch.
ü  Using skos:closeMatch, skos:mappingRela,on and skos:exactMatch.
2/10/2017
16

Exemplary expansion graph of the word
movie
2/10/2017
17
movie
home movie
hyponym
produc,on
ﬁlm
mo,on
picture
hyperny
m
show
super resource
video
teleﬁlm
Saeedeh Shekarpour, Konrad Höffner, Jens Lehmann, Sören Auer: Keyword Query Expansion on Linked
Data Using Linguistic and Semantic Features. ICSC 2013: 191-197

How can KG promote NLP approaches?
•  SUll the type of recognized enUUes by NER are limited to types
such as Person, OrganizaUon, Place, Date, Time.
•  With the support of KG, NER tools can be schema-aware and
extended in order to
ü Find new enUUes e.g. name of drugs, animals
ü Remove case sensiUvity from NER
ü Have schema-aware annotaUons, e.g.
President Barack Obama tweeted the American people in his ﬁnal hours as head of state promising to conUnue his
work with them, and unveiling a new website.

2/10/2017
18
Person President
Father Spouse

How can KG promote disambiguation approaches?
•  Using KG as the background knowledge enriches context
•  Having richer context, having well-performed disambiguaUon
approaches
2/10/2017
19

2/10/2017
20
1
2
3
Unknown
En,ty
4
5
6
7
8
9
Start
Keyword 1 Keyword 3 Keyword 2 Keyword 4
Query Disambiguation
Concurrent segmentation & disambiguation using
hidden Markov model

How can KG beneKit IR approaches?
•  Our search engines are not limited to keyword-based retrieval
•  Search engines are moving towards to semanUc retrieval &
QA
•  KG enables us to template-based approaches.
2/10/2017
21

Template-based approach for semantic search
22Saeedeh Shekarpour, Sören Auer, Axel-Cyrille Ngonga Ngomo, Daniel Gerber, Sebastian Hellmann, Claus
Stadler: Keyword-Driven SPARQL Query Generation Leveraging Background Knowledge. Web
Intelligence 2011: 203-210

Categorization
based on the matter of information
ü  Finding special characterisUcs of an instance
ü  Finding similar instances

ü Finding associaUons between instances
23Saeedeh Shekarpour, Sören Auer, Axel-Cyrille Ngonga Ngomo, Daniel Gerber, Sebastian Hellmann, Claus

Samples of keywords and results
2/10/2017
24 Saeedeh Shekarpour, Sören Auer, Axel-Cyrille Ngonga Ngomo, Daniel Gerber, Sebastian Hellmann, Claus

Outline

²  IntroducUon
ü NLP
2/10/2017
25

Input Query & Vocabulary Mismatch Problem
•  It is likely that the input queries do not match with the background
knowledge.

•  Query expansion and query rewriUng are soluUons for this problem.

•  But they are in danger of potenUally yielding a large number of
irrelevant words, which in turn negaUvely inﬂuences runUme as well
as accuracy.
Input Query

2/10/2017
26
k1k2 k3
10 ×10 ×10
Saeedeh Shekarpour, Edgard Marx, Sören Auer, Amit Sheth: RQUERY: Rewriting Natural Language
Queries on Knowledge Graphs to Alleviate the Vocabulary Mismatch Problem. AAAI 2017

RQUERY Overview
I.  Segment Genera,on: (1) TokenizaUon and stop word removal. (2) We generate all possible
segments which can be derived from q.
II.  Segment Expansion: This module expands segments derived from the previous module using a
linguisUc the thesaurus using linguisUc features of WordNet as (1) synonyms (2) hypernyms.
III.  Derived Word Valida,on: Each derived word is validated against the background knowledge
base.
IV.  Detec,ng and ranking possible query rewrites: We aim at disUnguishing and ranking possible
query rewrites. We address the problem of ﬁnding the appropriate query rewrite by employing
a Hidden Markov Model (HMM) in three steps:
i.  The state space is populated.
ii.  TransiUons between states are established.
iii.  Parameters are bootstrapped.
2/10/2017
27
RDF Knowledge Base
External Resources
RQUERY
WordNet
Segment
generation
Segment
expansion
Derived
word
validation
Detecting and
ranking query
rewrites model
construct
Input textual
query
Ranked list of
rewritten queries

Example – Part 1
2/10/2017
28
•  Input Query: ‘What is the profession of bandleader?’
•  Steps:
1)  RQUERY derives and validates 10 words for the two given input keywords.
2)  The state space is populated with all of these 10 validated words.
3)  Then, all the transiUons between states are recognized and established.

band
leader
director
music
director
conductor
occupation
profession
line
business
vacation
job
Start
profession bandleader
Observation 1 Observation 2

Example – Part 2
4) Finally, we run the Viterbi algorithm, which is a dynamic programming approach for
finding the optimal path through a HMM. This algorithm discovers the most likely states
that the sequence of input keywords is observable through.
5) Thus, after running the Viterbi algorithm for the running query “profession of
bandleader”, the generated top-6 outputs are as follows:
2/10/2017
29

Methodology: Modeling by HMM
2/10/2017
30
j
• B : X ⇥ Y ! [0, 1] represents the emission matrix. Each
entry bi seg = P(seg|Si) is the probability of emitting
the segment seg from the state Si.
• ⇡ : X ! [0, 1] denotes the initial probability of states.
We define the basic problem as follows: the sequence
of input keywords q and the model are given, and the
problem is to find the optimal sequence of states qr =
(S1, S2, ..., Sm) which explain the given observation, i.e. in-
put query q(k1, ..., kn). Please note that there are possibly
multiple distinct sequences of states which the given input
query q is observable through, thus the aim is obtaining the
optimal one; formally as: = arg maxqr
{P(qr | q, )}.
P(qr | q, )} is the probability of observing the given query
q through the sequence of states qr. For computing the prob-
ability of any query rewrite qr, the model plays a role as a
constant parameter, thus we assume
P(qr | q, )} ⇡ P(qr | q) =) = arg max
qr
{P(qr | q)}
Assuming that qr is a sequence of states (S1...Sm) (please
(a
pr
(d
ob
parameters of our HMM. Formally, a HMM is a quintuple
= (X, Y, A, B, ⇡) where:
• X is a finite set of states. In our case, X equals the set of
the validated derived words W . In other words, each word
w 2 W forms a state.
• Y denotes the set of observations. Here, Y equals the set
of all segments 8seg 2 S derived from the input n-tuple
of keywords q.
• A : X ⇥ X ! [0, 1] is the transition matrix. Each entry
aij is the transition probability P(Sj|Si) from state Si to
state Sj.
• B : X ⇥ Y ! [0, 1] represents the emission matrix. Each
entry bi seg = P(seg|Si) is the probability of emitting
the segment seg from the state Si.
• ⇡ : X ! [0, 1] denotes the initial probability of states.
We define the basic problem as follows: the sequence
of input keywords q and the model are given, and the
problem is to find the optimal sequence of states qr =
(S1, S2, ..., Sm) which explain the given observation, i.e. in-
For instan
profess
from the s
Transitio
tween stat
We adopt
traditiona
RDF kno
co-occurr
scriptions
s
l
w1
(a)
predicat

Triples
•  A triple has subject–predicate–object structure
•  Jack knows Ann
2/10/2017
31
Subject Object
Predicate
Jack Ann
knows

Triple-based Co-occurence
where:
states. In our case, X equals the set of
d words W. In other words, each word
te.
f observations. Here, Y equals the set
eg 2 S derived from the input n-tuple
1] is the transition matrix. Each entry
probability P(Sj|Si) from state Si to
] represents the emission matrix. Each
seg|Si) is the probability of emitting
om the state Si.
otes the initial probability of states.
ic problem as follows: the sequence
and the model are given, and the
he optimal sequence of states qr =
h explain the given observation, i.e. in-
). Please note that there are possibly
ences of states which the given input
through, thus the aim is obtaining the
y as: = arg maxqr
{P(qr | q, )}.
obability of observing the given query
e of states qr. For computing the prob-
write qr, the model plays a role as a
us we assume
r | q) =) = arg max
qr
{P(qr | q)}
sequence of states (S1...Sm) (please
corresponds to the word wi). We ex-
qr | q) = P(S1...Sm | k1...kn). The
ng the keyword ki from the state Sj is
. As from a state Si either one or mul-
be observable, the number of states
o the number of keywords m <= n.
v property, the probability of reach-
observing the keyword kn is equal to
n | Sm). Thus, the equation (2) can be
Sm 1)⇤P(kn | Sm))⇤P(S1...Sm 1 |
extended further as:
profession, so the keyword profession is emitted
from the state associated with the word job.
Transitions between States. We deﬁne transitions be-
tween states based on the concept of co-occurrence of words.
We adopt the concept of co-occurrence of words from the
traditional information retrieval context and move it to the
RDF knowledge bases. Triple-based co-occurrence means
co-occurrence of words in literals found in the resource de-
scriptions of the two resources of a given triple:
s p o
l
w1
l
w2
(a) subject-
predicate.
s p o
l
w1
l
w2
(b) subject-object.
s p w2
l
w1
(c) subject-literal.
s p o
l
w2
l
w1
(d) predicate-
object.
s p w2
l
w1
(e) predicate-
literal.
s" p" o"
a"
c"
l"
‘w2’"
‘w1’"l"
(f) predicate-Type
of subject.
s" p" o"
l"‘w2’"
l"
‘w1’"
a"
c"
(g) predicate-Type of ob-
ject.
Figure 3: The graph patterns employed for recognising co-
occurrence of the two given words w1 and w2. Please note
that the letters s, p, o, c, l and a respectively stand for subject,
predicate, object, class, rdfs:label and rdf:class.
2/10/2017
32

Evaluation
ü  Evalua,on Criteria: The goal of our evaluaUon is invesUgaUng posiUve as well as
negaUve impacts of the proposed approach by raising the following two
quesUons:
①  How eﬀecUve is the approach for addressing the vocabulary mismatch problem when
employing queries having a vocabulary mismatch problem?
②  How eﬀecUve is the approach for avoiding noise when employing queries
which do not have a vocabulary mismatch problem?
ü  We employ Mean Reciprocal Rank (MRR)?
ü  Benchmark: we use an evaluaUon test collecUon for schema-agnosUc query
mechanisms on RDF datasets (i.e. DBpedia) presented in ESWC 2015.
ü  hgps://sites.google.com/site/eswcsaq2015/documents
2/10/2017
33

Evaluation
•  Bootstrapping:
•  Issue: Since we encounter a dynamic modeling meaning state space as well as issued
observaUon (i.e., sequence of input keywords) vary query by query. Thus, learning probability
values should be generic and not query-dependent because learning model probabiliUes for
each individual query is not feasible.
•  Solu,on: Thus, we rely on bootstrapping, a technique used to esUmate an unknown
probability distribuUon funcUon. We apply three distribuUons (i.e., normal, uniform and
zipfian) to find out the most appropriate distribuUon.
2/10/2017
34
0.76
0.51
0.69
0.85
0.44
0.82
0.68
0.58
0.63
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Uniform Distribu9on Normal Distribu9on Zipfian Distribu9on
Mean Reciprocal Rank
All Queries Q1-10 Q11-20

Evaluation Results
0.00
0.20
0.40
0.60
0.80
1.00
Q12 Q15 Q18 Q20 Q21 Q24 Q29 Q31 Q40 Q51 Q54 Q65 Q70 Q76 Q78 Q84
Reciprocal Rank
HMM with Implicit Frequency HMM with Explicit Frequency n-gram Language Model
2/10/2017
35
0.00
0.20
0.40
0.60
0.80
1.00
Q2 Q3 Q5 Q8 Q10 Q16 Q22 Q34 Q37 Q46 Q48 Q49 Q50 Q58 Q59 Q63 Q64 Q69 Q85 Q91 Q93
Reciprocal Rank
HMM with Implicit Frequency HMM with Explicit Frequency n-gram model
Queries which do not have a mismatch problem

Queries which have a mismatch problem

Outline

²  IntroducUon
ü NLP
2/10/2017
36

Knowledge Graph Creation
HeadEx: Triple Extrac,on from Stream of News
Headlines on Twiaer using n-ary Rela,ons
2/10/2017
37

Stream of News Headlines
2/10/2017
38
each individual headline tweet ti, so that the headline news knowledge base Khnews is
populated by the triples extracted from the stream of news headline tweets. Formally
the extraction task can be captured as T ! Khnews where T = {t1, t2, ..., tl} is the
stream of news headline tweets and Khnews is a set of triples (in the following, it is
presented that for a given tweet ti being mappable to a relation with n arguments, thus,
more than n + 1 triples are generated). We must address three main challenges: (1)
Creation of a background data model, (2) Relation recognition and entity extraction,
and (3) Pubishing the triples on Linked Open Data. We address the first two in this
paper and discuss the third one in a manuscript in preparation.
Publisher Date News Headlines Tweets
CNN
16/3/2016 no1. Michelle Obama tells #SXSW crowd: I will not run for president
26/2/2016 no2. Instagram CEO meets with @Pontifex to discuss "the power of images to unite people"
14/3/2016 no3. Chemical accident in Bangkok bank kills eight people
BBC
14/3/2016 no4. State elections were "difficult day," German Chancellor Angela Merkel says
10/3/2016 no5. Pope Francis visits Cuba and Mexico
24/2/2016 no6. Storms kill at least three in Virginia
NY Times
10/3/2016 no7. Obama and Justin Trudeau announce efforts to fight climate change
10/3/2016 no8. Pope to meet leader of Russian Orthodox Church for first time in nearly
10/3/2016 no9. 2 air force pilots from United Arab Emirates killed when warplane crashed over Yemen
Challenge 1: Background Data Model. The key question is “What is the background
data model (serving as the pivot) for extracting triples?” Contemporary approaches
to extracting RDF triples that encompass entities and relations use binary relations
[10,6,3]. In this regard, we divide the current triple-based extraction approaches into two

CEVO: Cognitive annotation on relations
•  Problem:
ü RelaUon ExtracUon
ü Contextual equivalence of relaUons
ü Diversity in ConceptualizaUon
Requirements:
ü RelaUon tagging on textual data
ü RelaUon linking
ü IntegraUon and alignment of properUes
ü Simplicity
ü Reusability
2/10/2017
39

CEVO: Cognitive annotation on relations

•  CEVO is built up on Levin ‘s categorizaUon on
English verbs.

•  CEVO has an abstract conceptualizaUon

•  You can ﬁnd CEVO at hgp://eventontology.org
2/10/2017
40

Background Data Model
the meet event is associated with entities with type of Participant and Topic
(i.e., topic discussed in the meeting). Considering the sample of tweets in Table ??, the
tweets no1, no4, no7 are instances of the event Communication with the mentions
tell, say, announce. The tweets no2, no5, no8 are instances of the event Meet
with the mentions meet, visit. The tweets no3, no6, no9 are instances of the event
Murder with the mention kill.
subclass
Generic Event
Communica3on Meet
Publisher
Published
By
subclass
xs:date
Murder
subclass
published
date
Loca3on
Time
occurredIn
occurredon
(a) SubClasses of Event
Meet
Par(cipant
Topic
about
A2ended
in
(b) Meet Class
Communica)on
Giver Addressee
Message
expressed
says
addressed
(c) Communication Class
Murder
Vic*m
cause
Killer
quan*ty
kills
killed
caused xs:string
xs:integer
expression
(d) Murder Class
Fig. 1: Subclasses of the Generic Event.
2/10/2017
41

Example
Tweet #2: Instagram CEO meets with @PonUfex to discuss "the power
of images to unite people".
1. :Meet#1 a :Meet ; rdfs:label `meets' .
2. :e1 a :ParUcipant ; rdfs:label `Instagram CEO' .
3. :e2 a :ParUcipant ; rdfs:label `@PonUfex'
4. :t1 a :Topic ;
:body `to discuss the power of images to unite people' .
5. :e1 :agendedIn Meet#1 .
6. :e2 :agendedIn Meet#1 .
7. :Meet#1 :about :t1 .
8. :Meet#1 :publisher :CNN .
9. :Meet#1 :date `26/2/2106' .
2/10/2017
42

Overview
Crawling
News
Tweets

Disambigua3on
& Valida3on &
URI assignment
Filtering
Event
Recognition
Entity
Extraction
2/10/2017
43

Entity Extraction using Linguistic Analysis
2/10/2017
44
with Instagram CEO @Pon4fex the power to
xcomp
compound
case mark det
of images people to unite discuss
dobj
case
nmod mark
dobj
acl
Fig. 2: Dependency tree for the running example.
Definition 3 (Dependent Chunk of ROOT). Dependent Chunk of ROOT (DCR) is the
longest sequence of tokens of a given tweet that satisfies the following conditions: (i)
There is one token that is (directly) dependent on the root, and (ii) any other token
included in a given chunk is dependent on a token already within the given chunk.
Moreover, ROOT is an individual chunk.
Example 2 (Chunking Tweet). We chunk the running example based on the concept
of ROOT Dependent Chunk (RDC). Figure 3 shows the resulting chunks. Except for
the chunk of root (because root is an individual chunk), any other chunk has only one
token that is dependent on the root (only one outgoing arrow to the root) and other
tokens inside that chunk co-reference interior tokens (interior arrows). According to this
definition, the example tweet contains four individual chunks. For the chunk ‘Instagram
CEO’, only the token ‘CEO’ is dependent on the root and the other token ‘instagram’
is dependent on the interior token ‘CEO’.
meets
Instagram CEO With @Pon4fex
nsubj xcomp
compound
case mark det
To discuss the power of images to unite people
nmod
dobj
case
nmod mark
dobj
acl
ROOT
Chunk 1 Chunk 2 Chunk 4
Chunk 3
Fig. 3: Chunking the running example based on the concept of Root Dependent Chunk.
meets
with Instagram CEO @Pon4fex the power to
nsubj xcomp
compound
case mark det
of images people to unite discuss
nmod
dobj
case
nmod mark
dobj
acl
ROOT
Fig. 2: Dependency tree for the running example.
Definition 3 (Dependent Chunk of ROOT). Dependent Chunk of ROOT (DCR) is the
longest sequence of tokens of a given tweet that satisfies the following conditions: (i)
There is one token that is (directly) dependent on the root, and (ii) any other token
included in a given chunk is dependent on a token already within the given chunk.
Moreover, ROOT is an individual chunk.
Example 2 (Chunking Tweet). We chunk the running example based on the concept
of ROOT Dependent Chunk (RDC). Figure 3 shows the resulting chunks. Except for
the chunk of root (because root is an individual chunk), any other chunk has only one
token that is dependent on the root (only one outgoing arrow to the root) and other
tokens inside that chunk co-reference interior tokens (interior arrows). According to this
definition, the example tweet contains four individual chunks. For the chunk ‘Instagram

The best observed accuracy for Entity
Extraction Tasks
Entity F-measure Precision Recall
Communication 88.3 82.83 95.3
Giver 81.4 77 81.4
Addressee 73.9 72.1 73.9
Message 78 85.3 71.9
Meet 89.7 83.6 96.7
Participant 80.1 76.1 80.1
Topic 65.2 62.0 65.2
Murder 93.2 90.2 96.4
Victim 91.6 91.6 91.6
Killer 64.8 64.8 64.8
Cause 82.2 88.4 76.8
e best observed accuracy results for the entity extraction
2/10/2017
45

EnUty ExtracUon
Sequence Labeling Using Deep Learning
2/10/2017
46

Thank you
Any Question?
2/10/2017
47

Annotation Evolution
Metadata
Annota*on
Linguis*c
Annota*on
Interoperability
Annota*on
Cogni*ve
Annota*on
PROV Ontology
Dublin Core Meta
Data
OLiA Ontologies
Language Annota*on
Framework (LAF)
MEX (Machine
Learning)
QANARY (Ques*on
Answering)
NLP Interchange
Format (NIF)
CEVO (Comprehensive
Event Ontology)
Universal Conceptual
Cogni*ve Annota*on
(UCCA)
2/10/2017
48

CEVO use case 1: Annotating Text
BBC Tweet#1 on 10/3/2016:
Obama and Justin Trudeau announce efforts to fight climate change.
NYT Tweet#2 14/3/2016:
State elections were "difficult day," German Chancellor Angela Merkel says.
CEVO:Communication
CEVO:Communication
2/10/2017
49

CEVO use case 2: Annotating Ontological
Properties
We use Web AnnotaUon Data Model (WADM) for annotaUng
ontological properUes.
example:annotaUon1 a oa:AnnotaUon
oa:hasTarget dbo:spouse
oa:hasBody cevo:Amalgamate
2/10/2017
50

CEVO use case 3: Relation Linking
•  Example: Rupert Murdoch and Jerry Hall marry.
<exam:headline#char=31,35> a nif:String ;
nif:beginIndex 31 ;
nif:endIndex 35 ;
nif:anchorOf "marry" ;
nif:oliaCategory Olia:MainVerb .
a cevo:Amalgamate .

example:annotaUon3 a oa:AnnotaUon ;
oa:hasTarget exam:headline#char=31,35 ;
oa:hasBody dbo:spouse .
2/10/2017
51

Using Knowledge Graph for Promoting Cognitive Computing

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (19)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Using Knowledge Graph for Promoting Cognitive Computing

Ähnlich wie Using Knowledge Graph for Promoting Cognitive Computing (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Using Knowledge Graph for Promoting Cognitive Computing