Originally presented at DataDay Texas in Austin, this presentation shows how a graph database such as Neo4j can be used for common natural language processing tasks, such as building a word adjacency graph, mining word associations, summarization and keyword extraction and content recommendation.
3. Agenda
• Brief intro to graph databases / Neo4j
• Representing text as a graph
• NLP tasks
• Mining word associations
• Graph based summarization and keyword
extraction
• Content recommendation
4. Agenda
• Brief intro to graph databases / Neo4j
• Representing text as a graph
• NLP tasks
• Mining word associations
• Graph based summarization and keyword
extraction
• Content recommendation
Survey of NLP
methods with graphs
10. Relational Versus Graph Models
Relational Model Graph Model
KNOWS
KNOWS
KNOWS
ANDREAS
TOBIAS
MICA
DELIA
Person FriendPerson-Friend
ANDREAS
DELIA
TOBIAS
MICA
11. Property Graph Model Components
Nodes
• The objects in the graph
• Can have name-value properties
• Can be labeled
Relationships
• Relate nodes by type and
direction
• Can have name-value properties
CAR
DRIVES
name: “Dan”
born: May 29, 1970
twitter: “@dan”
name: “Ann”
born: Dec 5, 1975
since:
Jan 10, 2011
brand: “Volvo”
model: “V70”
LOVES
LOVES
LIVES WITH
OW
NS
PERSON PERSON
12. Cypher: Graph Query Language
CREATE (:Person { name:“Dan”} ) -[:LOVES]-> (:Person { name:“Ann”} )
LOVES
Dan Ann
LABEL PROPERTY
NODE NODE
LABEL PROPERTY
13. “So what does this have to do with NLP?”
“Am I in the wrong talk?”
“I thought this was going to be about text processing….”
34. Word Associations
• Paradigmatic
• words that can be substituted
• “Monday” <—> “Thursday”
• “cat” <—> “dog”
• Syntagmatic
• words that can be combined with each other
• “cold”, “weather”
• colocations
35. Computing Paradigmatic Similarity
1. Represent each word by its context
2. Compute context similarity
3. Words with high context similarity likely have
paradigmatic relation
42. Paradigmatic Similarity
3. Find words with high context similarity
http://earthlab.uoi.gr/theste/index.php/theste/article/viewFile/55/37CEEAUS corpus
53. Opinion Mining - Example
1.Graph based representation
of review corpus
2.Find and score candidate
summaries
3.Select top scoring candidates
as summary
59. Content recommendation
“Networks give structure to the conversation
while content mining gives meaning.”
http://breakthroughanalysis.com/2015/10/08/ltapreriitsouda/
- Preriit Souda
60. Using Data Relationships for
Recommendations
Content-based filtering
Recommend items based on what
users have liked in the past
Collaborative filtering
Predict what users like based on the
similarity of their behaviors,
activities and preferences to others
Movie
Person
Person
RATED
SIMILARITY
rating: 7
value: .92
61. Using Data Relationships for
Recommendations
Content-based filtering
Recommend items based on what
users have liked in the past
Movie
Person
Person
RATED
SIMILARITY
rating: 7
value: .92
63. Building the article graph
• Articles users have shared
• Extract keywords using newspaper3k
python library
• Insert in the graph
• Scrape additional articles
https://github.com/johnymontana/nlp-graph-notebooks
71. Opinion Mining
• “Opinosis: A Graph Based Approach to Abstractive
Summarization of Highly Redundant Opinions”
• - Kavita Ganesan, Cheng Xiang Zhai, Jiawei Han University
of Illinois at Urbana-Champaign
• Multi-sentence compression: Finding shortest paths in word
graphs
• - Proceedings of the 23rd International Conference on
Computational Linguistics. COLING 10. Beijing, Cina
Aug23-27, 2010. Katy Fillipova