SlideShare ist ein Scribd-Unternehmen logo
1 von 65
Downloaden Sie, um offline zu lesen
Personalized and Adaptive Semantic Information
Filtering for Social Media
Pavan Kapanipathi, PhD Candidate
Kno.e.sis Center, Wright State University
Committee: Drs. Amit Sheth (Advisor), Krishnaprasad Thirunarayan,
Derek Doran, and Prateek Jain
Ohio Center of Excellence in Knowledge-Enabled Computing
Social Media
2
Introduction
Information Consumption on Social
Media
• Updates of Friends and
Acquaintances
3
Introduction
Information Consumption on Social
Media
• Updates of Friends and
Acquaintances
• News [1]
– 86% of Twitter
users surveyed
4
Introduction
Information Consumption on Social
Media
• Updates of Friends and
Acquaintances
• News [1]
– 86% of Twitter
users surveyed
• Medical Information [2]
– 1 in 3 use social media
5
Introduction
Information Consumption on Social
Media
• Updates of Friends and
Acquaintances
• News [1]
– 86% of Twitter
users surveyed
• Medical Information [2]
– 1 in 3 use social media
• Disaster Management [3]
– 20 million tweets on Hurricane Sandy
– Most crisis management agencies
monitor social media 6
Introduction
Information Overload on Social
Media
• Users often complain of
getting overwhelmed with
the information on social
media
• 5 billion posts per day
– Real-time information
• 1000+ in my social network
7
“...a wealth of information creates a poverty of attention...”
Herbert A. Simon
Introduction
Need for Information Filtering
• Scenario
– Address information overload
– Enormous data stream has to be
filtered
• Information Filtering Systems
– Emails, News, and Blogs
– Functionality
• Understand user interests
• Deliver relevant information
8
Introduction
Traditional Information
Filtering
9
User Interest
Identification/User
Modeling
Filtering Module
Streaming Data
User
Generated
Content
Filtered
Data
Hanani, Uri, Bracha Shapira, and Peretz Shoval. "Information filtering: Overview of issues, research and systems." User
Modeling and User-Adapted Interaction 11.3 (2001): 203-259.Introduction
Traditional Information
Filtering
10
User Interest
Identification/User
Modeling
Filtering Module
Streaming Data
User
Generated
Content
Filtered
Data
Hanani, Uri, Bracha Shapira, and Peretz Shoval. "Information filtering: Overview of issues, research and systems." User
Modeling and User-Adapted Interaction 11.3 (2001): 203-259.
NBA
Basketball
Sports
Relevance: 0.9
Introduction
Challenges
1. Lack of Context
• Lack of context for processing short-text
– Short-Text
• Average length of social media posts (Facebook, Twitter, Google+, etc.)
are 100-160 characters
• Identifying topics from short-text is important
– We can infer the author’s interest and deliver the tweet to interested
users in the topic
– Traditional techniques are shown to have not perform well on social
media [Sriram 2010, Derczynski 2013]
11
Great day for Chicago sports as well
as Cubs beat the Reds, Sox beat the
Mariners with Humber’s perfect game.
Introduction
Challenges
2. Continuously Changing Vocabulary
• Social media is a real-time platform with information about
latest activities in the real-world
• Hurricane Sandy
– Mitigation, preparedness, recovery, and response phases
– #Frankenstorm and #Sandy, at the start, to #StaySafe and #RedCross during the
disaster and #ThanksSandy and #RestoreTheShore after the hurricane
• Indian Elections
– the announcement of prime ministerial candidates, issues
regarding corruptions, and polls in different states
– #modikisarkar, #NaMo, #VoteForRG, and #CongBJPQuitIndia
12
Civil Unrest Election Natural Disaster
Challenges
3. Scalability
• Practical aspects of the filtering system
• Popularity of social media is increasing
– Facebook has more than 1 billion users
– Twitter has more than 500 million users
• Disseminate information to a huge set of users
– Centralized disseminating systems either overload the client of
the server. (Push or Pull model)
13
Introduction
Introduction
Knowledge Bases
• A common theme across the methodologies developed is
the use of background knowledge and Semantic Web
technologies.
• Background knowledge to process short-text leverage
knowledge bases
14
“If a program is to perform a complex task well, it must
know a great deal about the world in which it operates.”
Lenat & Feigenbaum
Great day for Chicago sports as well
as Cubs beat the Reds, Sox beat the
Mariners with Humber’s perfect
game.
BaseballJason Herward
Kris Bryant
Chicago Cubs
Sports
Wikipedia as a Knowledge Base
• Requirements for a Knowledge base to be used for filtering
social data
– Diversity and Comprehensiveness: Large set of diverse users on social
media such as Twitter and Facebook
– Real-time updates: Social media is a real-time platform the discusses
dynamic topics
• Wikipedia as the Knowledge base
– Semi structured – Extract the structure
– Diverse: Collaborative effort of 80,000 users with 5 million articles
– Near real-time updates with unbiased views on topics [Ferron 2011]
15
Introduction
Thesis Statement
16
To build an effective information filtering system, background
knowledge and Semantic Web technologies can be used to
address lack of context, dynamic changing vocabulary and
scalability challenges introduced by social media’s short-text
and real-time nature.
Introduction
Outline
• Short-Text: Lack of context for processing
– Hierarchical Interest Graphs
– Built a hierarchical context for tweets leveraging Wikipedia category
structure. This hierarchical context is utilized for user modeling and
recommendations.
– Publications [ESWC 2014, WWWCOMP 2014, TR-JRNL 2016]
• Real-time and dynamic nature: Continuously changing
vocabulary
– A novel methodology that utilizes the evolving Wikipedia hyperlink
structure to detect topic-relevant hashtags for continuous filtering
– Publications [TR-CNF 2016, ESWC 2015]
• Popularity: Scalability
– Scalable distributed dissemination system that utilizes Sematic Web
technologies.
– Publications [ISWC 2011, SPIM 2011, ISWCDEM 2011]
17
Introduction
Outline
• Short-Text: Lack of context for processing
– Hierarchical Interest Graphs
– Built a hierarchical context for tweets leveraging Wikipedia category
structure. This hierarchical context is utilized for user modeling and
recommendations.
• Real-time and Dynamic Nature: Continuously Changing
Vocabulary
– A novel methodology that utilizes the evolving Wikipedia hyperlink
structure
• Popularity: Scalability
– Scalable distributed dissemination system that utilizes Sematic Web
technologies.
18
Lack of context
Baseball
• User generated content is processed to understand user
interests and filtering
– Tweets are used for these experiments
• Wikipedia category structure comprises taxonomical information
that can be leveraged
– Build context for short text for user interest identification
Processing Short-text for User
Interest Identification
19
Great day for Chicago sports as well
as Cubs beat the Reds, Sox beat the
Mariners with Humber’s perfect game.
“You are what you share”
Charles W. Leadbeater
Lack of context
ESWC 2014
Content Based User Interests
Identification from Social Data
20Semantics
Term Frequency
Based
Techniques
Lower Dim Space
as latent
semantics
Entity Based
Techniques
[Tao 2012][Ramage 2010]
Great day for Chicago sports as well
as Cubs beat the Reds, Sox beat the
Mariners with Humber’s perfect game.
Not sure who the Reds will look too
replace Dusty.some very interesting
jobs open (Cubs, Mariners, Reds, poss
Yanks) Girardi the domino sports
[Yan 2012]
Term Freq
great 1
day 1
sports 2
cubs 2
…
Dim Dist
1dim 0.3
2dim 0.2
3dim 0.2
4dim 0.1
5dim 0.4
Wiki-Entities Freq
Chicago Cubs 2
Cinci Reds 2
White Sox 1
NY Yankees 1
…
Knowledge
Enabled
Approaches
Lack of context
ESWC 2014
Implicit Information from Social Data
21
BroaderRelated
Interests
Major League
Baseball
Major League
Baseball Teams
Baseball
Great day for Chicago sports as well
as Cubs beat the Reds, Sox beat the
Mariners with Humber’s perfect
game.
Not sure who the Reds will look too
replace Dusty.some very interesting
jobs open (Cubs, Mariners, Reds,
poss Yanks) Girardi the domino
San Francisco Giants
Oakland Athletics
Baseball Organizations
Lack of context
ESWC 2014
22
BroaderRelated
Interestsfrom
WikipediaCategory
Structure
Major League
Baseball
Major League
Baseball Teams
Baseball
Great day for Chicago sports as well
as Cubs beat the Reds, Sox beat the
Mariners with Humber’s perfect game.
Not sure who the Reds will look too
replace Dusty.some very interesting
jobs open (Cubs, Mariners, Reds,
poss Yanks) Girardi the domino
Methodology: Structured
Hierarchical Knowledge
0.6 1.0 0.3 0.3
Seattle
Mariners
White Sox
Cincinnati
Reds
Chicago Cubs
Transformed
Wikipedia Category
Structure to a
Wikipedia Hierarchy
Lack of context
ESWC 2014
23
SpreadingActivation
Major League
Baseball
Major League
Baseball Teams
Baseball
Great day for Chicago sports as well
as Cubs beat the Reds, Sox beat the
Mariners with Humber’s perfect game.
Not sure who the Reds will look too
replace Dusty.some very interesting
jobs open (Cubs, Mariners, Reds,
poss Yanks) Girardi the domino
Methodology: Scoring the Inferred
Hierarchical Knowledge
0.6 1.0 0.3 0.3
Seattle
Mariners
White Sox
Cincinnati
Reds
Chicago Cubs
0.5
0.4
0.1
Lack of context
ESWC 2014
Designing an Activation Function
• Design parameters to adapt to the structure of Wikipedia
Hierarchy
– Uneven distribution of nodes in the hierarchy
• 16 hierarchical levels – most categories between 5-9 hierarchical level
– Raw Normalization 𝐹𝑛𝑖
= 1 𝑛𝑜𝑑𝑒𝑠(𝑖+1)
– Log Normalization 𝐹𝐿 𝑛𝑖
= 1 𝑙𝑜𝑔10 𝑛𝑜𝑑𝑒𝑠(𝑖+1)
– Many-many for category-subcategory relationships
• Boston Red Sox – Major League Baseball Teams , 1901 Establishments
in Massachusetts
– Preferential Path Constraint 𝑃𝑖𝑗= 1 𝑝𝑟𝑖𝑜𝑟𝑖𝑡𝑦𝑗𝑖
– Boosting common ancestors
• More entities activating the concept, better is its importance
– Intersect Booster 𝐵𝑖 = 𝑁𝑒𝑖
𝑁𝑒𝑖𝑐𝑚𝑎𝑥
24
Lack of context
ESWC 2014
Activation Functions
• Bell (Raw Normalization)
𝐴𝑗 = 𝐴𝑖 × 𝐹𝑗
𝑛
𝑖=0
• Bell Log (Log Normalization)
𝐴𝑗 = 𝐴𝑖 × 𝐹𝐿𝑗
𝑛
𝑖=0
• Priority Intersect (Log Normalization , Preferential Path, Intersect
Booster)
𝐴𝑗 = 𝐴𝑖 × 𝐹𝐿𝑗 × 𝑃𝑗𝑖 × 𝐵𝑗
𝑛
𝑖=0
25
i is the child node
j is the category
Ai is the activated value of i
Lack of context
ESWC 2014
26
ActivationFunctions
Major League
Baseball
Major League
Baseball Teams
Baseball
Great day for Chicago sports as well
as Cubs beat the Reds, Sox beat the
Mariners with Humber’s perfect game.
Not sure who the Reds will look too
replace Dusty.some very interesting
jobs open (Cubs, Mariners, Reds,
poss Yanks) Girardi the domino
Hierarchical Interest Graph
0.6 1.0 0.3 0.3
Seattle
Mariners
White Sox
Cincinnati
Reds
Chicago Cubs
0.5
0.4
0.1
BELL
BELL LOG
PRIORITY
INTERSECT
Lack of context
ESWC 2014
Hierarchical Interest Graph
Evaluation – User Study
Tweets Entities Distinct
Entities
Categories
in HIG
37 31,927 29,146 13,150 111,535
27
Users Tweets Distribution
Lack of context
ESWC 2014
Evaluation Results of Hierarchical
Interests
28
Graded Precision
Mean Average Precision
Relevant Irrelevant Maybe
k Bell Bell Log Priority
Intersect
Bell Bell Log Priority
Intersect
Bell Bell Log Priority
Intersect
10 0.53 0.67 0.76 0.34 0.23 0.16 0.13 0.10 0.08
20 0.54 0.66 0.72 0.34 0.22 0.19 0.12 0.12 0.09
30 0.53 0.64 0.69 0.34 0.24 0.21 0.13 0.12 0.10
40 0.52 0.61 0.68 0.35 0.26 0.22 0.13 0.13 0.10
50 0.52 0.61 0.67 0.36 0.28 0.24 0.12 0.11 0.09
k Bell Bell Log Priority
Intersect
10 0.64 0.72 0.88
20 0.61 0.7 0.82
30 0.59 0.69 0.79
40 0.58 0.68 0.77
50 0.57 0.67 0.75
Numbers in Bold
portray better
performance
Lack of context
ESWC 2014
On this day in 1934, Major League Baseball
announced it would host its first night games
Great day for Chicago sports as well
as Cubs beat the Reds, Sox beat the
Mariners with Humber’s perfect
game, Bulls win and Hawks stay alive
Implicit Interests Evaluation
• Implicit interests are categories of interest that were not
explicitly mentioned in tweets but inferred from the knowledge-
base
29
Category: Major
League Baseball
Explicit
Implicit
Lack of context
ESWC 2014
Summary
Hierarchical Interest Graphs
• Addressed the “Lack of Context” challenge in tweets using
Hierarchical Knowledge base.
– More than 70% of hierarchical interests are implicit.
• A new way to represent Twitter user interests
– Hierarchical Interest Graph with interest scores at each nodes
– Activation Function (models) to determine interest scores
What’s the use?
30
Lack of context
ESWC 2014
HIG-based
Tweet Recommendation Approach
31
Incoming Tweet
Semantic Web: 0.2
World Wide Web: 0.09
Ontology: 0.7
Technology: 0.01
Semantic Search: 0.3
World Wide Web: 0.9
Technology: 0.7
Sports: 0.6
Baseball: 0.4
India: 0.2
United States: 0.2
Semantic Web: 0.2
Pearson
Correlation
Recommend
Y/N?
Lack of context
TR-JRNL 2016
Content-based Tweet
Recommendation Approaches
• Term Frequency based approaches
– User profiles: Built on scoring important terms
• TF, TF-IDF
• Entity Frequency [Tao 2012]
– User profiles: Built on scoring important entities
• Wikipedia Entities
• Extracted using Zemanta
• Support Vector Machines (SVMrank) [Duan 2010]
– User Models built using content and tweet based features
– Tweet content features: Similarity to users tweets, similarity of hashtags,
tweet length, mention of URLs, mention of hashtags.
• Latent Dirichlet Allocation [Ramage 2010]
– User profiles: Distribution of 5 latent topics.
32
Lack of context
TR-JRNL 2016
Experimental Setup
• Utilized the same dataset from the user study
• Training and testing datasets using two assumptions
– Tweets what users share are interesting to them and can be
recommended (UGC Assumption)
• 80% to create user profiles
• 20% (~6,000) to test recommendation
– Retweets of users are interesting to them and can be recommended
(Retweet Assumption and is more popular in literature)
• 30% (~9,000) were retweets, hence used to test recommendation
• 70% to create user profiles
33
Users Tweets Entities
37 31,927 29,146
Lack of context
TR-JRNL 2016
Evaluation Methodology
• Transformed to a top-N recommendation evaluation
– Popular top-N evaluation methodology by Cremonesi et al. [Cremonesi
2010] for Precision/Recall
• Methodology
– For every test tweet – pick random 1000 tweets not tweeted/retweeted
by the author of the test tweet
• Random tweets are considered to be irrelevant to the user
– Score and rank the test tweet with the 1000 random tweets using the
recommendation algorithm
• TF, TFIDF, Entity-based, SVMrank, LDA, and HIG
– If the test tweet is within the top-N, its considered to be a hit otherwise
not ( T is the total number of test tweets)
𝑟𝑒𝑐𝑎𝑙𝑙 = ℎ𝑖𝑡𝑠 𝑇
34
Lack of context
TR-JRNL 2016
Retweet Assumption Evaluation
Results
• Term frequency performs the best for recommending
retweets tweets [Ramage et al 2010]
35
Lack of context
TR-JRNL 2016
UGC Assumption Evaluation Results
• HIG performed better for most top-N but at Top-20 TF-
based approaches performed better.
36
Lack of context
TR-JRNL 2016
Lack of context
Content + Knowledge based
Approach
• TF performed the best in content based approaches
• Merged TF and HIG which augments content with
knowledge bases and recommend using Pearson Correlation
37
World Wide Web: 0.4
Technology: 0.007
Sports: 0.06
Baseball: 0.34
India: 0.102
United States: 0.2
Semantic Web: 0.2
world: 3
great: 10
cricket: 24
slim: 13
good: 40
united: 34
states: 30
T
F
H
I
G
NORMALIZED
world: 0.075
great: 0.25
cricket: 0.6
slim: 0.325
good: 1
united: 0.85
states: 0.75
World Wide Web: 1
Technology: 0.017
Sports: 0.15
Baseball: 0.85
India: 0.25
United States: 0.5
Semantic Web: 0.5
MERGED
world: 0.075
great: 0.25
cricket: 0.6
slim: 0.325
good: 1
united: 0.85
states: 0.75
World Wide Web: 1
Technology: 0.017
Sports: 0.15
Baseball: 0.85
India: 0.25
United States: 0.5
Semantic Web: 0.5
TR-JRNL 2016
Retweet Assumption Evaluation
Results
• TF + HIG performs the best and provides an improvement
of more than 40% at top-20
38
Lack of context
TR-JRNL 2016
UGC Assumption Evaluation Results
• TF + HIG performs the best and provides an improvement
of more than 20% at top-20
39
Lack of context
TR-JRNL 2016
Summary
Hierarchical Interest Graphs
• A new way to represent Twitter user Interests
– Hierarchy Interest Graphs
• Addressed the “Lack of Context” challenge in tweets using
hierarchical knowledge base.
• HIG (knowledge base) augments content to provide
superior performance for tweet recommendation.
40
Lack of context
TR-JRNL 2016
Outline
• Short-Text: Lack of context for processing
– Augmented content with hierarchical knowledge from Wikipedia
• 70% of the top-50 interests were implicit (not mentioned in users’
tweets)
• Improved content based tweet recommendation by more than 40%.
• Real-time and Dynamic Nature: Continuously Changing
Vocabulary
– A novel methodology that utilizes the evolving Wikipedia hyperlink
structure to update filters for streaming topic-relevant information
• Popularity: Scalability
– Scalable distributed dissemination system that utilizes Sematic Web
technologies.
41
Lack of context
Outline
• Short-Text: Lack of context for processing
– Augmented content with hierarchical knowledge from Wikipedia
• 70% of the top-50 interests were implicit (not mentioned in users’
tweets)
• Improved tweet recommendation by more than 40%.
• Real-time and Dynamic Nature: Continuously Changing
Vocabulary
– A novel methodology that utilizes the evolving Wikipedia hyperlink
structure to update filters for streaming topic-relevant information
• Popularity: Scalability
– Scalable distributed dissemination system that utilizes Sematic Web
technologies.
42
Dynamic vocabulary
• Dynamic topics of interest that continuously evolve over
time
– Indian Elections
• the announcement of prime ministerial candidates, issues
regarding corruptions, and polls in different states
– Hurricane Sandy
• Mitigation, preparedness, recovery, and response phases
Social media: Real-time and Dynamic
Platform
43
Indian Election Hurricane Sandy
Dynamic vocabulary
TR-CNF 2016
• Keyword-based filtering
– Twitter streaming API
• Keywords are dynamically changing based on the
happenings in the real-world
– Necessary to track these keywords to be up-to-date regarding
the topic of interest
Filtering Dynamic Topics on Social
Media
44
#indianelection #sandy
#modikisarkar, #NaMo,
#VoteForRG, and
#CongBJPQuitIndia
#Frankenstorm ,#Sandy,
#RedCross,
#RestoreTheShore
Dynamic vocabulary
TR-CNF 2016
Topic-relevant hashtags that can be used
to crawl all the tweets co-occur with
each other
(1) Colorado Shooting (2) Occupy Wall Street
Analysis with over 6 million tweets
Hindsight Analysis of Topic-relevant
Hashtags
45
<1% of the topic-relevant hashtags can
crawl up to 85% of the tweets
Dynamic vocabulary
TR-CNF 2016
Approach for Detecting Topic-
Relevant Hashtags
46
Co-occurring:
Threshold δ
#indianelection2014
#modikisarkar
Manually started filter
Indian General
Election,_2014
Dynamically Updated
Background Knowledge
One hop from Topic
Page
Entity scoring based
on relevance to the Event
Indian General Elec: 1.0
India: 0.9
Elections: 0.7
UPA: 0.6
BJP: 0.3
NDA: 0.3
Narendra Modi: 0.3
Narendra Modi: 0.9
BJP: 0.7
NDA: 0.6
India: 0.4
Elections: 0.2
Rahul Gandhi: 0.2
Congress: 0.2
Entity Extraction
and Scoring
Normalized
Frequency
Scoring
Latest K (200,500)
Similarity
Check
Extract, Periodically
Update Hyperlink structure
Dynamic vocabulary
TR-CNF 2016
• Dataset – 2 Dynamic topics
– 2012 U S Presidential Elections
– Hurricane Sandy
• δ – Top 25 co-occurring hashtags
– Manual annotation for relevance
Evaluation
47
Event Tags Tweets Co-occ Tags (Distinct) Wiki Entities
US Elections 2012 #election2012 4,855 12,361 (1,460) 614
Hurricane Sandy #sandy 4,818 6,592 (837) 419
Event Tags Tweets (Distinct) Relevant Irrelevant Tweets Entities
US Elections
2012
25 11,504 (10,084) 7,086 2,998 27,558 (4255)
Hurricane Sandy 25 4,905 (4,850) 2,691 2,159 10,719 (2359)
Total 50 15,409 1,4934 9,777 38,219
Dynamic vocabulary
TR-CNF 2016
Evaluation Results
48
Hurricane Sandy 2012 U S Presidential Elections
Subsumption Cosine Jaccard Cooccurance Subsumption Cosine Jaccard Cooccurance
𝑁𝐷𝐶𝐺10 0.93 0.86 0.85 0.65 0.91 0.85 0.89 0.83
𝑁𝐷𝐶𝐺20 0.97 0.93 0.92 0.89 0.98 0.95 0.97 0.94
NDCG
MAP
Dynamic vocabulary
TR-CNF 2016
• Hashtag analysis
– Co-occurrence technique can be used to detect event relevant hashtags
– More popular hashtags are easier to be detected via co-occurrence
• Continuously changing vocabulary for dynamic topics and coverage
– Wikipedia as a dynamic knowledge-base for events
– Determining relevant hashtags using asymmetric similarity measure
– More hashtags in turn increase the coverage of tweets for events
• Content-based location prediction of Twitter users (ESWC 2015)
– Similar framework of relevancy detection was used for location prediction
Dynamic Hashtag Filter
49
Dynamic vocabulary
TR-CNF 2016
Outline
• Short-Text: Lack of context for processing
– Augmented content with hierarchical knowledge from Wikipedia
• 70% of the top-50 interests were implicit (not mentioned in users’ tweets)
• Improved content based tweet recommendation by more than 40%.
• Real-time and Dynamic Nature: Continuously Changing Vocabulary
– Hindsight analysis insight: co-occurrence can be used as a starting point
– Utilized Wikipedia as an evolving knowledge base for dynamic topics
• top-5 detected, increased the coverage by more than 3,500 tweets instantly
with a mean average precision of 0.92
• Popularity: Scalability
– Scalable distributed dissemination system that utilizes Sematic Web
technologies.
50
Dynamic vocabulary
Outline
• Short-Text: Lack of context for processing
– Augmented content with hierarchical knowledge from Wikipedia
• 70% of the top-50 interests were implicit (not mentioned in users’ tweets)
• Improved content based tweet recommendation by more than 40%.
• Real-time and Dynamic Nature: Continuously Changing Vocabulary
– Hindsight analysis insight: co-occurrence can be used as a starting point
– Utilized Wikipedia as an evolving knowledge base for dynamic topics
• top-5 detected, increased the coverage by more than 3,500 tweets instantly
with a mean average precision of 0.92
• Popularity: Scalability
– Scalable distributed dissemination system that utilizes Sematic Web
technologies.
51
Scalability
Content Dissemination
• Centralize content dissemination suffers from scalability
issues
– Server (publisher) or the Client (subscriber) are overwhelmed
– Server for Push and Client for Pull
• Distributed dissemination protocol
– Pubsubhubbub
• Introduced by Google in 2009
• 117 million users and 5.5 billion posts broadcasted by 2011
52
Scalability
ISWC 2011
• PubSubHubbub
– Simple, Open, web-hook based pubsub protocol
– Extension to RSS, Atom.
535353
Publisher SubscriberHub
I have new
content for
feed X
Give me the
latest content for
feed X
Here it is
Subscriber
Subscriber
Subscriber
Subscriber
Here is the
latest content
for feed X
Scalability
ISWC 2011
54
PubSubHubbub Protocol Extension
Pub
Sub - A
Sub - B
Sub - C
Sub - D
Hey I have new
content for feed
topics/preference
Social Graph
and User
Profiles
Get the subscribers
of Pub whose profile
matches
topic/preference
Here is the
new content
of feed X
Give me
the new
content
Here it
is
Semantic Hub
Scalability
ISWC 2011
Publisher – Social Data Annotation
• Preliminary processing of text for filtering
– Information extraction (entities, hashtags, urls, etc.)
• Representing as RDF using vocabulary used by SMOB
– Comprises
• SPARQL Queries representing the subset of subscribers from the Social
Graph in the hub
55
Scalability
<http://twitter.com/rob/statuses/123456789>
rdf:type sioct:MicroblogPost ;
sioc:content "Great day for Chicago sports as
well as Cubs beat the Reds, Sox beat the Mariners with
Humber’s perfect game #chicago“ ;•
sioc:has_creator <http://example.com/rob> ;
moat:taggedWith dbpedia:Chicago ;
moat:taggedWith dbpedia:Chicago_Cubs ;
moat:taggedWith dbpedia:Cincinnati_Reds ;
sioc:topic <http://example.com/tags/chicago> .
ISWC 2011
Semantic Hub
• Performs the matching of processed post to user profiles
– Flexible to different matching techniques
• Pearson correlation or other similarity measures
• Delivers information to relevant subscribers.
56
Scalability
SELECT ?user WHERE {
{ ?user foaf:interest dbpedia:Chicago } UNION
{ ?user foaf:interest dbpedia:Chicago_Cubs } UNION
{ ?user foaf:interest dbpedia:Cincinnati_Reds }
}
ISWC 2011
Semantic Hub: Conclusion
• Framework for distributed dissemination of content using
PubSubHubbub
– Hub takes the load of the filtering module and dissemination of
content
• PubSubHubbub
– 117 million subscriptions by 2011
– 5.5 billion unique feeds by 2011
• Semantic Hub
– Privacy-aware dissemination for distributed social networks
– Real-time filtering
57
Scalability
ISWC 2011
• To build an effective information filtering system, background
knowledge and Semantic Web technologies can be used to
address lack of context, dynamic changing vocabulary and
scalability challenges introduced by social media’s short-text
and real-time nature.
– Augmented content with hierarchical knowledge from Wikipedia to
improve context of short-text
• 70% of the top-50 interests were implicit (not mentioned in users’ tweets)
• Improved content based tweet recommendation by more than 40%.
– Utilized Wikipedia as an evolving knowledge base for dynamic topics to
detect topic-descriptors for filtering
• Hindsight analysis insight: co-occurrence can be used as a starting point
• top-5 detected, increased the coverage by more than 3,500 tweets instantly
with a mean average precision of 0.92
– Extended PubSubHubbub, a distributed content dissemination protocol
with Semantic Web technologies for filtering and dissemination
58
Conclusion
Thesis Conclusion
Graduate Journey
• Hierarchical Interest Graphs
– Internship work – IBM TJ Watson Research Center 2013
• Location Prediction of Twitter users
– Alleviates the dependence on training data
• Determining Twitter User Hobbies
– Internship work – Samsung Research America 2014 (Patent
Pending)
• Tweet Filtering and Recommendation
– Addressing the problem of dynamic topic drift. 59
Conclusion
Conclusion
Graduate Journey
• Research Internships
– 2011 DERI, Ireland (ISWC 2011, SPIM 2011, WebSci 2011)
– 2013 IBM TJ Watson Research Center (WWWCOMP 2014,
ESWC2014)
– 2014 Samsung Research America (Patent Pending)
• Invited talks
– IBM TJ Watson Research Center, Frontiers of Cloud
Computing and Big Data Workshop
– EMC CTO Office, Bangalore, Invited Speaker Series
– WSU Advisory Board
• Proposals and Projects
– Twitris – NSF Commercialization
– Ohio State University – NSF Hazards SEES ($2M)
– CITAR (Epidemiology) – NIH EdrugTrends ($1.6M)
• Development of Research Systems
– Twarql – A semantic tweet filtering system.
• Winner of Triplification Challenge (ISem2010)
– Scalable content dissemination on distributed social
networks. (ISWC2011)
– Twitris – A social semantic web for analyzing events.
60
COLLABORATIONS
CITAR
Publications
• [NOISE 2015] Raghava Mutharaju, and Pavan Kapanipathi. Are We Really Standing on the
Shoulders of Giants? 1st Workshop on Negative or Inconclusive Results in Semantic Web
2015, ESWC, 2015.
• [KNOW 2015] Siva Kumar Chekula, Pavan Kapanipathi, Derek Doran, Amit Sheth. Entity
Recommendations Using Hierarchical Knowledge Bases. 4th International Workshop on
Knowledge Discovery and Data Mining Meets Linked Open Data, 2015.
• [ESWC 2015] Pavan Kapanipathi, Revathy Krishnamurthy (Joint first author), Amit Sheth,
Krishnaprasad Thirunarayan. Knowledge Enabled Approach to Predict the Location of Twitter
Users. In Extended Semantic Web Conference, 2015. (acceptance rate 23%).
• [ESWC 2014] Pavan Kapanipathi, Prateek Jain, Chitra Venkataramani, Amit Sheth. User
Interests Identification on Twitter Using a Hierarchical Knowledge Base. In Extended Semantic
Web Conference 2014, Crete Greece. (acceptance rate 23%)
• [WWWComp 2014] Pavan Kapanipathi, Prateek Jain, Chitra Venkataramani, Amit Sheth.
Hierarchical Interest Graph from Twitter. 23rd International conference on World Wide Web
companion 2014 (WWW companion 2014), Seoul, South Korea.
• [WI 2013] Fabrizio Orlandi, Pavan Kapanipathi, Alexandre Passant, Amit Sheth. Characterising
concepts of interest leveraging Linked Data and the Social Web. The 2013 IEEE/WIC/ACM
International Conference on Web Intelligence, Atlanta, USA, United States, 2013.
• [SPIM 2011] Pavan Kapanipathi, Fabrizio Orlandi, Amit Sheth, Alexandre Passant.
Personalized Filtering of the Twitter Stream. 2nd workshop on Semantic Personalized
Information Management at ISWC 2011, September 2011.
• [ISWC 2011] Pavan Kapanipathi, Julia Anaya, Amit Sheth, Brett Slatkin, Alexandre Passant.
Privacy-Aware and Scalable Content Dissemination in Distributed Social Network. 10th
International Semantic Web Conference 2011, Bonn, Germany, September 2011. (acceptance
rate 22%)
61
Conclusion
Conclusion
Publications• [ISWCDEM 2011] Pavan Kapanipathi, Julia Anaya, Alexandre Passant . SemPuSH: Privacy-
Aware and Scalable Broadcasting for Semantic Microblogging. 10th International Semantic
Web Conference 2011,
• [FSWE 2011] Pavan Kapanipathi. SMOB: The Best of Both Worlds. Federated Social Web
Europe Conference, Berlin, June 3rd -5th 2011.
• [WEBSCI 2011] Alexandre Passant, Owen Sacco, Julia Anaya, Pavan Kapanipathi. Privacy-By-
Design in Federated Social Web Applications, Websci 2011, Koblenz, Germany June 14-17,
2011.
• [ISEM 2010] Pablo Mendes, Pavan Kapanipathi, Alexandre Passant. Twarql: Tapping into the
Wisdom of the Crowd. Triplification Challenge 2010 at 6th International Conference on
Semantic Systems (I-SEMANTICS), [WI 2010]
• [WI 2010] Pablo Mendes, Alexandre Passant, Pavan Kapanipathi, Amit Sheth. Linked Open
Social Signals.WI2010 IEEE/WIC/ACM International Conference on Web Intelligence (WI-10),
• [WEBSCI 2010] Pablo Mendes, Pavan Kapanipathi, Delroy Cameron, Amit Sheth. Dynamic
Associative Relationships on the Linked Open Data Web. In Proceedings of the WebSci10:
Extending the Frontiers of Society On-Line
• [TR-CNF 2016] Pavan Kapanipathi, Krishnaprasad Thirunarayan, Fabrizio Orlandi, Amit Sheth,
Pascal Hitzler. A Real-Time #approach for Continuous Crawling of Events on Twitter by
Leveraging Wikipedia. Technical Report.
• [TR-JRNL 2016] Pavan Kapanipathi, Siva Kumar, Derek Doran, Prateek Jain, Chitra
Venkataramani, Amit Sheth. Hierarchical Knowledge Base enabled Twitter User Modeling and
Recommendation. (Journal).
• [TR-CNFC 2016] Siva Kumar, Pavan Kapanipathi, Derek Doran, Prateek Jain, Amit Sheth.
Exploring Taxonomical Interests for Entity Recommendations. Technical report, 2015.
• [TR-CNFC 2016] Sarasi Sarangi, Pavan Kapanipathi, Amit Sheth. Domain-specific Sub graph
Generation. Technical report, 2015. 62
Conclusion
References
• [1] How Do People Use Social Media for Business/Finance News?
http://blog.marketwired.com/2013/11/12/how-do-people-use-social-media-for-businessfinance-news/
• [2] What is the role of social media in healthcare? http://worldofdtcmarketing.com/role-social-media-
healthcare/social-media-and-healthcare/
• [3] Social media use during disaster management http://www.emergency-management-degree.org/crisis/
• [Tao 2012] Tao, K., Abel, F., Gao, Q., and Houben, G.-J. (2012a). Tums: Twitter-based user
modeling service.
• [Ramage 2010] Ramage, D., Dumais, S., and Liebling, D. (2010). Characterizing microblogs with
topic models. AAAI’ 10.
• [Yan 2012] Yan, R., Lapata, M., and Li, X. (2012). Tweet recommendation with graph co-ranking. In
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics.
• [Duan 2010] Duan, Y., Jiang, L., Qin, T., Zhou, M., and Shum, H.-Y. (2010). An empirical study on
learning to rank of tweets. COLING ’10
• [Cremonesi 2010]Cremonesi, P., Koren, Y., and Turrin, R. (2010). Performance of recommender
algorithms on top-n recommendation tasks. RecSys2010
• [Sriram 2010] Sriram, B., Fuhry, D., Demir, E., Ferhatosmanoglu, H., and Demirbas, M. (2010).
Short text classification in twitter to improve information filtering. SIGIR ’10
• [Derczynsk 2013] Derczynski, L., Maynard, D., Aswani, N., and Bontcheva, K. (2013). Microblog-
genre noise and impact on semantic annotation accuracy. HT ’13,
• [Ferron 2011] Ferron, M. and Massa, P. (2011). Collective memory building in wikipedia: the case
of north african uprisings. WikiSys2011 63
Acknowledgements
64
Funding Agencies Internships and Collaborations
CITAR
Conclusion
Acknowledgements
65
Conclusion

Weitere ähnliche Inhalte

Was ist angesagt?

Netnography and Research Ethics: From ACR 2015 Doctoral Symposium
Netnography and Research Ethics: From ACR 2015 Doctoral SymposiumNetnography and Research Ethics: From ACR 2015 Doctoral Symposium
Netnography and Research Ethics: From ACR 2015 Doctoral SymposiumUniversity of Southern California
 
Netnography: Overview and How to (Schulich School of Business, MBA class, Soc...
Netnography: Overview and How to (Schulich School of Business, MBA class, Soc...Netnography: Overview and How to (Schulich School of Business, MBA class, Soc...
Netnography: Overview and How to (Schulich School of Business, MBA class, Soc...elpinchito
 
Information Visualization for Social Network Analysis,
 Information Visualization for Social Network Analysis,  Information Visualization for Social Network Analysis,
Information Visualization for Social Network Analysis, University of Maryland
 
Understanding Public Sentiment: Conducting a Related-Tags Content Network Ext...
Understanding Public Sentiment: Conducting a Related-Tags Content Network Ext...Understanding Public Sentiment: Conducting a Related-Tags Content Network Ext...
Understanding Public Sentiment: Conducting a Related-Tags Content Network Ext...Shalin Hai-Jew
 
Mobile and digital media literacy learning activity
Mobile and digital media literacy learning activityMobile and digital media literacy learning activity
Mobile and digital media literacy learning activityTara Conley
 
Netnography online course part 1 of 3 17 november 2016
Netnography online course part 1 of 3 17 november 2016Netnography online course part 1 of 3 17 november 2016
Netnography online course part 1 of 3 17 november 2016suresh sood
 
Lecture 6: How do we study the Social Web (2013)
Lecture 6: How do we study the Social Web  (2013)Lecture 6: How do we study the Social Web  (2013)
Lecture 6: How do we study the Social Web (2013)Lora Aroyo
 
Exploring Machine Learning for Libraries and Archives: Present and Future
Exploring Machine Learning for Libraries and Archives: Present and FutureExploring Machine Learning for Libraries and Archives: Present and Future
Exploring Machine Learning for Libraries and Archives: Present and FutureBohyun Kim
 
A Pedagogical Approach to Web Scale Discovery User Interface
A Pedagogical Approach to Web Scale Discovery User InterfaceA Pedagogical Approach to Web Scale Discovery User Interface
A Pedagogical Approach to Web Scale Discovery User InterfaceBohyun Kim
 
Eavesdropping on the Twitter Microblogging Site
Eavesdropping on the Twitter Microblogging SiteEavesdropping on the Twitter Microblogging Site
Eavesdropping on the Twitter Microblogging SiteShalin Hai-Jew
 
Emerging Trends in Crisis Informatics
Emerging Trends in Crisis InformaticsEmerging Trends in Crisis Informatics
Emerging Trends in Crisis InformaticsAdam Papendieck
 
SocialCapitalIraq-AcceleratedResearchImpact
SocialCapitalIraq-AcceleratedResearchImpactSocialCapitalIraq-AcceleratedResearchImpact
SocialCapitalIraq-AcceleratedResearchImpactJon Gresham, Ph.D.
 
Pragmatic ethical and fair AI for data scientists
Pragmatic ethical and fair AI for data scientistsPragmatic ethical and fair AI for data scientists
Pragmatic ethical and fair AI for data scientistsDavid Graus
 
DMPTool Overview for UC Merced Research Week
DMPTool Overview for UC Merced Research WeekDMPTool Overview for UC Merced Research Week
DMPTool Overview for UC Merced Research WeekCarly Strasser
 
Social Data and Multimedia Analytics for News and Events Applications
Social Data and Multimedia Analytics for News and Events ApplicationsSocial Data and Multimedia Analytics for News and Events Applications
Social Data and Multimedia Analytics for News and Events ApplicationsYiannis Kompatsiaris
 
Pratt Sils LIS653 4 Fall 2007
Pratt Sils LIS653 4 Fall 2007Pratt Sils LIS653 4 Fall 2007
Pratt Sils LIS653 4 Fall 2007PrattSILS
 
Virtual Ethnography: Bridging the Gap between Market Research and Social Media
Virtual Ethnography: Bridging the Gap between Market Research and Social MediaVirtual Ethnography: Bridging the Gap between Market Research and Social Media
Virtual Ethnography: Bridging the Gap between Market Research and Social MediaAlterian
 
Net Effectiveness Oct 6
Net Effectiveness Oct 6Net Effectiveness Oct 6
Net Effectiveness Oct 6dianascearce
 

Was ist angesagt? (20)

Netnography and Research Ethics: From ACR 2015 Doctoral Symposium
Netnography and Research Ethics: From ACR 2015 Doctoral SymposiumNetnography and Research Ethics: From ACR 2015 Doctoral Symposium
Netnography and Research Ethics: From ACR 2015 Doctoral Symposium
 
Netnography: Overview and How to (Schulich School of Business, MBA class, Soc...
Netnography: Overview and How to (Schulich School of Business, MBA class, Soc...Netnography: Overview and How to (Schulich School of Business, MBA class, Soc...
Netnography: Overview and How to (Schulich School of Business, MBA class, Soc...
 
James Robson - Digital and Online Ethnography
James Robson - Digital and Online EthnographyJames Robson - Digital and Online Ethnography
James Robson - Digital and Online Ethnography
 
Information Visualization for Social Network Analysis,
 Information Visualization for Social Network Analysis,  Information Visualization for Social Network Analysis,
Information Visualization for Social Network Analysis,
 
Understanding Public Sentiment: Conducting a Related-Tags Content Network Ext...
Understanding Public Sentiment: Conducting a Related-Tags Content Network Ext...Understanding Public Sentiment: Conducting a Related-Tags Content Network Ext...
Understanding Public Sentiment: Conducting a Related-Tags Content Network Ext...
 
Mobile and digital media literacy learning activity
Mobile and digital media literacy learning activityMobile and digital media literacy learning activity
Mobile and digital media literacy learning activity
 
Netnography online course part 1 of 3 17 november 2016
Netnography online course part 1 of 3 17 november 2016Netnography online course part 1 of 3 17 november 2016
Netnography online course part 1 of 3 17 november 2016
 
Lecture 6: How do we study the Social Web (2013)
Lecture 6: How do we study the Social Web  (2013)Lecture 6: How do we study the Social Web  (2013)
Lecture 6: How do we study the Social Web (2013)
 
Exploring Machine Learning for Libraries and Archives: Present and Future
Exploring Machine Learning for Libraries and Archives: Present and FutureExploring Machine Learning for Libraries and Archives: Present and Future
Exploring Machine Learning for Libraries and Archives: Present and Future
 
A Pedagogical Approach to Web Scale Discovery User Interface
A Pedagogical Approach to Web Scale Discovery User InterfaceA Pedagogical Approach to Web Scale Discovery User Interface
A Pedagogical Approach to Web Scale Discovery User Interface
 
Eavesdropping on the Twitter Microblogging Site
Eavesdropping on the Twitter Microblogging SiteEavesdropping on the Twitter Microblogging Site
Eavesdropping on the Twitter Microblogging Site
 
Emerging Trends in Crisis Informatics
Emerging Trends in Crisis InformaticsEmerging Trends in Crisis Informatics
Emerging Trends in Crisis Informatics
 
What is Netnography
What is NetnographyWhat is Netnography
What is Netnography
 
SocialCapitalIraq-AcceleratedResearchImpact
SocialCapitalIraq-AcceleratedResearchImpactSocialCapitalIraq-AcceleratedResearchImpact
SocialCapitalIraq-AcceleratedResearchImpact
 
Pragmatic ethical and fair AI for data scientists
Pragmatic ethical and fair AI for data scientistsPragmatic ethical and fair AI for data scientists
Pragmatic ethical and fair AI for data scientists
 
DMPTool Overview for UC Merced Research Week
DMPTool Overview for UC Merced Research WeekDMPTool Overview for UC Merced Research Week
DMPTool Overview for UC Merced Research Week
 
Social Data and Multimedia Analytics for News and Events Applications
Social Data and Multimedia Analytics for News and Events ApplicationsSocial Data and Multimedia Analytics for News and Events Applications
Social Data and Multimedia Analytics for News and Events Applications
 
Pratt Sils LIS653 4 Fall 2007
Pratt Sils LIS653 4 Fall 2007Pratt Sils LIS653 4 Fall 2007
Pratt Sils LIS653 4 Fall 2007
 
Virtual Ethnography: Bridging the Gap between Market Research and Social Media
Virtual Ethnography: Bridging the Gap between Market Research and Social MediaVirtual Ethnography: Bridging the Gap between Market Research and Social Media
Virtual Ethnography: Bridging the Gap between Market Research and Social Media
 
Net Effectiveness Oct 6
Net Effectiveness Oct 6Net Effectiveness Oct 6
Net Effectiveness Oct 6
 

Andere mochten auch

Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...Artificial Intelligence Institute at UofSC
 
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...Artificial Intelligence Institute at UofSC
 
Cartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defenseCartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defenseCartic Ramakrishnan
 
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...Artificial Intelligence Institute at UofSC
 
Kno.e.sis Approach to Impactful Research & Training for Exceptional Careers
Kno.e.sis Approach to Impactful Research & Training for Exceptional CareersKno.e.sis Approach to Impactful Research & Training for Exceptional Careers
Kno.e.sis Approach to Impactful Research & Training for Exceptional CareersAmit Sheth
 
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...Artificial Intelligence Institute at UofSC
 
Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...Amit Sheth
 

Andere mochten auch (20)

Automatic Emotion Identification from Text
Automatic Emotion Identification from TextAutomatic Emotion Identification from Text
Automatic Emotion Identification from Text
 
Knowledge-driven Implicit Information Extraction
Knowledge-driven Implicit Information ExtractionKnowledge-driven Implicit Information Extraction
Knowledge-driven Implicit Information Extraction
 
Ashutosh Jadhav PhD Defense: Knowledge Driven Search Intent Mining
Ashutosh Jadhav PhD Defense: Knowledge Driven Search Intent MiningAshutosh Jadhav PhD Defense: Knowledge Driven Search Intent Mining
Ashutosh Jadhav PhD Defense: Knowledge Driven Search Intent Mining
 
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
 
Mining and Analyzing Subjective Experiences in User-generated Content
Mining and Analyzing Subjective Experiences in User-generated ContentMining and Analyzing Subjective Experiences in User-generated Content
Mining and Analyzing Subjective Experiences in User-generated Content
 
Contrast Pattern Aided Regression and Classification
Contrast Pattern Aided Regression and ClassificationContrast Pattern Aided Regression and Classification
Contrast Pattern Aided Regression and Classification
 
A Semantics-based Approach to Machine Perception
A Semantics-based Approach to Machine PerceptionA Semantics-based Approach to Machine Perception
A Semantics-based Approach to Machine Perception
 
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...
 
Cartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defenseCartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defense
 
Satya Sahoo Thesis Defense
Satya Sahoo Thesis DefenseSatya Sahoo Thesis Defense
Satya Sahoo Thesis Defense
 
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...
 
PhD thesis defense of Ajith Ranabahu
PhD thesis defense of Ajith RanabahuPhD thesis defense of Ajith Ranabahu
PhD thesis defense of Ajith Ranabahu
 
Prateek Jain's Dissertation Defense - Linked Open Data Alignment and Querying
Prateek Jain's Dissertation Defense - Linked Open Data Alignment and QueryingPrateek Jain's Dissertation Defense - Linked Open Data Alignment and Querying
Prateek Jain's Dissertation Defense - Linked Open Data Alignment and Querying
 
2015 Kno.e.sis Center Annual Review
2015 Kno.e.sis Center Annual Review2015 Kno.e.sis Center Annual Review
2015 Kno.e.sis Center Annual Review
 
Trust Management: A Tutorial
Trust Management: A TutorialTrust Management: A Tutorial
Trust Management: A Tutorial
 
Web and Complex Systems Lab @ Kno.e.sis
Web and Complex Systems Lab @ Kno.e.sisWeb and Complex Systems Lab @ Kno.e.sis
Web and Complex Systems Lab @ Kno.e.sis
 
Kno.e.sis Approach to Impactful Research & Training for Exceptional Careers
Kno.e.sis Approach to Impactful Research & Training for Exceptional CareersKno.e.sis Approach to Impactful Research & Training for Exceptional Careers
Kno.e.sis Approach to Impactful Research & Training for Exceptional Careers
 
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
 
Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...
 
Kno.e.sis Review: late 2012 to mid 2013
Kno.e.sis Review: late 2012 to mid 2013Kno.e.sis Review: late 2012 to mid 2013
Kno.e.sis Review: late 2012 to mid 2013
 

Ähnlich wie Personalized and Adaptive Semantic Information Filtering for Social Media - Pavan Kapanipathi's Defense

Science and Social Media: The Importance of Being Online
Science and Social Media: The Importance of Being OnlineScience and Social Media: The Importance of Being Online
Science and Social Media: The Importance of Being OnlineChristie Wilcox
 
Words and More Words: Challenges of Big Data by Prof. Edie Rasmussen
Words and More Words: Challenges of Big Data by Prof. Edie RasmussenWords and More Words: Challenges of Big Data by Prof. Edie Rasmussen
Words and More Words: Challenges of Big Data by Prof. Edie Rasmussenwkwsci-research
 
Tracking Social Media Participation: New Approaches to Studying User-Genera...
Tracking Social Media Participation: New Approaches to Studying User-Genera...Tracking Social Media Participation: New Approaches to Studying User-Genera...
Tracking Social Media Participation: New Approaches to Studying User-Genera...Axel Bruns
 
Tracking Social Media Participation: New Approaches to Studying User-Genera...
Tracking Social Media Participation: New Approaches to Studying User-Genera...Tracking Social Media Participation: New Approaches to Studying User-Genera...
Tracking Social Media Participation: New Approaches to Studying User-Genera...Axel Bruns
 
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the Map
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the MapNew Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the Map
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the MapAxel Bruns
 
Principles ofnewssocialmedia overview(1)
Principles ofnewssocialmedia overview(1)Principles ofnewssocialmedia overview(1)
Principles ofnewssocialmedia overview(1)klstar1
 
Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...
Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...
Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...Dataconomy Media
 
How to Identify and Track Your Social Community
How to Identify and Track Your Social CommunityHow to Identify and Track Your Social Community
How to Identify and Track Your Social CommunityEric Athas
 
Flicc Institute for Library Technicians 2011 @ the Library of Congress
Flicc Institute for Library Technicians 2011 @ the Library of CongressFlicc Institute for Library Technicians 2011 @ the Library of Congress
Flicc Institute for Library Technicians 2011 @ the Library of CongressAileen Marshall
 
Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...
Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...
Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...Artificial Intelligence Institute at UofSC
 
Citizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and ApplicationsCitizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and ApplicationsAmit Sheth
 
Leslie Johnston Keynote, Best Practices Exchange 2011
Leslie Johnston Keynote, Best Practices Exchange 2011Leslie Johnston Keynote, Best Practices Exchange 2011
Leslie Johnston Keynote, Best Practices Exchange 2011lljohnston
 
Hack Democracy San Francisco meetup #1 - intro
Hack Democracy San Francisco meetup #1 - introHack Democracy San Francisco meetup #1 - intro
Hack Democracy San Francisco meetup #1 - introhackdemocracy
 
Analytic Journalism: Digital Evolution in the Datasphere
Analytic Journalism: Digital Evolution in the DatasphereAnalytic Journalism: Digital Evolution in the Datasphere
Analytic Journalism: Digital Evolution in the DatasphereJ T "Tom" Johnson
 
The web of where: How location is being woven into the web
The web of where: How location is being woven into the webThe web of where: How location is being woven into the web
The web of where: How location is being woven into the webKevin Anderson
 
Chapter_1_Gift_of_Fire(6).ppt
Chapter_1_Gift_of_Fire(6).pptChapter_1_Gift_of_Fire(6).ppt
Chapter_1_Gift_of_Fire(6).pptdaniloalbay1
 

Ähnlich wie Personalized and Adaptive Semantic Information Filtering for Social Media - Pavan Kapanipathi's Defense (20)

Social Media Data Mining Services - 3i Data Scraping
Social Media Data Mining Services - 3i Data Scraping Social Media Data Mining Services - 3i Data Scraping
Social Media Data Mining Services - 3i Data Scraping
 
Science and Social Media: The Importance of Being Online
Science and Social Media: The Importance of Being OnlineScience and Social Media: The Importance of Being Online
Science and Social Media: The Importance of Being Online
 
Words and More Words: Challenges of Big Data by Prof. Edie Rasmussen
Words and More Words: Challenges of Big Data by Prof. Edie RasmussenWords and More Words: Challenges of Big Data by Prof. Edie Rasmussen
Words and More Words: Challenges of Big Data by Prof. Edie Rasmussen
 
Tracking Social Media Participation: New Approaches to Studying User-Genera...
Tracking Social Media Participation: New Approaches to Studying User-Genera...Tracking Social Media Participation: New Approaches to Studying User-Genera...
Tracking Social Media Participation: New Approaches to Studying User-Genera...
 
Tracking Social Media Participation: New Approaches to Studying User-Genera...
Tracking Social Media Participation: New Approaches to Studying User-Genera...Tracking Social Media Participation: New Approaches to Studying User-Genera...
Tracking Social Media Participation: New Approaches to Studying User-Genera...
 
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the Map
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the MapNew Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the Map
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the Map
 
Social Media Strategies For Teams
Social Media Strategies For TeamsSocial Media Strategies For Teams
Social Media Strategies For Teams
 
Principles ofnewssocialmedia overview(1)
Principles ofnewssocialmedia overview(1)Principles ofnewssocialmedia overview(1)
Principles ofnewssocialmedia overview(1)
 
Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...
Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...
Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...
 
How to Identify and Track Your Social Community
How to Identify and Track Your Social CommunityHow to Identify and Track Your Social Community
How to Identify and Track Your Social Community
 
Flicc Institute for Library Technicians 2011 @ the Library of Congress
Flicc Institute for Library Technicians 2011 @ the Library of CongressFlicc Institute for Library Technicians 2011 @ the Library of Congress
Flicc Institute for Library Technicians 2011 @ the Library of Congress
 
Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...
Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...
Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...
 
Citizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and ApplicationsCitizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and Applications
 
Leslie Johnston Keynote, Best Practices Exchange 2011
Leslie Johnston Keynote, Best Practices Exchange 2011Leslie Johnston Keynote, Best Practices Exchange 2011
Leslie Johnston Keynote, Best Practices Exchange 2011
 
Hack Democracy San Francisco meetup #1 - intro
Hack Democracy San Francisco meetup #1 - introHack Democracy San Francisco meetup #1 - intro
Hack Democracy San Francisco meetup #1 - intro
 
Analytic Journalism: Digital Evolution in the Datasphere
Analytic Journalism: Digital Evolution in the DatasphereAnalytic Journalism: Digital Evolution in the Datasphere
Analytic Journalism: Digital Evolution in the Datasphere
 
Meraz
MerazMeraz
Meraz
 
The web of where: How location is being woven into the web
The web of where: How location is being woven into the webThe web of where: How location is being woven into the web
The web of where: How location is being woven into the web
 
Social Media
Social MediaSocial Media
Social Media
 
Chapter_1_Gift_of_Fire(6).ppt
Chapter_1_Gift_of_Fire(6).pptChapter_1_Gift_of_Fire(6).ppt
Chapter_1_Gift_of_Fire(6).ppt
 

Kürzlich hochgeladen

A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 

Kürzlich hochgeladen (20)

A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 

Personalized and Adaptive Semantic Information Filtering for Social Media - Pavan Kapanipathi's Defense

  • 1. Personalized and Adaptive Semantic Information Filtering for Social Media Pavan Kapanipathi, PhD Candidate Kno.e.sis Center, Wright State University Committee: Drs. Amit Sheth (Advisor), Krishnaprasad Thirunarayan, Derek Doran, and Prateek Jain Ohio Center of Excellence in Knowledge-Enabled Computing
  • 3. Information Consumption on Social Media • Updates of Friends and Acquaintances 3 Introduction
  • 4. Information Consumption on Social Media • Updates of Friends and Acquaintances • News [1] – 86% of Twitter users surveyed 4 Introduction
  • 5. Information Consumption on Social Media • Updates of Friends and Acquaintances • News [1] – 86% of Twitter users surveyed • Medical Information [2] – 1 in 3 use social media 5 Introduction
  • 6. Information Consumption on Social Media • Updates of Friends and Acquaintances • News [1] – 86% of Twitter users surveyed • Medical Information [2] – 1 in 3 use social media • Disaster Management [3] – 20 million tweets on Hurricane Sandy – Most crisis management agencies monitor social media 6 Introduction
  • 7. Information Overload on Social Media • Users often complain of getting overwhelmed with the information on social media • 5 billion posts per day – Real-time information • 1000+ in my social network 7 “...a wealth of information creates a poverty of attention...” Herbert A. Simon Introduction
  • 8. Need for Information Filtering • Scenario – Address information overload – Enormous data stream has to be filtered • Information Filtering Systems – Emails, News, and Blogs – Functionality • Understand user interests • Deliver relevant information 8 Introduction
  • 9. Traditional Information Filtering 9 User Interest Identification/User Modeling Filtering Module Streaming Data User Generated Content Filtered Data Hanani, Uri, Bracha Shapira, and Peretz Shoval. "Information filtering: Overview of issues, research and systems." User Modeling and User-Adapted Interaction 11.3 (2001): 203-259.Introduction
  • 10. Traditional Information Filtering 10 User Interest Identification/User Modeling Filtering Module Streaming Data User Generated Content Filtered Data Hanani, Uri, Bracha Shapira, and Peretz Shoval. "Information filtering: Overview of issues, research and systems." User Modeling and User-Adapted Interaction 11.3 (2001): 203-259. NBA Basketball Sports Relevance: 0.9 Introduction
  • 11. Challenges 1. Lack of Context • Lack of context for processing short-text – Short-Text • Average length of social media posts (Facebook, Twitter, Google+, etc.) are 100-160 characters • Identifying topics from short-text is important – We can infer the author’s interest and deliver the tweet to interested users in the topic – Traditional techniques are shown to have not perform well on social media [Sriram 2010, Derczynski 2013] 11 Great day for Chicago sports as well as Cubs beat the Reds, Sox beat the Mariners with Humber’s perfect game.
  • 12. Introduction Challenges 2. Continuously Changing Vocabulary • Social media is a real-time platform with information about latest activities in the real-world • Hurricane Sandy – Mitigation, preparedness, recovery, and response phases – #Frankenstorm and #Sandy, at the start, to #StaySafe and #RedCross during the disaster and #ThanksSandy and #RestoreTheShore after the hurricane • Indian Elections – the announcement of prime ministerial candidates, issues regarding corruptions, and polls in different states – #modikisarkar, #NaMo, #VoteForRG, and #CongBJPQuitIndia 12 Civil Unrest Election Natural Disaster
  • 13. Challenges 3. Scalability • Practical aspects of the filtering system • Popularity of social media is increasing – Facebook has more than 1 billion users – Twitter has more than 500 million users • Disseminate information to a huge set of users – Centralized disseminating systems either overload the client of the server. (Push or Pull model) 13 Introduction
  • 14. Introduction Knowledge Bases • A common theme across the methodologies developed is the use of background knowledge and Semantic Web technologies. • Background knowledge to process short-text leverage knowledge bases 14 “If a program is to perform a complex task well, it must know a great deal about the world in which it operates.” Lenat & Feigenbaum Great day for Chicago sports as well as Cubs beat the Reds, Sox beat the Mariners with Humber’s perfect game. BaseballJason Herward Kris Bryant Chicago Cubs Sports
  • 15. Wikipedia as a Knowledge Base • Requirements for a Knowledge base to be used for filtering social data – Diversity and Comprehensiveness: Large set of diverse users on social media such as Twitter and Facebook – Real-time updates: Social media is a real-time platform the discusses dynamic topics • Wikipedia as the Knowledge base – Semi structured – Extract the structure – Diverse: Collaborative effort of 80,000 users with 5 million articles – Near real-time updates with unbiased views on topics [Ferron 2011] 15 Introduction
  • 16. Thesis Statement 16 To build an effective information filtering system, background knowledge and Semantic Web technologies can be used to address lack of context, dynamic changing vocabulary and scalability challenges introduced by social media’s short-text and real-time nature. Introduction
  • 17. Outline • Short-Text: Lack of context for processing – Hierarchical Interest Graphs – Built a hierarchical context for tweets leveraging Wikipedia category structure. This hierarchical context is utilized for user modeling and recommendations. – Publications [ESWC 2014, WWWCOMP 2014, TR-JRNL 2016] • Real-time and dynamic nature: Continuously changing vocabulary – A novel methodology that utilizes the evolving Wikipedia hyperlink structure to detect topic-relevant hashtags for continuous filtering – Publications [TR-CNF 2016, ESWC 2015] • Popularity: Scalability – Scalable distributed dissemination system that utilizes Sematic Web technologies. – Publications [ISWC 2011, SPIM 2011, ISWCDEM 2011] 17 Introduction
  • 18. Outline • Short-Text: Lack of context for processing – Hierarchical Interest Graphs – Built a hierarchical context for tweets leveraging Wikipedia category structure. This hierarchical context is utilized for user modeling and recommendations. • Real-time and Dynamic Nature: Continuously Changing Vocabulary – A novel methodology that utilizes the evolving Wikipedia hyperlink structure • Popularity: Scalability – Scalable distributed dissemination system that utilizes Sematic Web technologies. 18 Lack of context
  • 19. Baseball • User generated content is processed to understand user interests and filtering – Tweets are used for these experiments • Wikipedia category structure comprises taxonomical information that can be leveraged – Build context for short text for user interest identification Processing Short-text for User Interest Identification 19 Great day for Chicago sports as well as Cubs beat the Reds, Sox beat the Mariners with Humber’s perfect game. “You are what you share” Charles W. Leadbeater Lack of context ESWC 2014
  • 20. Content Based User Interests Identification from Social Data 20Semantics Term Frequency Based Techniques Lower Dim Space as latent semantics Entity Based Techniques [Tao 2012][Ramage 2010] Great day for Chicago sports as well as Cubs beat the Reds, Sox beat the Mariners with Humber’s perfect game. Not sure who the Reds will look too replace Dusty.some very interesting jobs open (Cubs, Mariners, Reds, poss Yanks) Girardi the domino sports [Yan 2012] Term Freq great 1 day 1 sports 2 cubs 2 … Dim Dist 1dim 0.3 2dim 0.2 3dim 0.2 4dim 0.1 5dim 0.4 Wiki-Entities Freq Chicago Cubs 2 Cinci Reds 2 White Sox 1 NY Yankees 1 … Knowledge Enabled Approaches Lack of context ESWC 2014
  • 21. Implicit Information from Social Data 21 BroaderRelated Interests Major League Baseball Major League Baseball Teams Baseball Great day for Chicago sports as well as Cubs beat the Reds, Sox beat the Mariners with Humber’s perfect game. Not sure who the Reds will look too replace Dusty.some very interesting jobs open (Cubs, Mariners, Reds, poss Yanks) Girardi the domino San Francisco Giants Oakland Athletics Baseball Organizations Lack of context ESWC 2014
  • 22. 22 BroaderRelated Interestsfrom WikipediaCategory Structure Major League Baseball Major League Baseball Teams Baseball Great day for Chicago sports as well as Cubs beat the Reds, Sox beat the Mariners with Humber’s perfect game. Not sure who the Reds will look too replace Dusty.some very interesting jobs open (Cubs, Mariners, Reds, poss Yanks) Girardi the domino Methodology: Structured Hierarchical Knowledge 0.6 1.0 0.3 0.3 Seattle Mariners White Sox Cincinnati Reds Chicago Cubs Transformed Wikipedia Category Structure to a Wikipedia Hierarchy Lack of context ESWC 2014
  • 23. 23 SpreadingActivation Major League Baseball Major League Baseball Teams Baseball Great day for Chicago sports as well as Cubs beat the Reds, Sox beat the Mariners with Humber’s perfect game. Not sure who the Reds will look too replace Dusty.some very interesting jobs open (Cubs, Mariners, Reds, poss Yanks) Girardi the domino Methodology: Scoring the Inferred Hierarchical Knowledge 0.6 1.0 0.3 0.3 Seattle Mariners White Sox Cincinnati Reds Chicago Cubs 0.5 0.4 0.1 Lack of context ESWC 2014
  • 24. Designing an Activation Function • Design parameters to adapt to the structure of Wikipedia Hierarchy – Uneven distribution of nodes in the hierarchy • 16 hierarchical levels – most categories between 5-9 hierarchical level – Raw Normalization 𝐹𝑛𝑖 = 1 𝑛𝑜𝑑𝑒𝑠(𝑖+1) – Log Normalization 𝐹𝐿 𝑛𝑖 = 1 𝑙𝑜𝑔10 𝑛𝑜𝑑𝑒𝑠(𝑖+1) – Many-many for category-subcategory relationships • Boston Red Sox – Major League Baseball Teams , 1901 Establishments in Massachusetts – Preferential Path Constraint 𝑃𝑖𝑗= 1 𝑝𝑟𝑖𝑜𝑟𝑖𝑡𝑦𝑗𝑖 – Boosting common ancestors • More entities activating the concept, better is its importance – Intersect Booster 𝐵𝑖 = 𝑁𝑒𝑖 𝑁𝑒𝑖𝑐𝑚𝑎𝑥 24 Lack of context ESWC 2014
  • 25. Activation Functions • Bell (Raw Normalization) 𝐴𝑗 = 𝐴𝑖 × 𝐹𝑗 𝑛 𝑖=0 • Bell Log (Log Normalization) 𝐴𝑗 = 𝐴𝑖 × 𝐹𝐿𝑗 𝑛 𝑖=0 • Priority Intersect (Log Normalization , Preferential Path, Intersect Booster) 𝐴𝑗 = 𝐴𝑖 × 𝐹𝐿𝑗 × 𝑃𝑗𝑖 × 𝐵𝑗 𝑛 𝑖=0 25 i is the child node j is the category Ai is the activated value of i Lack of context ESWC 2014
  • 26. 26 ActivationFunctions Major League Baseball Major League Baseball Teams Baseball Great day for Chicago sports as well as Cubs beat the Reds, Sox beat the Mariners with Humber’s perfect game. Not sure who the Reds will look too replace Dusty.some very interesting jobs open (Cubs, Mariners, Reds, poss Yanks) Girardi the domino Hierarchical Interest Graph 0.6 1.0 0.3 0.3 Seattle Mariners White Sox Cincinnati Reds Chicago Cubs 0.5 0.4 0.1 BELL BELL LOG PRIORITY INTERSECT Lack of context ESWC 2014
  • 27. Hierarchical Interest Graph Evaluation – User Study Tweets Entities Distinct Entities Categories in HIG 37 31,927 29,146 13,150 111,535 27 Users Tweets Distribution Lack of context ESWC 2014
  • 28. Evaluation Results of Hierarchical Interests 28 Graded Precision Mean Average Precision Relevant Irrelevant Maybe k Bell Bell Log Priority Intersect Bell Bell Log Priority Intersect Bell Bell Log Priority Intersect 10 0.53 0.67 0.76 0.34 0.23 0.16 0.13 0.10 0.08 20 0.54 0.66 0.72 0.34 0.22 0.19 0.12 0.12 0.09 30 0.53 0.64 0.69 0.34 0.24 0.21 0.13 0.12 0.10 40 0.52 0.61 0.68 0.35 0.26 0.22 0.13 0.13 0.10 50 0.52 0.61 0.67 0.36 0.28 0.24 0.12 0.11 0.09 k Bell Bell Log Priority Intersect 10 0.64 0.72 0.88 20 0.61 0.7 0.82 30 0.59 0.69 0.79 40 0.58 0.68 0.77 50 0.57 0.67 0.75 Numbers in Bold portray better performance Lack of context ESWC 2014
  • 29. On this day in 1934, Major League Baseball announced it would host its first night games Great day for Chicago sports as well as Cubs beat the Reds, Sox beat the Mariners with Humber’s perfect game, Bulls win and Hawks stay alive Implicit Interests Evaluation • Implicit interests are categories of interest that were not explicitly mentioned in tweets but inferred from the knowledge- base 29 Category: Major League Baseball Explicit Implicit Lack of context ESWC 2014
  • 30. Summary Hierarchical Interest Graphs • Addressed the “Lack of Context” challenge in tweets using Hierarchical Knowledge base. – More than 70% of hierarchical interests are implicit. • A new way to represent Twitter user interests – Hierarchical Interest Graph with interest scores at each nodes – Activation Function (models) to determine interest scores What’s the use? 30 Lack of context ESWC 2014
  • 31. HIG-based Tweet Recommendation Approach 31 Incoming Tweet Semantic Web: 0.2 World Wide Web: 0.09 Ontology: 0.7 Technology: 0.01 Semantic Search: 0.3 World Wide Web: 0.9 Technology: 0.7 Sports: 0.6 Baseball: 0.4 India: 0.2 United States: 0.2 Semantic Web: 0.2 Pearson Correlation Recommend Y/N? Lack of context TR-JRNL 2016
  • 32. Content-based Tweet Recommendation Approaches • Term Frequency based approaches – User profiles: Built on scoring important terms • TF, TF-IDF • Entity Frequency [Tao 2012] – User profiles: Built on scoring important entities • Wikipedia Entities • Extracted using Zemanta • Support Vector Machines (SVMrank) [Duan 2010] – User Models built using content and tweet based features – Tweet content features: Similarity to users tweets, similarity of hashtags, tweet length, mention of URLs, mention of hashtags. • Latent Dirichlet Allocation [Ramage 2010] – User profiles: Distribution of 5 latent topics. 32 Lack of context TR-JRNL 2016
  • 33. Experimental Setup • Utilized the same dataset from the user study • Training and testing datasets using two assumptions – Tweets what users share are interesting to them and can be recommended (UGC Assumption) • 80% to create user profiles • 20% (~6,000) to test recommendation – Retweets of users are interesting to them and can be recommended (Retweet Assumption and is more popular in literature) • 30% (~9,000) were retweets, hence used to test recommendation • 70% to create user profiles 33 Users Tweets Entities 37 31,927 29,146 Lack of context TR-JRNL 2016
  • 34. Evaluation Methodology • Transformed to a top-N recommendation evaluation – Popular top-N evaluation methodology by Cremonesi et al. [Cremonesi 2010] for Precision/Recall • Methodology – For every test tweet – pick random 1000 tweets not tweeted/retweeted by the author of the test tweet • Random tweets are considered to be irrelevant to the user – Score and rank the test tweet with the 1000 random tweets using the recommendation algorithm • TF, TFIDF, Entity-based, SVMrank, LDA, and HIG – If the test tweet is within the top-N, its considered to be a hit otherwise not ( T is the total number of test tweets) 𝑟𝑒𝑐𝑎𝑙𝑙 = ℎ𝑖𝑡𝑠 𝑇 34 Lack of context TR-JRNL 2016
  • 35. Retweet Assumption Evaluation Results • Term frequency performs the best for recommending retweets tweets [Ramage et al 2010] 35 Lack of context TR-JRNL 2016
  • 36. UGC Assumption Evaluation Results • HIG performed better for most top-N but at Top-20 TF- based approaches performed better. 36 Lack of context TR-JRNL 2016
  • 37. Lack of context Content + Knowledge based Approach • TF performed the best in content based approaches • Merged TF and HIG which augments content with knowledge bases and recommend using Pearson Correlation 37 World Wide Web: 0.4 Technology: 0.007 Sports: 0.06 Baseball: 0.34 India: 0.102 United States: 0.2 Semantic Web: 0.2 world: 3 great: 10 cricket: 24 slim: 13 good: 40 united: 34 states: 30 T F H I G NORMALIZED world: 0.075 great: 0.25 cricket: 0.6 slim: 0.325 good: 1 united: 0.85 states: 0.75 World Wide Web: 1 Technology: 0.017 Sports: 0.15 Baseball: 0.85 India: 0.25 United States: 0.5 Semantic Web: 0.5 MERGED world: 0.075 great: 0.25 cricket: 0.6 slim: 0.325 good: 1 united: 0.85 states: 0.75 World Wide Web: 1 Technology: 0.017 Sports: 0.15 Baseball: 0.85 India: 0.25 United States: 0.5 Semantic Web: 0.5 TR-JRNL 2016
  • 38. Retweet Assumption Evaluation Results • TF + HIG performs the best and provides an improvement of more than 40% at top-20 38 Lack of context TR-JRNL 2016
  • 39. UGC Assumption Evaluation Results • TF + HIG performs the best and provides an improvement of more than 20% at top-20 39 Lack of context TR-JRNL 2016
  • 40. Summary Hierarchical Interest Graphs • A new way to represent Twitter user Interests – Hierarchy Interest Graphs • Addressed the “Lack of Context” challenge in tweets using hierarchical knowledge base. • HIG (knowledge base) augments content to provide superior performance for tweet recommendation. 40 Lack of context TR-JRNL 2016
  • 41. Outline • Short-Text: Lack of context for processing – Augmented content with hierarchical knowledge from Wikipedia • 70% of the top-50 interests were implicit (not mentioned in users’ tweets) • Improved content based tweet recommendation by more than 40%. • Real-time and Dynamic Nature: Continuously Changing Vocabulary – A novel methodology that utilizes the evolving Wikipedia hyperlink structure to update filters for streaming topic-relevant information • Popularity: Scalability – Scalable distributed dissemination system that utilizes Sematic Web technologies. 41 Lack of context
  • 42. Outline • Short-Text: Lack of context for processing – Augmented content with hierarchical knowledge from Wikipedia • 70% of the top-50 interests were implicit (not mentioned in users’ tweets) • Improved tweet recommendation by more than 40%. • Real-time and Dynamic Nature: Continuously Changing Vocabulary – A novel methodology that utilizes the evolving Wikipedia hyperlink structure to update filters for streaming topic-relevant information • Popularity: Scalability – Scalable distributed dissemination system that utilizes Sematic Web technologies. 42 Dynamic vocabulary
  • 43. • Dynamic topics of interest that continuously evolve over time – Indian Elections • the announcement of prime ministerial candidates, issues regarding corruptions, and polls in different states – Hurricane Sandy • Mitigation, preparedness, recovery, and response phases Social media: Real-time and Dynamic Platform 43 Indian Election Hurricane Sandy Dynamic vocabulary TR-CNF 2016
  • 44. • Keyword-based filtering – Twitter streaming API • Keywords are dynamically changing based on the happenings in the real-world – Necessary to track these keywords to be up-to-date regarding the topic of interest Filtering Dynamic Topics on Social Media 44 #indianelection #sandy #modikisarkar, #NaMo, #VoteForRG, and #CongBJPQuitIndia #Frankenstorm ,#Sandy, #RedCross, #RestoreTheShore Dynamic vocabulary TR-CNF 2016
  • 45. Topic-relevant hashtags that can be used to crawl all the tweets co-occur with each other (1) Colorado Shooting (2) Occupy Wall Street Analysis with over 6 million tweets Hindsight Analysis of Topic-relevant Hashtags 45 <1% of the topic-relevant hashtags can crawl up to 85% of the tweets Dynamic vocabulary TR-CNF 2016
  • 46. Approach for Detecting Topic- Relevant Hashtags 46 Co-occurring: Threshold δ #indianelection2014 #modikisarkar Manually started filter Indian General Election,_2014 Dynamically Updated Background Knowledge One hop from Topic Page Entity scoring based on relevance to the Event Indian General Elec: 1.0 India: 0.9 Elections: 0.7 UPA: 0.6 BJP: 0.3 NDA: 0.3 Narendra Modi: 0.3 Narendra Modi: 0.9 BJP: 0.7 NDA: 0.6 India: 0.4 Elections: 0.2 Rahul Gandhi: 0.2 Congress: 0.2 Entity Extraction and Scoring Normalized Frequency Scoring Latest K (200,500) Similarity Check Extract, Periodically Update Hyperlink structure Dynamic vocabulary TR-CNF 2016
  • 47. • Dataset – 2 Dynamic topics – 2012 U S Presidential Elections – Hurricane Sandy • δ – Top 25 co-occurring hashtags – Manual annotation for relevance Evaluation 47 Event Tags Tweets Co-occ Tags (Distinct) Wiki Entities US Elections 2012 #election2012 4,855 12,361 (1,460) 614 Hurricane Sandy #sandy 4,818 6,592 (837) 419 Event Tags Tweets (Distinct) Relevant Irrelevant Tweets Entities US Elections 2012 25 11,504 (10,084) 7,086 2,998 27,558 (4255) Hurricane Sandy 25 4,905 (4,850) 2,691 2,159 10,719 (2359) Total 50 15,409 1,4934 9,777 38,219 Dynamic vocabulary TR-CNF 2016
  • 48. Evaluation Results 48 Hurricane Sandy 2012 U S Presidential Elections Subsumption Cosine Jaccard Cooccurance Subsumption Cosine Jaccard Cooccurance 𝑁𝐷𝐶𝐺10 0.93 0.86 0.85 0.65 0.91 0.85 0.89 0.83 𝑁𝐷𝐶𝐺20 0.97 0.93 0.92 0.89 0.98 0.95 0.97 0.94 NDCG MAP Dynamic vocabulary TR-CNF 2016
  • 49. • Hashtag analysis – Co-occurrence technique can be used to detect event relevant hashtags – More popular hashtags are easier to be detected via co-occurrence • Continuously changing vocabulary for dynamic topics and coverage – Wikipedia as a dynamic knowledge-base for events – Determining relevant hashtags using asymmetric similarity measure – More hashtags in turn increase the coverage of tweets for events • Content-based location prediction of Twitter users (ESWC 2015) – Similar framework of relevancy detection was used for location prediction Dynamic Hashtag Filter 49 Dynamic vocabulary TR-CNF 2016
  • 50. Outline • Short-Text: Lack of context for processing – Augmented content with hierarchical knowledge from Wikipedia • 70% of the top-50 interests were implicit (not mentioned in users’ tweets) • Improved content based tweet recommendation by more than 40%. • Real-time and Dynamic Nature: Continuously Changing Vocabulary – Hindsight analysis insight: co-occurrence can be used as a starting point – Utilized Wikipedia as an evolving knowledge base for dynamic topics • top-5 detected, increased the coverage by more than 3,500 tweets instantly with a mean average precision of 0.92 • Popularity: Scalability – Scalable distributed dissemination system that utilizes Sematic Web technologies. 50 Dynamic vocabulary
  • 51. Outline • Short-Text: Lack of context for processing – Augmented content with hierarchical knowledge from Wikipedia • 70% of the top-50 interests were implicit (not mentioned in users’ tweets) • Improved content based tweet recommendation by more than 40%. • Real-time and Dynamic Nature: Continuously Changing Vocabulary – Hindsight analysis insight: co-occurrence can be used as a starting point – Utilized Wikipedia as an evolving knowledge base for dynamic topics • top-5 detected, increased the coverage by more than 3,500 tweets instantly with a mean average precision of 0.92 • Popularity: Scalability – Scalable distributed dissemination system that utilizes Sematic Web technologies. 51 Scalability
  • 52. Content Dissemination • Centralize content dissemination suffers from scalability issues – Server (publisher) or the Client (subscriber) are overwhelmed – Server for Push and Client for Pull • Distributed dissemination protocol – Pubsubhubbub • Introduced by Google in 2009 • 117 million users and 5.5 billion posts broadcasted by 2011 52 Scalability ISWC 2011
  • 53. • PubSubHubbub – Simple, Open, web-hook based pubsub protocol – Extension to RSS, Atom. 535353 Publisher SubscriberHub I have new content for feed X Give me the latest content for feed X Here it is Subscriber Subscriber Subscriber Subscriber Here is the latest content for feed X Scalability ISWC 2011
  • 54. 54 PubSubHubbub Protocol Extension Pub Sub - A Sub - B Sub - C Sub - D Hey I have new content for feed topics/preference Social Graph and User Profiles Get the subscribers of Pub whose profile matches topic/preference Here is the new content of feed X Give me the new content Here it is Semantic Hub Scalability ISWC 2011
  • 55. Publisher – Social Data Annotation • Preliminary processing of text for filtering – Information extraction (entities, hashtags, urls, etc.) • Representing as RDF using vocabulary used by SMOB – Comprises • SPARQL Queries representing the subset of subscribers from the Social Graph in the hub 55 Scalability <http://twitter.com/rob/statuses/123456789> rdf:type sioct:MicroblogPost ; sioc:content "Great day for Chicago sports as well as Cubs beat the Reds, Sox beat the Mariners with Humber’s perfect game #chicago“ ;• sioc:has_creator <http://example.com/rob> ; moat:taggedWith dbpedia:Chicago ; moat:taggedWith dbpedia:Chicago_Cubs ; moat:taggedWith dbpedia:Cincinnati_Reds ; sioc:topic <http://example.com/tags/chicago> . ISWC 2011
  • 56. Semantic Hub • Performs the matching of processed post to user profiles – Flexible to different matching techniques • Pearson correlation or other similarity measures • Delivers information to relevant subscribers. 56 Scalability SELECT ?user WHERE { { ?user foaf:interest dbpedia:Chicago } UNION { ?user foaf:interest dbpedia:Chicago_Cubs } UNION { ?user foaf:interest dbpedia:Cincinnati_Reds } } ISWC 2011
  • 57. Semantic Hub: Conclusion • Framework for distributed dissemination of content using PubSubHubbub – Hub takes the load of the filtering module and dissemination of content • PubSubHubbub – 117 million subscriptions by 2011 – 5.5 billion unique feeds by 2011 • Semantic Hub – Privacy-aware dissemination for distributed social networks – Real-time filtering 57 Scalability ISWC 2011
  • 58. • To build an effective information filtering system, background knowledge and Semantic Web technologies can be used to address lack of context, dynamic changing vocabulary and scalability challenges introduced by social media’s short-text and real-time nature. – Augmented content with hierarchical knowledge from Wikipedia to improve context of short-text • 70% of the top-50 interests were implicit (not mentioned in users’ tweets) • Improved content based tweet recommendation by more than 40%. – Utilized Wikipedia as an evolving knowledge base for dynamic topics to detect topic-descriptors for filtering • Hindsight analysis insight: co-occurrence can be used as a starting point • top-5 detected, increased the coverage by more than 3,500 tweets instantly with a mean average precision of 0.92 – Extended PubSubHubbub, a distributed content dissemination protocol with Semantic Web technologies for filtering and dissemination 58 Conclusion Thesis Conclusion
  • 59. Graduate Journey • Hierarchical Interest Graphs – Internship work – IBM TJ Watson Research Center 2013 • Location Prediction of Twitter users – Alleviates the dependence on training data • Determining Twitter User Hobbies – Internship work – Samsung Research America 2014 (Patent Pending) • Tweet Filtering and Recommendation – Addressing the problem of dynamic topic drift. 59 Conclusion
  • 60. Conclusion Graduate Journey • Research Internships – 2011 DERI, Ireland (ISWC 2011, SPIM 2011, WebSci 2011) – 2013 IBM TJ Watson Research Center (WWWCOMP 2014, ESWC2014) – 2014 Samsung Research America (Patent Pending) • Invited talks – IBM TJ Watson Research Center, Frontiers of Cloud Computing and Big Data Workshop – EMC CTO Office, Bangalore, Invited Speaker Series – WSU Advisory Board • Proposals and Projects – Twitris – NSF Commercialization – Ohio State University – NSF Hazards SEES ($2M) – CITAR (Epidemiology) – NIH EdrugTrends ($1.6M) • Development of Research Systems – Twarql – A semantic tweet filtering system. • Winner of Triplification Challenge (ISem2010) – Scalable content dissemination on distributed social networks. (ISWC2011) – Twitris – A social semantic web for analyzing events. 60 COLLABORATIONS CITAR
  • 61. Publications • [NOISE 2015] Raghava Mutharaju, and Pavan Kapanipathi. Are We Really Standing on the Shoulders of Giants? 1st Workshop on Negative or Inconclusive Results in Semantic Web 2015, ESWC, 2015. • [KNOW 2015] Siva Kumar Chekula, Pavan Kapanipathi, Derek Doran, Amit Sheth. Entity Recommendations Using Hierarchical Knowledge Bases. 4th International Workshop on Knowledge Discovery and Data Mining Meets Linked Open Data, 2015. • [ESWC 2015] Pavan Kapanipathi, Revathy Krishnamurthy (Joint first author), Amit Sheth, Krishnaprasad Thirunarayan. Knowledge Enabled Approach to Predict the Location of Twitter Users. In Extended Semantic Web Conference, 2015. (acceptance rate 23%). • [ESWC 2014] Pavan Kapanipathi, Prateek Jain, Chitra Venkataramani, Amit Sheth. User Interests Identification on Twitter Using a Hierarchical Knowledge Base. In Extended Semantic Web Conference 2014, Crete Greece. (acceptance rate 23%) • [WWWComp 2014] Pavan Kapanipathi, Prateek Jain, Chitra Venkataramani, Amit Sheth. Hierarchical Interest Graph from Twitter. 23rd International conference on World Wide Web companion 2014 (WWW companion 2014), Seoul, South Korea. • [WI 2013] Fabrizio Orlandi, Pavan Kapanipathi, Alexandre Passant, Amit Sheth. Characterising concepts of interest leveraging Linked Data and the Social Web. The 2013 IEEE/WIC/ACM International Conference on Web Intelligence, Atlanta, USA, United States, 2013. • [SPIM 2011] Pavan Kapanipathi, Fabrizio Orlandi, Amit Sheth, Alexandre Passant. Personalized Filtering of the Twitter Stream. 2nd workshop on Semantic Personalized Information Management at ISWC 2011, September 2011. • [ISWC 2011] Pavan Kapanipathi, Julia Anaya, Amit Sheth, Brett Slatkin, Alexandre Passant. Privacy-Aware and Scalable Content Dissemination in Distributed Social Network. 10th International Semantic Web Conference 2011, Bonn, Germany, September 2011. (acceptance rate 22%) 61 Conclusion
  • 62. Conclusion Publications• [ISWCDEM 2011] Pavan Kapanipathi, Julia Anaya, Alexandre Passant . SemPuSH: Privacy- Aware and Scalable Broadcasting for Semantic Microblogging. 10th International Semantic Web Conference 2011, • [FSWE 2011] Pavan Kapanipathi. SMOB: The Best of Both Worlds. Federated Social Web Europe Conference, Berlin, June 3rd -5th 2011. • [WEBSCI 2011] Alexandre Passant, Owen Sacco, Julia Anaya, Pavan Kapanipathi. Privacy-By- Design in Federated Social Web Applications, Websci 2011, Koblenz, Germany June 14-17, 2011. • [ISEM 2010] Pablo Mendes, Pavan Kapanipathi, Alexandre Passant. Twarql: Tapping into the Wisdom of the Crowd. Triplification Challenge 2010 at 6th International Conference on Semantic Systems (I-SEMANTICS), [WI 2010] • [WI 2010] Pablo Mendes, Alexandre Passant, Pavan Kapanipathi, Amit Sheth. Linked Open Social Signals.WI2010 IEEE/WIC/ACM International Conference on Web Intelligence (WI-10), • [WEBSCI 2010] Pablo Mendes, Pavan Kapanipathi, Delroy Cameron, Amit Sheth. Dynamic Associative Relationships on the Linked Open Data Web. In Proceedings of the WebSci10: Extending the Frontiers of Society On-Line • [TR-CNF 2016] Pavan Kapanipathi, Krishnaprasad Thirunarayan, Fabrizio Orlandi, Amit Sheth, Pascal Hitzler. A Real-Time #approach for Continuous Crawling of Events on Twitter by Leveraging Wikipedia. Technical Report. • [TR-JRNL 2016] Pavan Kapanipathi, Siva Kumar, Derek Doran, Prateek Jain, Chitra Venkataramani, Amit Sheth. Hierarchical Knowledge Base enabled Twitter User Modeling and Recommendation. (Journal). • [TR-CNFC 2016] Siva Kumar, Pavan Kapanipathi, Derek Doran, Prateek Jain, Amit Sheth. Exploring Taxonomical Interests for Entity Recommendations. Technical report, 2015. • [TR-CNFC 2016] Sarasi Sarangi, Pavan Kapanipathi, Amit Sheth. Domain-specific Sub graph Generation. Technical report, 2015. 62
  • 63. Conclusion References • [1] How Do People Use Social Media for Business/Finance News? http://blog.marketwired.com/2013/11/12/how-do-people-use-social-media-for-businessfinance-news/ • [2] What is the role of social media in healthcare? http://worldofdtcmarketing.com/role-social-media- healthcare/social-media-and-healthcare/ • [3] Social media use during disaster management http://www.emergency-management-degree.org/crisis/ • [Tao 2012] Tao, K., Abel, F., Gao, Q., and Houben, G.-J. (2012a). Tums: Twitter-based user modeling service. • [Ramage 2010] Ramage, D., Dumais, S., and Liebling, D. (2010). Characterizing microblogs with topic models. AAAI’ 10. • [Yan 2012] Yan, R., Lapata, M., and Li, X. (2012). Tweet recommendation with graph co-ranking. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. • [Duan 2010] Duan, Y., Jiang, L., Qin, T., Zhou, M., and Shum, H.-Y. (2010). An empirical study on learning to rank of tweets. COLING ’10 • [Cremonesi 2010]Cremonesi, P., Koren, Y., and Turrin, R. (2010). Performance of recommender algorithms on top-n recommendation tasks. RecSys2010 • [Sriram 2010] Sriram, B., Fuhry, D., Demir, E., Ferhatosmanoglu, H., and Demirbas, M. (2010). Short text classification in twitter to improve information filtering. SIGIR ’10 • [Derczynsk 2013] Derczynski, L., Maynard, D., Aswani, N., and Bontcheva, K. (2013). Microblog- genre noise and impact on semantic annotation accuracy. HT ’13, • [Ferron 2011] Ferron, M. and Massa, P. (2011). Collective memory building in wikipedia: the case of north african uprisings. WikiSys2011 63
  • 64. Acknowledgements 64 Funding Agencies Internships and Collaborations CITAR Conclusion