SlideShare ist ein Scribd-Unternehmen logo
1 von 105
1
Citizen sensor data mining,
social media analytics and applications
Singapore Symposium
on Sentiment Analysis (S3A) ,Feb 6, 2015
Amit Sheth
Kno.e.sis: Ohio Center of Excellence
in Knowledge-enabled Computing
@ Wright State University
Acknowledgements
Significant components of this talk is from the tutorial I gave at WWW2011:
“Citizen Sensor Data Mining, Social Media Analytics and Development
Centric Web Applications,” with Meena Nagarajan and Selvam Velmurugan.
Contributors to Twitris and/or Semantic Social Web Research @ Kno.e.sis:
L. Chen, H. Purohit, W. Wang
with: P. Anantharam, A. Jadhav, P. Kapanipathi, Dr. T.K. Prasad,
And alumni: K. Gomadam, M. Nagarajan, A. Ranabahu)
Funding: NSF, AFRL, NIH; Collaborations: IBM, Microsoft
3
Ohio Center of Excellence in Knowledge-
enabled Computing
• Among top 10 among all universities in the world in World Wide Web (cf:
10-yr impact, Microsoft Academic Search)
• Largest academic group in the US in Semantic Web + Social/Sensor
Webs, Mobile/Cloud/Cognitive Computing, Big Data, IoT, Health/Clinical &
Biomedicine Applications
• Exceptional student success: internships and jobs at top salary (IBM
Research, MSR, Amazon, CISCO, Oracle, Yahoo!, Samsung, research
universities, NLM, startups )
• 80+researchers including 15 World Class faculty (>3K citations/faculty)
and 45+ PhD students- practically all funded
• $2M+/yr research for largely multidisciplinary projects; world class
resources; industry sponsorships/collaborations (Google, IBM, …)
4
5
Social Media Landscape
6Data for mid2012
http://www.mediabistro.com/alltwitter/social-media-stats-2014_b54243
Never before humanity is so connected
• Mumbai Terror
Attack
• Iran Election
2009
• Haiti Earthquake
2010
• Occupy Wall
Street
• Kashmir Floods
2014
Citizen Sensors in Action
7Image: http://huff.to/hp0OhA
• Ghonim, who has been a figurehead for the movement
against the Egyptian government, told Blitzer “If you
want to liberate a government, give them the internet.”
• Egyptian anti-government
demonstrator sleeps on the pavement
under spray paint that reads 'Al-
Jazeera' and 'Facebook' at Cairo's
Tahrir square on February 7, 2011.
http://www.cbsnews.com/stories/2011/02
/15/eveningnews/main20032118.shtml
Revolution 2.0
Political/Social Activism
8
• When Blitzer asked “Tunisia, then Egypt, what’s next?,”
Ghonim replied succinctly “Ask Facebook.”
http://cnn.com/video/?/video/world/2011/02/13/nr.social.media.revolution.cnn
http://cnn.com/video/?/video/tech/2011/02/11/barnett.egypt.social.media.cnn
Citizen Journalism
9
Twitter Journalism
Images: http://bit.ly/9GVfPQ,
http://bit.ly/hmrTYV
• Social News
• Social Media and
Global Media are
inter-twined.
News is increasingly Social
10
11
Some of the significant human, social & economic
development applications we work on at Kno.e.sis
• Coordination during disasters (Qatar Computing Research
Institute, Microsoft Research NYC)
• Harassment on social media (WSU cognitive scientists)
• Prescription drug abuse, Cannabis & Synthetic
Cannabinoid epidemiology (Center for Interventions, Treatment
and Addictions Research, ….)
• Depressive disorders (Mayo Clinic)
• Gender-based violence (United Nations)
Highly multidisciplinary team efforts, often with significant
partners, with real world data, intended to achieve real-
world impact
12
Sample of Real-World Impact & Media Coverage
• Twitter Data Mining Reveals America‘s Religious Fault Lines,
MIT Technology Review, Oct 6, 2014
• Digital soldiers emerge heroes in Kashmir flood rescue,
HindustanTimes, September 25, 2014
• India's social media election battle, BBC News, Mar 30, 2014
• #Cursing Study: 10 Lessons About How We Use Swear Words on
Twitter, Time.com, Feb 19, 2014
• Twitris: Taking Crisis Mapping to the Next Level, Tech President,
June 24, 2013
• Picking the President: Twindex, Twitris Track Social Media
Electorate, Semanticweb.com, Aug 3, 2012
• Web App Analyzes Tweets in Real Time for a Record of Historic
Events, Mashable.com, Feb 17, 2012
13
TWITRIS’ Technical Approach to
Understand & Analyze Social Content
Social Data is
incredibly rich
14
Some of the topics on Online Social Media
we research at Kno.e.sis
1. Named Entity Recognition
2. Language usage in Social Media
4. Exploration of People, Content and Network dynamics
6. Sentiment, Emotion and Opinion mining
5. Trust
6. Integrated exploitation of Sensor (physical), Web (Cyber)
and Social data for PCS applications
7. TWITRIS: A System for Mining Collective Intelligence
from Citizen-Sensor Data
• "Who says what, to whom,
why, to what extent and with what effect?" [Laswell]
• Network: Social structure emerges
from the aggregate of relationships (ties)
• People: poster identities, the active effort of
accomplishing interaction
• Content : studying the content of communication
Social Information
Processing
15
Why People-Content-Network +
Spatial-Temporal-Thematic metadata?
(Example of Understanding Crisis Data)
16
, Offer help, etc.
`
• Explicit information from user profiles
– User Names, Pictures, Videos, Links, Demographic Information,
Group memberships...
• Implicit information from user attention metadata
– Page views, Facebook 'Likes', Comments; Twitter 'Follows',
Retweets, Replies..
People Metadata:
Variety of Self-expression Modes
on Multiple Social Media Platforms
17
People Metadata: Various Types
Identification
Structural Network
Activity
Interests
18
People Metadata: Continued
User Identification Metadata
• User-id
• Screen/Display-name of user
• Real name of user
• Location
• Profile Creation Date
• User description
- Biodata of the user
- Link to webpage of the user
Interest Metadata
• Author type
- Trustee/donor, journalist, blogger,
scientist etc.
• Favorite tweets
• Types of lists subscribed
• Style of Writing (personality
indicator)
• No. of Followees
• Majority of author type of
Followees
19
People Metadata: Continued
Web Presence:
- User affiliations
- Influence Metric – e.g., KLOUT (www.klout.com)
Activity Metadata
• Age of the profile
• Frequency of posts
• Timestamp of last status
• No. of Posts
• No. of Lists/groups created
• No. of Lists/groups subscribed
Influence Metadata
(Inferring People Metadata from Network level Information)
• No. of Followers – normal, influential
• No. of Mentions
• No. of Retweets/Forwards
• No. of Replies
• No. of Lists/groups following
• No. of people following back
• Authority & Hub Scores
20
Content Metadata:
Content Dependent (Tweet)
23
Direct Content-based Metadata
Indirect content-based metadata (External metadata)
Direct Content-based Metadata
Content Metadata:
Content Dependent (SMS)
24
Connections/Relationships matter! (foundation for the network)
Network Metadata
25
Structure Metadata
• Community Size
• Community growth rate
• Largest Strongly Connected
Component size
• Weakly Connected Components &
Max(WCC) size
• Average Degree of Separation
• Clustering Coefficient
Relationship Metadata
• Type of Relationship
• Relationship strength
• User Homophily (based on certain
characteristic such as location,
interest etc.)
• Reciprocity: mutual relationship
• Active Community/ Ties
Metadata Creation & Extraction
Length: 109 characters
General topic: Egypt protest
This poor {sentiment_expression: {target: “Lara Logan”,
polarity: “negative”}} woman! RT @THR CBS News‘
{entity:{type=“News Agency”}} Lara Logan
{entity:{type=“Person”}} Released From Hospital
{entity:{type=“Hospital”}} After Egypt
{entity:{type=“Country”} Assault {topic} http://bit.ly/dKWTY0
{external_URL}
26
Metadata Extraction from
Informal Text
Meena Nagarajan, ‘Understanding User-Generated Content on Social Media,’ Ph.D. Dissertation, Wright State University, 2010
Content Analysis: Typical Sub-tasks
• Recognize key entities mentioned in content
– Information Extraction (entity recognition, anaphora resolution, entity
classification..)
– Discovery of Semantic Associations between entities
• Topic Classification, Aboutness of content
– What is the content about?
• Intention Analysis
– Why did they share this content?
28
• Sentiment Analysis
– What opinions are people conveying via the content?
• Author Profiling
– What can we infer about the author from the content he posts?
• Context (external to content) extraction
– URL extraction, analyzing external content
• Named Entity Recognition
– I loved <movie> the hangover </movie>!
• Key Phrase Extraction
29
NER, Key Phrase Extraction
Named Entity Recognition
“I loved your music Yesterday!”
Yesterday is an album
“It was THE HANGOVER of the year..lasted forever..
The Hangover is not a movie
So I went to the movies..badchoice picking “GI
Jane”worse now”
GI Jane is a movie
30
Task of NER : Identifying and classifying tokens
Analysing the Content can be Hard…
Using a domain model (E.g., MusicBrainz)
Using context cues from the content
• e.g. new Merry Christmas tune
Reduce potential entity spot size (with restrictions)
• e.g. new albums/songs
Multimodal Social Intelligence in a Real-Time Dashboard System
Analyzing the content can be hard
31
32
Music NER application : BBC SoundIndex
(IBM Almaden)
Pulse of the Online Music Populace
Daniel Gruhl, Meenakshi Nagarajan, Jan Pieper, Christine Robson, Amit Sheth: ‘Multimodal Social Intelligence in a Real-Time Dashboard System,’
special issue of the VLDB Journal on "Data Management and Mining for Social Networks and Social Media", 2010
Project: http://www.almaden.ibm.com/cs/projects/iis/sound/
The Vision
http://www.almaden.ibm.com/cs/projects/iis/sound/
33
34
Several Insights
35
Only 4% -ve sentiments, perhaps ignore the Sentiment
Annotator on this data source?
Ignoring Spam can change ordering
of popular artists
Trending popularity of artists Trending topics in artist pages
Predictive Power of Data
• Billboards Top 50 Singles chart
during the week of Sept 22-28
’07 vs. MySpace popularity
charts.
• User study indicated 2:1 and
upto 7:1 (younger age groups)
preference for MySpace list.
• Challenging traditional polling
methods!
36
KEY PHRASE EXTRACTION
37
Key Phrase Extraction - Example
• Key phrases extracted from prominent discussions on
Twitter around the 2009 Health Care Reform debate and
2008 Mumbai Terror Attack on one day
38
39
M. Nagarajan et al., Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data - Challenges and Experiences, Tenth International Conference on Web
Information Systems Engineering, Oct 5-7, 2009: 539-553
TF-IDF vs. Spatio-temporal-thematic scores rank phrases differently
Foreign relations
surfaces up
INTENTION MINING
40
Why do people share?
• Outside of the psychological incentives, broadly, people
share to Seek Information OR Share Information
• If we understand the intent behind a post, we can build
systems that respond to it better
• An application: Understand intent to deliver targeted
content
– Use case: Online Content-Targeted Advertisements on Social Media
Platforms
41
Circa 2009 -Content-based Ads
42
Today – Content-based Ads on Profiles
43
What is going on here..
• Ads are targeted on profile interests, demographic data
• But Interests on profiles do not translate to purchase
intents
– Interests are often outdated..
– Intents are rarely stated on a profile..
• Some profile data does seem to work
– Example: New store openings, sales targeted at location
information in a profile
44
But Monetizable Intents are Elsewhere,
away from their profiles..
45
Showing clear intents on MySpace
posts but no relevant ads..
46
–Non-trivial
–Non-policed content
•Brand image, Unfavorable sentiments
–People are there to network
•User attention to ads is not guaranteed
–Informal, casual nature of content
•People are sharing experiences and events
–Main message overloaded with off
topic content
I NEED HELP WITHSONY VEGAS PRO 8!! Ugh and ihave a
video project due tomorrow for merrilllynch :(( all ineed
to do is simple: Extract several scenes from a clip, insert
captions, transitions and thatsit. really. omggicant figure
out anything!! help!! and igot food poisoning from eggs.
its not fun. Pleasssse, help? :(
1Learning from Multi-topic Web Documents for Contextual Advertisement, Zhang, Y., Surendran, A. C., Platt, J. C., and Narasimhan, M.,KDD 2008
Targeted Content-based Advertizing
47
Focus: Discuss Methodology,
Preliminary Results in…
• Identifying intents behind user posts on social networks
– Identify Content with monetization potential
• Identifying keywords for advertizing in user-generated
content
– Considering interpersonal communication & off-topic chatter
48
M. Nagarajan et al., ‘Monetizing User Activity on Social Networks - Challenges and Experiences,’ 2009 IEEE/WIC/ACM International Conference on Web
Intelligence, Sep 15-18 2009: 92-99
Result - 8X more interest for non-profile
ads..
• Using profile ads
– Total of 56 ad impressions
– 7% of ads generated interest
• Using authored posts
– Total of 56 ad impressions
– 43% of ads generated interest
• Using topical keywords from authored posts
– Total of 59 ad impressions
– 59% of ads generated interest
49
SENTIMENT / OPINION MINING
50
Sentiment Analysis: Motivation
Which movie
should I see?
What
customers
complain
about?
Why do
people
oppose
health care
reform?
Image: http://bit.ly/eZtKBF
51
Content Analysis:
Sentiment Analysis/Opinion Mining
• Two main types of information we can learn from user-
generated content: fact vs. opinion
• Much of social media text (e.g., blogs, Twitter, Facebook)
is a mix of facts and opinions.
• Extracting structured sentiment information from
unstructured content
• Allowing computation to be done on “what people think”
and “how people feel”
52
• From coarse-grained to fine-grained
– Document level -> sentence level -> expression level
– General sentiment -> domain-dependent sentiment -> target-
dependent sentiment
• From static to dynamic
– Our attitude can be changed during social communication.
• Modeling, detecting, and tracking the change of attitude
• What leads to the change of attitude? E.g., persuasion
campaign
53
Sentiment Analysis: Challenges
Sentiment Analysis:
Target-specific Opinion Identification
Observations:
• The opinion clues may not be toward the given target
(1,2,3,6)
• The opinion clues are domain and context dependent
(5,7)
• Single words are not enough (4,7,8)
Simple lexicon-based method doesn't work well.
54
Target of “sexy” is “Helena”
Target of “terrific” is “reviews”
“free” is not opinionated in
movie domain.
Target of “loving” is “telling”
“well” in “as well” is not
opinionated
55
Extracting a diverse and richer
set of sentiment-bearing
expressions, including formal
and slang words/phrases
Assessing the
target-dependent polarity
of each sentiment
expression
A novel formulation of assigning
polarity to a sentiment expression
as a constrained optimization
problem over the tweet corpus
Extracting Diverse Sentiment Expressions
With Target-dependent Polarity from Twitter [Chen et al. ICWSM 2012]
The Usage of Background Knowledge
56
57
Sentiment Analysis:
Feature and Aspect Extraction
Motivation
• To understand a user’s opinions about a product at a fine-grained
level, support opinion summarization for products, and
automatically extract pros and cons from reviews it is essential to
identify product features and aspects.
Impact
• Existing methods tend to require seed terms and focus on
identifying explicit features or a few high-level aspects.
• Our approach is capable of identifying both explicit and implicit
aspects and does not require any labeling efforts.
Approach
• We use a combination of corpus-based association measures, and
semantic similarity measures to identify product aspects in an
efficient clustering based approach.
58
Clustering for Aspect Discovery in Opinion Mining [Chen et al.
in submission]
59
It is actually about tracking public opinion.
PollingorSocial Media Analysis?
1. Sample size
2. Representative of the target population
3. Accurate measure of opinions
4. Timeliness
• We Study different groups of social media users who
engage in the discussions of 2012 U.S. Republican
Presidential Primaries, and compare the predictive
power among these user groups.
• Existing studies on predicting election result are under
the assumption that all the users should be treated
equally.
• How could different groups of users be different in
predicting election results?
60
Harnessing the Power of Social Data
to Predict Election Results [Chen et al., SocInfo 2012]
61
1. Engagement
Degree
2. Tweet Mode 3. Content Type 4. Political Preference
User Categorization
Predicting a User's Vote
• Basic idea: for which candidate the user shows the most
support
– Frequent mentions
– Positive sentiment
62
Nm(c): the number of tweets mentioning the candidate c
Npos(c): the number of positive tweets about candidate c
Nneg(c): the number of negative tweets about candidate c
 (0 <  < 1): smoothing parameter
 (0 <  < 1): discounting the score when the user does not
express any opinion towards c.
The user
posted opinion
about c
The user
mentioned c but
did not post
opinion about c
More mentions,
higher score
More positive/less
negative opinions,
higher score
63
Revealing the challenge of
identifying the vote intent of
“silent majority”
Retweets may not necessarily
reflect users' attitude.
Prediction of user’s vote based
on more opinion tweets is not
necessarily more accurate than
the prediction using more
information tweets
The right-leaning user group
provides the most accurate
prediction result. It correctly predict
the winners in 8 out of 10 states
with an average prediction error of
0.1.
To some extent, it demonstrates
the importance of identifying likely
voters in electoral prediction.
Twitter users are not “equal”
in predicting elections!
EMOTION MINING
64
Emotion Mining: Motivation
65
• Emotion is essential to all aspects of our lives.
– Influences our decision-making
– Affects our social relationships
– Shapes our daily behavior
• Emotional mental health
– New mothers may suffer from post-partum depression
– Veterans may constantly suffer from negative emotions because
of post-traumatic stress disorder
Emotion Mining: what have we studied
66
• Can we automatically create a large emotion dataset
with high quality labels from Twitter? How?
• What features can effectively improve the performance
of supervised machine learning algorithms?
• Can the system developed on Twitter data be directly
applied to identify emotions from other datasets?
• What can we learn about emotion from social media
data?
• Collect self-annotated emotion tweets [Wang et. al. SocialCom 2012]
– Seven emotions: joy, sadness, anger, love, fear, surprise, thankfulness
“When I see a cop, no matter where I am or what I’m doing, I
always feel like every law I’ve ever broken is stamped all over
my body #fear”
“I hate when my mom compares me to my friends. #anger”
“I hate when I get the hiccups in class. #embarrassing”
Harnessing twitter" big data" for
automatic emotion identification [Wang et al.
SocialCom12]
67
0.4
0.45
0.5
0.55
0.6
0.65
1,000 10,000 248,898 497,796 746,694 995,592 1,244,490 1,493,388 1,742,286 1,991,184
accuracy
number of tweets in training data
LIBLINEAR
MNB
The more data, the merrier
68
Results of performing seven emotion classifications
Discovering Fine-grained Emotion
in Suicide Notes [Wang et al. BII12]
69
• Automatically classify suicide notes to different (15)
categories at sentence level
• Emotion categories
– Positive
• Hopefulness, thankfulness, forgiveness, love, pride, happiness
– negative
• Sorrow, abuse, anger, hopelessness, guilt, blame, fear
• Other categories
– Information, instructions
Discovering Fine-grained Emotion
in Suicide Notes [Wang et al. BII12]
70
Sentence: “Found out today that // I passed my math STAAR test.”
• N-gram features
• Unigram, e.g., found, today, passed, etc.
• Bigram, e.g., found_out, out_today, etc.
• N-gram position
– Unigram: found-1, out-1, today-1,…,, I-2, passed-2, my-2, …
• Knowledge-based features:
– LIWC (Pennebaker et al., 2014a)
– WordNet-Affect (Strapparava and Valitutti, 2004)
– MPQA (Wilson et al., 2005)
• Syntactic features:
– Part-of-speech tags, e.g., Found/VBN out/RP today/NN that/IN I/PRP
passed/VBD…
– Dependency relations, e.g., root(ROOT-0, Found-1); ccomp(Found-1, passed-6);
dobj(passed-6, test-10) …
Discovering Fine-grained Emotion
in Suicide Notes [Wang et al. BII12]
71
Winner: N-gram(1,2), knowledge-based and syntactic features
Cursing in English on Twitter [Wang et al. CSCW14]
72
• The main reason that people use curse words is to express some
strong emotions, especially anger and frustration. [Jay 1992, 2000;
McEnergy 2006; Nasution and Rosa 2012]
Normalized Emotion Distributions
over Time in Eastern Standard TimeNormalized Emotion Distributions over Days (EST)
“I am so thankful for my family && close friends. They hold me together
when everything else around me is falling apart. #SoBlessed #Thankful”
73
Normalized Emotion Distributions over Time (EST)
“I thank God everytime I see another day :*) #thankful .”
74
Rank Mom Dad
1 Irritation (7, 562) Irritation (3, 034)
2 Sadness (2, 315) Sadness (1, 363)
3 Affection (2, 225) Embarrassment (1, 158)
4 Zest (2, 213) Zest (1, 035)
5 Embarrassment (1, 849) Affection (1, 030)
6 Thankfulness (1, 537) Cheerfulness (911)
7 Cheerfulness (1, 332) envy (902)
“I hate when my dad uses my laptop. Its mine. Not yours. You have your own computer.
I have shit to do, get off now please. #annoyed”
“ugh my mom gets so nervous when i drive #annoying”
“My mom just told me I can't open any presents early cause I'm too old for that #depressing”
What are the top Emotions Associated with Moms and Dads?
75
PEOPLE ANALYSIS
- Deriving People Metadata
- from Content Analysis
- from Network Analysis
- Merge of two approaches
- People-Content-Network Analysis to leverage the metadata
- Finding Influential Users
- Finding User Types & Affiliation
- Measuring Social Engagement
- Leverage communities to assist coordination
76
People Analysis:
Social Engagement & Coordination
77
Imagine a crisis scenario such as Haiti earthquake (2010) or
hurricane Sandy (2012)
- emergency teams are looking for ways to help the victims
• What are the best possible ways to communicate:
identify and engage people
• Between resource providers (supply) and people in
need of resources (demand)
• Topical community influencers
• How response teams can coordinate social media
communities well between volunteers, managers in
organizational structure, and resource seekers?
People Analysis: Who is asking for help, Who is offering to help?
Smart Data in the context of Disaster Management
ACTIONABLE: Timely delivery of
right resources and information
to the right people at right
location!
78
Because everyone wants to Help, but DON’T KNOW HOW!
Really sparse Signal to Noise:
• 2M tweets during the first 48 hrs. of #Oklahoma-tornado-2013
- 1.3% as the precise resource donation requests to help
- 0.02% as the precise resource donation offers to help
79
• Anyone know how to get involved to
help the tornado victims in
Oklahoma??#tornado #oklahomacity
(OFFER)
• I want to donate to the Oklahoma cause
shoes clothes even food if I can (OFFER)
Disaster Response Coordination:
Finding Actionable Nuggets for Responders to act
• Text REDCROSS to 909-99 to donate to
those impacted by the Moore tornado!
http://t.co/oQMljkicPs (REQUEST)
• Please donate to Oklahoma disaster
relief efforts.: http://t.co/crRvLAaHtk
(REQUEST)
For responders, most important information to manage
coordination dependencies is
the scarcity and availability of resources
Blog by our colleague Patrick Meier on this analysis: http://irevolution.net/2013/05/29/analyzing-tweets-tornado/
People Analysis: Match demander-
suppliers for coordination during crisis
Purohit, H., Castillo, C., Diaz, F., Sheth, A., & Meier, P. (2013). Emergency-relief coordination on social media: Automatically
matching resource requests and offers. First Monday, 19(1).
80
Demand-Supply identification and
representation: core & facets
• Extract Core of the phrase- “what”
– Other facets includes “who”, “where”, “when”, etc.
• Supervised Learning to classify items for demands, supplies, and
resource type facets
81
Rotary collecting clothing and other donations in New Jersey <URL>
{ source: “Twitter”, author: “@NN”, text: “Rotary collecting clothing and
other donations in New Jersey <URL>”, donation-info: { donation-type:
“Request”, donation-type-confidence: 0.8, donation-organization: “Rotary”,
donation-item: “clothing and other donations”, donation-location: “New
Jersey” }, … }
Corresponding data item in the semi-structured knowledge inventory:
• IR model approach to match demand (request) with supply (offer)
items in this semantically annotated knowledge inventory
Leveraging Communities for Whom
to Engage With, Why and How
82
Purohit et al., User Taglines: Alternative Presentations of Expertise and Interest in Social Media . ASE Social Informatics, 2012
Network Analysis
Interesting questions to ask:
• How communities form around topics- growth & evolution
• What are the effects of influential participants in the communities
• What are the effects of content nature (or sentiment, opinions)
flowing in network on the community structures and growth
• What is the community structure: degree of separation and sub-
communities that contribute for macro-level effects, e.g.,
coordination, engagement
“To Discover How A, is in Touch with B and C,
Is Affected by the Relation Between B & C”
-John Barnes
83
Foundation of network:
•Nodes
•Connections/Relationships
Image: http://www.onasurveys.com/
Graphs showing sparse (A) and dense (B) RT networks and their
corresponding follower graphs for 'call for action' and
'information sharing' tweet content types
M. Nagarajan, H. Purohit, and A. Sheth, ’A Qualitative Examination of Topical Tweet and Retweet Practices,’ 4th Int'l AAAI Conference on Weblogs
and Social Media, ICWSM 2010 84
Understanding Evolving Community
Structures for Coordination
85
User interaction networks of two topical communities– Occupy LA and Chicago,
of emerging influencers during Occupy Wall Street (OWS) event 2011
Application of evolving communities:
H. Purohit, J. Ajmera, S. Joshi, A. Verma, A. Sheth. Finding Influential Authors in Brand-Page Communities. 6th Int'l AAAI Conference on Weblogs and
Social Media (ICWSM), Dublin, Ireland, June 5-7, 2012
Evolution of influencer interaction networks for Romney vs. Obama topical
communities, during U.S. Presidential Election 2012 debates
Romney
Obama
Before 1st
debate
After 1st
debate
After
Hurricane Sandy
After 3rd
debate
Understanding Community Evolution for
Real-World Actions
86
Social Media analysis for US elections 2012, powered by Twitris: http://analysis.knoesis.org/uselection/insights/
On Understanding the Divergence of
Online Social Group Discussion
• Change of group discussion divergence over time, and different
phases of real world events
• Relation between discussion divergence and existing theories of
social cohesion and social identity in Psychology
• Prediction of future change in the group discussion divergence
Research Questions on Social Dynamics in Communities
Acknowledgement:
NSF SoCS grant for ‘Leveraging Social Media during Emergency
Response’
Purohit, H., Ruan, Y., Fuhry, D., Parthasarathy, S., & Sheth, A. (2014, May). On Understanding Divergence of Online Social Group
Discussion. In 8th Intl AAAI Conference on Weblogs and Social Media.
• Prior work:
– Focus on structural metrics to understand group evolution
dynamics, but may not be sufficient to answer ‘WHY a group
diverges over time’
• Our approach:
– Content driven measure: collective divergence of group
members for topics of discussion
– Features assessing role of socio-psychological theories:
cohesion & identity
• Data:
– Tweets during evolving events of natural disasters, and social
activism
Contrasting Prior Work and Approach
Evolution of groups in online
social communities
surrounding events 
On Understanding the Divergence of
Online Social Group Discussion
Purohit, H., Ruan, Y., Fuhry, D., Parthasarathy, S., & Sheth, A. (2014, May). On Understanding Divergence of Online Social Group
Discussion. In 8th Intl AAAI Conference on Weblogs and Social Media.
88
• During #sandy, predicted low
diverging (focused) groups to
engage with on the updates
of flights, first delays &
cancellation, then resuming
• Natural disaster (D) events
(Hurricane Irene and Sandy)
have stronger correlations
with identity-driven features
than with cohesion featuresWe predicted group discussion
divergence
across phases, by 0.83 AUC
Time
On Understanding the Divergence of
Online Social Group Discussion
Purohit, H., Ruan, Y., Fuhry, D., Parthasarathy, S., & Sheth, A. (2014, May). On Understanding Divergence of Online Social Group
Discussion. In 8th Intl AAAI Conference on Weblogs and Social Media.
89
Continuous Semantics for Evolving Events to Extract Smart Data
90
Dynamic Model Creation
Continuous Semantics 91
Live Demo of Powerful Social
Media Analysis: Twitris
92
Twitris - Motivation
1. Information Overload
• Multiple events around us
• WHAT to be aware of
• Multiple Storylines about same
event!!
93
Image: http://bit.ly/etFezl
Twitris - Motivation
2. Evolution of Citizen Observation
• with location and time
94
Twitris - Motivation
3. Semantics of Social perceptions
• What is being said about an event (theme)
• Where (spatial)
• When (temporal )
Twitris lets you browse citizen reports using social
perceptions as the fulcrum
95
Twitris: Semantic Social Web Mash-up
Facilitates understanding of multi-dimensional social perceptions over
SMS, Tweets, multimedia Web content, electronic news media
96
96
Twitris: Architecture
97
Meenakshi Nagarajan, Karthik Gomadam, Amit Sheth, Ajith Ranabahu, Raghava Mutharaju and Ashutosh Jadhav, ‘Spatio-Temporal-Thematic
Analysis of Citizen-Sensor Data - Challenges and Experiences,’ Tenth International Conference on Web Information Systems Engineering, 539 - 553,
Oct 5-7, 2009.
Twitris:
Functional
Overview
98
Twitris: Event Summarization
99
Incoming Tweets with need
types to give quick idea of what
is needed and where currently
#OKC
Legends for
Different
needs #OKC
100
Clicking on a tag brings contextual
information– relevant tweets,
news/blogs, and Wikipedia articles
Twitris: Real-time information
How People from Different
parts of the world talked
about US Election
Images and Videos
Related to US Election
101
Twitris: Analysis by location for contrast in
social perceptions
Twitris: Sentiment Analysis
• Sentiment Analysis
– using statistical and machine learning techniques
102
103
How was Obama doing in the first debate?
Twitris: Sentiment Analysis- Smart
Answers with reasoning!
The Dead People mentioned
in the event OWC
104
Twitris: Impact of Background
Knowledge
Twitris: Demo, Quick Show
http://twitris2.knoesis.org/
• Many other interesting efforts – Eg: Vivek K. Singh, Mingyan Gao, and Ramesh
Jain. 2010. From microblogs to social images: event analytics for situation
assessment. In Proceedings of the international conference on Multimedia
information retrieval (MIR '10). ACM, New York, NY, USA, 433-436.
105
• Do you have a sense of immense opportunity of analyzing
citizen sensing for useful social signals?
• Do you appreciate the broad range of issues and challenges?
Did we present examples and a few insights into how to
address some unique challenges?
• Did spatio-temporal-thematic, people-content-network,
emotion-sentiment-intent dimensions present reasonable way
to organize vast number of relevant research challenges and
techniques?
106
Conclusions
107
http://knoesis.org
Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing
Wright State University, Dayton, Ohio, USA
thank you, and please visit us at

Weitere ähnliche Inhalte

Was ist angesagt?

Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...
Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...
Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...Amit Sheth
 
Semantic, Cognitive, and Perceptual Computing – three intertwined strands of ...
Semantic, Cognitive, and Perceptual Computing – three intertwined strands of ...Semantic, Cognitive, and Perceptual Computing – three intertwined strands of ...
Semantic, Cognitive, and Perceptual Computing – three intertwined strands of ...Amit Sheth
 
Kno.e.sis Approach to Impactful Research & Training for Exceptional Careers
Kno.e.sis Approach to Impactful Research & Training for Exceptional CareersKno.e.sis Approach to Impactful Research & Training for Exceptional Careers
Kno.e.sis Approach to Impactful Research & Training for Exceptional CareersAmit Sheth
 
Physical Cyber Social Computing
Physical Cyber Social ComputingPhysical Cyber Social Computing
Physical Cyber Social ComputingAmit Sheth
 
Smart Data for you and me: Personalized and Actionable Physical Cyber Social ...
Smart Data for you and me: Personalized and Actionable Physical Cyber Social ...Smart Data for you and me: Personalized and Actionable Physical Cyber Social ...
Smart Data for you and me: Personalized and Actionable Physical Cyber Social ...Amit Sheth
 
Knowledge Will Propel Machine Understanding of Big Data
Knowledge Will Propel Machine Understanding of Big DataKnowledge Will Propel Machine Understanding of Big Data
Knowledge Will Propel Machine Understanding of Big DataAmit Sheth
 
TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, ...
TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, ...TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, ...
TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, ...Amit Sheth
 
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...Artificial Intelligence Institute at UofSC
 
A Semantics-based Approach to Machine Perception
A Semantics-based Approach to Machine PerceptionA Semantics-based Approach to Machine Perception
A Semantics-based Approach to Machine PerceptionCory Andrew Henson
 
Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Soci...
Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Soci...Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Soci...
Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Soci...Artificial Intelligence Institute at UofSC
 
Big Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&DBig Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&DUniversity of Washington
 
New and Emerging Forms of Data
New and Emerging Forms of DataNew and Emerging Forms of Data
New and Emerging Forms of DataDavid De Roure
 
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Matthew Lease
 
Accessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science KnowledgeAccessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science KnowledgeJosh Cowls
 

Was ist angesagt? (20)

Knoesis Student Achievement
Knoesis Student AchievementKnoesis Student Achievement
Knoesis Student Achievement
 
Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...
Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...
Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...
 
Semantic, Cognitive, and Perceptual Computing – three intertwined strands of ...
Semantic, Cognitive, and Perceptual Computing – three intertwined strands of ...Semantic, Cognitive, and Perceptual Computing – three intertwined strands of ...
Semantic, Cognitive, and Perceptual Computing – three intertwined strands of ...
 
Kno.e.sis Approach to Impactful Research & Training for Exceptional Careers
Kno.e.sis Approach to Impactful Research & Training for Exceptional CareersKno.e.sis Approach to Impactful Research & Training for Exceptional Careers
Kno.e.sis Approach to Impactful Research & Training for Exceptional Careers
 
Trust Management: A Tutorial
Trust Management: A TutorialTrust Management: A Tutorial
Trust Management: A Tutorial
 
Physical Cyber Social Computing
Physical Cyber Social ComputingPhysical Cyber Social Computing
Physical Cyber Social Computing
 
Smart Data for you and me: Personalized and Actionable Physical Cyber Social ...
Smart Data for you and me: Personalized and Actionable Physical Cyber Social ...Smart Data for you and me: Personalized and Actionable Physical Cyber Social ...
Smart Data for you and me: Personalized and Actionable Physical Cyber Social ...
 
Knowledge Will Propel Machine Understanding of Big Data
Knowledge Will Propel Machine Understanding of Big DataKnowledge Will Propel Machine Understanding of Big Data
Knowledge Will Propel Machine Understanding of Big Data
 
TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, ...
TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, ...TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, ...
TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, ...
 
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
 
A Semantics-based Approach to Machine Perception
A Semantics-based Approach to Machine PerceptionA Semantics-based Approach to Machine Perception
A Semantics-based Approach to Machine Perception
 
Data Science and Urban Science @ UW
Data Science and Urban Science @ UWData Science and Urban Science @ UW
Data Science and Urban Science @ UW
 
Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Soci...
Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Soci...Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Soci...
Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Soci...
 
Understanding City Traffic Dynamics Utilizing Sensor and Textual Observations
Understanding City Traffic Dynamics Utilizing Sensor and Textual ObservationsUnderstanding City Traffic Dynamics Utilizing Sensor and Textual Observations
Understanding City Traffic Dynamics Utilizing Sensor and Textual Observations
 
Urban Data Science at UW
Urban Data Science at UWUrban Data Science at UW
Urban Data Science at UW
 
Big Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&DBig Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&D
 
New and Emerging Forms of Data
New and Emerging Forms of DataNew and Emerging Forms of Data
New and Emerging Forms of Data
 
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
 
Accessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science KnowledgeAccessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science Knowledge
 
Data stories
Data storiesData stories
Data stories
 

Ähnlich wie Citizen Sensor Data Mining, Social Media Analytics and Applications

Sentiment Analysis and Social Media: How and Why
Sentiment Analysis and Social Media: How and WhySentiment Analysis and Social Media: How and Why
Sentiment Analysis and Social Media: How and WhyDavide Feltoni Gurini
 
Social Data and Multimedia Analytics for News and Events Applications
Social Data and Multimedia Analytics for News and Events ApplicationsSocial Data and Multimedia Analytics for News and Events Applications
Social Data and Multimedia Analytics for News and Events ApplicationsYiannis Kompatsiaris
 
Ejis Analysis
Ejis AnalysisEjis Analysis
Ejis Analysisu3037519
 
Webometrics and Studies of Cultural Diffusion -Psy Gangnam Style on YouTube
Webometrics and Studies of Cultural Diffusion-Psy Gangnam Style on YouTubeWebometrics and Studies of Cultural Diffusion-Psy Gangnam Style on YouTube
Webometrics and Studies of Cultural Diffusion -Psy Gangnam Style on YouTubeHan Woo PARK
 
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting...
Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods:  Extracting...Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods:  Extracting...
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting...learjk
 
Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods: Extracting So...
Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods:  Extracting So...Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods:  Extracting So...
Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods: Extracting So...Shalin Hai-Jew
 
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...Artificial Intelligence Institute at UofSC
 
Social media & the News Revolution
Social media & the News RevolutionSocial media & the News Revolution
Social media & the News RevolutionSue Robinson
 
National Geographic - Omniture Cafe 6/11/09
National Geographic - Omniture Cafe 6/11/09National Geographic - Omniture Cafe 6/11/09
National Geographic - Omniture Cafe 6/11/09Ted McDonald
 
Summer Social Webshop: Technology-Mediated Social Participation
Summer Social Webshop: Technology-Mediated Social ParticipationSummer Social Webshop: Technology-Mediated Social Participation
Summer Social Webshop: Technology-Mediated Social ParticipationUniversity of Maryland
 
Socialsensor project overview and topic discovery in tweeter streams
Socialsensor project overview and topic discovery in tweeter streams Socialsensor project overview and topic discovery in tweeter streams
Socialsensor project overview and topic discovery in tweeter streams Yiannis Kompatsiaris
 
The Networked Creativity in the Censored Web 2.0
The Networked Creativity in the Censored Web 2.0The Networked Creativity in the Censored Web 2.0
The Networked Creativity in the Censored Web 2.0Weiai Wayne Xu
 
Picturing the Social: Talk for Transforming Digital Methods Winter School
Picturing the Social: Talk for Transforming Digital Methods Winter SchoolPicturing the Social: Talk for Transforming Digital Methods Winter School
Picturing the Social: Talk for Transforming Digital Methods Winter SchoolFarida Vis
 
The State of Social Media (and How to Use It and Not Lose Your Job)
The State of Social Media (and How to Use It and Not Lose Your Job)The State of Social Media (and How to Use It and Not Lose Your Job)
The State of Social Media (and How to Use It and Not Lose Your Job)Andrew Krzmarzick
 
Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisFarida Vis
 
Research with Social Media Data: Stewardship & Ethical Considerations
Research with Social Media Data: Stewardship & Ethical ConsiderationsResearch with Social Media Data: Stewardship & Ethical Considerations
Research with Social Media Data: Stewardship & Ethical ConsiderationsToronto Metropolitan University
 
Social media mining PPT
Social media mining PPTSocial media mining PPT
Social media mining PPTChhavi Mathur
 

Ähnlich wie Citizen Sensor Data Mining, Social Media Analytics and Applications (20)

Sentiment Analysis and Social Media: How and Why
Sentiment Analysis and Social Media: How and WhySentiment Analysis and Social Media: How and Why
Sentiment Analysis and Social Media: How and Why
 
Social Data and Multimedia Analytics for News and Events Applications
Social Data and Multimedia Analytics for News and Events ApplicationsSocial Data and Multimedia Analytics for News and Events Applications
Social Data and Multimedia Analytics for News and Events Applications
 
Ejis Analysis
Ejis AnalysisEjis Analysis
Ejis Analysis
 
Webometrics and Studies of Cultural Diffusion -Psy Gangnam Style on YouTube
Webometrics and Studies of Cultural Diffusion-Psy Gangnam Style on YouTubeWebometrics and Studies of Cultural Diffusion-Psy Gangnam Style on YouTube
Webometrics and Studies of Cultural Diffusion -Psy Gangnam Style on YouTube
 
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting...
Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods:  Extracting...Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods:  Extracting...
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting...
 
Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods: Extracting So...
Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods:  Extracting So...Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods:  Extracting So...
Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods: Extracting So...
 
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
 
Methods and Tools for Facilitating Social Participation
Methods and Tools for Facilitating Social ParticipationMethods and Tools for Facilitating Social Participation
Methods and Tools for Facilitating Social Participation
 
Lecture4 Social Web
Lecture4 Social Web Lecture4 Social Web
Lecture4 Social Web
 
Social media & the News Revolution
Social media & the News RevolutionSocial media & the News Revolution
Social media & the News Revolution
 
National Geographic - Omniture Cafe 6/11/09
National Geographic - Omniture Cafe 6/11/09National Geographic - Omniture Cafe 6/11/09
National Geographic - Omniture Cafe 6/11/09
 
Summer Social Webshop: Technology-Mediated Social Participation
Summer Social Webshop: Technology-Mediated Social ParticipationSummer Social Webshop: Technology-Mediated Social Participation
Summer Social Webshop: Technology-Mediated Social Participation
 
Socialsensor project overview and topic discovery in tweeter streams
Socialsensor project overview and topic discovery in tweeter streams Socialsensor project overview and topic discovery in tweeter streams
Socialsensor project overview and topic discovery in tweeter streams
 
The Networked Creativity in the Censored Web 2.0
The Networked Creativity in the Censored Web 2.0The Networked Creativity in the Censored Web 2.0
The Networked Creativity in the Censored Web 2.0
 
Picturing the Social: Talk for Transforming Digital Methods Winter School
Picturing the Social: Talk for Transforming Digital Methods Winter SchoolPicturing the Social: Talk for Transforming Digital Methods Winter School
Picturing the Social: Talk for Transforming Digital Methods Winter School
 
The State of Social Media (and How to Use It and Not Lose Your Job)
The State of Social Media (and How to Use It and Not Lose Your Job)The State of Social Media (and How to Use It and Not Lose Your Job)
The State of Social Media (and How to Use It and Not Lose Your Job)
 
Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media Analysis
 
Social media 101
Social media 101Social media 101
Social media 101
 
Research with Social Media Data: Stewardship & Ethical Considerations
Research with Social Media Data: Stewardship & Ethical ConsiderationsResearch with Social Media Data: Stewardship & Ethical Considerations
Research with Social Media Data: Stewardship & Ethical Considerations
 
Social media mining PPT
Social media mining PPTSocial media mining PPT
Social media mining PPT
 

Kürzlich hochgeladen

VIP Moti Bagh Call Girls Free Doorstep Delivery 9873777170
VIP Moti Bagh Call Girls Free Doorstep Delivery 9873777170VIP Moti Bagh Call Girls Free Doorstep Delivery 9873777170
VIP Moti Bagh Call Girls Free Doorstep Delivery 9873777170Komal Khan
 
THE FRAUD NETFLIX ORIGINAL MEDIA PITCH PROJECT
THE FRAUD NETFLIX ORIGINAL MEDIA PITCH PROJECTTHE FRAUD NETFLIX ORIGINAL MEDIA PITCH PROJECT
THE FRAUD NETFLIX ORIGINAL MEDIA PITCH PROJECT17mos052
 
Music Video Codes and Conventions 2 .pptx
Music Video Codes and Conventions 2 .pptxMusic Video Codes and Conventions 2 .pptx
Music Video Codes and Conventions 2 .pptxjenrobinson12
 
Unveiling SOCIO COSMOS: Where Socializing Meets the Stars
Unveiling SOCIO COSMOS: Where Socializing Meets the StarsUnveiling SOCIO COSMOS: Where Socializing Meets the Stars
Unveiling SOCIO COSMOS: Where Socializing Meets the StarsSocioCosmos
 
Upgrade Your Twitter Presence with Socio Cosmos
Upgrade Your Twitter Presence with Socio CosmosUpgrade Your Twitter Presence with Socio Cosmos
Upgrade Your Twitter Presence with Socio CosmosSocioCosmos
 
Cosmic Conversations with Sociocosmos...
Cosmic Conversations with Sociocosmos...Cosmic Conversations with Sociocosmos...
Cosmic Conversations with Sociocosmos...SocioCosmos
 
Models Call Girls Shettihalli - 7001305949 Escorts Service 50% Off with Cash ...
Models Call Girls Shettihalli - 7001305949 Escorts Service 50% Off with Cash ...Models Call Girls Shettihalli - 7001305949 Escorts Service 50% Off with Cash ...
Models Call Girls Shettihalli - 7001305949 Escorts Service 50% Off with Cash ...jicagig173
 
fraud storyboards powerpoint media project
fraud storyboards powerpoint media projectfraud storyboards powerpoint media project
fraud storyboards powerpoint media project17mos052
 
social media advantages and disadvantages
social media advantages and disadvantagessocial media advantages and disadvantages
social media advantages and disadvantagesmehwishkhan1018786
 
The--Fraud: Netflix Original Media Pitch
The--Fraud: Netflix Original Media PitchThe--Fraud: Netflix Original Media Pitch
The--Fraud: Netflix Original Media Pitch17mos052
 
Amplify Your Brand with Our Tailored Social Media Marketing Services
Amplify Your Brand with Our Tailored Social Media Marketing ServicesAmplify Your Brand with Our Tailored Social Media Marketing Services
Amplify Your Brand with Our Tailored Social Media Marketing ServicesNetqom Solutions
 
办理伯明翰大学毕业证书文凭学位证书
办理伯明翰大学毕业证书文凭学位证书办理伯明翰大学毕业证书文凭学位证书
办理伯明翰大学毕业证书文凭学位证书saphesg8
 
定制(ENU毕业证书)英国爱丁堡龙比亚大学毕业证成绩单原版一比一
定制(ENU毕业证书)英国爱丁堡龙比亚大学毕业证成绩单原版一比一定制(ENU毕业证书)英国爱丁堡龙比亚大学毕业证成绩单原版一比一
定制(ENU毕业证书)英国爱丁堡龙比亚大学毕业证成绩单原版一比一ra6e69ou
 
Protecting Your Little Explorer at Home!
Protecting Your Little Explorer at Home!Protecting Your Little Explorer at Home!
Protecting Your Little Explorer at Home!andrekr997
 
Mastering Wealth with YouTube Content Marketing.pdf
Mastering Wealth with YouTube Content Marketing.pdfMastering Wealth with YouTube Content Marketing.pdf
Mastering Wealth with YouTube Content Marketing.pdfTirupati Social Media
 
AI Virtual Influencers: The Future of Influencer Marketing
AI Virtual Influencers:  The Future of Influencer MarketingAI Virtual Influencers:  The Future of Influencer Marketing
AI Virtual Influencers: The Future of Influencer MarketingCut-the-SaaS
 

Kürzlich hochgeladen (20)

VIP Moti Bagh Call Girls Free Doorstep Delivery 9873777170
VIP Moti Bagh Call Girls Free Doorstep Delivery 9873777170VIP Moti Bagh Call Girls Free Doorstep Delivery 9873777170
VIP Moti Bagh Call Girls Free Doorstep Delivery 9873777170
 
THE FRAUD NETFLIX ORIGINAL MEDIA PITCH PROJECT
THE FRAUD NETFLIX ORIGINAL MEDIA PITCH PROJECTTHE FRAUD NETFLIX ORIGINAL MEDIA PITCH PROJECT
THE FRAUD NETFLIX ORIGINAL MEDIA PITCH PROJECT
 
young call girls in Greater Noida 🔝 9953056974 🔝 Delhi escort Service
young call girls in  Greater Noida 🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in  Greater Noida 🔝 9953056974 🔝 Delhi escort Service
young call girls in Greater Noida 🔝 9953056974 🔝 Delhi escort Service
 
Hot Sexy call girls in Ramesh Nagar🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Ramesh Nagar🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Ramesh Nagar🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Ramesh Nagar🔝 9953056974 🔝 Delhi escort Service
 
Enjoy ➥8448380779▻ Call Girls In Noida Sector 93 Escorts Delhi NCR
Enjoy ➥8448380779▻ Call Girls In Noida Sector 93 Escorts Delhi NCREnjoy ➥8448380779▻ Call Girls In Noida Sector 93 Escorts Delhi NCR
Enjoy ➥8448380779▻ Call Girls In Noida Sector 93 Escorts Delhi NCR
 
Music Video Codes and Conventions 2 .pptx
Music Video Codes and Conventions 2 .pptxMusic Video Codes and Conventions 2 .pptx
Music Video Codes and Conventions 2 .pptx
 
Unveiling SOCIO COSMOS: Where Socializing Meets the Stars
Unveiling SOCIO COSMOS: Where Socializing Meets the StarsUnveiling SOCIO COSMOS: Where Socializing Meets the Stars
Unveiling SOCIO COSMOS: Where Socializing Meets the Stars
 
Upgrade Your Twitter Presence with Socio Cosmos
Upgrade Your Twitter Presence with Socio CosmosUpgrade Your Twitter Presence with Socio Cosmos
Upgrade Your Twitter Presence with Socio Cosmos
 
Cosmic Conversations with Sociocosmos...
Cosmic Conversations with Sociocosmos...Cosmic Conversations with Sociocosmos...
Cosmic Conversations with Sociocosmos...
 
Models Call Girls Shettihalli - 7001305949 Escorts Service 50% Off with Cash ...
Models Call Girls Shettihalli - 7001305949 Escorts Service 50% Off with Cash ...Models Call Girls Shettihalli - 7001305949 Escorts Service 50% Off with Cash ...
Models Call Girls Shettihalli - 7001305949 Escorts Service 50% Off with Cash ...
 
looking for escort 9953056974 Low Rate Call Girls In Vinod Nagar
looking for escort 9953056974 Low Rate Call Girls In  Vinod Nagarlooking for escort 9953056974 Low Rate Call Girls In  Vinod Nagar
looking for escort 9953056974 Low Rate Call Girls In Vinod Nagar
 
fraud storyboards powerpoint media project
fraud storyboards powerpoint media projectfraud storyboards powerpoint media project
fraud storyboards powerpoint media project
 
social media advantages and disadvantages
social media advantages and disadvantagessocial media advantages and disadvantages
social media advantages and disadvantages
 
The--Fraud: Netflix Original Media Pitch
The--Fraud: Netflix Original Media PitchThe--Fraud: Netflix Original Media Pitch
The--Fraud: Netflix Original Media Pitch
 
Amplify Your Brand with Our Tailored Social Media Marketing Services
Amplify Your Brand with Our Tailored Social Media Marketing ServicesAmplify Your Brand with Our Tailored Social Media Marketing Services
Amplify Your Brand with Our Tailored Social Media Marketing Services
 
办理伯明翰大学毕业证书文凭学位证书
办理伯明翰大学毕业证书文凭学位证书办理伯明翰大学毕业证书文凭学位证书
办理伯明翰大学毕业证书文凭学位证书
 
定制(ENU毕业证书)英国爱丁堡龙比亚大学毕业证成绩单原版一比一
定制(ENU毕业证书)英国爱丁堡龙比亚大学毕业证成绩单原版一比一定制(ENU毕业证书)英国爱丁堡龙比亚大学毕业证成绩单原版一比一
定制(ENU毕业证书)英国爱丁堡龙比亚大学毕业证成绩单原版一比一
 
Protecting Your Little Explorer at Home!
Protecting Your Little Explorer at Home!Protecting Your Little Explorer at Home!
Protecting Your Little Explorer at Home!
 
Mastering Wealth with YouTube Content Marketing.pdf
Mastering Wealth with YouTube Content Marketing.pdfMastering Wealth with YouTube Content Marketing.pdf
Mastering Wealth with YouTube Content Marketing.pdf
 
AI Virtual Influencers: The Future of Influencer Marketing
AI Virtual Influencers:  The Future of Influencer MarketingAI Virtual Influencers:  The Future of Influencer Marketing
AI Virtual Influencers: The Future of Influencer Marketing
 

Citizen Sensor Data Mining, Social Media Analytics and Applications

  • 1. 1
  • 2. Citizen sensor data mining, social media analytics and applications Singapore Symposium on Sentiment Analysis (S3A) ,Feb 6, 2015 Amit Sheth Kno.e.sis: Ohio Center of Excellence in Knowledge-enabled Computing @ Wright State University
  • 3. Acknowledgements Significant components of this talk is from the tutorial I gave at WWW2011: “Citizen Sensor Data Mining, Social Media Analytics and Development Centric Web Applications,” with Meena Nagarajan and Selvam Velmurugan. Contributors to Twitris and/or Semantic Social Web Research @ Kno.e.sis: L. Chen, H. Purohit, W. Wang with: P. Anantharam, A. Jadhav, P. Kapanipathi, Dr. T.K. Prasad, And alumni: K. Gomadam, M. Nagarajan, A. Ranabahu) Funding: NSF, AFRL, NIH; Collaborations: IBM, Microsoft 3
  • 4. Ohio Center of Excellence in Knowledge- enabled Computing • Among top 10 among all universities in the world in World Wide Web (cf: 10-yr impact, Microsoft Academic Search) • Largest academic group in the US in Semantic Web + Social/Sensor Webs, Mobile/Cloud/Cognitive Computing, Big Data, IoT, Health/Clinical & Biomedicine Applications • Exceptional student success: internships and jobs at top salary (IBM Research, MSR, Amazon, CISCO, Oracle, Yahoo!, Samsung, research universities, NLM, startups ) • 80+researchers including 15 World Class faculty (>3K citations/faculty) and 45+ PhD students- practically all funded • $2M+/yr research for largely multidisciplinary projects; world class resources; industry sponsorships/collaborations (Google, IBM, …) 4
  • 7. • Mumbai Terror Attack • Iran Election 2009 • Haiti Earthquake 2010 • Occupy Wall Street • Kashmir Floods 2014 Citizen Sensors in Action 7Image: http://huff.to/hp0OhA
  • 8. • Ghonim, who has been a figurehead for the movement against the Egyptian government, told Blitzer “If you want to liberate a government, give them the internet.” • Egyptian anti-government demonstrator sleeps on the pavement under spray paint that reads 'Al- Jazeera' and 'Facebook' at Cairo's Tahrir square on February 7, 2011. http://www.cbsnews.com/stories/2011/02 /15/eveningnews/main20032118.shtml Revolution 2.0 Political/Social Activism 8 • When Blitzer asked “Tunisia, then Egypt, what’s next?,” Ghonim replied succinctly “Ask Facebook.” http://cnn.com/video/?/video/world/2011/02/13/nr.social.media.revolution.cnn http://cnn.com/video/?/video/tech/2011/02/11/barnett.egypt.social.media.cnn
  • 9. Citizen Journalism 9 Twitter Journalism Images: http://bit.ly/9GVfPQ, http://bit.ly/hmrTYV
  • 10. • Social News • Social Media and Global Media are inter-twined. News is increasingly Social 10
  • 11. 11 Some of the significant human, social & economic development applications we work on at Kno.e.sis • Coordination during disasters (Qatar Computing Research Institute, Microsoft Research NYC) • Harassment on social media (WSU cognitive scientists) • Prescription drug abuse, Cannabis & Synthetic Cannabinoid epidemiology (Center for Interventions, Treatment and Addictions Research, ….) • Depressive disorders (Mayo Clinic) • Gender-based violence (United Nations) Highly multidisciplinary team efforts, often with significant partners, with real world data, intended to achieve real- world impact
  • 12. 12 Sample of Real-World Impact & Media Coverage • Twitter Data Mining Reveals America‘s Religious Fault Lines, MIT Technology Review, Oct 6, 2014 • Digital soldiers emerge heroes in Kashmir flood rescue, HindustanTimes, September 25, 2014 • India's social media election battle, BBC News, Mar 30, 2014 • #Cursing Study: 10 Lessons About How We Use Swear Words on Twitter, Time.com, Feb 19, 2014 • Twitris: Taking Crisis Mapping to the Next Level, Tech President, June 24, 2013 • Picking the President: Twindex, Twitris Track Social Media Electorate, Semanticweb.com, Aug 3, 2012 • Web App Analyzes Tweets in Real Time for a Record of Historic Events, Mashable.com, Feb 17, 2012
  • 13. 13 TWITRIS’ Technical Approach to Understand & Analyze Social Content Social Data is incredibly rich
  • 14. 14 Some of the topics on Online Social Media we research at Kno.e.sis 1. Named Entity Recognition 2. Language usage in Social Media 4. Exploration of People, Content and Network dynamics 6. Sentiment, Emotion and Opinion mining 5. Trust 6. Integrated exploitation of Sensor (physical), Web (Cyber) and Social data for PCS applications 7. TWITRIS: A System for Mining Collective Intelligence from Citizen-Sensor Data
  • 15. • "Who says what, to whom, why, to what extent and with what effect?" [Laswell] • Network: Social structure emerges from the aggregate of relationships (ties) • People: poster identities, the active effort of accomplishing interaction • Content : studying the content of communication Social Information Processing 15
  • 16. Why People-Content-Network + Spatial-Temporal-Thematic metadata? (Example of Understanding Crisis Data) 16 , Offer help, etc.
  • 17. ` • Explicit information from user profiles – User Names, Pictures, Videos, Links, Demographic Information, Group memberships... • Implicit information from user attention metadata – Page views, Facebook 'Likes', Comments; Twitter 'Follows', Retweets, Replies.. People Metadata: Variety of Self-expression Modes on Multiple Social Media Platforms 17
  • 18. People Metadata: Various Types Identification Structural Network Activity Interests 18
  • 19. People Metadata: Continued User Identification Metadata • User-id • Screen/Display-name of user • Real name of user • Location • Profile Creation Date • User description - Biodata of the user - Link to webpage of the user Interest Metadata • Author type - Trustee/donor, journalist, blogger, scientist etc. • Favorite tweets • Types of lists subscribed • Style of Writing (personality indicator) • No. of Followees • Majority of author type of Followees 19
  • 20. People Metadata: Continued Web Presence: - User affiliations - Influence Metric – e.g., KLOUT (www.klout.com) Activity Metadata • Age of the profile • Frequency of posts • Timestamp of last status • No. of Posts • No. of Lists/groups created • No. of Lists/groups subscribed Influence Metadata (Inferring People Metadata from Network level Information) • No. of Followers – normal, influential • No. of Mentions • No. of Retweets/Forwards • No. of Replies • No. of Lists/groups following • No. of people following back • Authority & Hub Scores 20
  • 21. Content Metadata: Content Dependent (Tweet) 23 Direct Content-based Metadata Indirect content-based metadata (External metadata)
  • 22. Direct Content-based Metadata Content Metadata: Content Dependent (SMS) 24
  • 23. Connections/Relationships matter! (foundation for the network) Network Metadata 25 Structure Metadata • Community Size • Community growth rate • Largest Strongly Connected Component size • Weakly Connected Components & Max(WCC) size • Average Degree of Separation • Clustering Coefficient Relationship Metadata • Type of Relationship • Relationship strength • User Homophily (based on certain characteristic such as location, interest etc.) • Reciprocity: mutual relationship • Active Community/ Ties
  • 24. Metadata Creation & Extraction Length: 109 characters General topic: Egypt protest This poor {sentiment_expression: {target: “Lara Logan”, polarity: “negative”}} woman! RT @THR CBS News‘ {entity:{type=“News Agency”}} Lara Logan {entity:{type=“Person”}} Released From Hospital {entity:{type=“Hospital”}} After Egypt {entity:{type=“Country”} Assault {topic} http://bit.ly/dKWTY0 {external_URL} 26
  • 25. Metadata Extraction from Informal Text Meena Nagarajan, ‘Understanding User-Generated Content on Social Media,’ Ph.D. Dissertation, Wright State University, 2010
  • 26. Content Analysis: Typical Sub-tasks • Recognize key entities mentioned in content – Information Extraction (entity recognition, anaphora resolution, entity classification..) – Discovery of Semantic Associations between entities • Topic Classification, Aboutness of content – What is the content about? • Intention Analysis – Why did they share this content? 28 • Sentiment Analysis – What opinions are people conveying via the content? • Author Profiling – What can we infer about the author from the content he posts? • Context (external to content) extraction – URL extraction, analyzing external content
  • 27. • Named Entity Recognition – I loved <movie> the hangover </movie>! • Key Phrase Extraction 29 NER, Key Phrase Extraction
  • 28. Named Entity Recognition “I loved your music Yesterday!” Yesterday is an album “It was THE HANGOVER of the year..lasted forever.. The Hangover is not a movie So I went to the movies..badchoice picking “GI Jane”worse now” GI Jane is a movie 30 Task of NER : Identifying and classifying tokens
  • 29. Analysing the Content can be Hard… Using a domain model (E.g., MusicBrainz) Using context cues from the content • e.g. new Merry Christmas tune Reduce potential entity spot size (with restrictions) • e.g. new albums/songs Multimodal Social Intelligence in a Real-Time Dashboard System Analyzing the content can be hard 31
  • 30. 32 Music NER application : BBC SoundIndex (IBM Almaden) Pulse of the Online Music Populace Daniel Gruhl, Meenakshi Nagarajan, Jan Pieper, Christine Robson, Amit Sheth: ‘Multimodal Social Intelligence in a Real-Time Dashboard System,’ special issue of the VLDB Journal on "Data Management and Mining for Social Networks and Social Media", 2010 Project: http://www.almaden.ibm.com/cs/projects/iis/sound/
  • 32. 34
  • 33. Several Insights 35 Only 4% -ve sentiments, perhaps ignore the Sentiment Annotator on this data source? Ignoring Spam can change ordering of popular artists Trending popularity of artists Trending topics in artist pages
  • 34. Predictive Power of Data • Billboards Top 50 Singles chart during the week of Sept 22-28 ’07 vs. MySpace popularity charts. • User study indicated 2:1 and upto 7:1 (younger age groups) preference for MySpace list. • Challenging traditional polling methods! 36
  • 36. Key Phrase Extraction - Example • Key phrases extracted from prominent discussions on Twitter around the 2009 Health Care Reform debate and 2008 Mumbai Terror Attack on one day 38
  • 37. 39 M. Nagarajan et al., Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data - Challenges and Experiences, Tenth International Conference on Web Information Systems Engineering, Oct 5-7, 2009: 539-553 TF-IDF vs. Spatio-temporal-thematic scores rank phrases differently Foreign relations surfaces up
  • 39. Why do people share? • Outside of the psychological incentives, broadly, people share to Seek Information OR Share Information • If we understand the intent behind a post, we can build systems that respond to it better • An application: Understand intent to deliver targeted content – Use case: Online Content-Targeted Advertisements on Social Media Platforms 41
  • 41. Today – Content-based Ads on Profiles 43
  • 42. What is going on here.. • Ads are targeted on profile interests, demographic data • But Interests on profiles do not translate to purchase intents – Interests are often outdated.. – Intents are rarely stated on a profile.. • Some profile data does seem to work – Example: New store openings, sales targeted at location information in a profile 44
  • 43. But Monetizable Intents are Elsewhere, away from their profiles.. 45
  • 44. Showing clear intents on MySpace posts but no relevant ads.. 46
  • 45. –Non-trivial –Non-policed content •Brand image, Unfavorable sentiments –People are there to network •User attention to ads is not guaranteed –Informal, casual nature of content •People are sharing experiences and events –Main message overloaded with off topic content I NEED HELP WITHSONY VEGAS PRO 8!! Ugh and ihave a video project due tomorrow for merrilllynch :(( all ineed to do is simple: Extract several scenes from a clip, insert captions, transitions and thatsit. really. omggicant figure out anything!! help!! and igot food poisoning from eggs. its not fun. Pleasssse, help? :( 1Learning from Multi-topic Web Documents for Contextual Advertisement, Zhang, Y., Surendran, A. C., Platt, J. C., and Narasimhan, M.,KDD 2008 Targeted Content-based Advertizing 47
  • 46. Focus: Discuss Methodology, Preliminary Results in… • Identifying intents behind user posts on social networks – Identify Content with monetization potential • Identifying keywords for advertizing in user-generated content – Considering interpersonal communication & off-topic chatter 48 M. Nagarajan et al., ‘Monetizing User Activity on Social Networks - Challenges and Experiences,’ 2009 IEEE/WIC/ACM International Conference on Web Intelligence, Sep 15-18 2009: 92-99
  • 47. Result - 8X more interest for non-profile ads.. • Using profile ads – Total of 56 ad impressions – 7% of ads generated interest • Using authored posts – Total of 56 ad impressions – 43% of ads generated interest • Using topical keywords from authored posts – Total of 59 ad impressions – 59% of ads generated interest 49
  • 48. SENTIMENT / OPINION MINING 50
  • 49. Sentiment Analysis: Motivation Which movie should I see? What customers complain about? Why do people oppose health care reform? Image: http://bit.ly/eZtKBF 51
  • 50. Content Analysis: Sentiment Analysis/Opinion Mining • Two main types of information we can learn from user- generated content: fact vs. opinion • Much of social media text (e.g., blogs, Twitter, Facebook) is a mix of facts and opinions. • Extracting structured sentiment information from unstructured content • Allowing computation to be done on “what people think” and “how people feel” 52
  • 51. • From coarse-grained to fine-grained – Document level -> sentence level -> expression level – General sentiment -> domain-dependent sentiment -> target- dependent sentiment • From static to dynamic – Our attitude can be changed during social communication. • Modeling, detecting, and tracking the change of attitude • What leads to the change of attitude? E.g., persuasion campaign 53 Sentiment Analysis: Challenges
  • 52. Sentiment Analysis: Target-specific Opinion Identification Observations: • The opinion clues may not be toward the given target (1,2,3,6) • The opinion clues are domain and context dependent (5,7) • Single words are not enough (4,7,8) Simple lexicon-based method doesn't work well. 54 Target of “sexy” is “Helena” Target of “terrific” is “reviews” “free” is not opinionated in movie domain. Target of “loving” is “telling” “well” in “as well” is not opinionated
  • 53. 55 Extracting a diverse and richer set of sentiment-bearing expressions, including formal and slang words/phrases Assessing the target-dependent polarity of each sentiment expression A novel formulation of assigning polarity to a sentiment expression as a constrained optimization problem over the tweet corpus Extracting Diverse Sentiment Expressions With Target-dependent Polarity from Twitter [Chen et al. ICWSM 2012]
  • 54. The Usage of Background Knowledge 56
  • 55. 57 Sentiment Analysis: Feature and Aspect Extraction Motivation • To understand a user’s opinions about a product at a fine-grained level, support opinion summarization for products, and automatically extract pros and cons from reviews it is essential to identify product features and aspects. Impact • Existing methods tend to require seed terms and focus on identifying explicit features or a few high-level aspects. • Our approach is capable of identifying both explicit and implicit aspects and does not require any labeling efforts. Approach • We use a combination of corpus-based association measures, and semantic similarity measures to identify product aspects in an efficient clustering based approach.
  • 56. 58 Clustering for Aspect Discovery in Opinion Mining [Chen et al. in submission]
  • 57. 59 It is actually about tracking public opinion. PollingorSocial Media Analysis? 1. Sample size 2. Representative of the target population 3. Accurate measure of opinions 4. Timeliness
  • 58. • We Study different groups of social media users who engage in the discussions of 2012 U.S. Republican Presidential Primaries, and compare the predictive power among these user groups. • Existing studies on predicting election result are under the assumption that all the users should be treated equally. • How could different groups of users be different in predicting election results? 60 Harnessing the Power of Social Data to Predict Election Results [Chen et al., SocInfo 2012]
  • 59. 61 1. Engagement Degree 2. Tweet Mode 3. Content Type 4. Political Preference User Categorization
  • 60. Predicting a User's Vote • Basic idea: for which candidate the user shows the most support – Frequent mentions – Positive sentiment 62 Nm(c): the number of tweets mentioning the candidate c Npos(c): the number of positive tweets about candidate c Nneg(c): the number of negative tweets about candidate c  (0 <  < 1): smoothing parameter  (0 <  < 1): discounting the score when the user does not express any opinion towards c. The user posted opinion about c The user mentioned c but did not post opinion about c More mentions, higher score More positive/less negative opinions, higher score
  • 61. 63 Revealing the challenge of identifying the vote intent of “silent majority” Retweets may not necessarily reflect users' attitude. Prediction of user’s vote based on more opinion tweets is not necessarily more accurate than the prediction using more information tweets The right-leaning user group provides the most accurate prediction result. It correctly predict the winners in 8 out of 10 states with an average prediction error of 0.1. To some extent, it demonstrates the importance of identifying likely voters in electoral prediction. Twitter users are not “equal” in predicting elections!
  • 63. Emotion Mining: Motivation 65 • Emotion is essential to all aspects of our lives. – Influences our decision-making – Affects our social relationships – Shapes our daily behavior • Emotional mental health – New mothers may suffer from post-partum depression – Veterans may constantly suffer from negative emotions because of post-traumatic stress disorder
  • 64. Emotion Mining: what have we studied 66 • Can we automatically create a large emotion dataset with high quality labels from Twitter? How? • What features can effectively improve the performance of supervised machine learning algorithms? • Can the system developed on Twitter data be directly applied to identify emotions from other datasets? • What can we learn about emotion from social media data?
  • 65. • Collect self-annotated emotion tweets [Wang et. al. SocialCom 2012] – Seven emotions: joy, sadness, anger, love, fear, surprise, thankfulness “When I see a cop, no matter where I am or what I’m doing, I always feel like every law I’ve ever broken is stamped all over my body #fear” “I hate when my mom compares me to my friends. #anger” “I hate when I get the hiccups in class. #embarrassing” Harnessing twitter" big data" for automatic emotion identification [Wang et al. SocialCom12] 67
  • 66. 0.4 0.45 0.5 0.55 0.6 0.65 1,000 10,000 248,898 497,796 746,694 995,592 1,244,490 1,493,388 1,742,286 1,991,184 accuracy number of tweets in training data LIBLINEAR MNB The more data, the merrier 68 Results of performing seven emotion classifications
  • 67. Discovering Fine-grained Emotion in Suicide Notes [Wang et al. BII12] 69 • Automatically classify suicide notes to different (15) categories at sentence level • Emotion categories – Positive • Hopefulness, thankfulness, forgiveness, love, pride, happiness – negative • Sorrow, abuse, anger, hopelessness, guilt, blame, fear • Other categories – Information, instructions
  • 68. Discovering Fine-grained Emotion in Suicide Notes [Wang et al. BII12] 70 Sentence: “Found out today that // I passed my math STAAR test.” • N-gram features • Unigram, e.g., found, today, passed, etc. • Bigram, e.g., found_out, out_today, etc. • N-gram position – Unigram: found-1, out-1, today-1,…,, I-2, passed-2, my-2, … • Knowledge-based features: – LIWC (Pennebaker et al., 2014a) – WordNet-Affect (Strapparava and Valitutti, 2004) – MPQA (Wilson et al., 2005) • Syntactic features: – Part-of-speech tags, e.g., Found/VBN out/RP today/NN that/IN I/PRP passed/VBD… – Dependency relations, e.g., root(ROOT-0, Found-1); ccomp(Found-1, passed-6); dobj(passed-6, test-10) …
  • 69. Discovering Fine-grained Emotion in Suicide Notes [Wang et al. BII12] 71 Winner: N-gram(1,2), knowledge-based and syntactic features
  • 70. Cursing in English on Twitter [Wang et al. CSCW14] 72 • The main reason that people use curse words is to express some strong emotions, especially anger and frustration. [Jay 1992, 2000; McEnergy 2006; Nasution and Rosa 2012]
  • 71. Normalized Emotion Distributions over Time in Eastern Standard TimeNormalized Emotion Distributions over Days (EST) “I am so thankful for my family && close friends. They hold me together when everything else around me is falling apart. #SoBlessed #Thankful” 73
  • 72. Normalized Emotion Distributions over Time (EST) “I thank God everytime I see another day :*) #thankful .” 74
  • 73. Rank Mom Dad 1 Irritation (7, 562) Irritation (3, 034) 2 Sadness (2, 315) Sadness (1, 363) 3 Affection (2, 225) Embarrassment (1, 158) 4 Zest (2, 213) Zest (1, 035) 5 Embarrassment (1, 849) Affection (1, 030) 6 Thankfulness (1, 537) Cheerfulness (911) 7 Cheerfulness (1, 332) envy (902) “I hate when my dad uses my laptop. Its mine. Not yours. You have your own computer. I have shit to do, get off now please. #annoyed” “ugh my mom gets so nervous when i drive #annoying” “My mom just told me I can't open any presents early cause I'm too old for that #depressing” What are the top Emotions Associated with Moms and Dads? 75
  • 74. PEOPLE ANALYSIS - Deriving People Metadata - from Content Analysis - from Network Analysis - Merge of two approaches - People-Content-Network Analysis to leverage the metadata - Finding Influential Users - Finding User Types & Affiliation - Measuring Social Engagement - Leverage communities to assist coordination 76
  • 75. People Analysis: Social Engagement & Coordination 77 Imagine a crisis scenario such as Haiti earthquake (2010) or hurricane Sandy (2012) - emergency teams are looking for ways to help the victims • What are the best possible ways to communicate: identify and engage people • Between resource providers (supply) and people in need of resources (demand) • Topical community influencers • How response teams can coordinate social media communities well between volunteers, managers in organizational structure, and resource seekers?
  • 76. People Analysis: Who is asking for help, Who is offering to help? Smart Data in the context of Disaster Management ACTIONABLE: Timely delivery of right resources and information to the right people at right location! 78 Because everyone wants to Help, but DON’T KNOW HOW!
  • 77. Really sparse Signal to Noise: • 2M tweets during the first 48 hrs. of #Oklahoma-tornado-2013 - 1.3% as the precise resource donation requests to help - 0.02% as the precise resource donation offers to help 79 • Anyone know how to get involved to help the tornado victims in Oklahoma??#tornado #oklahomacity (OFFER) • I want to donate to the Oklahoma cause shoes clothes even food if I can (OFFER) Disaster Response Coordination: Finding Actionable Nuggets for Responders to act • Text REDCROSS to 909-99 to donate to those impacted by the Moore tornado! http://t.co/oQMljkicPs (REQUEST) • Please donate to Oklahoma disaster relief efforts.: http://t.co/crRvLAaHtk (REQUEST) For responders, most important information to manage coordination dependencies is the scarcity and availability of resources Blog by our colleague Patrick Meier on this analysis: http://irevolution.net/2013/05/29/analyzing-tweets-tornado/
  • 78. People Analysis: Match demander- suppliers for coordination during crisis Purohit, H., Castillo, C., Diaz, F., Sheth, A., & Meier, P. (2013). Emergency-relief coordination on social media: Automatically matching resource requests and offers. First Monday, 19(1). 80
  • 79. Demand-Supply identification and representation: core & facets • Extract Core of the phrase- “what” – Other facets includes “who”, “where”, “when”, etc. • Supervised Learning to classify items for demands, supplies, and resource type facets 81 Rotary collecting clothing and other donations in New Jersey <URL> { source: “Twitter”, author: “@NN”, text: “Rotary collecting clothing and other donations in New Jersey <URL>”, donation-info: { donation-type: “Request”, donation-type-confidence: 0.8, donation-organization: “Rotary”, donation-item: “clothing and other donations”, donation-location: “New Jersey” }, … } Corresponding data item in the semi-structured knowledge inventory: • IR model approach to match demand (request) with supply (offer) items in this semantically annotated knowledge inventory
  • 80. Leveraging Communities for Whom to Engage With, Why and How 82 Purohit et al., User Taglines: Alternative Presentations of Expertise and Interest in Social Media . ASE Social Informatics, 2012
  • 81. Network Analysis Interesting questions to ask: • How communities form around topics- growth & evolution • What are the effects of influential participants in the communities • What are the effects of content nature (or sentiment, opinions) flowing in network on the community structures and growth • What is the community structure: degree of separation and sub- communities that contribute for macro-level effects, e.g., coordination, engagement “To Discover How A, is in Touch with B and C, Is Affected by the Relation Between B & C” -John Barnes 83 Foundation of network: •Nodes •Connections/Relationships Image: http://www.onasurveys.com/
  • 82. Graphs showing sparse (A) and dense (B) RT networks and their corresponding follower graphs for 'call for action' and 'information sharing' tweet content types M. Nagarajan, H. Purohit, and A. Sheth, ’A Qualitative Examination of Topical Tweet and Retweet Practices,’ 4th Int'l AAAI Conference on Weblogs and Social Media, ICWSM 2010 84
  • 83. Understanding Evolving Community Structures for Coordination 85 User interaction networks of two topical communities– Occupy LA and Chicago, of emerging influencers during Occupy Wall Street (OWS) event 2011 Application of evolving communities: H. Purohit, J. Ajmera, S. Joshi, A. Verma, A. Sheth. Finding Influential Authors in Brand-Page Communities. 6th Int'l AAAI Conference on Weblogs and Social Media (ICWSM), Dublin, Ireland, June 5-7, 2012
  • 84. Evolution of influencer interaction networks for Romney vs. Obama topical communities, during U.S. Presidential Election 2012 debates Romney Obama Before 1st debate After 1st debate After Hurricane Sandy After 3rd debate Understanding Community Evolution for Real-World Actions 86 Social Media analysis for US elections 2012, powered by Twitris: http://analysis.knoesis.org/uselection/insights/
  • 85. On Understanding the Divergence of Online Social Group Discussion • Change of group discussion divergence over time, and different phases of real world events • Relation between discussion divergence and existing theories of social cohesion and social identity in Psychology • Prediction of future change in the group discussion divergence Research Questions on Social Dynamics in Communities Acknowledgement: NSF SoCS grant for ‘Leveraging Social Media during Emergency Response’ Purohit, H., Ruan, Y., Fuhry, D., Parthasarathy, S., & Sheth, A. (2014, May). On Understanding Divergence of Online Social Group Discussion. In 8th Intl AAAI Conference on Weblogs and Social Media.
  • 86. • Prior work: – Focus on structural metrics to understand group evolution dynamics, but may not be sufficient to answer ‘WHY a group diverges over time’ • Our approach: – Content driven measure: collective divergence of group members for topics of discussion – Features assessing role of socio-psychological theories: cohesion & identity • Data: – Tweets during evolving events of natural disasters, and social activism Contrasting Prior Work and Approach Evolution of groups in online social communities surrounding events  On Understanding the Divergence of Online Social Group Discussion Purohit, H., Ruan, Y., Fuhry, D., Parthasarathy, S., & Sheth, A. (2014, May). On Understanding Divergence of Online Social Group Discussion. In 8th Intl AAAI Conference on Weblogs and Social Media. 88
  • 87. • During #sandy, predicted low diverging (focused) groups to engage with on the updates of flights, first delays & cancellation, then resuming • Natural disaster (D) events (Hurricane Irene and Sandy) have stronger correlations with identity-driven features than with cohesion featuresWe predicted group discussion divergence across phases, by 0.83 AUC Time On Understanding the Divergence of Online Social Group Discussion Purohit, H., Ruan, Y., Fuhry, D., Parthasarathy, S., & Sheth, A. (2014, May). On Understanding Divergence of Online Social Group Discussion. In 8th Intl AAAI Conference on Weblogs and Social Media. 89
  • 88. Continuous Semantics for Evolving Events to Extract Smart Data 90
  • 90. Live Demo of Powerful Social Media Analysis: Twitris 92
  • 91. Twitris - Motivation 1. Information Overload • Multiple events around us • WHAT to be aware of • Multiple Storylines about same event!! 93 Image: http://bit.ly/etFezl
  • 92. Twitris - Motivation 2. Evolution of Citizen Observation • with location and time 94
  • 93. Twitris - Motivation 3. Semantics of Social perceptions • What is being said about an event (theme) • Where (spatial) • When (temporal ) Twitris lets you browse citizen reports using social perceptions as the fulcrum 95
  • 94. Twitris: Semantic Social Web Mash-up Facilitates understanding of multi-dimensional social perceptions over SMS, Tweets, multimedia Web content, electronic news media 96 96
  • 95. Twitris: Architecture 97 Meenakshi Nagarajan, Karthik Gomadam, Amit Sheth, Ajith Ranabahu, Raghava Mutharaju and Ashutosh Jadhav, ‘Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data - Challenges and Experiences,’ Tenth International Conference on Web Information Systems Engineering, 539 - 553, Oct 5-7, 2009.
  • 98. Incoming Tweets with need types to give quick idea of what is needed and where currently #OKC Legends for Different needs #OKC 100 Clicking on a tag brings contextual information– relevant tweets, news/blogs, and Wikipedia articles Twitris: Real-time information
  • 99. How People from Different parts of the world talked about US Election Images and Videos Related to US Election 101 Twitris: Analysis by location for contrast in social perceptions
  • 100. Twitris: Sentiment Analysis • Sentiment Analysis – using statistical and machine learning techniques 102
  • 101. 103 How was Obama doing in the first debate? Twitris: Sentiment Analysis- Smart Answers with reasoning!
  • 102. The Dead People mentioned in the event OWC 104 Twitris: Impact of Background Knowledge
  • 103. Twitris: Demo, Quick Show http://twitris2.knoesis.org/ • Many other interesting efforts – Eg: Vivek K. Singh, Mingyan Gao, and Ramesh Jain. 2010. From microblogs to social images: event analytics for situation assessment. In Proceedings of the international conference on Multimedia information retrieval (MIR '10). ACM, New York, NY, USA, 433-436. 105
  • 104. • Do you have a sense of immense opportunity of analyzing citizen sensing for useful social signals? • Do you appreciate the broad range of issues and challenges? Did we present examples and a few insights into how to address some unique challenges? • Did spatio-temporal-thematic, people-content-network, emotion-sentiment-intent dimensions present reasonable way to organize vast number of relevant research challenges and techniques? 106 Conclusions
  • 105. 107 http://knoesis.org Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing Wright State University, Dayton, Ohio, USA thank you, and please visit us at

Hinweis der Redaktion

  1. Got carried away with coverage and content – too much material for 3 hours – so the remaining content can be used as background
  2. Many media companies use Facebook and Twitter as news-delivery platform. Many individuals rely on them as news source. News is increasingly social.
  3. Interest level: (Based on Description info, lists and fav. tweets)
  4. Semantic metadata, relationships: Inferred?
  5. Structure Level Metadata Community Size - Showing scale: global vs. local Community growth rate - Popularity estimation for a topic Largest Strongly Connected Component size - Measuring Reachability in the directed graph No. of Weakly Connected Components & Max. size - distribution of pre-existing network connections (follower-followee) - Showing Nature: loose vs. compact Average Degree of Separation - How many hops between two authors Clustering Coefficient - Showing the likelihood of association Relationship Level Metadata Type of Relationship - topic/content (based on Retweet, Entity etc.) - follower/followee (based on structure) Relationship strength - Strong vs. Weak ties based on activity/ communication between users  - % tie strength User Homophily [Homophily (i.e., "love of the same") is the tendency of individuals to associate and bond with similar others] based on certain characteristic (e.g., Location, interest etc.) % of users showing similar behavior Reciprocity: mutual relationship - % of users following back their followers Active Community/ Ties - How active is the communication between users or how active are the relationship ties  - Average of tie strength based on activity
  6. Building on foundations of  Statistical Natural Language Processing Information Extraction Semantic Web/ Knowledge Representation We will talk about key issues in extracting metadata from Informal Text and how it varies from what has been done in more well-structured text like news articles etc.
  7. What the two tasks look like in terms of outputs they produce
  8. This is an application of the NER work
  9. We have come a long way but still room for improvement
  10. Social media serves as a platform for people to speak their mind more freely, which lead to a growing volume of opinionated data that can be used by:   (1) individuals for suggestion and recommendation (2) companies and organizations for marketing strategies and other decision making process (3) government for monitoring social phenomenons, being aware of potential dangerous situations, etc.
  11. Fact can be proven, opinion cannot.   An opinion is normally a subjective statement that bases on people's thoughts, feelings and understandings.
  12. One of the most attractive advantages of unsupervised approaches is that they do not require for training data. Many sentiment analysis applications for social media content use simple lexicon-based method. However, for the problem of target-specific sentiment analysis, it doesn't work. Based on simple lexicon-based method which use a general sentiment lexicon containing positive/negative/neutral words in the general sense,  (1) for the task of "find tweets containing positive opinions about a specific topic", such as a movie, the results will like the table shows. However, 2,3,5,6,7 don't contain opinions about the movie.  (2) for the task of extract the opinion clues/expressions, the right answers should be like we show in the other picture. However, the simple  lexicon-based method might give all the words with orange color in the table.
  13. We use background knowledge to help identifying the entity mentioned in the text, e.g., the knowledge from IMDB and Freebase is used to determine whether a noun phrase in the text is the name of a movie or a person. The lexical resources such as Urban Dictionary are used to help identifying the sentiment clues in the text. Urban Dictionary is a popular online slang dictionary with word definitions written by users. Each word is associated with a list of related words to interpret it, and many glossary definitions given by different users. Both the related words list and glossary definitions can be used to help determining the sentiment of the spotted word. E.g., the word “wicked” has a list of related words, and most of those words carry positive sentiment, so that we can infer that “wicked” is highly possible a positive sentiment clue. In addition, there is also a definition of “wicked” given by user saying that it has different meanings in different countries. Given this knowledge, if we know the location of the author who wrote the tweet, we can infer whether “wicked” in the tweet  is used as a sentiment clue, and whether it is positive, negative or neutral.
  14. While sentiment analysis concerns about people’s opinions about something, emotion analysis focuses on our own emotional state, our mental health! Am I happy? Sad? Angry? Etc.
  15. As an emotional create, emotion plays an important role in all aspects of our lives! (1) Influences our decision-making (2) Affects our social relationships (3) Shapes our daily behavior What is more important, emotions affect our mental health: Take new mothers and veterans for example
  16. It is difficult to annotate sentences with emotion labels for following reasons: Emotion is more fine-grained (joy, sadness, anger, etc.), while sentiment usually deals with only positive, neutral and negative labels. A reader may incorrectly interpret the emotion embedded in a sentence by a writer
  17. We leverage more than 100 emotion-related hashtags to filter Twitter streaming data and use ending emotion hashtags to infer the emotion label of a tweet, e.g., “leaving for hospital #nervous” -> sadness emotion (1) We kept only the tweets with the emotion hashtags at the end (2) We discarded tweets which have less than five words, since they may not provide sufficient context to infer emotions (3) We removed the tweets which contain URLs or quotations. A large amount of tweets with URLs are information-oriented, which do not convey emotions.
  18. This figure shows the benefits of leveraging Twitter ‘big data’: When the size of training data is 1,000, the classification accurary is about 45%; When we increase the size of training data to 10,000, the classification accurary gets close to 55%; When we further increase the size of training data to about 2M, the classification accurary reaches about 65%.
  19. As an emotional create, emotion plays an important role in all aspects of our lives! (1) Influences our decision-making (2) Affects our social relationships (3) Shapes our daily behavior What is more important, emotions affect our mental health: Take new mothers and veterans for example
  20. As an emotional create, emotion plays an important role in all aspects of our lives! (1) Influences our decision-making (2) Affects our social relationships (3) Shapes our daily behavior What is more important, emotions affect our mental health: Take new mothers and veterans for example
  21. As an emotional create, emotion plays an important role in all aspects of our lives! (1) Influences our decision-making (2) Affects our social relationships (3) Shapes our daily behavior What is more important, emotions affect our mental health: Take new mothers and veterans for example
  22. As an emotional create, emotion plays an important role in all aspects of our lives! (1) Influences our decision-making (2) Affects our social relationships (3) Shapes our daily behavior What is more important, emotions affect our mental health: Take new mothers and veterans for example
  23. User engagement levels: applications in coordination activities Connecting the dots here with NGO initiatives (*presented by Selvam)
  24. Categorization of severity based on weather conditions. Actionable information is contextually dependent.
  25. Supervised Machine Learning based system to enable support for high level operations of coordination, by mining demand-supplies of resources/services, and matching them.
  26. 1.) Extract information nuggets for donations, requests and offers and the context (geo, time), etc.. 2.) Semi-structured knowledge-based is then used for Matching of demand-supply to assist coordination
  27. Example of PCN analysis in action– Clustering mined influencers (from network), by the user demographics (People) and ability to tune engagement by understanding ‘why’ of the influencers (Content)
  28. Connections/Relationships - Implicit content features
  29. Authoritative nature of the poster or the volume of follower connections did not predict the re-tweet behavior associated with the tweets! ‘Call of action’ type of content creates sparse retweet networks while giving less weight to the attribution of users – because ‘action’ is important than attribution in that context.
  30. Interaction networks can work as proxy for identify influencers in the evolving communities (by using network algorithms like PageRank), because traditional network analysis of community structures can not work due to sparse user connections data, e.g., follower-followee networks.
  31. Slide #1: Introduce the project, participants and the main goal Slide #2: Substantive slide showing either key graphic/chart or claim from this work Slide #3 (optional): Provide additional context or teaser for what would be discovered on poster
  32. Increasing diverging groups write more of general reporting type content based on past incidents, while ones with decreasing diverging behavior write more social & future action related content Least diverging group members practice RT heavily, while the most divergent groups, hashtags Group discussion divergence increases during the event, but decreases in the post phase
  33. Explain about continuous semantics
  34. (It is real-time widget for monitoring of needs, so will not be active after the event has passed) http://twitris.knoesis.org/oklahomatornado
  35. And http://knoesis.org/vision