Opening talk at Singapore Symposium on Sentiment Analysis (S3A), February 6, 2015, Singapore. http://s3a.sentic.net/#s3a2015
Abstract
With the rapid rise in the popularity of social media, and near ubiquitous mobile access, the sharing of observations and opinions has become common-place. This has given us an unprecedented access to the pulse of a populace and the ability to perform analytics on social data to support a variety of socially intelligent applications -- be it for brand tracking and management, crisis coordination, organizing revolutions or promoting social development in underdeveloped and developing countries.
I will review: 1) understanding and analysis of informal text, esp. microblogs (e.g., issues of cultural entity extraction and role of semantic/background knowledge enhanced techniques), and 2) how we built Twitris, a comprehensive social media analytics (social intelligence) platform.
I will describe the analysis capabilities along three dimensions: spatio-temporal-thematic, people-content-network, and sentiment-emption-intent. I will couple technical insights with identification of computational techniques and real-world examples using live demos of Twitris (http://twitris2.knoesis.org).
2. Citizen sensor data mining,
social media analytics and applications
Singapore Symposium
on Sentiment Analysis (S3A) ,Feb 6, 2015
Amit Sheth
Kno.e.sis: Ohio Center of Excellence
in Knowledge-enabled Computing
@ Wright State University
3. Acknowledgements
Significant components of this talk is from the tutorial I gave at WWW2011:
“Citizen Sensor Data Mining, Social Media Analytics and Development
Centric Web Applications,” with Meena Nagarajan and Selvam Velmurugan.
Contributors to Twitris and/or Semantic Social Web Research @ Kno.e.sis:
L. Chen, H. Purohit, W. Wang
with: P. Anantharam, A. Jadhav, P. Kapanipathi, Dr. T.K. Prasad,
And alumni: K. Gomadam, M. Nagarajan, A. Ranabahu)
Funding: NSF, AFRL, NIH; Collaborations: IBM, Microsoft
3
4. Ohio Center of Excellence in Knowledge-
enabled Computing
• Among top 10 among all universities in the world in World Wide Web (cf:
10-yr impact, Microsoft Academic Search)
• Largest academic group in the US in Semantic Web + Social/Sensor
Webs, Mobile/Cloud/Cognitive Computing, Big Data, IoT, Health/Clinical &
Biomedicine Applications
• Exceptional student success: internships and jobs at top salary (IBM
Research, MSR, Amazon, CISCO, Oracle, Yahoo!, Samsung, research
universities, NLM, startups )
• 80+researchers including 15 World Class faculty (>3K citations/faculty)
and 45+ PhD students- practically all funded
• $2M+/yr research for largely multidisciplinary projects; world class
resources; industry sponsorships/collaborations (Google, IBM, …)
4
7. • Mumbai Terror
Attack
• Iran Election
2009
• Haiti Earthquake
2010
• Occupy Wall
Street
• Kashmir Floods
2014
Citizen Sensors in Action
7Image: http://huff.to/hp0OhA
8. • Ghonim, who has been a figurehead for the movement
against the Egyptian government, told Blitzer “If you
want to liberate a government, give them the internet.”
• Egyptian anti-government
demonstrator sleeps on the pavement
under spray paint that reads 'Al-
Jazeera' and 'Facebook' at Cairo's
Tahrir square on February 7, 2011.
http://www.cbsnews.com/stories/2011/02
/15/eveningnews/main20032118.shtml
Revolution 2.0
Political/Social Activism
8
• When Blitzer asked “Tunisia, then Egypt, what’s next?,”
Ghonim replied succinctly “Ask Facebook.”
http://cnn.com/video/?/video/world/2011/02/13/nr.social.media.revolution.cnn
http://cnn.com/video/?/video/tech/2011/02/11/barnett.egypt.social.media.cnn
10. • Social News
• Social Media and
Global Media are
inter-twined.
News is increasingly Social
10
11. 11
Some of the significant human, social & economic
development applications we work on at Kno.e.sis
• Coordination during disasters (Qatar Computing Research
Institute, Microsoft Research NYC)
• Harassment on social media (WSU cognitive scientists)
• Prescription drug abuse, Cannabis & Synthetic
Cannabinoid epidemiology (Center for Interventions, Treatment
and Addictions Research, ….)
• Depressive disorders (Mayo Clinic)
• Gender-based violence (United Nations)
Highly multidisciplinary team efforts, often with significant
partners, with real world data, intended to achieve real-
world impact
12. 12
Sample of Real-World Impact & Media Coverage
• Twitter Data Mining Reveals America‘s Religious Fault Lines,
MIT Technology Review, Oct 6, 2014
• Digital soldiers emerge heroes in Kashmir flood rescue,
HindustanTimes, September 25, 2014
• India's social media election battle, BBC News, Mar 30, 2014
• #Cursing Study: 10 Lessons About How We Use Swear Words on
Twitter, Time.com, Feb 19, 2014
• Twitris: Taking Crisis Mapping to the Next Level, Tech President,
June 24, 2013
• Picking the President: Twindex, Twitris Track Social Media
Electorate, Semanticweb.com, Aug 3, 2012
• Web App Analyzes Tweets in Real Time for a Record of Historic
Events, Mashable.com, Feb 17, 2012
14. 14
Some of the topics on Online Social Media
we research at Kno.e.sis
1. Named Entity Recognition
2. Language usage in Social Media
4. Exploration of People, Content and Network dynamics
6. Sentiment, Emotion and Opinion mining
5. Trust
6. Integrated exploitation of Sensor (physical), Web (Cyber)
and Social data for PCS applications
7. TWITRIS: A System for Mining Collective Intelligence
from Citizen-Sensor Data
15. • "Who says what, to whom,
why, to what extent and with what effect?" [Laswell]
• Network: Social structure emerges
from the aggregate of relationships (ties)
• People: poster identities, the active effort of
accomplishing interaction
• Content : studying the content of communication
Social Information
Processing
15
17. `
• Explicit information from user profiles
– User Names, Pictures, Videos, Links, Demographic Information,
Group memberships...
• Implicit information from user attention metadata
– Page views, Facebook 'Likes', Comments; Twitter 'Follows',
Retweets, Replies..
People Metadata:
Variety of Self-expression Modes
on Multiple Social Media Platforms
17
19. People Metadata: Continued
User Identification Metadata
• User-id
• Screen/Display-name of user
• Real name of user
• Location
• Profile Creation Date
• User description
- Biodata of the user
- Link to webpage of the user
Interest Metadata
• Author type
- Trustee/donor, journalist, blogger,
scientist etc.
• Favorite tweets
• Types of lists subscribed
• Style of Writing (personality
indicator)
• No. of Followees
• Majority of author type of
Followees
19
20. People Metadata: Continued
Web Presence:
- User affiliations
- Influence Metric – e.g., KLOUT (www.klout.com)
Activity Metadata
• Age of the profile
• Frequency of posts
• Timestamp of last status
• No. of Posts
• No. of Lists/groups created
• No. of Lists/groups subscribed
Influence Metadata
(Inferring People Metadata from Network level Information)
• No. of Followers – normal, influential
• No. of Mentions
• No. of Retweets/Forwards
• No. of Replies
• No. of Lists/groups following
• No. of people following back
• Authority & Hub Scores
20
23. Connections/Relationships matter! (foundation for the network)
Network Metadata
25
Structure Metadata
• Community Size
• Community growth rate
• Largest Strongly Connected
Component size
• Weakly Connected Components &
Max(WCC) size
• Average Degree of Separation
• Clustering Coefficient
Relationship Metadata
• Type of Relationship
• Relationship strength
• User Homophily (based on certain
characteristic such as location,
interest etc.)
• Reciprocity: mutual relationship
• Active Community/ Ties
24. Metadata Creation & Extraction
Length: 109 characters
General topic: Egypt protest
This poor {sentiment_expression: {target: “Lara Logan”,
polarity: “negative”}} woman! RT @THR CBS News‘
{entity:{type=“News Agency”}} Lara Logan
{entity:{type=“Person”}} Released From Hospital
{entity:{type=“Hospital”}} After Egypt
{entity:{type=“Country”} Assault {topic} http://bit.ly/dKWTY0
{external_URL}
26
25. Metadata Extraction from
Informal Text
Meena Nagarajan, ‘Understanding User-Generated Content on Social Media,’ Ph.D. Dissertation, Wright State University, 2010
26. Content Analysis: Typical Sub-tasks
• Recognize key entities mentioned in content
– Information Extraction (entity recognition, anaphora resolution, entity
classification..)
– Discovery of Semantic Associations between entities
• Topic Classification, Aboutness of content
– What is the content about?
• Intention Analysis
– Why did they share this content?
28
• Sentiment Analysis
– What opinions are people conveying via the content?
• Author Profiling
– What can we infer about the author from the content he posts?
• Context (external to content) extraction
– URL extraction, analyzing external content
27. • Named Entity Recognition
– I loved <movie> the hangover </movie>!
• Key Phrase Extraction
29
NER, Key Phrase Extraction
28. Named Entity Recognition
“I loved your music Yesterday!”
Yesterday is an album
“It was THE HANGOVER of the year..lasted forever..
The Hangover is not a movie
So I went to the movies..badchoice picking “GI
Jane”worse now”
GI Jane is a movie
30
Task of NER : Identifying and classifying tokens
29. Analysing the Content can be Hard…
Using a domain model (E.g., MusicBrainz)
Using context cues from the content
• e.g. new Merry Christmas tune
Reduce potential entity spot size (with restrictions)
• e.g. new albums/songs
Multimodal Social Intelligence in a Real-Time Dashboard System
Analyzing the content can be hard
31
30. 32
Music NER application : BBC SoundIndex
(IBM Almaden)
Pulse of the Online Music Populace
Daniel Gruhl, Meenakshi Nagarajan, Jan Pieper, Christine Robson, Amit Sheth: ‘Multimodal Social Intelligence in a Real-Time Dashboard System,’
special issue of the VLDB Journal on "Data Management and Mining for Social Networks and Social Media", 2010
Project: http://www.almaden.ibm.com/cs/projects/iis/sound/
33. Several Insights
35
Only 4% -ve sentiments, perhaps ignore the Sentiment
Annotator on this data source?
Ignoring Spam can change ordering
of popular artists
Trending popularity of artists Trending topics in artist pages
34. Predictive Power of Data
• Billboards Top 50 Singles chart
during the week of Sept 22-28
’07 vs. MySpace popularity
charts.
• User study indicated 2:1 and
upto 7:1 (younger age groups)
preference for MySpace list.
• Challenging traditional polling
methods!
36
36. Key Phrase Extraction - Example
• Key phrases extracted from prominent discussions on
Twitter around the 2009 Health Care Reform debate and
2008 Mumbai Terror Attack on one day
38
37. 39
M. Nagarajan et al., Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data - Challenges and Experiences, Tenth International Conference on Web
Information Systems Engineering, Oct 5-7, 2009: 539-553
TF-IDF vs. Spatio-temporal-thematic scores rank phrases differently
Foreign relations
surfaces up
39. Why do people share?
• Outside of the psychological incentives, broadly, people
share to Seek Information OR Share Information
• If we understand the intent behind a post, we can build
systems that respond to it better
• An application: Understand intent to deliver targeted
content
– Use case: Online Content-Targeted Advertisements on Social Media
Platforms
41
42. What is going on here..
• Ads are targeted on profile interests, demographic data
• But Interests on profiles do not translate to purchase
intents
– Interests are often outdated..
– Intents are rarely stated on a profile..
• Some profile data does seem to work
– Example: New store openings, sales targeted at location
information in a profile
44
45. –Non-trivial
–Non-policed content
•Brand image, Unfavorable sentiments
–People are there to network
•User attention to ads is not guaranteed
–Informal, casual nature of content
•People are sharing experiences and events
–Main message overloaded with off
topic content
I NEED HELP WITHSONY VEGAS PRO 8!! Ugh and ihave a
video project due tomorrow for merrilllynch :(( all ineed
to do is simple: Extract several scenes from a clip, insert
captions, transitions and thatsit. really. omggicant figure
out anything!! help!! and igot food poisoning from eggs.
its not fun. Pleasssse, help? :(
1Learning from Multi-topic Web Documents for Contextual Advertisement, Zhang, Y., Surendran, A. C., Platt, J. C., and Narasimhan, M.,KDD 2008
Targeted Content-based Advertizing
47
46. Focus: Discuss Methodology,
Preliminary Results in…
• Identifying intents behind user posts on social networks
– Identify Content with monetization potential
• Identifying keywords for advertizing in user-generated
content
– Considering interpersonal communication & off-topic chatter
48
M. Nagarajan et al., ‘Monetizing User Activity on Social Networks - Challenges and Experiences,’ 2009 IEEE/WIC/ACM International Conference on Web
Intelligence, Sep 15-18 2009: 92-99
47. Result - 8X more interest for non-profile
ads..
• Using profile ads
– Total of 56 ad impressions
– 7% of ads generated interest
• Using authored posts
– Total of 56 ad impressions
– 43% of ads generated interest
• Using topical keywords from authored posts
– Total of 59 ad impressions
– 59% of ads generated interest
49
49. Sentiment Analysis: Motivation
Which movie
should I see?
What
customers
complain
about?
Why do
people
oppose
health care
reform?
Image: http://bit.ly/eZtKBF
51
50. Content Analysis:
Sentiment Analysis/Opinion Mining
• Two main types of information we can learn from user-
generated content: fact vs. opinion
• Much of social media text (e.g., blogs, Twitter, Facebook)
is a mix of facts and opinions.
• Extracting structured sentiment information from
unstructured content
• Allowing computation to be done on “what people think”
and “how people feel”
52
51. • From coarse-grained to fine-grained
– Document level -> sentence level -> expression level
– General sentiment -> domain-dependent sentiment -> target-
dependent sentiment
• From static to dynamic
– Our attitude can be changed during social communication.
• Modeling, detecting, and tracking the change of attitude
• What leads to the change of attitude? E.g., persuasion
campaign
53
Sentiment Analysis: Challenges
52. Sentiment Analysis:
Target-specific Opinion Identification
Observations:
• The opinion clues may not be toward the given target
(1,2,3,6)
• The opinion clues are domain and context dependent
(5,7)
• Single words are not enough (4,7,8)
Simple lexicon-based method doesn't work well.
54
Target of “sexy” is “Helena”
Target of “terrific” is “reviews”
“free” is not opinionated in
movie domain.
Target of “loving” is “telling”
“well” in “as well” is not
opinionated
53. 55
Extracting a diverse and richer
set of sentiment-bearing
expressions, including formal
and slang words/phrases
Assessing the
target-dependent polarity
of each sentiment
expression
A novel formulation of assigning
polarity to a sentiment expression
as a constrained optimization
problem over the tweet corpus
Extracting Diverse Sentiment Expressions
With Target-dependent Polarity from Twitter [Chen et al. ICWSM 2012]
55. 57
Sentiment Analysis:
Feature and Aspect Extraction
Motivation
• To understand a user’s opinions about a product at a fine-grained
level, support opinion summarization for products, and
automatically extract pros and cons from reviews it is essential to
identify product features and aspects.
Impact
• Existing methods tend to require seed terms and focus on
identifying explicit features or a few high-level aspects.
• Our approach is capable of identifying both explicit and implicit
aspects and does not require any labeling efforts.
Approach
• We use a combination of corpus-based association measures, and
semantic similarity measures to identify product aspects in an
efficient clustering based approach.
57. 59
It is actually about tracking public opinion.
PollingorSocial Media Analysis?
1. Sample size
2. Representative of the target population
3. Accurate measure of opinions
4. Timeliness
58. • We Study different groups of social media users who
engage in the discussions of 2012 U.S. Republican
Presidential Primaries, and compare the predictive
power among these user groups.
• Existing studies on predicting election result are under
the assumption that all the users should be treated
equally.
• How could different groups of users be different in
predicting election results?
60
Harnessing the Power of Social Data
to Predict Election Results [Chen et al., SocInfo 2012]
60. Predicting a User's Vote
• Basic idea: for which candidate the user shows the most
support
– Frequent mentions
– Positive sentiment
62
Nm(c): the number of tweets mentioning the candidate c
Npos(c): the number of positive tweets about candidate c
Nneg(c): the number of negative tweets about candidate c
(0 < < 1): smoothing parameter
(0 < < 1): discounting the score when the user does not
express any opinion towards c.
The user
posted opinion
about c
The user
mentioned c but
did not post
opinion about c
More mentions,
higher score
More positive/less
negative opinions,
higher score
61. 63
Revealing the challenge of
identifying the vote intent of
“silent majority”
Retweets may not necessarily
reflect users' attitude.
Prediction of user’s vote based
on more opinion tweets is not
necessarily more accurate than
the prediction using more
information tweets
The right-leaning user group
provides the most accurate
prediction result. It correctly predict
the winners in 8 out of 10 states
with an average prediction error of
0.1.
To some extent, it demonstrates
the importance of identifying likely
voters in electoral prediction.
Twitter users are not “equal”
in predicting elections!
63. Emotion Mining: Motivation
65
• Emotion is essential to all aspects of our lives.
– Influences our decision-making
– Affects our social relationships
– Shapes our daily behavior
• Emotional mental health
– New mothers may suffer from post-partum depression
– Veterans may constantly suffer from negative emotions because
of post-traumatic stress disorder
64. Emotion Mining: what have we studied
66
• Can we automatically create a large emotion dataset
with high quality labels from Twitter? How?
• What features can effectively improve the performance
of supervised machine learning algorithms?
• Can the system developed on Twitter data be directly
applied to identify emotions from other datasets?
• What can we learn about emotion from social media
data?
65. • Collect self-annotated emotion tweets [Wang et. al. SocialCom 2012]
– Seven emotions: joy, sadness, anger, love, fear, surprise, thankfulness
“When I see a cop, no matter where I am or what I’m doing, I
always feel like every law I’ve ever broken is stamped all over
my body #fear”
“I hate when my mom compares me to my friends. #anger”
“I hate when I get the hiccups in class. #embarrassing”
Harnessing twitter" big data" for
automatic emotion identification [Wang et al.
SocialCom12]
67
66. 0.4
0.45
0.5
0.55
0.6
0.65
1,000 10,000 248,898 497,796 746,694 995,592 1,244,490 1,493,388 1,742,286 1,991,184
accuracy
number of tweets in training data
LIBLINEAR
MNB
The more data, the merrier
68
Results of performing seven emotion classifications
67. Discovering Fine-grained Emotion
in Suicide Notes [Wang et al. BII12]
69
• Automatically classify suicide notes to different (15)
categories at sentence level
• Emotion categories
– Positive
• Hopefulness, thankfulness, forgiveness, love, pride, happiness
– negative
• Sorrow, abuse, anger, hopelessness, guilt, blame, fear
• Other categories
– Information, instructions
68. Discovering Fine-grained Emotion
in Suicide Notes [Wang et al. BII12]
70
Sentence: “Found out today that // I passed my math STAAR test.”
• N-gram features
• Unigram, e.g., found, today, passed, etc.
• Bigram, e.g., found_out, out_today, etc.
• N-gram position
– Unigram: found-1, out-1, today-1,…,, I-2, passed-2, my-2, …
• Knowledge-based features:
– LIWC (Pennebaker et al., 2014a)
– WordNet-Affect (Strapparava and Valitutti, 2004)
– MPQA (Wilson et al., 2005)
• Syntactic features:
– Part-of-speech tags, e.g., Found/VBN out/RP today/NN that/IN I/PRP
passed/VBD…
– Dependency relations, e.g., root(ROOT-0, Found-1); ccomp(Found-1, passed-6);
dobj(passed-6, test-10) …
70. Cursing in English on Twitter [Wang et al. CSCW14]
72
• The main reason that people use curse words is to express some
strong emotions, especially anger and frustration. [Jay 1992, 2000;
McEnergy 2006; Nasution and Rosa 2012]
71. Normalized Emotion Distributions
over Time in Eastern Standard TimeNormalized Emotion Distributions over Days (EST)
“I am so thankful for my family && close friends. They hold me together
when everything else around me is falling apart. #SoBlessed #Thankful”
73
73. Rank Mom Dad
1 Irritation (7, 562) Irritation (3, 034)
2 Sadness (2, 315) Sadness (1, 363)
3 Affection (2, 225) Embarrassment (1, 158)
4 Zest (2, 213) Zest (1, 035)
5 Embarrassment (1, 849) Affection (1, 030)
6 Thankfulness (1, 537) Cheerfulness (911)
7 Cheerfulness (1, 332) envy (902)
“I hate when my dad uses my laptop. Its mine. Not yours. You have your own computer.
I have shit to do, get off now please. #annoyed”
“ugh my mom gets so nervous when i drive #annoying”
“My mom just told me I can't open any presents early cause I'm too old for that #depressing”
What are the top Emotions Associated with Moms and Dads?
75
74. PEOPLE ANALYSIS
- Deriving People Metadata
- from Content Analysis
- from Network Analysis
- Merge of two approaches
- People-Content-Network Analysis to leverage the metadata
- Finding Influential Users
- Finding User Types & Affiliation
- Measuring Social Engagement
- Leverage communities to assist coordination
76
75. People Analysis:
Social Engagement & Coordination
77
Imagine a crisis scenario such as Haiti earthquake (2010) or
hurricane Sandy (2012)
- emergency teams are looking for ways to help the victims
• What are the best possible ways to communicate:
identify and engage people
• Between resource providers (supply) and people in
need of resources (demand)
• Topical community influencers
• How response teams can coordinate social media
communities well between volunteers, managers in
organizational structure, and resource seekers?
76. People Analysis: Who is asking for help, Who is offering to help?
Smart Data in the context of Disaster Management
ACTIONABLE: Timely delivery of
right resources and information
to the right people at right
location!
78
Because everyone wants to Help, but DON’T KNOW HOW!
77. Really sparse Signal to Noise:
• 2M tweets during the first 48 hrs. of #Oklahoma-tornado-2013
- 1.3% as the precise resource donation requests to help
- 0.02% as the precise resource donation offers to help
79
• Anyone know how to get involved to
help the tornado victims in
Oklahoma??#tornado #oklahomacity
(OFFER)
• I want to donate to the Oklahoma cause
shoes clothes even food if I can (OFFER)
Disaster Response Coordination:
Finding Actionable Nuggets for Responders to act
• Text REDCROSS to 909-99 to donate to
those impacted by the Moore tornado!
http://t.co/oQMljkicPs (REQUEST)
• Please donate to Oklahoma disaster
relief efforts.: http://t.co/crRvLAaHtk
(REQUEST)
For responders, most important information to manage
coordination dependencies is
the scarcity and availability of resources
Blog by our colleague Patrick Meier on this analysis: http://irevolution.net/2013/05/29/analyzing-tweets-tornado/
78. People Analysis: Match demander-
suppliers for coordination during crisis
Purohit, H., Castillo, C., Diaz, F., Sheth, A., & Meier, P. (2013). Emergency-relief coordination on social media: Automatically
matching resource requests and offers. First Monday, 19(1).
80
79. Demand-Supply identification and
representation: core & facets
• Extract Core of the phrase- “what”
– Other facets includes “who”, “where”, “when”, etc.
• Supervised Learning to classify items for demands, supplies, and
resource type facets
81
Rotary collecting clothing and other donations in New Jersey <URL>
{ source: “Twitter”, author: “@NN”, text: “Rotary collecting clothing and
other donations in New Jersey <URL>”, donation-info: { donation-type:
“Request”, donation-type-confidence: 0.8, donation-organization: “Rotary”,
donation-item: “clothing and other donations”, donation-location: “New
Jersey” }, … }
Corresponding data item in the semi-structured knowledge inventory:
• IR model approach to match demand (request) with supply (offer)
items in this semantically annotated knowledge inventory
80. Leveraging Communities for Whom
to Engage With, Why and How
82
Purohit et al., User Taglines: Alternative Presentations of Expertise and Interest in Social Media . ASE Social Informatics, 2012
81. Network Analysis
Interesting questions to ask:
• How communities form around topics- growth & evolution
• What are the effects of influential participants in the communities
• What are the effects of content nature (or sentiment, opinions)
flowing in network on the community structures and growth
• What is the community structure: degree of separation and sub-
communities that contribute for macro-level effects, e.g.,
coordination, engagement
“To Discover How A, is in Touch with B and C,
Is Affected by the Relation Between B & C”
-John Barnes
83
Foundation of network:
•Nodes
•Connections/Relationships
Image: http://www.onasurveys.com/
82. Graphs showing sparse (A) and dense (B) RT networks and their
corresponding follower graphs for 'call for action' and
'information sharing' tweet content types
M. Nagarajan, H. Purohit, and A. Sheth, ’A Qualitative Examination of Topical Tweet and Retweet Practices,’ 4th Int'l AAAI Conference on Weblogs
and Social Media, ICWSM 2010 84
83. Understanding Evolving Community
Structures for Coordination
85
User interaction networks of two topical communities– Occupy LA and Chicago,
of emerging influencers during Occupy Wall Street (OWS) event 2011
Application of evolving communities:
H. Purohit, J. Ajmera, S. Joshi, A. Verma, A. Sheth. Finding Influential Authors in Brand-Page Communities. 6th Int'l AAAI Conference on Weblogs and
Social Media (ICWSM), Dublin, Ireland, June 5-7, 2012
84. Evolution of influencer interaction networks for Romney vs. Obama topical
communities, during U.S. Presidential Election 2012 debates
Romney
Obama
Before 1st
debate
After 1st
debate
After
Hurricane Sandy
After 3rd
debate
Understanding Community Evolution for
Real-World Actions
86
Social Media analysis for US elections 2012, powered by Twitris: http://analysis.knoesis.org/uselection/insights/
85. On Understanding the Divergence of
Online Social Group Discussion
• Change of group discussion divergence over time, and different
phases of real world events
• Relation between discussion divergence and existing theories of
social cohesion and social identity in Psychology
• Prediction of future change in the group discussion divergence
Research Questions on Social Dynamics in Communities
Acknowledgement:
NSF SoCS grant for ‘Leveraging Social Media during Emergency
Response’
Purohit, H., Ruan, Y., Fuhry, D., Parthasarathy, S., & Sheth, A. (2014, May). On Understanding Divergence of Online Social Group
Discussion. In 8th Intl AAAI Conference on Weblogs and Social Media.
86. • Prior work:
– Focus on structural metrics to understand group evolution
dynamics, but may not be sufficient to answer ‘WHY a group
diverges over time’
• Our approach:
– Content driven measure: collective divergence of group
members for topics of discussion
– Features assessing role of socio-psychological theories:
cohesion & identity
• Data:
– Tweets during evolving events of natural disasters, and social
activism
Contrasting Prior Work and Approach
Evolution of groups in online
social communities
surrounding events
On Understanding the Divergence of
Online Social Group Discussion
Purohit, H., Ruan, Y., Fuhry, D., Parthasarathy, S., & Sheth, A. (2014, May). On Understanding Divergence of Online Social Group
Discussion. In 8th Intl AAAI Conference on Weblogs and Social Media.
88
87. • During #sandy, predicted low
diverging (focused) groups to
engage with on the updates
of flights, first delays &
cancellation, then resuming
• Natural disaster (D) events
(Hurricane Irene and Sandy)
have stronger correlations
with identity-driven features
than with cohesion featuresWe predicted group discussion
divergence
across phases, by 0.83 AUC
Time
On Understanding the Divergence of
Online Social Group Discussion
Purohit, H., Ruan, Y., Fuhry, D., Parthasarathy, S., & Sheth, A. (2014, May). On Understanding Divergence of Online Social Group
Discussion. In 8th Intl AAAI Conference on Weblogs and Social Media.
89
90. Live Demo of Powerful Social
Media Analysis: Twitris
92
91. Twitris - Motivation
1. Information Overload
• Multiple events around us
• WHAT to be aware of
• Multiple Storylines about same
event!!
93
Image: http://bit.ly/etFezl
93. Twitris - Motivation
3. Semantics of Social perceptions
• What is being said about an event (theme)
• Where (spatial)
• When (temporal )
Twitris lets you browse citizen reports using social
perceptions as the fulcrum
95
94. Twitris: Semantic Social Web Mash-up
Facilitates understanding of multi-dimensional social perceptions over
SMS, Tweets, multimedia Web content, electronic news media
96
96
95. Twitris: Architecture
97
Meenakshi Nagarajan, Karthik Gomadam, Amit Sheth, Ajith Ranabahu, Raghava Mutharaju and Ashutosh Jadhav, ‘Spatio-Temporal-Thematic
Analysis of Citizen-Sensor Data - Challenges and Experiences,’ Tenth International Conference on Web Information Systems Engineering, 539 - 553,
Oct 5-7, 2009.
98. Incoming Tweets with need
types to give quick idea of what
is needed and where currently
#OKC
Legends for
Different
needs #OKC
100
Clicking on a tag brings contextual
information– relevant tweets,
news/blogs, and Wikipedia articles
Twitris: Real-time information
99. How People from Different
parts of the world talked
about US Election
Images and Videos
Related to US Election
101
Twitris: Analysis by location for contrast in
social perceptions
101. 103
How was Obama doing in the first debate?
Twitris: Sentiment Analysis- Smart
Answers with reasoning!
102. The Dead People mentioned
in the event OWC
104
Twitris: Impact of Background
Knowledge
103. Twitris: Demo, Quick Show
http://twitris2.knoesis.org/
• Many other interesting efforts – Eg: Vivek K. Singh, Mingyan Gao, and Ramesh
Jain. 2010. From microblogs to social images: event analytics for situation
assessment. In Proceedings of the international conference on Multimedia
information retrieval (MIR '10). ACM, New York, NY, USA, 433-436.
105
104. • Do you have a sense of immense opportunity of analyzing
citizen sensing for useful social signals?
• Do you appreciate the broad range of issues and challenges?
Did we present examples and a few insights into how to
address some unique challenges?
• Did spatio-temporal-thematic, people-content-network,
emotion-sentiment-intent dimensions present reasonable way
to organize vast number of relevant research challenges and
techniques?
106
Conclusions
105. 107
http://knoesis.org
Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing
Wright State University, Dayton, Ohio, USA
thank you, and please visit us at
Hinweis der Redaktion
Got carried away with coverage and content – too much material for 3 hours – so the remaining content can be used as background
Many media companies use Facebook and Twitter as news-delivery platform. Many individuals rely on them as news source. News is increasingly social.
Interest level:
(Based on Description info, lists and fav. tweets)
Semantic metadata, relationships: Inferred?
Structure Level Metadata
Community Size
- Showing scale: global vs. local
Community growth rate
- Popularity estimation for a topic
Largest Strongly Connected Component size
- Measuring Reachability in the directed graph
No. of Weakly Connected Components & Max. size
- distribution of pre-existing network connections (follower-followee)
- Showing Nature: loose vs. compact
Average Degree of Separation
- How many hops between two authors
Clustering Coefficient
- Showing the likelihood of association
Relationship Level Metadata
Type of Relationship- topic/content (based on Retweet, Entity etc.)
- follower/followee (based on structure)
Relationship strength
- Strong vs. Weak ties based on activity/ communication between users
- % tie strength
User Homophily [Homophily (i.e., "love of the same") is the tendency of individuals to associate and bond with similar others]
based on certain characteristic (e.g., Location, interest etc.)
% of users showing similar behavior
Reciprocity: mutual relationship
- % of users following back their followers
Active Community/ Ties
- How active is the communication between users or how active are the relationship ties
- Average of tie strength based on activity
Building on foundations of
Statistical Natural Language Processing
Information Extraction
Semantic Web/ Knowledge Representation
We will talk about key issues in extracting metadata from Informal Text and how it varies from what has been done in more well-structured text like news articles etc.
What the two tasks look like in terms of outputs they produce
This is an application of the NER work
We have come a long way but still room for improvement
Social media serves as a platform for people to speak their mind more freely, which lead to a growing volume of opinionated data that can be used by:
(1) individuals for suggestion and recommendation
(2) companies and organizations for marketing strategies and other decision making process
(3) government for monitoring social phenomenons, being aware of potential dangerous situations, etc.
Fact can be proven, opinion cannot.
An opinion is normally a subjective statement that bases on people's thoughts, feelings and understandings.
One of the most attractive advantages of unsupervised approaches is that they do not require for training data.
Many sentiment analysis applications for social media content use simple lexicon-based method. However, for the problem of target-specific sentiment analysis, it doesn't work.
Based on simple lexicon-based method which use a general sentiment lexicon containing positive/negative/neutral words in the general sense,
(1) for the task of "find tweets containing positive opinions about a specific topic", such as a movie, the results will like the table shows. However, 2,3,5,6,7 don't contain opinions about the movie. (2) for the task of extract the opinion clues/expressions, the right answers should be like we show in the other picture. However, the simple lexicon-based method might give all the words with orange color in the table.
We use background knowledge to help identifying the entity mentioned in the text, e.g., the knowledge from IMDB and Freebase is used to determine whether a noun phrase in the text is the name of a movie or a person. The lexical resources such as Urban Dictionary are used to help identifying the sentiment clues in the text. Urban Dictionary is a popular online slang dictionary with word definitions written by users. Each word is associated with a list of related words to interpret it, and many glossary definitions given by different users. Both the related words list and glossary definitions can be used to help determining the sentiment of the spotted word. E.g., the word “wicked” has a list of related words, and most of those words carry positive sentiment, so that we can infer that “wicked” is highly possible a positive sentiment clue. In addition, there is also a definition of “wicked” given by user saying that it has different meanings in different countries. Given this knowledge, if we know the location of the author who wrote the tweet, we can infer whether “wicked” in the tweet is used as a sentiment clue, and whether it is positive, negative or neutral.
While sentiment analysis concerns about people’s opinions about something, emotion analysis focuses on our own emotional state, our mental health!
Am I happy? Sad? Angry? Etc.
As an emotional create, emotion plays an important role in all aspects of our lives!
(1) Influences our decision-making
(2) Affects our social relationships
(3) Shapes our daily behavior
What is more important, emotions affect our mental health:
Take new mothers and veterans for example
It is difficult to annotate sentences with emotion labels for following reasons:
Emotion is more fine-grained (joy, sadness, anger, etc.), while sentiment usually deals with only positive, neutral and negative labels.
A reader may incorrectly interpret the emotion embedded in a sentence by a writer
We leverage more than 100 emotion-related hashtags to filter Twitter streaming data and use ending emotion hashtags to infer the emotion label of a tweet, e.g.,
“leaving for hospital #nervous” -> sadness emotion
(1) We kept only the tweets with the emotion hashtags at the end
(2) We discarded tweets which have less than five words, since they may not provide sufficient context to infer emotions
(3) We removed the tweets which contain URLs or quotations. A large amount of tweets with URLs are information-oriented, which do not convey emotions.
This figure shows the benefits of leveraging Twitter ‘big data’:
When the size of training data is 1,000, the classification accurary is about 45%;
When we increase the size of training data to 10,000, the classification accurary gets close to 55%;
When we further increase the size of training data to about 2M, the classification accurary reaches about 65%.
As an emotional create, emotion plays an important role in all aspects of our lives!
(1) Influences our decision-making
(2) Affects our social relationships
(3) Shapes our daily behavior
What is more important, emotions affect our mental health:
Take new mothers and veterans for example
As an emotional create, emotion plays an important role in all aspects of our lives!
(1) Influences our decision-making
(2) Affects our social relationships
(3) Shapes our daily behavior
What is more important, emotions affect our mental health:
Take new mothers and veterans for example
As an emotional create, emotion plays an important role in all aspects of our lives!
(1) Influences our decision-making
(2) Affects our social relationships
(3) Shapes our daily behavior
What is more important, emotions affect our mental health:
Take new mothers and veterans for example
As an emotional create, emotion plays an important role in all aspects of our lives!
(1) Influences our decision-making
(2) Affects our social relationships
(3) Shapes our daily behavior
What is more important, emotions affect our mental health:
Take new mothers and veterans for example
User engagement levels: applications in coordination activities
Connecting the dots here with NGO initiatives (*presented by Selvam)
Categorization of severity based on weather conditions. Actionable information is contextually dependent.
Supervised Machine Learning based system to enable support for high level operations of coordination,
by mining demand-supplies of resources/services, and matching them.
1.) Extract information nuggets for donations, requests and offers and the context (geo, time), etc..
2.) Semi-structured knowledge-based is then used for Matching of demand-supply to assist coordination
Example of PCN analysis in action–
Clustering mined influencers (from network), by the user demographics (People) and ability to tune engagement by understanding ‘why’ of the influencers (Content)
Connections/Relationships
- Implicit content features
Authoritative nature of the poster or the volume of follower connections did not predict the re-tweet behavior associated with the tweets!
‘Call of action’ type of content creates sparse retweet networks while giving less weight to the attribution of users – because ‘action’ is important than attribution in that context.
Interaction networks can work as proxy for identify influencers in the evolving communities (by using network algorithms like PageRank),
because traditional network analysis of community structures can not work due to sparse user connections data, e.g., follower-followee networks.
Slide #1: Introduce the project, participants and the main goal
Slide #2: Substantive slide showing either key graphic/chart or claim from this work
Slide #3 (optional): Provide additional context or teaser for what would be discovered on poster
Increasing diverging groups write more of general reporting type content based on past incidents, while ones with decreasing diverging behavior write more social & future action related content
Least diverging group members practice RT heavily, while the most divergent groups, hashtags
Group discussion divergence increases during the event, but decreases in the post phase
Explain about continuous semantics
(It is real-time widget for monitoring of needs, so will not be active after the event has passed)
http://twitris.knoesis.org/oklahomatornado