Automatic Emotion Identification from Text

Ohio Center of Excellence in Knowledge-Enabled Computing
Automatic Emotion Identification
from Text
Wenbo Wang
Kno.e.sis Center
Advisor:
Dr. Amit P. Sheth
Committee members:
Dr. Keke Chen
Kevin Haas
Dr. T.K. Prasad
Dr. Ramakanth Kavuluru
Ph.D. Dissertation Defense

Ohio Center of Excellence in Knowledge-Enabled Computing 2Sadness
Anger
Fear
Joy
Your emotions are the slaves to your thoughts,
and you are the slave to your emotions.
--Elizabeth Gilbert

Ohio Center of Excellence in Knowledge-Enabled Computing 3
S&P 500 dropped 1% …
Jon C. Ogg, credit
Stock Market

Employee Productivity
Credit, credit

Subjective Well-being
Credit, credit
Happiness IndexECG
Physical State Emotional State

Emotion Identification
• Emotion
– “a strong feeling (such as love, anger, joy, hate, or fear)” --
Merriam-Webster Online Dictionary
• Emotion Identification
– the task of automatically identifying and extracting the
emotions expressed in a given text.
• Examples
8
“I hate when my mom compares me to my friends” -> Anger
“When I see a cop, no matter where I am or what I’m doing,
I always feel like every law I’ve broken is stamped all over
my body” -> Fear

Proposed Questions
• How to glean people’s emotions from their texts using machine
learning techniques?
• How to create large self-labeled emotion data from social media?
• How to improve emotion identification in target domains (e.g.,
blog, diary) by leveraging large self-labeled emotion data from
social media?
9

1. EMOTION CLASSIFICATION
10
Wenbo Wang, Lu Chen, Ming Tan, Shaojun Wang, Amit P. Sheth. Discovering Fine-
grained Sentiment in Suicide Notes. Biomedical Informatics Insights, 2012
Wenbo Wang, Lu Chen, Krishnaprasad Thirunarayan, Amit P. Sheth. Harnessing
Twitter ‘Big Data’ for Automatic Emotion Identification. 2012 ASE International
Conference on Social Computing (SocialCom 2012)

Background - Classification
Credit: nltk 11

Dataset Description
• Suicide notes
– 15 fine-grained emotions
– Training: 4,633 sentences;
– Testing: 2086 sentences
• Twitter data
– 7 emotions
– Training: ~250 K tweets
– Testing: 250 K tweets
12

Suicide Notes Dataset
13
Sentence example:
“I loved you and was proud of
you.”
Unigrams: i, love, you, and,
be, proud, of, you, .
Bigrams: I love, love you, you
and, and be, be proud, proud
of, of you, you .
The combination of unigrams and
bigrams perform the best among n-gram
features.

14
Sentence example:
you.”
LIWC Knowledge:
Posemo: 2 (love, proud)
Negemo: 0
Anger: 0
Sad: 0
Adding knowledge-based features
further increases the performance.

15
Sentence example:
you .”
POS count:
Adjective: 1 (proud)
Noun: 0 ()
Pronoun: 3 (i, you)
…
Sentence tense:
Simple past tense: 2 (I loved,
was proud)
Adding sentence tenses and POS counts
further increases the performance

Twitter Dataset – Supervised Classifier
16
Applying only adjectives performs poorly because
emotions can be implicitly expressed in text.

17
The combination of unigrams and bigrams
perform the best among n-gram features.

18
Knowledge features and syntactic features
become less important on Twitter data.

Challenge: The Lack of Training Data
• Emotion annotation is typically time-consuming,
expensive and error-prone.
– multiple emotion categories
– subtle and ambiguous emotion expressions
– Human judgement of emotion tends to be subjective and
varied.
• Most of existing datasets are small, e.g.,
– Blog: 1,890 sentences (Aman and Szpakowicz 2008)
– Experience: 1,000 sentences (Neviarouskaya et. al. 2010)
– Diary: 700 sentences (Neviarouskaya et. al. 2011)
19

Why do We Need More Training Data? (I)
20
speech. The memory-based learner used only
the word before and word after as features.
0.70
0.75
0.80
0.85
0.90
0.95
1.00
0.1 1 10 100 1000
Millions of Words
TestAccuracy
Memory-Based
Winnow
Perceptron
Naïve Bayes
Figure 1. Learning Curves for Confusion Set
Disambiguation
We collected a 1-billion-word training
corpus from a variety of English texts, including
“We may want to reconsider the
trade-off between spending
time and money on algorithm
development versus spending it
on corpus development”
-- (Banko and Brill 2001)
From (Banko and Brill 2001)

Why do We Need More Training Data? (II)
• Emotions arise in various situations, which leads to very
diverse expressions conveying the emotions.
21
“I hate when my mom compares me to my friends”
“When I see a cop, no matter where I am or what
I’m doing, I always feel like every law I’ve broken is
stamped all over my body”
“I hate when I get the hiccups in class”
“Omg I finally fit into one pair of my jeans from last
year!!”
“A dog barked at me!”

The Use of Hashtags on Twitter
22
“I hate when my mom compares me to my friends
#annoying”
“When I see a cop, no matter where I am or what
I’m doing, I always feel like every law I’ve broken is
stamped all over my body #nervous”
“I hate when I get the hiccups in class
#embarrassing”
“Omg I finally fit into one pair of my jeans from last
year!! #excited”
“A dog barked at me! #scared #weak”

2. SELF-LABELED DATA
CREATION
23
Wenbo Wang, Lu Chen, Krishnaprasad Thirunarayan, Amit P. Sheth. Harnessing
Twitter ‘Big Data’ for Automatic Emotion Identification. 2012 ASE International
Conference on Social Computing (SocialCom 2012)

Emotion Hashtags
• From existing psychology literature (Shaver et. al.
1987), collected 7 sets of emotion words for 7 different
emotions – joy, sadness, anger, love, fear,
thankfulness, and surprise.
24
Emotion Hashtag Word Examples Number of Tweets
Joy excited, happy, elated, proud (36) 706,182
Sadness sorrow, unhappy, depressing, lonely (36) 616,471
Anger irritating, annoyed, frustrate, fury (23) 574,170
Love affection, lovin, loving, fondness (7) 301,759
Fear fear, panic, fright, worry, scare (22) 135,154
Thankfulness thankfulness, thankful (2) 131,340
Surprise surprised, astonished, unexpected (5) 23,906
Total 131 2,488,982

Removing Irrelevant Tweets
25
Hashtag count > 2
Emotion hashtag is not at the end
Word count < 5
Has URL or quotations
About 5 million tweets -> 2,488,982 tweets

Results with Increasing Training Data
0.4
0.45
0.5
0.55
0.6
0.65
1,000 10,000 248,898 497,796 746,694 995,592 1,244,490 1,493,388 1,742,286 1,991,184
accuracy
number of tweets in training data
LIBLINEAR
MNB
26
0.4341
0.5292 Logistic Regression (LR)
Training instance: 1K -> 2M
Percentage gain = 51.05%
0.6557
LR
0.6156

Results with Increasing Training Data
0.4
0.45
0.5
0.55
0.6
0.65
1,000 10,000 248,898 497,796 746,694 995,592 1,244,490 1,493,388 1,742,286 1,991,184
accuracy
number of tweets in training data
LIBLINEAR
MNB
27
0.4580
0.5426
Multinomial Naive Bayes (MNB)
Training instance: 1K -> 2M
Percentage gain = 38.65%
0.6350
LR
0.6113

For three popular emotions (76.2% of the tweets), the classifier
achieves F-measures of over 64%
Detailed Results
28

Detailed Results
29
For three less popular emotions (22.8% of the tweets), the
precisions are relatively higher compared with the recalls, and
the F-measures are over 43%.

What Have We Learned?
• We can automatically create training datasets for
emotion identification by leveraging emotion hashtags
on Twitter.
– A large amount of labeled data are collected with little effort
and cost
– Covers a variety of situations that elicit emotions
– Performance gain with increasing size of training data
• However, there is still a lack of labeled data in many
other domains/data sources.
30

New Challenge
31
Lots of labeled tweets
Far less labeled data in
many other domains
Can we use emotion-labeled tweets to help emotion
identification in other domains?

3. DOMAIN ADAPTATION FOR
EMOTION IDENTIFICATION
32
Wenbo Wang, Lu Chen, Keke Chen, Krishnaprasad Thirunarayan, Amit P. Sheth.
Domain Adaptation for Emotion Identification via Data Selection. Technical paper
(under review) 2015

Problem Definition
• Input
– Large amount of emotion-labeled tweets
– Small amount of labeled sentences from target
domains (e.g., blogs, fairy tales)
• Objective
– Select informative tweets and add them to target
domain training data, and train an adaptive classifier
for the target domain
33

The Bootstrapping Framework
34
Self-labeled tweets
Target domain labeled data
Credit1, credit2, credit3
• Train classifier c
• Apply c to tweets

The Bootstrapping Framework
35
Target domain labeled data
Credit1, credit2, credit3
Correctly
classified
Misclassified
• Train classifier c
• Apply c to tweets
• Identify informative tweets
from misclassified tweets
• Add them to target domain
training data
Why select from
misclassified tweets?

Informativeness Overview
36
Consistency Diversity Similarity

Consistency
• Fear: “Amazing night with my baby. Hope she liked our
anniversary present. Alil early but whatever. :) hopefully tmmrw
goes as planned.”
– Top supporting features for emotion fear
– Top supporting features for any emotion other than fear
– Use the margin to estimate consistency:
0.5094 – 0.5962 = -0.0868
37
Consistency measures how much is a tweet’s Label
consistent with its content.

Diversity
• Sadness: “Searching for vinyl proved to be quite disappointing”
– “disappoint” occurs 2 times
• Sadness: “I'm about to lose everything I've ever wanted, my
whole world, and it's all my fault..”
– “lose” occurs 15 times
38
0.00
0.25
0.50
0.75
1.00
0 25 50 75 100
term_freq
diversity
0.9048 (disappoint)
0.4724 (lose)
Exponential decay of its term
frequency in target domain
training data
Diversity encourages the selection of source instances containing
discriminative features that are infrequent or underrepresented in
the target domain.

Similarity Intuition
• Inspired by domain adaptation for machine translation
studies that select source instances similar to test
instances (Eck et al., 2004; Lu et al., 2007)
• Given a target test sentence
– Disgust: “im sick of look at a comput screen.”
• Retrieve most similar tweets
– Anger: “im sick and tire of look like a fool”
– Joy: “i have get usb fairi light around my comput screen .”
39
Content Similarity is not sufficient!

Similarity Overview
40
Content
similarity
Label
similarity Uncertainty

Content Similarity
• Upweight important words
– Source instance:
– Target test instance: inverse document frequency
(idf)
41

Label Similarity
• Target test sentence
• Disgust: “im sick of look at a comput screen.”
• Source tweet
• Anger: “im sick and tire of look like a fool”
• How likely will the test sentence express anger?
• Apply the same formula used for Consistency factor
• Top supporting features for emotion anger
• Top supporting features for any emotion other than anger
• Use the margin to estimate consistency: 0.5838 – 0.625 = -0.0412
42

Uncertainty
Sentence Label
Predicted
Label
Classifier
confidence
Uncertainty
the second day i go in and i
be so paranoid .
Fear Sadness 0.2352
we are total awesome! Joy Joy 0.8683
43
0.7648
0.1317
The more confident the classifier is, the more likely the prediction
is correct, the less focus we should give to this sentence.

Similarity Revisit
• Encourage the selection of source instances that share high
content and label similarities with target domain test instances
that classifier c is most uncertain about.
44
Content
similarity
Label
similarity Uncertainty

Informativeness Revisit
• A tweet is informative when
– 1) its label is consistent with its content
– AND 2) it contains a discriminative feature that is infrequent in
target training data
– AND 3) it is similar to an target domain test instance whose
label cannot be predicted by the classifier c with high
confidence.
45
Consistency Diversity Similarity
Our proposed approach: CDS

Baseline approaches
• Source Only (SO): train classifiers using only Twit
• Target Only (TO): train classifiers using only target domain
training data
• Feature Injection (FI): first train a source classifier using only
source data (Daume III, 2007)
• Feature Augmentation (FA) (Daume III, 2007)
– Source instances: X -> XX0 (common, source, target)
– Target instances: X -> XoX (common, source, target)
• Balance Weight (BW): assign larger weights for the target
instances so that the weight sum of target instances equals to
that of source instances (Jiang and Zhai, 2007)
46

Baseline approaches
• Source Only (SO): train classifiers using only Twit
• Target Only (TO): train classifiers using only target domain
training data
• Feature Injection (FI): first train a source classifier using only
source data (Daume III, 2007)
• Feature Augmentation (FA) (Daume III, 2007)
– Source instances: X -> XX0 (common, source, target)
– Target instances: X -> XoX (common, source, target)
• Balance Weight (BW): assign larger weights for the target
instances so that the weight sum of target instances equals to
that of source instances (Jiang and Zhai, 2007)
47

Experimental settings
• Features
– Experimented unigrams, bigrams, unigrams+bigrams
– Applied unigrams in the end
• Logistic regression
– Fast, support probability output (uncertainty)
• Five-fold cross validation
– Four folds: training; 1 fold; testing
• Add-0.5 smoothing
48

Results on four target datasets*
49
Percentage gain
8.01%
24.07%
36.53%
3.62%
16.45%
*: The numbers are different from those in the dissertation defense video, because I fixed a bug after that. Results
got slightly improved because of this.

Different Instance Selection Strategies
• CDS: select tweets from misclassified tweets
• CD: removed similarity factor from CDS
• CDS-ALL: select tweets from all source tweets
• CDS-CORR: select tweets from source tweets that can be
correctly classified by c
50

Comparing instance selection strategies
51
Among all the strategies, CDS
improves F1 in the fastest way.

52
CDS-ALL achieves a similar performance
as CDS does but takes more iterations,
because the input of CDS-ALL is a
superset of CDS.

53
CDS-CORR performs the worst because it
selects tweets from correctly classified tweets,
the knowledge of which might already exist in
target domains.

Summary
• People’s emotions can be gleaned from their texts using machine learning
techniques.
– The combination of n-grams (n=1,2), knowledge-based and syntactic features
achieves the best performance.
– Knowledge features and syntactic features become less important on large training
data.
• We can automatically create a large training dataset for emotion identification
by leveraging emotion hashtags on Twitter.
– A large amount of labeled data are collected with little effort and cost
– Covers a variety of situations that elicit emotions
– Performance gain with increasing size of training data
• This self-labeled emotion dataset can be used to improve emotion
identification in text from other domains/data sources.
– Domain adaptation via selecting tweets that are informative to the target domain
– It is superior to select source instances that cannot be correctly classified.
– Informativeness of a source instance is measured by three factors: consistency,
diversity and similarity.
54

Publications
• Wenbo Wang, Lei Duan, Anirudh Koul, Amit P. Sheth. YouRank: Let User Engagement Rank
Microblog Search Results. In the Eighth International AAAI Conference on Weblogs and Social
Media (ICWSM'14) 2014
• Wenbo Wang, Lu Chen, Krishnaprasad Thirunarayan, Amit P. Sheth. Cursing in English on Twitter.
In ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW'14)
2014
• Amit Sheth, Ashutosh Jadhav, Pavan Kapanipathi, Lu Chen, Hemant Purohit, Gary Alan Smith, and
Wenbo Wang. "Twitris: A system for collective social intelligence." In Encyclopedia of Social
Network Analysis and Mining, pp. 2240-2253. Springer New York, 2014.
• Lu Chen, Wenbo Wang, Amit P. Sheth. Are Twitter Users Equal in Predicting Elections? A Study of
User Groups in Predicting 2012 U.S. Republican Presidential Primaries. In Proceedings of the
Fourth International Conference on Social Informatics (SocInfo'12) 2012
• Wenbo Wang, Lu Chen, Krishnaprasad Thirunarayan, Amit P. Sheth. Harnessing Twitter ‘Big Data’
for Automatic Emotion Identification. 2012 ASE International Conference on Social Computing
(SocialCom 2012), 2012
• Lu Chen, Wenbo Wang, Meenakshi Nagarajan, Shaojun Wang, Amit P. Sheth. Extracting Diverse
Sentiment Expressions with Target-dependent Polarity from Twitter. In Proceedings of the 6th
International AAAI Conference on Weblogs and Social Media (ICWSM), 2012
55

Publications
• Wenbo Wang, Lu Chen, Ming Tan, Shaojun Wang, Amit P. Sheth. Discovering Fine-grained
Sentiment in Suicide Notes. Biomedical Informatics Insights, 2012
• Ramakanth Kavuluru, Christopher Thomas, Amit Sheth, Victor Chan, Wenbo Wang, Alan Smith, An
Up-to-date Knowledge-Based Literature Search and Exploration Framework for Focused
Bioscience Domains, IHI 2012 - 2nd ACM SIGHIT Intl Health Informatics Symposium, January 28-
30, 2012.
• Wenbo Wang, Christopher Thomas, Amit Sheth, Victor Chan. Pattern-Based Synonym and
Antonym Extraction. 48th ACM Southeast Conference, ACMSE2010, Oxford Mississippi, April 15-
17, 2010
• Christopher J. Thomas, Wenbo Wang, Pankaj Mehra, Delroy Cameron, Pablo N. Mendes, and Amit
P. Sheth.. What Goes Around Comes Around – Improving Linked Opend Data through On-Demand
Model Creation. In: Proceedings of the WebSci10: Extending the Frontiers of Society On-Line, April
26-27th, 2010, Raleigh, NC: US.
• Ashutosh Jadhav, Wenbo Wang, Raghava Mutharaju, Pramod Anantharam, Vinh Nyugen, Amit P.
Sheth, Karthik Gomadam, Meenakshi Nagarajan, and Ajith Ranabahu, Twitris: Socially Influenced
Browsing, Semantic Web Challenge 2009, demo at 8th International Semantic Web Conference,
Oct. 25-29 2009, Washington, DC, USA
56

Patents & Proposal
• Wenbo Wang, Lei Duan. "Temporal User Engagement Features", U.S. Patent
No. 20,150,120,753. 30 Apr. 2015.
• Lu Chen, Wenbo Wang, Amit Sheth. "Topic-specific Sentiment Extraction", U.S.
Patent No. 20,140,358,523. 4 Dec. 2014.
• Context-Aware Harassment Detection on Social Media. NSF proposal
57

Special thanks to AFRL and NSF
58
Credit, credit
*Part of this material is based upon work supported by the National Science Foundation under Grant IIS-1111182 ``
SoCS: Collaborative Research: Social Media Enhanced Organizational Sensemaking in Emergency Response.''

Thank You! & Questions?

Automatic Emotion Identification from Text

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Automatic Emotion Identification from Text

Ähnlich wie Automatic Emotion Identification from Text (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (6)

Automatic Emotion Identification from Text