Alleviating Data Sparsity for Twitter Sentiment Analysis

Alleviating Data Sparsity For
Twitter Sentiment Analysis

Hassan Saif, Yulan He & Harith Alani
Knowledge Media Institute, The Open University, Milton
Keynes, United Kingdom

Making Sense of Microblogs – WWW2012 Conference
Lyon - France

Outline
• Hello World
• Motivation
• Related Work
• Semantic Features
• Topic-Sentiment Features
• Evaluation
• Demos
• The Future

Sentiment Analysis
“Sentiment analysis is the task of identifying
positive and negative opinions, emotions and
evaluations in text”

The main dish was The main dish was
It is a Syrian dish
delicious salty and horrible

Opinion Fact Opinion

3

Microblogging
• Service which allows subscribers to post short
updates online and broadcast them

• Answers the question: What are you doing
now?

• Twitter, Plurk, sfeed, Yammer, BlueTwi, etc.

4

Sense?
1400000

AGARWAL et al.
1200000 1143562

BARBOSA, L., AND FENG, J.
1000000
BIFET, A., AND FRANK, E.
800000
DIAKOPOULOS, N., AND SHAMMA, D. 713178

GO et al. 600000

He & Saif 400000

PAK & PAROUBEK 200000

And 0
1
Many Objective Tweets Subjective Tweets

Others
UK General Elections Corpus

Why

Private Sectors

Because It is Critical
df Keep In Touch

Public Sectors

Sentiment Analysis

Lexical Based Approach
Text Classification Problem

Right Features
Word Polarity

Building Better Dictionary
Machine Learning Approach


Challenges

– The short length of status update

– Language Variations

– Open Social Environment


Related Work

• Distant Supervision
– Supervised classifiers trained from noisy labels

– Tweets messages are labeled using emoticons

– Data filtering process

Go et al., (2009) - Barbosa and Fengl. (2010) – Pak and Paroubek (2010)


Related Work

• Followers Graph & Label Propagation
– Twitter follower graph (users, tweets, unigrams
and hashtags)
– Start with small number of labeled tweets
– Applied label propagation method throughout the
graph.

Speriosu et al., (2009)


Related Work

• Feature Engineering
– Unigrams, bigrams, POS
– Microblogging features
• Hashtags
• Emoticons
• Abbreviations & Intensifiers

Agsrwal et al., (2011) – Kouloumpis et al (2011)

What Does sparsity mean?

Training data contains many infrequent terms

What Does sparsity mean?

Word frequency statistics

How!

Sentiment Topic Features
Extracts semantically hidden concepts
from tweets data and then
incorporates them into supervised
Extract latent topics and classifier training by interpolation
the associated topic
sentiment from the
tweets data which are
subsequently added into
the original feature space Semantic Features
for supervised classifier
training

Semantic Features
Shallow Semantic Method

Sport
@Stace_meister Ya, I have Rugby in an hour

Sushi time for fabulous Jesse's last day on dragons den Person

Dear eBay, if I win I owe you a total 580.63 bye paycheck
Company

Sematic Features
Interpolation Method

Topic-Sentiment Features
Joint Sentiment Topic Model
JST1 is a four-layer generative model
which allows the detection of both
sentiment and topic simultaneously
from text.

The only supervision is word prior
polarity information which can be
obtained from MPQA subjectivity
lexicon.

Lin & He. 2009

Twitter Sentiment Corpus
Stanford University

Collected: the 6th of April & the 25th of June 2009

Training Set: 1.6 million tweets (Balanced)

Testing Set: 177 negative tweets & 182 positive tweets

http://twittersentiment.appspot.com/

Our Sentiment Corpus
• Training Set: 60K tweets
• Testing Set: 1000 tweets

• Annotated additional 640 tweets using
Tweenator

Evaluation
Extended Test Set

Method Accuracy

Unigrams 80.7%

Semantic replacement 76.3%

Semantic interpolation 84.0%

Sentiment-topic features 82.3%

Sentiment classification results on the 1000-tweets test set

Evaluation
Stanford Dataset

Method Accuracy
Unigrams 81.0%
Semantic replacement 77.3%
Sematic augmentation 80.45%
Semantic interpolation 84.1%
Sentiment-topic features 86.3%

(Go et al., 2009) 83%
(Speriosu et al., 2011) 84.7%

Sentiment classification results on the original Stanford Twitter Sentiment test set

Evaluation

Semantic Features V.s Sentiment-Topic Features

Tweenator

http://tweenator.com

Conclusion
• Twitter sentiment analysis faces data sparsity problem due to
some special characteristics of Twitter

• Semantic & topic-sentiment features reduce the sparsity
problem and increase the performance significantly

• Sentiment-topic features should be preferred over semantic
features for the sentiment classification task since it gives
much better results with far less features.

30

Future Work

Sentiment-Topic Model

Enhance Entity Extraction Methods

Attaching weight to extracted features

Semantic Smoothing Model

Statistical replacement

References
[1] AGARWAL, A., XIE, B., VOVSHA, I., RAMBOW, O., AND PASSONNEAU, R. Sentiment analysis of twitter data. In Proceedings of
the ACL 2011 Workshop on Languages in Social Media (2011), pp. 30–38.

[2] BARBOSA, L., AND FENG, J. Robust sentiment detection on twitter from biased and noisy data. In Proceedings of COLING
(2010), pp. 36–44.

[3] BIFET, A., AND FRANK, E. Sentiment knowledge discovery in twitter streaming data. In Discovery Science (2010), Springer, pp.
1–15.

[4] GO, A., BHAYANI, R., AND HUANG, L. Twitter sentiment classification using distant supervision. CS224N Project
Report, Stanford (2009).

[5] KOULOUMPIS, E., WILSON, T., AND MOORE, J. Twitter sentiment analysis: The good the bad and the omg! In Proceedings of
the ICWSM (2011).

[5]LIN, C., AND HE, Y. Joint sentiment/topic model for sentiment analysis. In Proceeding of the 18th ACM conference on
Information and knowledge management (2009), ACM, pp. 375–384.
[6] PAK, A., AND PAROUBEK, P. Twitter as a corpus for sentiment analysis and opinion mining. Proceedings of LREC 2010 (2010).

[7]SAIF, H., HE, Y., AND ALANI, H. Semantic Smoothing for Twitter Sentiment Analysis. In Proceeding of the 10th International
Semantic Web Conference (ISWC) (2011).

Alleviating Data Sparsity for Twitter Sentiment Analysis

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Alleviating Data Sparsity for Twitter Sentiment Analysis

Ähnlich wie Alleviating Data Sparsity for Twitter Sentiment Analysis (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Alleviating Data Sparsity for Twitter Sentiment Analysis

Hinweis der Redaktion