SlideShare ist ein Scribd-Unternehmen logo
1 von 64
Lu Chen
Kno.e.sis Center
Ph.D. Dissertation Defense
Advisor:
Prof. Amit P. Sheth
Committee members:
Prof. T.K. Prasad
Prof. Keke Chen
Dr. Ingmar Weber (QCRI)
Dr. Justin Martineau (SRA)
Ohio Center of Excellence in Knowledge-Enabled Computing
Mining and Analyzing Subjective
Experiences in User Generated
Content
Subjective Experience –
What We Experience in Our Mind
Hunger
Love
Happiness
Surprise
Embarrassment
Like
Dislike
Confused
Pain
Tired Stressed
Nervous
Relaxed
Warm
Proud
Confident
Taste of ice cream
Feeling about sky
Perception of time Appreciation of music
Opinion on climate change
InterestSource: http://bit.ly/1DvofHX
2
Music preference
Purchase intent
Subjective Information – The Information
about People’s Subjective Experiences
Source: http://bit.ly/1GDD9Mb
Source: http://bit.ly/1KkJF2l
Source: http://bit.ly/1IjjBSX
Source: http://bit.ly/1KkK1Gc
The traditional way of collecting subjective information:
3
User Generated Content
• New opportunities arise as we now can obtain a wide variety of
subjective information from user generated content.
4
The Demand of Subjective Information
• Subjective information can be used to support better decision-
making.
5
Source: http://twitris2.knoesis.org/debate
Predicting election results
Source: http://bit.ly/1gQg5Fl
Monitoring social phenomena
Source: http://bit.ly/1niFkU7
Targeted advertising
Source: http://bit.ly/1l0ombo
Making purchase decision
Source: http://bit.ly/1VzYEZG
Different Types of Subjective Information
Intent “would like to watch”
Expectation “hope it’s good”
would like to watch The Secret Life Of
Pets. I hope it's good.
"The Secret Life of Pets" was clever,
adorable, funny and I already want to
see it again.
I don't think watching The Secret Life
of Pets makes me childish. I laughed I
cried and it was so touching for
someone who has a pet like me.
Finding Dory was much better than
The Secret Life of Pets. Still not as good
as Zootopia though.
6
The Secret Life of Pets soundtrack
should be nominated for an Oscar
Sentiment “clever, adorable, funny”
Intent “want to see it again”
Opinion
“don’t think watching …
makes me childish”
Emotion
“I laughed I cried and it
was so touching”
Preference “much better than”
Preference “not as good as”
Opinion “should be nominated
for an Oscar”
Defining Subjective Information
 cesh ,,,
Formally, a subjective experience can be represented as a quadruple
𝒉 − a holder, an individual who holds the experiences
𝒔 − a stimulus (or target), an entity, event or situation that elicits
the experiences.
𝒆 − a set of expressions that are used to describe the experience, e.g.,
the sentiment words/phrases or the opinion claims.
𝒄 − a classification or assessment that categorizes or measures the
exeprience, e.g., sentiment orientation (positive vs. negative), emotion
type (joy, anger, sadness, surprise, etc.), a score indicating the strength
of sentiment.
7
Different Types of Subjective Information
8
𝐇𝐨𝐥𝐝𝐞𝐫 𝒉 𝐒𝐭𝐢𝐦𝐮𝐥𝐮𝐬 𝐬 𝐄𝐱𝐩𝐫𝐞𝐬𝐬𝐢𝐨𝐧 𝒆 𝐂𝐥𝐚𝐬𝐬𝐢𝐟𝐢𝐜𝐚𝐭𝐢𝐨𝐧 𝒄
Sentiment
an individual who
holds the sentiment
an entity
sentiment
words/phrases
positive, negative,
neutral
Opinion
an individual who
holds the opinion
an entity
opinion claims (may not
contain sentiment words)
positive, negative,
neutral
Emotion
an individual who
holds the emotion
an event or
situation
emotion words/phrases,
description of
events/situations
anger, disgust, fear,
happiness, sadness,
surprise
Preference
an individual who
holds the preference
a set of
alternatives
words/phrases that
indicate comparison or
preference
depend on specific
tasks
Intent
an individual who
holds the intent
an action
words/phrases that show
the presence of will,
description of the act
depend on specific
tasks
Expectation
an individual who
holds the
expectation
an entity
words/phrases that
express the beliefs about
someone or something
will be.
depend on specific
tasks
9
* The holders of these experiences are the authors of the messages.
Example Type 𝐒𝐭𝐢𝐦𝐮𝐥𝐮𝐬 𝐬 𝐄𝐱𝐩𝐫𝐞𝐬𝐬𝐢𝐨𝐧 𝒆 𝐂𝐥𝐚𝐬𝐬𝐢𝐟𝐢𝐜𝐚𝐭𝐢𝐨𝐧 𝒄
would like to watch The
Secret Life Of Pets. I hope
it's good.
Intent watch the movie “would like to” transactional
Expectation The Secret Life of
Pets movie
“hope” optimistic
"The Secret Life of Pets" was
clever, adorable, funny and I
already want to see it again.
sentiment The Secret Life of
Pets movie
“clever”, “funny”,
“adorable”
positive
Intent see the movie “want to” transactional
I don't think watching The
Secret Life of Pets makes me
childish. I laughed I cried
and it was so touching for
someone who has a pet like
me.
Opinion The Secret Life of
Pets movie
“don’t think …
makes me
childish”
positive
Emotion The Secret Life of
Pets movie
“laughed”, “cried”,
“so touching”
funny, touching
Finding Dory was much
better than The Secret Life
of Pets. Still not as good as
Zootopia though.
preference Finding Dory, The
Secret Life of Pets
“much better
than”
preferring Finding
Dory
preference Finding Dory,
Zootopia
“not as good as” Preferring
Zootopia
The Secret Life of Pets
soundtrack should be
nominated for an Oscar
Opinion The Secret Life of
Pets soundtrack
“should be
nominated for an
Oscar”
positive
10
An overview of subjective
information extraction.
The box colored in orange
indicate the scope of this
dissertation.
Dissertation Focus
1. Extraction of Target-
Specific Sentiment
Expressions (ICWSM’12)
2. Discovery of Domain-
Specific Features and
Aspects (NAACL’16)
Emotion Identification
(SocialCom’12, BII’12,
CSCW’14, ACL’14)
3. Application: Predicting
Election Results (SocInfo’12)
• Identifying and extracting subjective information from
user generated content.
11
4. Application: Religiosity &
Happiness (SocInfo’14)
Sentiment
Opinion
Emotion
Subjective
Information
𝐒𝐭𝐢𝐦𝐮𝐥𝐢 𝐬
𝐄𝐱𝐩𝐫𝐞𝐬𝐬𝐢𝐨𝐧 𝒆
𝐂𝐥𝐚𝐬𝐬𝐢𝐟𝐢𝐜𝐚𝐭𝐢𝐨𝐧 𝐜
Holder 𝒉
Thesis Statement
• This dissertation presents a unified framework that characterizes a
subjective experience, such as sentiment, opinion, or emotion, in terms
of an individual holding it, a target eliciting it, a set of expressions
describing it, and a classification or assessment measuring it;
• it describes new algorithms that automatically identify and extract
sentiment expressions and opinion targets from user generated content
with minimal human supervision;
• it shows how to use social media data to predict election results and
investigate religion and subjective well-being, by classifying and
assessing subjective information in user generated content.
12
Sentiment in User Generated Content
Sources: Social media
Data: posts, messages
Targets: movies, persons,
brands, etc.
13
E1. Lights out definitely lived up to the hype! Great movie!
E2. I got my second Pikachu today this one was from 2k egg revitalised my
love for Pokemon go... Did not last long 😆 stoopid game
E3. Game of Thrones is a must watch.
E4. I find myself grateful that Hillary Clinton is predictable and steady. Like
her or don't, she's SAFE.
E5. Saw the avengers last night. Mad overrated. Cheesy lines and horrible
writing. Very predictable.
E6. I saw The Avengers yesterday evening. It was long but it was very good!
E7. Galaxy s7 edge battery life last so long it's almost unlimited battery life
xD
Target
Lights out 75% 20% 5%
Pokemon Go 69% 17% 14%
Game of Thrones 83% 10% 7%
Hillary Clinton 49% 35% 16%
The Avengers 70% 24% 6%
Galaxy S7 Edge 68% 16% 16%
Sentiment Analysis Predictive Models
business
analytics,
predicting
financial
performance,
predicting
election
results
…
1. Extraction of Target-
Specific Sentiment Expressions
14
Lu Chen, Wenbo Wang, Meenakshi Nagarajan, Shaojun Wang, Amit Sheth. Extracting Diverse Sentiment
Expressions with Target-dependent Polarity from Twitter. Proceedings of the 6th International AAAI
Conference on Weblogs and Social Media (ICWSM), 2012.
Given a set of unlabeled social media posts, how to
extract diverse forms of sentiment expressions with
respect to a specific target?
Example
E1. Lights out definitely lived up to the hype! Great movie!
E2. I got my second Pikachu today this one was from 2k egg revitalised my
love for Pokemon go... Did not last long 😆 stoopid game
E3. Game of Thrones is a must watch.
E4. I find myself grateful that Hillary Clinton is predictable and steady. Like
her or don't, she's SAFE.
E5. Saw the avengers last night. Mad overrated. Cheesy lines and horrible
writing. Very predictable.
E6. I saw The Avengers yesterday evening. It was long but it was very good!
E7. Galaxy s7 edge battery life last so long it's almost unlimited battery life
xD
Instances Sentiment Expressions Classification
E1 lived up to the hype, great positive
E2 love, not last long, stoopid positive, negative
E3 must watch positive
E4 grateful, predictable, steady, safe positive
E5
mad overrated, cheesy, horrible,
very predictable
negative
E6 long, very good negative, positive
E7 last so long, unlimited positive
Sources: Social media
Data: posts, messages
e.g., tweets
Targets: movies, persons,
brands, etc.
15
Challenges
• Sentiment expressions can be very diverse.
‒ Vary from single words (e.g., “good”, “predictable”) to multi-word phrases
of different lengths (“lived up to the hype”, “must see”)
‒ Can be formal or slang expressions, including abbreviations and spelling
variations (e.g., “gud”, “stoopid”).
• The polarity of a sentiment expression is sensitive to its target.
‒ E.g., “long” in “long river”, “long battery life”, or “long time for
downloading”.
‒ E.g., “predictable” regarding movies, or regarding stocks.
16
Contributions
We propose a novel optimization-based approach that:
• identifies a diverse and richer set of sentiment expressions,
including both formal and slang words/phrases;
• assesses the target-dependent polarity of each sentiment
expression; and
• does not require labeled data or hand-crafted patterns.
17
The Proposed Approach
Extracting
Candidate Expressions
Identifying
Inter-Expression Relations
Assessing
Target-dependent Polarity
18
Example:
“The Avengers movie was bloody amazing! A little cheesy at times, but I
liked it. Mmm looking good Robert Downey Jr and Captain America ;)”
“on-target” subjective words: “bloody”, “amazing”, “cheesy”, “liked”
Candidate expressions: “bloody”, “amazing”, “bloody amazing”, “cheesy”,
“little cheesy”, “cheesy at times”, “little cheesy at times”, “liked”
Method:
• For each message, selecting the “on-target” subjective words, and
extracting all the n-grams that contain at least one selected subjective
word as candidates.
• A subjective word is selected as “on-target” if
(1) there is a dependency relation between the word and the target, or
(2) the word is proximate to the target (e.g., within four words distance).
19
Extracting Candidate Expressions
Identifying Inter-Expression Relations
1. I saw The Avengers yesterday evening. It was long but it was very good!
2. I do enjoy The Avengers, but it's both overrated and problematic.
3. Saw the avengers last night. Mad overrated. Cheesy lines and horrible
writing. Very predictable.
4. The avengers was good but the plot was just simple minded and predictable.
5. The Avengers was good. I was not disappointed.
20
Assessing Target-dependent Polarity
21
An Optimization Model (1)
• For each candidate expression ,
‒ P-Probability – the probability that indicates positive sentiment
‒ N-Probability – the probability that indicates negative
sentiment
• For each pair of candidate expressions and ,
‒ Consistency probability – the probability that and have the same
polarity:
‒ Inconsistency probability – the probability that and have different
polarities:
ic
)(Pr i
P
c
)(Pr i
N
c
ic
ic
1)(Pr)(Pr  i
N
i
P
cc
ic jc
ic jc
)(Pr)(Pr)(Pr)(Pr),(Pr j
N
i
N
j
P
i
P
ji
cons
cccccc 
ic jc
)(Pr)(Pr)(Pr)(Pr),(Pr j
P
i
N
j
N
i
P
ji
incons
cccccc 
22
An Optimization Model (2)
• We want the consistency and inconsistency probabilities derived from
the P-Probabilities and N-Probabilities of the candidates to be closest to
their expectations suggested by the relation networks.
• Objective Function:
    








 
1
1
22
),(Pr1),(Pr1minimize
n
i
n
ij
ji
inconsincons
ijji
conscons
ij ccwccw
where and are the weights of the edges (strength of the
relations) between and in the consistency and inconsistency relation
networks, and n is the total number of candidate expressions.
ic jc
cons
ijw incons
ijw
)(Pr)(Pr)(Pr)(Pr),(Pr j
N
i
N
j
P
i
P
ji
cons
cccccc 
)(Pr)(Pr)(Pr)(Pr),(Pr j
P
i
N
j
N
i
P
ji
incons
cccccc 
23
Experiments: Datasets
Table: Description of four
target-specific datasets from
social media.
24
Tweet about movie New Star Trek movie is great! Highly recommend it!
Tweet about person Scarlett Johansson rocking a suit better than most men.
Forum post about
epilepsy treatment
I have an 11 month old who suffers from 0-8 seizures per day. We've tried 6
medications that have all failed and are now on The Ketogenic Diet. The diet has
been amazing at reducing the frequency and intensity of his seizures. However, I
want them GONE! I am wondering if infant chiropractic care or acupuncture is safe
and effective in eliminating seizures. Does anyone have any experience with either
of these?
Forum post about
cellular company
I click on Mobile Sync to move all my contacts from my phone to the Sprint website.
There are over 100 contacts in my phone, but it's only moving 59 of them? Help
Facebook post
about automobile
company
I have a 2006 Trailblazer that had a motor failure at 60,000 miles. GM refused to
help in any way. Poor customer service to say the least. I guess they don't care about
your car post warranty. With a driveway full of GM's its probably the last one I will
buy.
Experiments on Tweets
• Datasets:
‒ 168,005 tweets about movies
‒ 258,655 tweets about persons
• Gold standard: 1500 tweets were randomly sampled from each domain.
Human experts identified sentiment expressions and labeled each
expression and tweet with target-specific sentiment.
Table: Distributions of N-
grams and Part-of-speech of
the Sentiment Expressions in
the Gold Standard Data Set.
Table: Distribution of
Sentiment Categories of the
Tweets in the Gold Standard
Data Set.
25
Methods
COM -- Constrained Optimization Model
• COM-const: Assign 0.5 to all the candidates as their initial P-
Probabilities.
• COM-gelex: Initialize the candidates’ polarities according to the
subjectivity dictionary. (positive-1.0, negative-0.0, other-0.5)
• MPQA, GI, SWN: For each extracted subjective word regarding
the target, simply look up its polarity in MPQA, General Inquirer
and SentiWordNet, respectively.
• PROP: a propagation approach proposed by Qiu et al. (IJCAI’09)
26
Results
27
It demonstrates the advantage of our
optimization-based approach over
the lexicon-based or rule-based
manner in polarity assessment – our
method extracts diverse sentiment
expressions and capture their target-
dependent polarity.
Results of Sentiment Expression Extraction with Various Corpora Sizes
Our approach make increases on both
precision and recall when we increase the
size of corpora from 12,000 to 48,000.
Because our method could benefit from
more relations extracted from larger corpora.
28
• Datasets:
‒ 100 forum posts about epilepsy treatment
‒ 162 forum posts about cellular company
‒ 200 Facebook posts about automobile company
• Gold standard: human experts identified sentiment expressions from
posts, and labeled each expression and post sentence with target-
specific sentiment.
29
Experiments on Other Social Media Posts
Table: Characteristics of
sentiment expressions
in the Gold Standard
Data Set.
Table: Distribution of
Sentiment Categories of
post sentences in the
Gold Standard Data Set.
Results
30
Table: Quality of the extracted sentiment
expressions.
Figure: Sentence-level sentiment
classification accuracy using different
lexicons.
The stable performance on all five datasets provides a strong indication that
the proposed approach is not limited to a specific domain or a specific social
media data source.
Sample Output (Movie Domain)
31
Aspect-based Opinion Mining
It would be helpful to have an aspect-based opinion summarization for products.
…
Size
picture quality
motion-smoothing
sound quality
big screen perfect size fits big bedroom …
full hd best picture blur reduction …
smooth motion sensor tracing effects …
loud white noise high pitched sound …
32
2. Discovery of Domain-
Specific Features and Aspects
33
Lu Chen, Justin Martineau, Doreen Cheng and Amit Sheth. Clustering for Simultaneous Extraction of
Aspects and Features from Reviews. Proceedings of the 15th Annual Conference of the North American
Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL),
2016.
Given a set of plain product reviews, how to efficiently
identify (both explicit and implicit) product features
and group them into aspects?
Example
Review Sentences
1. Phone is easy to use and has great features. Large
screen is great. Great speed makes smooth
viewing of tv programs or sports.
2. It has a big bright display, it's very fast and very
lightweight for its size.
3. Good features for an inexpensive android, light,
good signal, good sound, pretty quick for a
800MHz processor.
4. The phone runs extra fast and smooth, and has
great price.
Aspects
{screen, display, bright}
{size, large, big}
{lightweight, light}
{price, inexpensive}
{speed, processor, fast,
quick, smooth}
{easy, use}
{features}
{signal}
{sound}
Feature: components and attributes of a product.
• Explicit feature: mentioned as a opinion target
• Implicit feature: implied by opinion words
• Different feature expressions may be used to describe the same aspect
of a product.
Aspect: represented as a group of features 34
• Two-step approach: first identifying features, then clustering
them
• Feature Identification
‒ Only extract features but not group them.
‒ Implicit features have been largely ignored.
‒ Require seed terms, hand-crafted rules/patterns, or other annotation
efforts.
• Feature Clustering/Aspect Discovery
‒ Assume that features have been identified beforehand.
‒ Topic-model based approach
o not fine-grained aspects (Zhang and Liu, 2014), not directly
interpretable as aspects (Chen et al., 2013; Bancken et al., 2014),
not good at dealing with aspect sparsity (Xu et al., 2014), etc.
‒ Clustering-based approach (Su et al., 2008; Lu et al., 2009; Bancken et
al., 2014)
Related Work
35
Contributions
We propose a new clustering-based approach that:
• identifies both features and aspects simultaneously;
• extracts both explicit and implicit features and groups them into
aspects; and
• does not require seed terms, hand-crafted patterns, or any other
labeling efforts.
36
Notation
 is a set of
candidate features, which are extracted
from reviews of a given product.
o Candidate of explicit features: noun and
noun phrases
o Candidate of implicit features: adjectives
and verbs
 is the number of aspects.
 is the number of most
frequent candidates that will be
grouped first to generate the seed
clusters.
 is the upper bound of the distance
between two mergeable clusters.
(1) To generate high quality seed clusters:
Frequent terms are more likely the
actual features of customers' interests.
(2) Speed up the process by clustering only
the most frequent ones.
Domain-specific similarity measure:
determine how similar the members in two
clusters are regarding the particular
domain/product.
Merging constraints: further ensure that
the terms from different aspects would not
be merged
The Clustering Algorithm
37
• General semantic similarities that are learned from thesaurus
dictionaries or web corpus.
‒ The similarities between words/phrases are domain dependent.
E.g., “ice cream sandwich'' and “operating system” (cell-phone domain)
“smooth” and “speed” (cell-phone domain vs. hair dryer domain)
• Domain-dependent similarities that are learned from a domain-
specific corpus based on distributional information.
‒ Different aspects may share similar context.
E.g., “great display”, “great price”, “great speed”
‒ The words describing the same aspect may not share similar context or
co-occur.
E.g., people use “is inexpensive” or “has great price” instead of “has
inexpensive price”; “running fast” or “great speed” instead of “fast speed”
Similarity Measures
38
Domain-specific Similarity
• General similarity matrix G -- a n × n matrix, where Gij is the general semantic
similarity between xi and xj , Gij ∈ [0, 1], Gij = 1 when i=j, and Gij = Gji.
• Use UMBC Semantic Similarity Service to get G.
• Statistical association matrix T -- a n × n matrix, where Tij is the pairwise
statistical association between xi and xj in a domain-specific corpus, Tij ∈ [0,
1], Tij = 1 when i=j, and Tij = Tji.
• Use normalized pointwise mutual information (NPMI) to get T.
39
- f(xi) (or f(xj)) is the number of documents where xi (or xj) appears,
- f(xi, xj) is the number of documents where xi and xj co-occur in a sentence,
- N is the total number of documents in the corpus.
NPMI(xi, xj) ∈ [−1, 1], and we rescale the values of NPMI to the range of [0, 1].
• A candidate xi can be represented by the i-th row in G or T.
40
where
• The domain-specific similarity between xi and xj is defined as the weighted
sum of the similarity metrics:
simg captures semantically similar/relevant words,
e.g., “screen” and “display”, “speed” and “fast”.
simt captures words sharing similar context, e.g.,
“ice cream sandwich” and “operating system”.
simgt gets high value when the terms strongly associated with xi (or xj) are
semantically similar to xj (or xi), e.g., “smooth” and “speed”.
Domain-specific Similarity
• We evaluate this approach on reviews from three different domains.
• The default setting of CAFE (Clustering for Aspect and Feature Extraction):
‒ The number of aspects k = 50
‒ Distance upper bound 𝛿 = 0.8
‒ The number of candidates that are grouped first to generate seed clusters s = 500
‒ The weights of three similarity measures wg = wt = 0.2, wgt = 0.6
41
Data and Experimental Setting
• PROP: A double propagation approach that extracts features using hand-
crafted rules based on dependency relations between features and opinion
words. (Qiu et al., IJCAI’09)
• LRTBOOT: A bootstrapping approach that extracts features by mining
pairwise feature-feature, feature-opinion, opinion-opinion associations
between terms in the corpus, where the association is measured by the
likelihood ratio tests (Hai et al., CIKM’12)
Evaluations on Feature Extraction – Methods
42
43
Evaluations on Feature Extraction – Results
• MuReinf: A clustering method utilizes the mutual reinforcement
association between features and opinion words to iteratively group them
into feature clusters and opinion clusters. (Su et al., WWW’08)
• L-EM: A semi-supervised learning method that adapts Naive Bayesian-
based EM algorithm to group synonym features into categories. (Zhai et al.,
WSDM’11)
• L-LDA: This is a baseline method used in (Zhai et al., WSDM’11), which is
based on LDA.
* Because MuReinf, L-EM and L-LDA need another algorithm to extract
features, both the LRTBOOT and CAFE is applied.
Evaluations on Aspect Discovery – Methods
44
Evaluations on Aspect Discovery – Results
45
The results showed the advantage of combining feature and aspect discovery
over chaining them, and also implied the effectiveness of our domain-specific
similarity measure in identifying synonym features in a particular domain.
Influence of Parameters
46
Based on the experiments on three domains, the best results can be achieved when
distance upper bound 𝜹 is set to a value between 0.76 and 0.84.
CAFE generates better results by first clustering the top 10%-30% most frequent
candidates.
The best F-score and Rand Index can be achieved when we set wgt to 0.5 or 0.6 across all
three domains.
Sample Output
47
3. Harnessing Public Opinion
on Twitter to predict election
results
48
Lu Chen, Wenbo Wang, Amit P. Sheth. Are Twitter Users Equal in Predicting Elections? A Study of User
Groups in Predicting 2012 U.S. Republican Presidential Primaries. Proceedings of the 4th International
Conference on Social Informatics (SocInfo) 2012.
How to derive public opinion about election candidates?
Are opinion holders equal in predicting elections?
Overview
49
Tweet ID
candidate: XXX
opinion:
positive
User category:
right-leaning
high engagement
opinion prone
orig. tweet-prone
a user
tweets
network
2. Engagement Degree 4. Tweet Mode
3. Content Type1. Political Preference
Predicting which
candidate this user
support
Aggregating opinions
of each user group to
predict election results
Contributions
• We introduce a new method to predict the election results that:
‒ identifies which candidate is mentioned, and whether a positive or
negative opinion is expressed towards a candidate in a tweet;
‒ predicts which candidate a user supports based on the opinions extracted
from his/her tweets; and
‒ aggregates the opinions of all users from a group to predict which
candidate will win the election.
• We show that the opinion holders matter in predicting election
results.
‒ We group users based on their political preference, engagement degree,
tweet mode, and content type, and examine the predictive power of
different user groups in predicting Super Tuesday results in 10 states.
‒ We evaluate the results in terms of both the accuracy of predicting
winners and the error rate between the predicted votes and the actual
votes for each candidate.
50
Findings
51
Revealing the challenge of
identifying the opinion of “silent
majority”
Retweets may not necessarily
reflect users' attitude.
Prediction of user’s vote based on
more opinion tweets is not
necessarily more accurate than the
prediction using more information
tweets
The right-leaning user group provides
the most accurate prediction result. In
the best case (56-day time window), it
correctly predict the winners in 8 out
of 10 states with an average
prediction error of 0.1.
4. Religion and Subjective Well-
being
52
Lu Chen, Ingmar Weber and Adam Okulicz-Kozaryn. U.S. Religious Landscape on Twitter. Proceedings of
the 6th International Conference on Social Informatics (SocInfo), 2014.
Lu Chen, Ingmar Weber, Adam Okulicz-Kozaryn, and Amit Sheth. Understanding the Effect of Religion
on Happiness by Examining the Topic Preferences and Word Usage on Twitter. (in submission to PLOS
ONE).
How to use Twitter data to measure subjective well-
being? How does the religious belief of users
(holders) affect their happiness expressed in tweets?
53
user’s religious
belief: Buddhism
a user
tweets
network
user ID
happiness_level: ℎ 𝑎𝑣𝑔 𝑢𝑠𝑒𝑟
topic_preference: 𝑝 𝑡𝑜𝑝𝑖𝑐 𝑢𝑠𝑒𝑟
word_preference: 𝑝(𝑤𝑜𝑟𝑑|𝑡𝑜𝑝𝑖𝑐, 𝑢𝑠𝑒𝑟)
Religion: Buddhism
happiness_level: ℎ 𝑎𝑣𝑔 𝑔𝑟𝑜𝑢𝑝
topic_preference: 𝑝 𝑡𝑜𝑝𝑖𝑐 𝑔𝑟𝑜𝑢𝑝
word_preference: 𝑝(𝑤𝑜𝑟𝑑|𝑡𝑜𝑝𝑖𝑐, 𝑔𝑟𝑜𝑢𝑝)
Overview
aggregating the measures of individual
users to obtain the group-level measures
1. What is the effect of religion on happiness?
2. How does topic preference and word usage
affect the happiness expressed by each group?
Contributions
• We provide a fresh perspective about happiness and religion,
complementing traditional survey-based studies, via analyzing the
topics and words naturally disclosed in people's social media messages.
• We introduce a framework and methodology that explore the effect of
social and demographic factors of a holder (e.g., a holder’s religious
belief) on subjective well-being.
• Our method also explores potential reasons for the variations in the
level of happiness from the holder’s topic preferences and word usage
on topics.
54
Findings
• There is a significant difference among the seven groups (atheist,
Buddhist, Christian, Hindu, Jew, Muslim, and random Twitter users) on
the level of happiness (pleasant/unpleasant emotions) expressed in
tweets.
• Each user group has different topic preferences and different word
usage on the same topic. However, differences on word usage are small
compared with the differences on topic distributions.
• The users' topic preferences strongly correlate with their happiness
expressed in tweets.
55
Conclusion
• This dissertation presents a unified framework that characterizes a
subjective experience, such as sentiment, opinion, or emotion, in terms
of an individual holding it, a target eliciting it, a set of expressions
describing it, and a classification or assessment measuring it;
• it describes new algorithms that automatically identify and extract
sentiment expressions and opinion targets from user generated content
with minimal human supervision;
• it shows how to use social media data to predict election results and
investigate religion and subjective well-being, by classifying and
assessing subjective information in user generated content.
56
Future Directions
57
Time
1. Detecting different types of
subjectivity in text
2. Beyond sentiment and opinion
3. Towards dynamic modeling of
subjective information.
A subjective experience is a
quintuple , where
t is the time when the subjective
experience occurs.
 tcesh ,,,,
Publications
• Lu Chen, Justin Martineau, Doreen Cheng and Amit Sheth. Clustering for Simultaneous Extraction of Aspects and Features from
Reviews. Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational
Linguistics: Human Language Technologies (NAACL), 2016. (Acceptance rate: 24%)
• Lu Chen, Ingmar Weber and Adam Okulicz-Kozaryn. U.S. Religious Landscape on Twitter. Proceedings of the 6th International
Conference on Social Informatics (SocInfo), 2014. (Acceptance rate: 23%)
• Justin Martineau, Lu Chen, Doreen Cheng and Amit Sheth. Active Learning with Efficient Feature Weighting Methods for Improving
Data Quality and Classification Accuracy. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics
(ACL), 2014. (Acceptance rate: 26%)
• Wenbo Wang, Lu Chen, Krishnaprasad Thirunarayan, Amit P. Sheth. Cursing in English on Twitter. Proceedings of the 17th ACM
Conference on Computer Supported Cooperative Work and Social Computing (CSCW) 2014. (Acceptance rate: 27%)
• Amit Sheth, Ashutosh Jadhav, Pavan Kapanipathi, Lu Chen, Hemant Purohit, Alan Smith, and Wenbo Wang. Chapter title: Twitris - A
System for Collective Social Intelligence. Encyclopedia of Social Network Analysis and Mining, 2014.
• D. Cameron, G. A. Smith, R. Daniulaityte, A. P. Sheth, D. Dave, L. Chen, G. Anand, R. Carlson, K. Z. Watkins, R. Falck. PREDOSE: A
Semantic Web Platform for Drug Abuse Epidemiology Using Social Media. Journal of Biomedical Informatics: Special Issue on
Biomedical Information through the Implementation of Social Media Environments. 2013. PMID: 23892295.
• Lu Chen, Wenbo Wang, Amit P. Sheth. Are Twitter Users Equal in Predicting Elections? A Study of User Groups in Predicting 2012
U.S. Republican Presidential Primaries. Proceedings of the 4th International Conference on Social Informatics (SocInfo) 2012.
(Acceptance rate: 35%)
• Wenbo Wang, Lu Chen, Krishnaprasad Thirunarayan, Amit P. Sheth. Harnessing Twitter "Big Data" for Automatic Emotion
Identification. Proceedings of the 4th ASE/IEEE International Conference on Social Computing (SocialCom), 2012.
• Lu Chen, Wenbo Wang, Meenakshi Nagarajan, Shaojun Wang, Amit Sheth. Extracting Diverse Sentiment Expressions with Target-
dependent Polarity from Twitter. Proceedings of the 6th International AAAI Conference on Weblogs and Social Media (ICWSM),
2012. (Acceptance rate: 20%)
• Wenbo Wang, Lu Chen, Ming Tan, Shaojun Wang, Amit Sheth. Discovering Fine-grained Sentiment in Suicide Notes. Biomedical
Informatics Insights (BII), 2012.
• R. Daniulaityte, R. Carlson, R. Falck, D. Cameron, S. Perera, L. Chen, and A. Sheth. "I Just Wanted to Tell You That Loperamide WILL
WORK": A Web-Based Study of Extra-Medical Use of Loperamide. Journal of Drug and Alcohol Dependence, 2012.
• R. Daniulaityte, R. Carlson, R. Falck, D. Cameron, S. Udayanga, L. Chen, A. Sheth. A Web-based Study of Self-treatment of Opioid
Withdrawal Symptoms with Loperamide. The College on Problems of Drug Dependence (CPDD), 2012.
58
Media Coverage (1)
59
Washington Post Washington Times La Croix
MIT Technology Review Time
Media Coverage (2)
60
Fast Company RAPPLER BuzzFeed
The Times of India
Huffington Post
Media Coverage (3)
61
IN Gizmodo RNS
NDTV World Religion News
Acknowledgement
62
Prof. Amit Sheth
(Advisor)
Dr. Ingmar Weber
(QCRI)
Prof. T.K.Prasad Dr. Justin Martineau
(SRA)
Prof. Keke Chen
Dissertation Committee
Co-authors and Collaborators
Dr. Shaojun Wang
Computer Science
Dr. Meena Nagarajan
(IBM Watson)
Prof. Adam Okulicz-Kozaryn
(Rutgers-Camden)
Dr. Wenbo Wang
(GoDaddy)
Dr. Doreen Cheng
(SRA)
Prof. Raminta Daniulaityte Dr. Delroy Cameron
(Apple)
Dr. Ming Tan
(IBM Watson)
Prof. Valerie Shalin
63
Acknowledgement
This dissertation is based upon work supported by the National
Science Foundation under Grant:
• IIS-1111182 “SoCS: Collaborative Research: Social Media
Enhanced Organizational Sensemaking in Emergency Response”
and
• CNS-1513721 “Context-Aware Harassment Detection on Social
Media.”
64

Weitere ähnliche Inhalte

Andere mochten auch

Cartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defenseCartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defenseCartic Ramakrishnan
 
User-Generated Content on Social Media
User-Generated Content on Social MediaUser-Generated Content on Social Media
User-Generated Content on Social MediaMeena Nagarajan
 
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...Artificial Intelligence Institute at UofSC
 
Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...
Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...
Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...Artificial Intelligence Institute at UofSC
 
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...Artificial Intelligence Institute at UofSC
 
Kno.e.sis Approach to Impactful Research & Training for Exceptional Careers
Kno.e.sis Approach to Impactful Research & Training for Exceptional CareersKno.e.sis Approach to Impactful Research & Training for Exceptional Careers
Kno.e.sis Approach to Impactful Research & Training for Exceptional CareersAmit Sheth
 
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...Artificial Intelligence Institute at UofSC
 
Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...Amit Sheth
 
2017 sa tc_pi_meeting_-_poster final 2
2017 sa tc_pi_meeting_-_poster final 22017 sa tc_pi_meeting_-_poster final 2
2017 sa tc_pi_meeting_-_poster final 2Monireh Ebrahimi
 

Andere mochten auch (20)

Cartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defenseCartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defense
 
PhD thesis defense of Christopher Thomas
PhD thesis defense of Christopher ThomasPhD thesis defense of Christopher Thomas
PhD thesis defense of Christopher Thomas
 
User-Generated Content on Social Media
User-Generated Content on Social MediaUser-Generated Content on Social Media
User-Generated Content on Social Media
 
Satya Sahoo Thesis Defense
Satya Sahoo Thesis DefenseSatya Sahoo Thesis Defense
Satya Sahoo Thesis Defense
 
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...
 
PhD thesis defense of Ajith Ranabahu
PhD thesis defense of Ajith RanabahuPhD thesis defense of Ajith Ranabahu
PhD thesis defense of Ajith Ranabahu
 
Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...
Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...
Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...
 
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...
 
Prateek Jain's Dissertation Defense - Linked Open Data Alignment and Querying
Prateek Jain's Dissertation Defense - Linked Open Data Alignment and QueryingPrateek Jain's Dissertation Defense - Linked Open Data Alignment and Querying
Prateek Jain's Dissertation Defense - Linked Open Data Alignment and Querying
 
Web and Complex Systems Lab @ Kno.e.sis
Web and Complex Systems Lab @ Kno.e.sisWeb and Complex Systems Lab @ Kno.e.sis
Web and Complex Systems Lab @ Kno.e.sis
 
2015 Kno.e.sis Center Annual Review
2015 Kno.e.sis Center Annual Review2015 Kno.e.sis Center Annual Review
2015 Kno.e.sis Center Annual Review
 
Trust Management: A Tutorial
Trust Management: A TutorialTrust Management: A Tutorial
Trust Management: A Tutorial
 
Kno.e.sis Approach to Impactful Research & Training for Exceptional Careers
Kno.e.sis Approach to Impactful Research & Training for Exceptional CareersKno.e.sis Approach to Impactful Research & Training for Exceptional Careers
Kno.e.sis Approach to Impactful Research & Training for Exceptional Careers
 
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
 
Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...
 
Kno.e.sis Review: late 2012 to mid 2013
Kno.e.sis Review: late 2012 to mid 2013Kno.e.sis Review: late 2012 to mid 2013
Kno.e.sis Review: late 2012 to mid 2013
 
Knoesis Student Achievement
Knoesis Student AchievementKnoesis Student Achievement
Knoesis Student Achievement
 
Context Aware Harassment Detection in Social Media [Overview]
Context Aware Harassment Detection in Social Media [Overview]Context Aware Harassment Detection in Social Media [Overview]
Context Aware Harassment Detection in Social Media [Overview]
 
2017 sa tc_pi_meeting_-_poster final 2
2017 sa tc_pi_meeting_-_poster final 22017 sa tc_pi_meeting_-_poster final 2
2017 sa tc_pi_meeting_-_poster final 2
 
Depression slides.pptx
Depression slides.pptxDepression slides.pptx
Depression slides.pptx
 

Ähnlich wie Mining and Analyzing Subjective Experiences in User-generated Content

Mining and Analyzing Subjective and Experiences in Social Media Text
Mining and Analyzing Subjective and Experiences in Social Media TextMining and Analyzing Subjective and Experiences in Social Media Text
Mining and Analyzing Subjective and Experiences in Social Media TextLu Chen
 
Narrative-Driven Game Design (revised)
Narrative-Driven Game Design (revised)Narrative-Driven Game Design (revised)
Narrative-Driven Game Design (revised)Artur Ganszyniec
 
T:\staff resource\7th grade\7th language arts\writing test\evaluative
T:\staff resource\7th grade\7th language arts\writing test\evaluativeT:\staff resource\7th grade\7th language arts\writing test\evaluative
T:\staff resource\7th grade\7th language arts\writing test\evaluativePaula Layton
 
The Reason Seeking Transfer AdmissionApplication Essay
The Reason Seeking Transfer AdmissionApplication EssayThe Reason Seeking Transfer AdmissionApplication Essay
The Reason Seeking Transfer AdmissionApplication EssayJennifer Strong
 
How to design inner play in a study narrative?
How to design inner play in a study narrative? How to design inner play in a study narrative?
How to design inner play in a study narrative? Eva Den Heijer
 
Nighttime dreams and video game play
Nighttime dreams and video game playNighttime dreams and video game play
Nighttime dreams and video game playjgackenb
 
#Conversatorio - Ciencia de datos como ventana a la sociedad. (Edgar Altszyle...
#Conversatorio - Ciencia de datos como ventana a la sociedad. (Edgar Altszyle...#Conversatorio - Ciencia de datos como ventana a la sociedad. (Edgar Altszyle...
#Conversatorio - Ciencia de datos como ventana a la sociedad. (Edgar Altszyle...Aprender 3C
 
Extracting What We Think and How We Feel from What We Say in Social Media
Extracting What We Think and How We Feel from What We Say in Social MediaExtracting What We Think and How We Feel from What We Say in Social Media
Extracting What We Think and How We Feel from What We Say in Social MediaLu Chen
 
Essay On Best Teacher.pdf
Essay On Best Teacher.pdfEssay On Best Teacher.pdf
Essay On Best Teacher.pdfAmanda Cote
 
Film Unit I
Film Unit IFilm Unit I
Film Unit Ijamarch
 
Film Studies Unit 1 Structure/Story/Form
Film Studies Unit 1  Structure/Story/FormFilm Studies Unit 1  Structure/Story/Form
Film Studies Unit 1 Structure/Story/Formjamarch
 
Persuasive Texts: The language of persuasion by Jeni Mawter
Persuasive Texts: The language of persuasion by Jeni MawterPersuasive Texts: The language of persuasion by Jeni Mawter
Persuasive Texts: The language of persuasion by Jeni MawterJeni Mawter
 
The Social SelfThree Motivations in Social PsychologyM.docx
The Social SelfThree Motivations in Social PsychologyM.docxThe Social SelfThree Motivations in Social PsychologyM.docx
The Social SelfThree Motivations in Social PsychologyM.docxjoshua2345678
 
Class 2 - Building Rapport
Class 2 - Building RapportClass 2 - Building Rapport
Class 2 - Building RapportCase IQ
 
Dreams and video game play
Dreams and video game playDreams and video game play
Dreams and video game playjgackenb
 
31 SAMPLE IELTS TASK 2 ESSAYS
31 SAMPLE IELTS TASK 2 ESSAYS31 SAMPLE IELTS TASK 2 ESSAYS
31 SAMPLE IELTS TASK 2 ESSAYSSarah Brown
 
Why Self Reflection Is Important To Community S
Why Self Reflection Is Important To Community SWhy Self Reflection Is Important To Community S
Why Self Reflection Is Important To Community SCrystal Jackson
 
Secrets: The Game Is On
Secrets: The Game Is OnSecrets: The Game Is On
Secrets: The Game Is Onlees49
 

Ähnlich wie Mining and Analyzing Subjective Experiences in User-generated Content (20)

Mining and Analyzing Subjective and Experiences in Social Media Text
Mining and Analyzing Subjective and Experiences in Social Media TextMining and Analyzing Subjective and Experiences in Social Media Text
Mining and Analyzing Subjective and Experiences in Social Media Text
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
Narrative-Driven Game Design (revised)
Narrative-Driven Game Design (revised)Narrative-Driven Game Design (revised)
Narrative-Driven Game Design (revised)
 
T:\staff resource\7th grade\7th language arts\writing test\evaluative
T:\staff resource\7th grade\7th language arts\writing test\evaluativeT:\staff resource\7th grade\7th language arts\writing test\evaluative
T:\staff resource\7th grade\7th language arts\writing test\evaluative
 
The Reason Seeking Transfer AdmissionApplication Essay
The Reason Seeking Transfer AdmissionApplication EssayThe Reason Seeking Transfer AdmissionApplication Essay
The Reason Seeking Transfer AdmissionApplication Essay
 
Theme revised 10 10 21
Theme revised 10 10 21Theme revised 10 10 21
Theme revised 10 10 21
 
How to design inner play in a study narrative?
How to design inner play in a study narrative? How to design inner play in a study narrative?
How to design inner play in a study narrative?
 
Nighttime dreams and video game play
Nighttime dreams and video game playNighttime dreams and video game play
Nighttime dreams and video game play
 
#Conversatorio - Ciencia de datos como ventana a la sociedad. (Edgar Altszyle...
#Conversatorio - Ciencia de datos como ventana a la sociedad. (Edgar Altszyle...#Conversatorio - Ciencia de datos como ventana a la sociedad. (Edgar Altszyle...
#Conversatorio - Ciencia de datos como ventana a la sociedad. (Edgar Altszyle...
 
Extracting What We Think and How We Feel from What We Say in Social Media
Extracting What We Think and How We Feel from What We Say in Social MediaExtracting What We Think and How We Feel from What We Say in Social Media
Extracting What We Think and How We Feel from What We Say in Social Media
 
Essay On Best Teacher.pdf
Essay On Best Teacher.pdfEssay On Best Teacher.pdf
Essay On Best Teacher.pdf
 
Film Unit I
Film Unit IFilm Unit I
Film Unit I
 
Film Studies Unit 1 Structure/Story/Form
Film Studies Unit 1  Structure/Story/FormFilm Studies Unit 1  Structure/Story/Form
Film Studies Unit 1 Structure/Story/Form
 
Persuasive Texts: The language of persuasion by Jeni Mawter
Persuasive Texts: The language of persuasion by Jeni MawterPersuasive Texts: The language of persuasion by Jeni Mawter
Persuasive Texts: The language of persuasion by Jeni Mawter
 
The Social SelfThree Motivations in Social PsychologyM.docx
The Social SelfThree Motivations in Social PsychologyM.docxThe Social SelfThree Motivations in Social PsychologyM.docx
The Social SelfThree Motivations in Social PsychologyM.docx
 
Class 2 - Building Rapport
Class 2 - Building RapportClass 2 - Building Rapport
Class 2 - Building Rapport
 
Dreams and video game play
Dreams and video game playDreams and video game play
Dreams and video game play
 
31 SAMPLE IELTS TASK 2 ESSAYS
31 SAMPLE IELTS TASK 2 ESSAYS31 SAMPLE IELTS TASK 2 ESSAYS
31 SAMPLE IELTS TASK 2 ESSAYS
 
Why Self Reflection Is Important To Community S
Why Self Reflection Is Important To Community SWhy Self Reflection Is Important To Community S
Why Self Reflection Is Important To Community S
 
Secrets: The Game Is On
Secrets: The Game Is OnSecrets: The Game Is On
Secrets: The Game Is On
 

Kürzlich hochgeladen

THE FRAUD NETFLIX ORIGINAL MEDIA PITCH PROJECT
THE FRAUD NETFLIX ORIGINAL MEDIA PITCH PROJECTTHE FRAUD NETFLIX ORIGINAL MEDIA PITCH PROJECT
THE FRAUD NETFLIX ORIGINAL MEDIA PITCH PROJECT17mos052
 
Top 10 Ways to Know If a Song on social media
Top 10 Ways to Know If a Song on social mediaTop 10 Ways to Know If a Song on social media
Top 10 Ways to Know If a Song on social mediae-Definers Technology
 
Dubai Calls Girls Busty Babes O525547819 Call Girls In Dubai
Dubai Calls Girls Busty Babes O525547819 Call Girls In DubaiDubai Calls Girls Busty Babes O525547819 Call Girls In Dubai
Dubai Calls Girls Busty Babes O525547819 Call Girls In Dubaikojalkojal131
 
Values Newsletter teamwork section 2023.pdf
Values Newsletter teamwork section 2023.pdfValues Newsletter teamwork section 2023.pdf
Values Newsletter teamwork section 2023.pdfSoftServe HRM
 
Amplify Your Brand with Our Tailored Social Media Marketing Services
Amplify Your Brand with Our Tailored Social Media Marketing ServicesAmplify Your Brand with Our Tailored Social Media Marketing Services
Amplify Your Brand with Our Tailored Social Media Marketing ServicesNetqom Solutions
 
Unveiling SOCIO COSMOS: Where Socializing Meets the Stars
Unveiling SOCIO COSMOS: Where Socializing Meets the StarsUnveiling SOCIO COSMOS: Where Socializing Meets the Stars
Unveiling SOCIO COSMOS: Where Socializing Meets the StarsSocioCosmos
 
The--Fraud: Netflix Original Media Pitch
The--Fraud: Netflix Original Media PitchThe--Fraud: Netflix Original Media Pitch
The--Fraud: Netflix Original Media Pitch17mos052
 
Top 5 Ways To Use Reddit for SEO SEO Expert in USA - Macaw Digital
Top 5 Ways To Use Reddit for SEO  SEO Expert in USA - Macaw DigitalTop 5 Ways To Use Reddit for SEO  SEO Expert in USA - Macaw Digital
Top 5 Ways To Use Reddit for SEO SEO Expert in USA - Macaw Digitalmacawdigitalseo2023
 
INDIGENOUS GODS AND INDIGENOUS GODDESSES.pdf
INDIGENOUS GODS AND INDIGENOUS GODDESSES.pdfINDIGENOUS GODS AND INDIGENOUS GODDESSES.pdf
INDIGENOUS GODS AND INDIGENOUS GODDESSES.pdfcarlos784vt
 

Kürzlich hochgeladen (9)

THE FRAUD NETFLIX ORIGINAL MEDIA PITCH PROJECT
THE FRAUD NETFLIX ORIGINAL MEDIA PITCH PROJECTTHE FRAUD NETFLIX ORIGINAL MEDIA PITCH PROJECT
THE FRAUD NETFLIX ORIGINAL MEDIA PITCH PROJECT
 
Top 10 Ways to Know If a Song on social media
Top 10 Ways to Know If a Song on social mediaTop 10 Ways to Know If a Song on social media
Top 10 Ways to Know If a Song on social media
 
Dubai Calls Girls Busty Babes O525547819 Call Girls In Dubai
Dubai Calls Girls Busty Babes O525547819 Call Girls In DubaiDubai Calls Girls Busty Babes O525547819 Call Girls In Dubai
Dubai Calls Girls Busty Babes O525547819 Call Girls In Dubai
 
Values Newsletter teamwork section 2023.pdf
Values Newsletter teamwork section 2023.pdfValues Newsletter teamwork section 2023.pdf
Values Newsletter teamwork section 2023.pdf
 
Amplify Your Brand with Our Tailored Social Media Marketing Services
Amplify Your Brand with Our Tailored Social Media Marketing ServicesAmplify Your Brand with Our Tailored Social Media Marketing Services
Amplify Your Brand with Our Tailored Social Media Marketing Services
 
Unveiling SOCIO COSMOS: Where Socializing Meets the Stars
Unveiling SOCIO COSMOS: Where Socializing Meets the StarsUnveiling SOCIO COSMOS: Where Socializing Meets the Stars
Unveiling SOCIO COSMOS: Where Socializing Meets the Stars
 
The--Fraud: Netflix Original Media Pitch
The--Fraud: Netflix Original Media PitchThe--Fraud: Netflix Original Media Pitch
The--Fraud: Netflix Original Media Pitch
 
Top 5 Ways To Use Reddit for SEO SEO Expert in USA - Macaw Digital
Top 5 Ways To Use Reddit for SEO  SEO Expert in USA - Macaw DigitalTop 5 Ways To Use Reddit for SEO  SEO Expert in USA - Macaw Digital
Top 5 Ways To Use Reddit for SEO SEO Expert in USA - Macaw Digital
 
INDIGENOUS GODS AND INDIGENOUS GODDESSES.pdf
INDIGENOUS GODS AND INDIGENOUS GODDESSES.pdfINDIGENOUS GODS AND INDIGENOUS GODDESSES.pdf
INDIGENOUS GODS AND INDIGENOUS GODDESSES.pdf
 

Mining and Analyzing Subjective Experiences in User-generated Content

  • 1. Lu Chen Kno.e.sis Center Ph.D. Dissertation Defense Advisor: Prof. Amit P. Sheth Committee members: Prof. T.K. Prasad Prof. Keke Chen Dr. Ingmar Weber (QCRI) Dr. Justin Martineau (SRA) Ohio Center of Excellence in Knowledge-Enabled Computing Mining and Analyzing Subjective Experiences in User Generated Content
  • 2. Subjective Experience – What We Experience in Our Mind Hunger Love Happiness Surprise Embarrassment Like Dislike Confused Pain Tired Stressed Nervous Relaxed Warm Proud Confident Taste of ice cream Feeling about sky Perception of time Appreciation of music Opinion on climate change InterestSource: http://bit.ly/1DvofHX 2 Music preference Purchase intent
  • 3. Subjective Information – The Information about People’s Subjective Experiences Source: http://bit.ly/1GDD9Mb Source: http://bit.ly/1KkJF2l Source: http://bit.ly/1IjjBSX Source: http://bit.ly/1KkK1Gc The traditional way of collecting subjective information: 3
  • 4. User Generated Content • New opportunities arise as we now can obtain a wide variety of subjective information from user generated content. 4
  • 5. The Demand of Subjective Information • Subjective information can be used to support better decision- making. 5 Source: http://twitris2.knoesis.org/debate Predicting election results Source: http://bit.ly/1gQg5Fl Monitoring social phenomena Source: http://bit.ly/1niFkU7 Targeted advertising Source: http://bit.ly/1l0ombo Making purchase decision Source: http://bit.ly/1VzYEZG
  • 6. Different Types of Subjective Information Intent “would like to watch” Expectation “hope it’s good” would like to watch The Secret Life Of Pets. I hope it's good. "The Secret Life of Pets" was clever, adorable, funny and I already want to see it again. I don't think watching The Secret Life of Pets makes me childish. I laughed I cried and it was so touching for someone who has a pet like me. Finding Dory was much better than The Secret Life of Pets. Still not as good as Zootopia though. 6 The Secret Life of Pets soundtrack should be nominated for an Oscar Sentiment “clever, adorable, funny” Intent “want to see it again” Opinion “don’t think watching … makes me childish” Emotion “I laughed I cried and it was so touching” Preference “much better than” Preference “not as good as” Opinion “should be nominated for an Oscar”
  • 7. Defining Subjective Information  cesh ,,, Formally, a subjective experience can be represented as a quadruple 𝒉 − a holder, an individual who holds the experiences 𝒔 − a stimulus (or target), an entity, event or situation that elicits the experiences. 𝒆 − a set of expressions that are used to describe the experience, e.g., the sentiment words/phrases or the opinion claims. 𝒄 − a classification or assessment that categorizes or measures the exeprience, e.g., sentiment orientation (positive vs. negative), emotion type (joy, anger, sadness, surprise, etc.), a score indicating the strength of sentiment. 7
  • 8. Different Types of Subjective Information 8 𝐇𝐨𝐥𝐝𝐞𝐫 𝒉 𝐒𝐭𝐢𝐦𝐮𝐥𝐮𝐬 𝐬 𝐄𝐱𝐩𝐫𝐞𝐬𝐬𝐢𝐨𝐧 𝒆 𝐂𝐥𝐚𝐬𝐬𝐢𝐟𝐢𝐜𝐚𝐭𝐢𝐨𝐧 𝒄 Sentiment an individual who holds the sentiment an entity sentiment words/phrases positive, negative, neutral Opinion an individual who holds the opinion an entity opinion claims (may not contain sentiment words) positive, negative, neutral Emotion an individual who holds the emotion an event or situation emotion words/phrases, description of events/situations anger, disgust, fear, happiness, sadness, surprise Preference an individual who holds the preference a set of alternatives words/phrases that indicate comparison or preference depend on specific tasks Intent an individual who holds the intent an action words/phrases that show the presence of will, description of the act depend on specific tasks Expectation an individual who holds the expectation an entity words/phrases that express the beliefs about someone or something will be. depend on specific tasks
  • 9. 9 * The holders of these experiences are the authors of the messages. Example Type 𝐒𝐭𝐢𝐦𝐮𝐥𝐮𝐬 𝐬 𝐄𝐱𝐩𝐫𝐞𝐬𝐬𝐢𝐨𝐧 𝒆 𝐂𝐥𝐚𝐬𝐬𝐢𝐟𝐢𝐜𝐚𝐭𝐢𝐨𝐧 𝒄 would like to watch The Secret Life Of Pets. I hope it's good. Intent watch the movie “would like to” transactional Expectation The Secret Life of Pets movie “hope” optimistic "The Secret Life of Pets" was clever, adorable, funny and I already want to see it again. sentiment The Secret Life of Pets movie “clever”, “funny”, “adorable” positive Intent see the movie “want to” transactional I don't think watching The Secret Life of Pets makes me childish. I laughed I cried and it was so touching for someone who has a pet like me. Opinion The Secret Life of Pets movie “don’t think … makes me childish” positive Emotion The Secret Life of Pets movie “laughed”, “cried”, “so touching” funny, touching Finding Dory was much better than The Secret Life of Pets. Still not as good as Zootopia though. preference Finding Dory, The Secret Life of Pets “much better than” preferring Finding Dory preference Finding Dory, Zootopia “not as good as” Preferring Zootopia The Secret Life of Pets soundtrack should be nominated for an Oscar Opinion The Secret Life of Pets soundtrack “should be nominated for an Oscar” positive
  • 10. 10 An overview of subjective information extraction. The box colored in orange indicate the scope of this dissertation.
  • 11. Dissertation Focus 1. Extraction of Target- Specific Sentiment Expressions (ICWSM’12) 2. Discovery of Domain- Specific Features and Aspects (NAACL’16) Emotion Identification (SocialCom’12, BII’12, CSCW’14, ACL’14) 3. Application: Predicting Election Results (SocInfo’12) • Identifying and extracting subjective information from user generated content. 11 4. Application: Religiosity & Happiness (SocInfo’14) Sentiment Opinion Emotion Subjective Information 𝐒𝐭𝐢𝐦𝐮𝐥𝐢 𝐬 𝐄𝐱𝐩𝐫𝐞𝐬𝐬𝐢𝐨𝐧 𝒆 𝐂𝐥𝐚𝐬𝐬𝐢𝐟𝐢𝐜𝐚𝐭𝐢𝐨𝐧 𝐜 Holder 𝒉
  • 12. Thesis Statement • This dissertation presents a unified framework that characterizes a subjective experience, such as sentiment, opinion, or emotion, in terms of an individual holding it, a target eliciting it, a set of expressions describing it, and a classification or assessment measuring it; • it describes new algorithms that automatically identify and extract sentiment expressions and opinion targets from user generated content with minimal human supervision; • it shows how to use social media data to predict election results and investigate religion and subjective well-being, by classifying and assessing subjective information in user generated content. 12
  • 13. Sentiment in User Generated Content Sources: Social media Data: posts, messages Targets: movies, persons, brands, etc. 13 E1. Lights out definitely lived up to the hype! Great movie! E2. I got my second Pikachu today this one was from 2k egg revitalised my love for Pokemon go... Did not last long 😆 stoopid game E3. Game of Thrones is a must watch. E4. I find myself grateful that Hillary Clinton is predictable and steady. Like her or don't, she's SAFE. E5. Saw the avengers last night. Mad overrated. Cheesy lines and horrible writing. Very predictable. E6. I saw The Avengers yesterday evening. It was long but it was very good! E7. Galaxy s7 edge battery life last so long it's almost unlimited battery life xD Target Lights out 75% 20% 5% Pokemon Go 69% 17% 14% Game of Thrones 83% 10% 7% Hillary Clinton 49% 35% 16% The Avengers 70% 24% 6% Galaxy S7 Edge 68% 16% 16% Sentiment Analysis Predictive Models business analytics, predicting financial performance, predicting election results …
  • 14. 1. Extraction of Target- Specific Sentiment Expressions 14 Lu Chen, Wenbo Wang, Meenakshi Nagarajan, Shaojun Wang, Amit Sheth. Extracting Diverse Sentiment Expressions with Target-dependent Polarity from Twitter. Proceedings of the 6th International AAAI Conference on Weblogs and Social Media (ICWSM), 2012. Given a set of unlabeled social media posts, how to extract diverse forms of sentiment expressions with respect to a specific target?
  • 15. Example E1. Lights out definitely lived up to the hype! Great movie! E2. I got my second Pikachu today this one was from 2k egg revitalised my love for Pokemon go... Did not last long 😆 stoopid game E3. Game of Thrones is a must watch. E4. I find myself grateful that Hillary Clinton is predictable and steady. Like her or don't, she's SAFE. E5. Saw the avengers last night. Mad overrated. Cheesy lines and horrible writing. Very predictable. E6. I saw The Avengers yesterday evening. It was long but it was very good! E7. Galaxy s7 edge battery life last so long it's almost unlimited battery life xD Instances Sentiment Expressions Classification E1 lived up to the hype, great positive E2 love, not last long, stoopid positive, negative E3 must watch positive E4 grateful, predictable, steady, safe positive E5 mad overrated, cheesy, horrible, very predictable negative E6 long, very good negative, positive E7 last so long, unlimited positive Sources: Social media Data: posts, messages e.g., tweets Targets: movies, persons, brands, etc. 15
  • 16. Challenges • Sentiment expressions can be very diverse. ‒ Vary from single words (e.g., “good”, “predictable”) to multi-word phrases of different lengths (“lived up to the hype”, “must see”) ‒ Can be formal or slang expressions, including abbreviations and spelling variations (e.g., “gud”, “stoopid”). • The polarity of a sentiment expression is sensitive to its target. ‒ E.g., “long” in “long river”, “long battery life”, or “long time for downloading”. ‒ E.g., “predictable” regarding movies, or regarding stocks. 16
  • 17. Contributions We propose a novel optimization-based approach that: • identifies a diverse and richer set of sentiment expressions, including both formal and slang words/phrases; • assesses the target-dependent polarity of each sentiment expression; and • does not require labeled data or hand-crafted patterns. 17
  • 18. The Proposed Approach Extracting Candidate Expressions Identifying Inter-Expression Relations Assessing Target-dependent Polarity 18
  • 19. Example: “The Avengers movie was bloody amazing! A little cheesy at times, but I liked it. Mmm looking good Robert Downey Jr and Captain America ;)” “on-target” subjective words: “bloody”, “amazing”, “cheesy”, “liked” Candidate expressions: “bloody”, “amazing”, “bloody amazing”, “cheesy”, “little cheesy”, “cheesy at times”, “little cheesy at times”, “liked” Method: • For each message, selecting the “on-target” subjective words, and extracting all the n-grams that contain at least one selected subjective word as candidates. • A subjective word is selected as “on-target” if (1) there is a dependency relation between the word and the target, or (2) the word is proximate to the target (e.g., within four words distance). 19 Extracting Candidate Expressions
  • 20. Identifying Inter-Expression Relations 1. I saw The Avengers yesterday evening. It was long but it was very good! 2. I do enjoy The Avengers, but it's both overrated and problematic. 3. Saw the avengers last night. Mad overrated. Cheesy lines and horrible writing. Very predictable. 4. The avengers was good but the plot was just simple minded and predictable. 5. The Avengers was good. I was not disappointed. 20
  • 22. An Optimization Model (1) • For each candidate expression , ‒ P-Probability – the probability that indicates positive sentiment ‒ N-Probability – the probability that indicates negative sentiment • For each pair of candidate expressions and , ‒ Consistency probability – the probability that and have the same polarity: ‒ Inconsistency probability – the probability that and have different polarities: ic )(Pr i P c )(Pr i N c ic ic 1)(Pr)(Pr  i N i P cc ic jc ic jc )(Pr)(Pr)(Pr)(Pr),(Pr j N i N j P i P ji cons cccccc  ic jc )(Pr)(Pr)(Pr)(Pr),(Pr j P i N j N i P ji incons cccccc  22
  • 23. An Optimization Model (2) • We want the consistency and inconsistency probabilities derived from the P-Probabilities and N-Probabilities of the candidates to be closest to their expectations suggested by the relation networks. • Objective Function:                1 1 22 ),(Pr1),(Pr1minimize n i n ij ji inconsincons ijji conscons ij ccwccw where and are the weights of the edges (strength of the relations) between and in the consistency and inconsistency relation networks, and n is the total number of candidate expressions. ic jc cons ijw incons ijw )(Pr)(Pr)(Pr)(Pr),(Pr j N i N j P i P ji cons cccccc  )(Pr)(Pr)(Pr)(Pr),(Pr j P i N j N i P ji incons cccccc  23
  • 24. Experiments: Datasets Table: Description of four target-specific datasets from social media. 24 Tweet about movie New Star Trek movie is great! Highly recommend it! Tweet about person Scarlett Johansson rocking a suit better than most men. Forum post about epilepsy treatment I have an 11 month old who suffers from 0-8 seizures per day. We've tried 6 medications that have all failed and are now on The Ketogenic Diet. The diet has been amazing at reducing the frequency and intensity of his seizures. However, I want them GONE! I am wondering if infant chiropractic care or acupuncture is safe and effective in eliminating seizures. Does anyone have any experience with either of these? Forum post about cellular company I click on Mobile Sync to move all my contacts from my phone to the Sprint website. There are over 100 contacts in my phone, but it's only moving 59 of them? Help Facebook post about automobile company I have a 2006 Trailblazer that had a motor failure at 60,000 miles. GM refused to help in any way. Poor customer service to say the least. I guess they don't care about your car post warranty. With a driveway full of GM's its probably the last one I will buy.
  • 25. Experiments on Tweets • Datasets: ‒ 168,005 tweets about movies ‒ 258,655 tweets about persons • Gold standard: 1500 tweets were randomly sampled from each domain. Human experts identified sentiment expressions and labeled each expression and tweet with target-specific sentiment. Table: Distributions of N- grams and Part-of-speech of the Sentiment Expressions in the Gold Standard Data Set. Table: Distribution of Sentiment Categories of the Tweets in the Gold Standard Data Set. 25
  • 26. Methods COM -- Constrained Optimization Model • COM-const: Assign 0.5 to all the candidates as their initial P- Probabilities. • COM-gelex: Initialize the candidates’ polarities according to the subjectivity dictionary. (positive-1.0, negative-0.0, other-0.5) • MPQA, GI, SWN: For each extracted subjective word regarding the target, simply look up its polarity in MPQA, General Inquirer and SentiWordNet, respectively. • PROP: a propagation approach proposed by Qiu et al. (IJCAI’09) 26
  • 27. Results 27 It demonstrates the advantage of our optimization-based approach over the lexicon-based or rule-based manner in polarity assessment – our method extracts diverse sentiment expressions and capture their target- dependent polarity.
  • 28. Results of Sentiment Expression Extraction with Various Corpora Sizes Our approach make increases on both precision and recall when we increase the size of corpora from 12,000 to 48,000. Because our method could benefit from more relations extracted from larger corpora. 28
  • 29. • Datasets: ‒ 100 forum posts about epilepsy treatment ‒ 162 forum posts about cellular company ‒ 200 Facebook posts about automobile company • Gold standard: human experts identified sentiment expressions from posts, and labeled each expression and post sentence with target- specific sentiment. 29 Experiments on Other Social Media Posts Table: Characteristics of sentiment expressions in the Gold Standard Data Set. Table: Distribution of Sentiment Categories of post sentences in the Gold Standard Data Set.
  • 30. Results 30 Table: Quality of the extracted sentiment expressions. Figure: Sentence-level sentiment classification accuracy using different lexicons. The stable performance on all five datasets provides a strong indication that the proposed approach is not limited to a specific domain or a specific social media data source.
  • 31. Sample Output (Movie Domain) 31
  • 32. Aspect-based Opinion Mining It would be helpful to have an aspect-based opinion summarization for products. … Size picture quality motion-smoothing sound quality big screen perfect size fits big bedroom … full hd best picture blur reduction … smooth motion sensor tracing effects … loud white noise high pitched sound … 32
  • 33. 2. Discovery of Domain- Specific Features and Aspects 33 Lu Chen, Justin Martineau, Doreen Cheng and Amit Sheth. Clustering for Simultaneous Extraction of Aspects and Features from Reviews. Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), 2016. Given a set of plain product reviews, how to efficiently identify (both explicit and implicit) product features and group them into aspects?
  • 34. Example Review Sentences 1. Phone is easy to use and has great features. Large screen is great. Great speed makes smooth viewing of tv programs or sports. 2. It has a big bright display, it's very fast and very lightweight for its size. 3. Good features for an inexpensive android, light, good signal, good sound, pretty quick for a 800MHz processor. 4. The phone runs extra fast and smooth, and has great price. Aspects {screen, display, bright} {size, large, big} {lightweight, light} {price, inexpensive} {speed, processor, fast, quick, smooth} {easy, use} {features} {signal} {sound} Feature: components and attributes of a product. • Explicit feature: mentioned as a opinion target • Implicit feature: implied by opinion words • Different feature expressions may be used to describe the same aspect of a product. Aspect: represented as a group of features 34
  • 35. • Two-step approach: first identifying features, then clustering them • Feature Identification ‒ Only extract features but not group them. ‒ Implicit features have been largely ignored. ‒ Require seed terms, hand-crafted rules/patterns, or other annotation efforts. • Feature Clustering/Aspect Discovery ‒ Assume that features have been identified beforehand. ‒ Topic-model based approach o not fine-grained aspects (Zhang and Liu, 2014), not directly interpretable as aspects (Chen et al., 2013; Bancken et al., 2014), not good at dealing with aspect sparsity (Xu et al., 2014), etc. ‒ Clustering-based approach (Su et al., 2008; Lu et al., 2009; Bancken et al., 2014) Related Work 35
  • 36. Contributions We propose a new clustering-based approach that: • identifies both features and aspects simultaneously; • extracts both explicit and implicit features and groups them into aspects; and • does not require seed terms, hand-crafted patterns, or any other labeling efforts. 36
  • 37. Notation  is a set of candidate features, which are extracted from reviews of a given product. o Candidate of explicit features: noun and noun phrases o Candidate of implicit features: adjectives and verbs  is the number of aspects.  is the number of most frequent candidates that will be grouped first to generate the seed clusters.  is the upper bound of the distance between two mergeable clusters. (1) To generate high quality seed clusters: Frequent terms are more likely the actual features of customers' interests. (2) Speed up the process by clustering only the most frequent ones. Domain-specific similarity measure: determine how similar the members in two clusters are regarding the particular domain/product. Merging constraints: further ensure that the terms from different aspects would not be merged The Clustering Algorithm 37
  • 38. • General semantic similarities that are learned from thesaurus dictionaries or web corpus. ‒ The similarities between words/phrases are domain dependent. E.g., “ice cream sandwich'' and “operating system” (cell-phone domain) “smooth” and “speed” (cell-phone domain vs. hair dryer domain) • Domain-dependent similarities that are learned from a domain- specific corpus based on distributional information. ‒ Different aspects may share similar context. E.g., “great display”, “great price”, “great speed” ‒ The words describing the same aspect may not share similar context or co-occur. E.g., people use “is inexpensive” or “has great price” instead of “has inexpensive price”; “running fast” or “great speed” instead of “fast speed” Similarity Measures 38
  • 39. Domain-specific Similarity • General similarity matrix G -- a n × n matrix, where Gij is the general semantic similarity between xi and xj , Gij ∈ [0, 1], Gij = 1 when i=j, and Gij = Gji. • Use UMBC Semantic Similarity Service to get G. • Statistical association matrix T -- a n × n matrix, where Tij is the pairwise statistical association between xi and xj in a domain-specific corpus, Tij ∈ [0, 1], Tij = 1 when i=j, and Tij = Tji. • Use normalized pointwise mutual information (NPMI) to get T. 39 - f(xi) (or f(xj)) is the number of documents where xi (or xj) appears, - f(xi, xj) is the number of documents where xi and xj co-occur in a sentence, - N is the total number of documents in the corpus. NPMI(xi, xj) ∈ [−1, 1], and we rescale the values of NPMI to the range of [0, 1].
  • 40. • A candidate xi can be represented by the i-th row in G or T. 40 where • The domain-specific similarity between xi and xj is defined as the weighted sum of the similarity metrics: simg captures semantically similar/relevant words, e.g., “screen” and “display”, “speed” and “fast”. simt captures words sharing similar context, e.g., “ice cream sandwich” and “operating system”. simgt gets high value when the terms strongly associated with xi (or xj) are semantically similar to xj (or xi), e.g., “smooth” and “speed”. Domain-specific Similarity
  • 41. • We evaluate this approach on reviews from three different domains. • The default setting of CAFE (Clustering for Aspect and Feature Extraction): ‒ The number of aspects k = 50 ‒ Distance upper bound 𝛿 = 0.8 ‒ The number of candidates that are grouped first to generate seed clusters s = 500 ‒ The weights of three similarity measures wg = wt = 0.2, wgt = 0.6 41 Data and Experimental Setting
  • 42. • PROP: A double propagation approach that extracts features using hand- crafted rules based on dependency relations between features and opinion words. (Qiu et al., IJCAI’09) • LRTBOOT: A bootstrapping approach that extracts features by mining pairwise feature-feature, feature-opinion, opinion-opinion associations between terms in the corpus, where the association is measured by the likelihood ratio tests (Hai et al., CIKM’12) Evaluations on Feature Extraction – Methods 42
  • 43. 43 Evaluations on Feature Extraction – Results
  • 44. • MuReinf: A clustering method utilizes the mutual reinforcement association between features and opinion words to iteratively group them into feature clusters and opinion clusters. (Su et al., WWW’08) • L-EM: A semi-supervised learning method that adapts Naive Bayesian- based EM algorithm to group synonym features into categories. (Zhai et al., WSDM’11) • L-LDA: This is a baseline method used in (Zhai et al., WSDM’11), which is based on LDA. * Because MuReinf, L-EM and L-LDA need another algorithm to extract features, both the LRTBOOT and CAFE is applied. Evaluations on Aspect Discovery – Methods 44
  • 45. Evaluations on Aspect Discovery – Results 45 The results showed the advantage of combining feature and aspect discovery over chaining them, and also implied the effectiveness of our domain-specific similarity measure in identifying synonym features in a particular domain.
  • 46. Influence of Parameters 46 Based on the experiments on three domains, the best results can be achieved when distance upper bound 𝜹 is set to a value between 0.76 and 0.84. CAFE generates better results by first clustering the top 10%-30% most frequent candidates. The best F-score and Rand Index can be achieved when we set wgt to 0.5 or 0.6 across all three domains.
  • 48. 3. Harnessing Public Opinion on Twitter to predict election results 48 Lu Chen, Wenbo Wang, Amit P. Sheth. Are Twitter Users Equal in Predicting Elections? A Study of User Groups in Predicting 2012 U.S. Republican Presidential Primaries. Proceedings of the 4th International Conference on Social Informatics (SocInfo) 2012. How to derive public opinion about election candidates? Are opinion holders equal in predicting elections?
  • 49. Overview 49 Tweet ID candidate: XXX opinion: positive User category: right-leaning high engagement opinion prone orig. tweet-prone a user tweets network 2. Engagement Degree 4. Tweet Mode 3. Content Type1. Political Preference Predicting which candidate this user support Aggregating opinions of each user group to predict election results
  • 50. Contributions • We introduce a new method to predict the election results that: ‒ identifies which candidate is mentioned, and whether a positive or negative opinion is expressed towards a candidate in a tweet; ‒ predicts which candidate a user supports based on the opinions extracted from his/her tweets; and ‒ aggregates the opinions of all users from a group to predict which candidate will win the election. • We show that the opinion holders matter in predicting election results. ‒ We group users based on their political preference, engagement degree, tweet mode, and content type, and examine the predictive power of different user groups in predicting Super Tuesday results in 10 states. ‒ We evaluate the results in terms of both the accuracy of predicting winners and the error rate between the predicted votes and the actual votes for each candidate. 50
  • 51. Findings 51 Revealing the challenge of identifying the opinion of “silent majority” Retweets may not necessarily reflect users' attitude. Prediction of user’s vote based on more opinion tweets is not necessarily more accurate than the prediction using more information tweets The right-leaning user group provides the most accurate prediction result. In the best case (56-day time window), it correctly predict the winners in 8 out of 10 states with an average prediction error of 0.1.
  • 52. 4. Religion and Subjective Well- being 52 Lu Chen, Ingmar Weber and Adam Okulicz-Kozaryn. U.S. Religious Landscape on Twitter. Proceedings of the 6th International Conference on Social Informatics (SocInfo), 2014. Lu Chen, Ingmar Weber, Adam Okulicz-Kozaryn, and Amit Sheth. Understanding the Effect of Religion on Happiness by Examining the Topic Preferences and Word Usage on Twitter. (in submission to PLOS ONE). How to use Twitter data to measure subjective well- being? How does the religious belief of users (holders) affect their happiness expressed in tweets?
  • 53. 53 user’s religious belief: Buddhism a user tweets network user ID happiness_level: ℎ 𝑎𝑣𝑔 𝑢𝑠𝑒𝑟 topic_preference: 𝑝 𝑡𝑜𝑝𝑖𝑐 𝑢𝑠𝑒𝑟 word_preference: 𝑝(𝑤𝑜𝑟𝑑|𝑡𝑜𝑝𝑖𝑐, 𝑢𝑠𝑒𝑟) Religion: Buddhism happiness_level: ℎ 𝑎𝑣𝑔 𝑔𝑟𝑜𝑢𝑝 topic_preference: 𝑝 𝑡𝑜𝑝𝑖𝑐 𝑔𝑟𝑜𝑢𝑝 word_preference: 𝑝(𝑤𝑜𝑟𝑑|𝑡𝑜𝑝𝑖𝑐, 𝑔𝑟𝑜𝑢𝑝) Overview aggregating the measures of individual users to obtain the group-level measures 1. What is the effect of religion on happiness? 2. How does topic preference and word usage affect the happiness expressed by each group?
  • 54. Contributions • We provide a fresh perspective about happiness and religion, complementing traditional survey-based studies, via analyzing the topics and words naturally disclosed in people's social media messages. • We introduce a framework and methodology that explore the effect of social and demographic factors of a holder (e.g., a holder’s religious belief) on subjective well-being. • Our method also explores potential reasons for the variations in the level of happiness from the holder’s topic preferences and word usage on topics. 54
  • 55. Findings • There is a significant difference among the seven groups (atheist, Buddhist, Christian, Hindu, Jew, Muslim, and random Twitter users) on the level of happiness (pleasant/unpleasant emotions) expressed in tweets. • Each user group has different topic preferences and different word usage on the same topic. However, differences on word usage are small compared with the differences on topic distributions. • The users' topic preferences strongly correlate with their happiness expressed in tweets. 55
  • 56. Conclusion • This dissertation presents a unified framework that characterizes a subjective experience, such as sentiment, opinion, or emotion, in terms of an individual holding it, a target eliciting it, a set of expressions describing it, and a classification or assessment measuring it; • it describes new algorithms that automatically identify and extract sentiment expressions and opinion targets from user generated content with minimal human supervision; • it shows how to use social media data to predict election results and investigate religion and subjective well-being, by classifying and assessing subjective information in user generated content. 56
  • 57. Future Directions 57 Time 1. Detecting different types of subjectivity in text 2. Beyond sentiment and opinion 3. Towards dynamic modeling of subjective information. A subjective experience is a quintuple , where t is the time when the subjective experience occurs.  tcesh ,,,,
  • 58. Publications • Lu Chen, Justin Martineau, Doreen Cheng and Amit Sheth. Clustering for Simultaneous Extraction of Aspects and Features from Reviews. Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), 2016. (Acceptance rate: 24%) • Lu Chen, Ingmar Weber and Adam Okulicz-Kozaryn. U.S. Religious Landscape on Twitter. Proceedings of the 6th International Conference on Social Informatics (SocInfo), 2014. (Acceptance rate: 23%) • Justin Martineau, Lu Chen, Doreen Cheng and Amit Sheth. Active Learning with Efficient Feature Weighting Methods for Improving Data Quality and Classification Accuracy. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL), 2014. (Acceptance rate: 26%) • Wenbo Wang, Lu Chen, Krishnaprasad Thirunarayan, Amit P. Sheth. Cursing in English on Twitter. Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW) 2014. (Acceptance rate: 27%) • Amit Sheth, Ashutosh Jadhav, Pavan Kapanipathi, Lu Chen, Hemant Purohit, Alan Smith, and Wenbo Wang. Chapter title: Twitris - A System for Collective Social Intelligence. Encyclopedia of Social Network Analysis and Mining, 2014. • D. Cameron, G. A. Smith, R. Daniulaityte, A. P. Sheth, D. Dave, L. Chen, G. Anand, R. Carlson, K. Z. Watkins, R. Falck. PREDOSE: A Semantic Web Platform for Drug Abuse Epidemiology Using Social Media. Journal of Biomedical Informatics: Special Issue on Biomedical Information through the Implementation of Social Media Environments. 2013. PMID: 23892295. • Lu Chen, Wenbo Wang, Amit P. Sheth. Are Twitter Users Equal in Predicting Elections? A Study of User Groups in Predicting 2012 U.S. Republican Presidential Primaries. Proceedings of the 4th International Conference on Social Informatics (SocInfo) 2012. (Acceptance rate: 35%) • Wenbo Wang, Lu Chen, Krishnaprasad Thirunarayan, Amit P. Sheth. Harnessing Twitter "Big Data" for Automatic Emotion Identification. Proceedings of the 4th ASE/IEEE International Conference on Social Computing (SocialCom), 2012. • Lu Chen, Wenbo Wang, Meenakshi Nagarajan, Shaojun Wang, Amit Sheth. Extracting Diverse Sentiment Expressions with Target- dependent Polarity from Twitter. Proceedings of the 6th International AAAI Conference on Weblogs and Social Media (ICWSM), 2012. (Acceptance rate: 20%) • Wenbo Wang, Lu Chen, Ming Tan, Shaojun Wang, Amit Sheth. Discovering Fine-grained Sentiment in Suicide Notes. Biomedical Informatics Insights (BII), 2012. • R. Daniulaityte, R. Carlson, R. Falck, D. Cameron, S. Perera, L. Chen, and A. Sheth. "I Just Wanted to Tell You That Loperamide WILL WORK": A Web-Based Study of Extra-Medical Use of Loperamide. Journal of Drug and Alcohol Dependence, 2012. • R. Daniulaityte, R. Carlson, R. Falck, D. Cameron, S. Udayanga, L. Chen, A. Sheth. A Web-based Study of Self-treatment of Opioid Withdrawal Symptoms with Loperamide. The College on Problems of Drug Dependence (CPDD), 2012. 58
  • 59. Media Coverage (1) 59 Washington Post Washington Times La Croix MIT Technology Review Time
  • 60. Media Coverage (2) 60 Fast Company RAPPLER BuzzFeed The Times of India Huffington Post
  • 61. Media Coverage (3) 61 IN Gizmodo RNS NDTV World Religion News
  • 62. Acknowledgement 62 Prof. Amit Sheth (Advisor) Dr. Ingmar Weber (QCRI) Prof. T.K.Prasad Dr. Justin Martineau (SRA) Prof. Keke Chen Dissertation Committee Co-authors and Collaborators Dr. Shaojun Wang Computer Science Dr. Meena Nagarajan (IBM Watson) Prof. Adam Okulicz-Kozaryn (Rutgers-Camden) Dr. Wenbo Wang (GoDaddy) Dr. Doreen Cheng (SRA) Prof. Raminta Daniulaityte Dr. Delroy Cameron (Apple) Dr. Ming Tan (IBM Watson) Prof. Valerie Shalin
  • 63. 63
  • 64. Acknowledgement This dissertation is based upon work supported by the National Science Foundation under Grant: • IIS-1111182 “SoCS: Collaborative Research: Social Media Enhanced Organizational Sensemaking in Emergency Response” and • CNS-1513721 “Context-Aware Harassment Detection on Social Media.” 64

Hinweis der Redaktion

  1. Note that the precision may be worse than the true quality obtainable using a larger corpus, since the gold standards are generated from a subset of tweets.