Invited talk given at Observational Studies Through Social Media workshop (OSSM, https://www.microsoft.com/en-us/research/event/ossm17/) at ICWSM'17. Includes both my own but, mostly, other people's work.
Studies covered:
"Detecting Emotional Contagion in Massive Social Networks", http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0090315
"Exercise contagion in a global social network", https://www.nature.com/articles/ncomms14753
"How Community Feedback Shapes User Behavior", https://www.aaai.org/ocs/index.php/ICWSM/ICWSM14/paper/view/8066
"A Warm Welcome Matters!: The Link Between Social Feedback and Weight Loss in /r/loseit", http://dl.acm.org/citation.cfm?id=3055131
See https://ingmarweber.de/publications/ for my own work.
Matching Methods and Natural Experiments - Examples of Causal Inference from Social Media
1. Matching Methods and Natural Experiments
Examples of Causal Inference from Social Media
Ingmar Weber
@ingmarweber
Given at OSSM Workshop at ICWSM’17
https://www.microsoft.com/en-us/research/event/ossm17/
Featuring joint work with Tiago Cunha, Gisele Pappa, Yelena Mejova and Sofiane Abbar
Slides partly based on joint WWW’16 tutorial with M. Strohmaier, C. Wagner and L. Aiello
4. Ideal Solution: Run an Experiment
Gold Standard Method for causal inference
Randomly assign subjects to either treatment or control group
Assume both groups large enough to wash out differences in covariates
[More on best assignment strategies later]
If done correctly, no need for fancy analysis.
Average(treatment group) - Average(control group)
5. Limitations of Experiments
• Expensive
• Not all treatments are possible and ethical
• Internal validity is high, but external, i.e. generalization
validity is often limited
• Non-interference assumption is often violated in social
science field experiments, i.e. i‘s treatment effects j‘s
treatment
6. Alternative Solution:
Causal Inference from Observational Data
Approach 1: Natural Experiments
Has nature already done the work for us?
Is there a (partly) random observed assignment?
Approach 2: Matching Methods
Try to find pairs of as-similar-as-possible participants.
One happened to get treated, the other not
Part1Part2
8. 1854 Cholera Outbreak in London
Assumption at time: cholera is air-borne disease
John Snow’s hypothesis: cholera is water-borne disease
Famous use of mapping
Water
Company
# houses Deaths/10k
Southwark
and Vauxhall
40,046 315
Lambeth 26,107 37
Assignment to water company as-if random
• Could differ from one house to next
• Tenants don’t know their company
Lambeth’s inlet upstream = clean water
S&V’s inlet downstream = contaminated water
First (?) use of natural experiment!
There was one significant anomaly –
none of the workers in the nearby
Broad Street brewery contracted
cholera. They were given a daily
allowance of beer, and did not
consume water from the nearby well.
The water used in the brewing process
is boiled during mashing which kills
cholera bacteria. [Wikipedia]
9. Natural Experiments with Instrumental Variables
Study by Angrist 1990:
What is the effect of military service (M) on lifetime
earnings (E)?
+: Improve self discipline? Get external recognition?
Join network of “alumni”? – Could increase earnings.
-: Lose actual job experience? Become traumatized?
Lose touch with society? – Could decrease earnings.
Why not just compute:
Exp. earnings for military joiners – exp. earnings for others
E [ E | M=1] - E [ E | M=0 ]
10. Limits of Linear Regression
Assume the structural equation:
E = ® + ¯ * M + ²
Error term ² stands for all exogenous factors that affect E
when M is held constant
Crucial assumption: Cov(M,²) = 0 (not correlated)
M
P
E lifetime earnings
earning potential (unobserved)
military service
Actual model:
11. Limits of Linear Regression
(P)otential (M)ilitary
service
(E)arnings count
High = 1 Yes = 1 40,000 20
No = 0 30,000 80
Low = 0 Yes = 1 20,000 80
No = 0 10,000 20
E [ E | M=1] = (20*40k + 80*20k)/100 = 24k
E [ E | M=0 ] = (80*30k + 20*10k)/100 = 26k
Fitted linear regression: E = 26k - 2k* M + ²
At this stage you have enforced Cov(M,²)=0
True structural equation:
E = 10k + 10k*M + 20k*P
Example of
Simpson’s Paradox
Wrong assumption => biased estimator
Cov(M,P) = -0.15, r = -0.6
12. Instrumental Variable
Instrumental variable should be strongly correlated with
the included endogenous regressors (L <-> M), but not
with the effect outcome variable directly (L <-> E)
M
P
E
L
L is the draft-lottery
(can be 0 or 1)
L is an instrument for
the causal effect of M
on E
Angrist, Joshua D. (1990). "Lifetime Earnings and the Vietnam Draft Lottery: Evidence from Social
Security Administrative Records". American Economic Review 80 (3)
lifetime earnings
earning potential
military servicedraft lottery
13. Instrumental Variable
Observe:
E[ E | L=1 ] - E[ E | L=0 ] = -$2,000/year
Done?
Winning the lottery != Doing Military Service
Let’s call the military service “treatment”
Winning the lottery = assigned-to-treat
a% of population = Always-Treats, EA is unaffected
n% of population = Never-Treats, EN is unaffected
c% of population = Compliers, EC
T vs. EC
C
a + n + c = 100
Goal: Estimate EC
T vs. EC
C
14. Instrumental Variable
E[ E | L=1 ] = a*EA + n*EN + c* EC
T
E[ E | L=0 ] = a*EA + n*EN + c* EC
C
EC
T - EC
C = (E[ E | L=1 ] - E[ E | L=0 ]) / c
How to compute c?
# (M = 1 & L =1 ) = (c + a) * # (L=1)
# (M = 1 & L =0 ) = a * # (L = 0)
c = P( M = 1 | L = 1) – P(M = 1 | L = 0)
15. “Wald Estimator”
if L is binary
Plausibility check: what if L is perfect instrument?
Example: winning the lottery => 90% chance of joining military
& 10% joining without invitation
& $2,200 per year difference
$2,200 / 0.8 = $2,750 (what if 0.00001?)
Angrist found that military service decreases earnings about
$2,741 dollar per year
Instrumental Variable
E[E|L=1] – E[E|L=0]
E[M|L=1] – E[M|L=0]
Non-Binary: Cov(E,L)/Cov(M,L)
16. Instrumental Variable – Multi-Variate
δ is a consistent (=
asymptotically
unbiased) estimator
and estimates the
causal effect of M on E
¯ and L are vectors
17. Example 1: Emotional Contagion
• Homophily?
• Friends are friends because their “emotions are in sync”
• Both friends hate Mondays, like Friday evenings, …
• Common exposure?
• Friends read the same news, watch the same shows
• Happy/sad because of common external factors
• Social influence?
• Seeing your friend happy makes you happy
• Your friends force you to smile back
Friends’
expression
User’s
expression
18. Friends’
expression
User’s
expression
No manipulation of user experience!
Example 1: Emotional Contagion
External
variable
Friends’
expression
User’s
expression
• Social influence? Homophily? Common exposure?
• Use meteorological data as an instrument
19. Emotion on Facebook
• Classify semantic content of status updates using LIWC
• Emotion: fraction of posts with positive/negative words
Coviello et al., PLoS ONE 2014, “Detecting Emotional Contagion in Massive Social Networks”
Slides provided by Lorenzo Coviello. Thanks! Later partially modified.
20. yjt = user j’s happiness at time t, fraction of posts with positive/negative words
j = user whose emotion we’re predicting
i = a friend of user j
t = time window of interest
Θt = time-related fixed effect (there are “happy times”)
fj = user-related fixed effect (there are “happy users”)
𝛿jt = degree of user j at time t (friends come and go)
aijt = strength of relationship at time t between i and j
Cumulative effect of
a user on their friends
Individual-Level Model
Computationally demanding. One observation per (user, time) pair
21. g = city whose aggregate emotion we’re predicting
Θt = time-related fixed effect (there are “happy times”)
fg = city-related fixed effect (there are “happy cities”)
ng = number of users in city g
average emotion
in city g at time t
average strength of relationship between
i and an individual in city g
City-Level Model
Ygt = average emotional influence
on an individual in city g
23. Instrument z should not affect when is held
constant
Friends’ rain z
Friends’
emotion
Users’
emotion
Break correlation between friends’ rain and user’s rain
– Restrict data to (city,day) WITHOUT rain
– Restrict data to (city,day) WITH rain
Exclusion Restriction
But weather in g and in friends’ cities could be correlated.
=> Weather directly influences emotions in g!
28. Open Issues
What are we measuring?
– People complain about rain on Twitter, ok. Does that mean they are
unhappy?
Just conversational dynamics? Topical contagion?
– User A: All the rain is making me depressed.
– User B: Poor guys who have to suffer in the rain.
Hidden weather variables
“it is overcast or not” might correlate between friends
even when fixing rainfall
29. Example 2: Physical Activity Social Networks
Aral & Nicolaides, Nature Comm. 2017, “Exercise Contagion in a Global Social Network”
32. Data
Some undisclosed physical activity social network
5 years of data
1.1M users
359M km run
2.1M geographically located ties with weather
-> very sparse network!
33. How to define “good weather”?
No rain? Less rain than usual?
Not cold? Not too hot? “Nicer” than usual?
Compute percentiles for city-specific
precipitation and temperature
Use LASSO to select predictive features
i.e. feature-select the instruments to use
Predictive of friends’ activity
Average activity of i’s friends at time tweather influence stuff to be explained later
34. Exclusion restriction
Problem: friends and user have similar weather
– Then we’re including “common exposure”
Solution: only keep uncorrelated city pairs
cutoff ½ < .025
35. How to define your (friends’) activity?
Distance run? Time run? Pace? Cals burned?
– Try them all. Separately.
How to aggregate your friends’ activity?
– They use average of shared runs
– Could try lots of other alternatives (but very sparse)
User i’s friends’ physical
activity at time t Degree of user
i at time t
Link matrix at time t
Sum over all users j
36. The Details
Ait = activity of individual i on day t
Ap
it = a
νt = time fixed effects (holidays, marathon days, …)
ηi = user fixed effects (personal habits, motivation, …)
ωit = exogenous factors, i.e. weather
Xit = time varying characteristics, e.g. degree
Xp
it = time varying/independent factors of peers, e.g. age, country
Baseline: estimate beta using ordinary least squares regression on this model
Better: two stage least squares regression
39. Robustness Tests Performed
Ensure that the instrument is “strong”
– Cragg-Donald Wald F statistic (Stock & Yogo, 2005)
Exogeneity
– Check (remaining) friends’ weather not predictive
Alternative instrument
– Use a good/bad weather binary setting
Falsification tests
– Friends’ future activity and weather has no influence
– Shuffle network to create “false friends”
40. Example 3: Weather and Icecream Contagion
Y. Mejova, S. Abbar, I. Weber, under construction …
#icecream on Instagram
42. Compute 7-day
running average
Binarize: weather is
good if > running
average
Binarize friends
activity: true if at
least one friend
posts on that day
WT03 - Thunder
WESF - Water equivalent of snowfall (tenths of mm)
WT04 - Ice pellets, sleet, snow pellets, or small hail"
PRCP - Precipitation (tenths of mm)
WT05 - Hail (may include small hail)
WT06 - Glaze or rime
WT07 - Dust, volcanic ash, blowing dust, blowing sand, or blowing obstruction
WT08 - Smoke or haze
SNWD - Snow depth (mm)
WT09 - Blowing or drifting snow
WT10 - Tornado, waterspout, or funnel cloud"
WT11 - High or damaging winds
TMAX - Maximum temperature (tenths of degrees C)
WT13 - Mist
SNOW - Snowfall (mm)
WT14 - Drizzle
WT15 - Freezing drizzle
WT16 - Rain (may include freezing rain, drizzle, and freezing drizzle)"
TOBS - Temperature at the time of observation (tenths of degrees C)
WT17 - Freezing rain
WT18 - Snow, snow pellets, snow grains, or ice crystals
WT19 - Unknown source of precipitation
AWND - Average daily wind speed (tenths of meters per second)
WT21 - Ground fog
WT22 - Ice fog or freezing fog
WT01 - Fog, ice fog, or freezing fog (may include heavy fog)
WESD - Water equivalent of snow on the ground (tenths of mm)
WT02 - Heavy fog or heaving freezing fog (not always distinguished from fog)
PSUN - Daily percent of possible sunshine (percent)
TMIN - Minimum temperature (tenths of degrees C)
TSUN - Daily total sunshine (minutes)
45. Natural Experiments Summary
Natural experiments can be powerful alternatives to
experiments
Find randomized variables that are highly correlated
with your regressor but not with your outcome
Carefully think of violations to exclusion criterion
Perform robustness checks and falsification tests
48. Matching Methods
Among given “organic” data (e.g. human trace data), can we
find a subset that looks like generated by an experiment?
matching == pruning
49. Ho, Daniel, Kosuke Imai, Gary King, and Elizabeth Stuart. 2007. “Matching as Nonparametric
Preprocessing for Reducing Model Dependence in Parametric Causal Inference.” Political Analysis 15: 199–
236. Copy at http://j.mp/jPupwz
Position
education (in years)
Outcome
1-dimensional covariate
Treated with
special training
Does Special Training Help Job Promotion?
Gary King, "Why Propensity Scores Should Not Be Used for Matching“, Methods Colloquium,
2015, https://www.youtube.com/watch?v=rBv39pK1iEs
51. Quadratic Regression
position(p)
education (e)
Model Dependence
Too much freedom given to analyst.
Reason: Imbalance of covariates
Correcting for education, the treated group has lower positions.
p = c + β1*e + β2*e2 + γ*is_treated
53. Matching Approximates Randomized
Experiment
Completely randomized:
Flip a coin for each patient. Heads -> “T”, tails -> “C”.
Could get unlucky: all men assigned “T”
Fully blocked experiment:
First pair up similar patients, same gender, age, …
Then flip a coin for each pair. One gets “T”, one “C”.
Balances the known covariates.
Both balance unknown covariates.
Fully blocked experiment dominates complete
randomization!
54. Distance Matching
Approximates fully blocked experiment
Many Variations:
Optimal match, greedy match,
match 1:1 or 1:many, and so on
Prune bad matched with
distance > threshold (“caliper”)
age
education
55. Mahalanobis Distance
Euclidean distance doesn’t make sense when different
dimensions are on different scales.
(yearly income, age, gender, body weight, …)
Distance dominated by largest values
Conceptual fix: first rescale each dimension to N(0,1)
Ok, but maybe want to correct for colinearity
In practice: could use “expert scaling” and Euclidean distance.
56. Example 1: Comment Quality on the Internet
Some sites no longer have them
Some sites still have them
But many no longer show downvotes
57. What made “comments go sour”?
Is there an effect of the votes received on a comment?
Re “operant conditioning” (punishment & reward)
“I believe that restricting immigration of highly
qualified people could hurt our economy.”
“Trump is a sh*thead.”
“According to a 2015 scientific study [reference]
However, [user1] makes a valid point that …”
79 up-votes
1 down-vote
2 up-votes
100 down-vote
Cheng, Danescu-Niculescu-Mizil, Leskovec. ICWSM’14.
“How Community Feedback Shapes User Behavior”
58. Approach: Match Similar Posts
1. Automatically quantify a post’s quality
2. Match pairs of posts of similar quality
One receives positive feedback
One negative feedback
Q: What happens next?
60. Quantifying a Post’s Quality
1st: Measure community feedback
– P? -N? P-N? P/(P+N)?
– Ask AMT workers to rate feedback received
2nd: Build a text-only model to predict q
– R2 = .22 (using a separate test set)
– Compared to AMT labels q’: q R2 =.25, p R2 = .12
p=
61. Match on Predicted Quality q
|q(a0) - q(b0)|· 10-4
Also: # words, # past posts, % +tive vote
for CNN
63. Changes on Activity
Negative feedback accelerates
commenting rate
Negative feedback keeps
users for longer
Negative feedback leads
to retaliation
But … matching seems to be imperfect.
Rate of giving positive feedback not balanced!
Hints at unbalanced latent factor
A perfect storm of a downward spiral!
64. Example 2: Social Feedback & Weight Loss
Cunha, Weber, Pappa. WWW’17 WebSci Track. “A Warm Welcome Matters!
The Link Between Social Feedback and Weight Loss in /r/loseit”
65. Support for Newcomers
So I've been working on losing weight since
December, but since June I've been in a rut
:(
3 points 0 comments submitted 4 years ago by moonyDP to r/loseit
Okay, so I was diagnosed back in December with GERD,
and my doctor told me it would help to lose weight. I'm 5'
8" and, at the time, was around 175-180. …
I'm 23 and weigh 550lbs. Please help
455 points 204 comments submitted 2 months ago by
Ecurtis936'5" 550Lbs Male to r/loseit
Starting weight: 550lbs Goal Weight: 250lbs
Just to tell you a little about myself; I'm 23 years old
6'5" and sadly weigh 560lbs. I work at a call center,
sitting in a desk for 10 hours a day. …
66. Data Collection
5 years of data (August 2010 to
October 2014)
107,886 unique users
70,949 posts and 922,245 comments
Metadata (timestamp, user name,
voting score and history of badges)
67. Define Treatment and Control
Look at first post of a user in the community
Treatment = received comments
Sparsity: 96% of posts received a comment
Re-Define:
–Treatment = received at least 4 comments
–4,657 treatment and 1,468 control
68. Covariates Choice
Matching only balances matched variables
– Important choice of what to match on
Build LASSO regularized model to predict receiving
“treatment”
Use LDA topics, LIWC, Question words, posts size,
sentiment
– 98 variables in total
Final model 20 variables (selected by LASSO)
Use coefficient values as covariates weights
69. Prune by Matching
Use cosine similarity for matching
– Weighted by LASSO coefficients
Use 1-to-Many matching
– To avoid throwing out data
Use a caliper to only keep “similar
enough” matches
– Extreme case: exact match
70. Balance Check
Compute standardized mean difference
Small dc = similar values of c in treatment and
control group
Remaining bias for variable c is considered to be
insignificant if dc is smaller than 0.1
Note: don’t use a significance test! Else “too little
data => no significant difference”
71. Estimate Effect Size
Effect on return rate
25,647 users present in Group 1. 18,000 treatment and 7,647
control.
Balance check Effect size
72. Estimate Effect Size
Effect on weight loss
6,143 users present in Group 2. 4,657 treatment and 1,468 control.
26%, or an absolute mean difference of 9 lbs.
Balance check Effect size
73. Mediation Analysis
Used a Sobel Test to check for mediation
No statistically significant mediation effect found
Social
Feedback Weight Loss
Engagement
in Community
76. Limitations
Using badges to track weight loss
–What if they don’t update badges?
Determining the start of weight loss
journey
–What if lost weight before first post?
Our choice of covariates
–Can only correct for known
covariates
Observability of returning users
–No return does not equal no weight loss
77. Matching Methods Summary
• Matching methods help to approximate causality
• Problems
– Researchers have lots of freedom on how to match
– Most matching methods have been developed for low
number of covariates
– Worst case: random pruning increases imbalance
increases bias and model dependence
• Test for balance of observed covariates
• Compare results from different matching methods,
different dimensionality reduction methods, different
models
– Avoid model dependence and method dependence!
78. There is No Magic Bullet
https://twitter.com/johnmyleswhite/status/854419974995050496