In order to study the effects of Online Social Network (OSN) activity on real-world offline events, researchers need access to OSN data, the reliability of which has particular implications for social network analysis. This relates not only to the completeness of any collected dataset, but also to constructing meaningful social and information networks from them. In this multidisciplinary study, we consider the question of constructing traditional social networks from OSN data and then present a measurement case study showing how the reliability of OSN data affects social network analyses. To this end we developed a systematic comparison methodology, which we applied to two parallel datasets we collected from Twitter. We found considerable differences in datasets collected with different tools and that these variations significantly alter the results of subsequent analyses. Our results lead to a set of guidelines for researchers planning to collect online data streams to infer social networks.
Presented at ASONAM'20 (2020 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining)
Co-authors: Mehwish Nasim (Data61 / CSIRO), Lewis Mitchell (University of Adelaide), Lucia Falzon (University of Melbourne / DST Group)
A method to evaluate the reliability of social media data for social network analysis
1. ASONAM 2020 OFFICIAL
Derek Weber1,2, Mehwish Nasim3-6, Lewis Mitchell5,6 and Lucia Falzon2,7
Contact: derek.weber@adelaide.edu.au
1 School of Computer Science, The University of Adelaide, Australia.
2 Defence Science and Technology Group, Department of Defence, Australia.
3 Data61, CSIRO, Adelaide, Australia.
4 Cyber Security Cooperative Research Centre, Adelaide, Australia.
5 ARC Centre for Excellence for Mathematical and Statistical Frontiers (ACEMS), Adelaide, Australia.
6 School of Mathematical Sciences, The University of Adelaide, Australia.
7 School of Psychological Sciences, The University of Melbourne, Australia.
AMETHOD TO EVALUATE
THE RELIABILITY OF SOCIAL MEDIADATA
FOR SOCIAL NETWORKANALYSIS
2. ASONAM 2020 OFFICIAL
Moreno’s sociogram of a year 2 class
Building Social Networks
• To build a social network…
• What’s the research question?
• What’s a node?
• What’s an edge?
• Where’s the boundary?
• How will the data be collected?
• Traditional example
• Interviewing kids in a school regarding their friends
• Social media?
http://www.martingrandjean.ch/social-network-analysis-visualization-morenos-sociograms-revisited/ 2
3. ASONAM 2020 OFFICIAL
Using Social Media Data
What’s a node? An account
What’s an edge? Friends & followers?
• Cheap, often stale, platform-specific meaning,
dense networks, boundary questions, bot followers
Open research questions
• What makes a relationship?
• Does SNA make sense?
• What to use as evidence? Where’s the boundary?
Our approach: interaction networks
• Active connections
• Degree of activity
• Direction of flow
3
Follower network in Gephi, coloured by modularity
https://www.flickr.com/photos/psychemedia/5821218782/in/photostream/
4. ASONAM 2020 OFFICIAL
Social Media Interactions
Social Media
Behaviour
Potential Social
Networks
Mention
Retweet
Reply
4
5. ASONAM 2020 OFFICIAL
Network Varieties
Mentions
Cf. Instagram tags
E.g. @username
Replies
Cf. Facebook or
Reddit comments
Retweets
Cf. Facebook
shares, Tumblr
reposts
Five minutes of #qanda (ABC’s Q&A) Twitter in 2018
5Diagrams were produced in Visone, available at https://visone.info
6. ASONAM 2020 OFFICIAL
Collection Methods
Input OutputMethod
Keywords
Usernames
Timeframes
Filter live stream
Retrieval via Search
Snowball strategy
Post corpus
Egonet
6
7. ASONAM 2020 OFFICIAL
Conducting a Collection
Boundary
• Filter terms:
• Hashtags
• Search terms
• Usernames
• Timing:
• Start time / date
• End time / date
• Geo:
• Bounding box
Time
7
8. ASONAM 2020 OFFICIAL
Potential Pitfalls
• What have we got? What are we missing?
• Repeatability?
• Null hypothesis
• Plan:
• Conduct simultaneous collections…
• with different tools…
• but same boundary criteria.
• Compare the results.
8
Collecting social media data from the same platform, at the
same time, using the same boundary criteria, regardless of the
tool used, should result in identical datasets
10. ASONAM 2020 OFFICIAL
Experiment Collection
Datasets
• ABC’s Q&A
• 2018 episode
• Terms: “qanda”, …*
• AFL
• March & April 2019
• Term: “afl”
• Ethics
• University of Adelaide HREC
Tools
• RAPID
• University of Melbourne / DST
• Collection & data analytics
• Twitter & Reddit
• Topic tracking
• Twarc
• Thin open source Python wrapper
over Twitter API
RAPID: K. Lim, S. Jayasekara, S. Karunasekera, A. Harwood, L. Falzon, J. Dunn, & G. Burgess, RAPID: Real-time Analytics
Platform for Interactive Data Mining, in: ECML/PKDD (3), volume 11053 of LNCS, Springer, 2018, pp. 649–653.
Twarc: https://github.com/DocNow/twarc
* Two terms hidden to adhere to ethics protocols. 10
www.abc.net.au
Wikipedia.org
16. ASONAM 2020 OFFICIAL
A Weekend of AFL
Collection Tool Duration When Posts Users
AFL1 RAPID 72 hrs March
2019
21,799 11,573
Twarc 44,470 16,821
16
22962
20431 en only
18. ASONAM 2020 OFFICIAL
What’s happening?
• Twarc
• Thin layer – matches anything with “afl”
• RAPID
• Grab any match, then…
• Check text properties have the desired string
• E.g., ‘text’, ‘user.screen_name’, ‘user.description’
18
19. ASONAM 2020 OFFICIAL
Lessons
• Null hypothesis rejected
• Collection variations affect analyses
• Know what your tool does
• Be aware of noise
• Be aware of language clashes
• Choose filter terms carefully
• Have a guiding research question
WARNING: Your tools may affect results & business decisions
19
20. ASONAM 2020 OFFICIAL
QUESTIONS?
20
Derek Weber1,2, Mehwish Nasim3-6, Lewis Mitchell5,6 and Lucia Falzon2,7
1 School of Computer Science, The University of Adelaide, Australia.
2 Defence Science and Technology Group, Department of Defence, Australia.
3 Data61, CSIRO, Adelaide, Australia.
4 Cyber Security Cooperative Research Centre, Adelaide, Australia.
5 ARC Centre for Excellence for Mathematical and Statistical Frontiers (ACEMS), Adelaide, Australia.
6 School of Mathematical Sciences, The University of Adelaide, Australia.
7 School of Psychological Sciences, The University of Melbourne, Australia.
21. ASONAM 2020 OFFICIAL
Q&A Content
Noisy for SNA
Part 1: RAPIDPart 1: Twarc
21
Hashtag co-mention networks
OK for trend analysis
* Some hashtags hidden to adhere to ethics protocols.
22. ASONAM 2020 OFFICIAL
AFL1 Content
22
Noisy for SNA
Part 1: RAPIDPart 1: Twarc
OK for trend analysis
Hashtag co-mention networks
* Some hashtags hidden to adhere to ethics protocols.
23. ASONAM 2020 OFFICIAL
Dataset Statistics
Collection Tool Duration (hrs) When Posts Reposts Users
Q&A Part 1 RAPID 4 2018 15,930 8,744 4,970
Twarc 27,389 14,191 7,057
Q&A Part 2 RAPID 15 2018 11,719 8,051 4,708
Twarc 15,490 10,988 5,799
AFL1 RAPID 72 March 2019 21,799 7,047 11,573
Twarc 44,470 11,482 16,821
AFL1-en
(en & und)
RAPID 72 March 2019 21,235 6,849 11,238
Twarc 25,231 8,531 12,399
AFL2 RAPID 144 April 2019 30,103 9,215 14,231
RAPID 30,115 9,215 14,232
23
Good morning everyone,
The promise of social media is easy access to plentiful social data, but there are questions to consider, particularly concerning how to define network boundaries while working with the vagaries of the platforms themselves.
(Then there are sampling issues, and the dynamic nature of the data.)
Today I will present work that my colleagues and I have done considering the challenges of using social media data as a source for social network analysis.
To conduct SNA, we first need to construct a social network.
How do we do this? Of course, in the general case, “It depends”. [CLICK] What’s the research question we’re trying to answer?
[CLICK] In turn, that will help answer the follow up questions:
What will a node represent?
How will nodes be linked?
How far do we want the network to extend? This could cover a family or a community, or be based on something categorical about the actors and their attributes, such as their gender or income bracket.
These questions will help answer what data needs to be collected, and then we can address the challenges of how to collect the data.
(The information is hard to come by, or may come with a degree of uncertainty or timeliness.)
[CLICK] A traditional example is Moreno’s work on student social networks and how they evolve as the students progress.
(The data was collected through interviews and the boundary surrounded the class.)
[CLICK] But what can we do with social media data?
Diagram source: https://en.wikipedia.org/wiki/Sociometry which got it from http://www.martingrandjean.ch/social-network-analysis-visualization-morenos-sociograms-revisited/ Description: Size = Indegree, Colour = dark blue if indegree is 0, white if indegree >= 3 http://www.martingrandjean.ch/social-network-analysis-visualization-morenos-sociograms-revisited/
Well, it’s clear that an obvious candidate for a node is an account, but what constitutes an edge?
[CLICK] A social network edge is typically a long-lasting connection, like parenthood or employment, but the obvious counterpart in social media, friend and follower relations, aren’t quite the same:
They even differ between platforms. A Twitter follow is cheaper to create and easier to ignore than a friendship in Facebook;
Because they are so cheap to create and everything is effectively permanent, many such connections may, in fact, be stale or, at best, still valid but inactive, or may be otherwise questionable, such as if they’re links to bots.
[CLICK] In any case, they can form very dense networks, such as these.
[CLICK] So the questions remain:
What constitutes a relationship in social media?
Is it meaningful to use it for SNA (social network analysis)?
What evidence can be gathered, i.e., what data is available, to support these relationships?
[CLICK] Our approach is to focus on interaction networks, because by studying actual posting behaviour, we can see not only how accounts are connecting, but also how often, which provides a richer sense of the underlying relationships than follower networks.
Furthermore, because the interactions are often directed, it gives us a chance to consider the flow of information and influence.
https://www.flickr.com/photos/psychemedia/5821218782/in/photostream/
To build social networks from interactions, we consider ones that are common across platforms. We focus on [CLICK] accounts and the posts they make, whether they’re Facebook or Instagram posts or tweets on Twitter.
[CLICK] On many platforms, a post may mention another account;
[CLICK] A post may be duplicated by sharing or retweeting it; OR
[CLICK] A post may be a reply to or a comment on an existing post.
These are just examples, of course, as there are a myriad of other ways to connect accounts, especially when you consider their content
(like their hashtags, links to web pages, or other media, or even the terms they use).
From a five minute window looking at the Twitter discussion surrounding an episode of ABC’s Q&A, [CLICK] we created these three different networks.
As you can see, the retweet network in green appears to be a subset of the mention network in red - [CLICK] here are notable matches. This is because retweets necessarily include a mention of the account they are retweeting. Depending on your research question, this may be important to keep or confounding.
Now that we know how to build social networks, how can we get the data?
Social media platforms offer data via their APIs, but the data models will vary, as will the amount of information that is accessible, either through rate limit or simply through not making it available; paying will usually get you more data. [CLICK] You can tap into live feeds of posts, search for current or historical data, or start with seed accounts and work your way out using a snowball strategy.
[CLICK] As input, we can use search terms, usernames and timestamps, though many platforms offer sophisticated query languages.
(Geographic bounding boxes can often also be used.)
[CLICK] The output we get may be a corpus of posts or an egonet.
The network boundary we want may not be the one we get, depending on what the API will let us do.
Typically, we want to find the community of accounts discussing a particular topic.
[CLICK] Say these are posts we want to collect for our network
[CLICK] Constrained by the API, our filter can collect only these, so we are bound to miss some posts we wanted
[CLICK] Also, we collect posts that match our criteria, but we don’t want. These may be irrelevant spam and advertising or others who’ve used our filter terms, or else the terms may be meaningful in other languages.
So what can we do with all this?
[CLICK] Importantly, what are we missing and how does it affect our analyses?
[CLICK] Could someone else do the same study? Bearing in mind, of course, that they’d need to do it simultaneously?
[CLICK] Our null hypothesis was therefore that collecting social media data from a particular platform, at a particular time, using particular criteria, should produce the same dataset, give or take minor timing issues.
[CLICK] Our plan was to test this out. We would conduct simultaneous Twitter collection activities with different tools during the same time periods, using the same boundary criteria, and then analyse the datasets we collected to see if they differed.
(, and then see how any variations affected social network analyses based on them.)
We established this analysis process to compare corresponding datasets.
[CLICK] First we consider the dataset statistics,
(such as number of posts, number of accounts that appeared, number of retweets, hashtags, etc.)
[CLICK] Then we construct different networks from the datasets and compare network statistics
(such as number of nodes, edges, network diameter, largest component, etc.)
[CLICK] To look more closely, we then examine the nodes themselves, considering a variety of standard centrality measures.
[CLICK] Finally, we compare the groupings in the networks based on Louvain clustering.
In this way, we consider not just variations in the data collected, but also the effect of those variations on analyses.
[CLICK] The first tool we used was RAPID, a collection and data mining application created by the University of Melbourne with DST Group. It lets you set up filters of Twitter and Reddit. Notably, RAPID has the ability to dynamically modify the filter criteria autonomously, periodically adding terms it finds are popular and dropping ones that haven’t been seen. This allows it to track the topic based on how it’s actually being discussed in real time.
[CLICK] As a contrast, we used Twarc, which just provides a very thin layer around the Twitter API with no extra smarts.
[CLICK] We then used these to create two collections, each with two simultaenous phases, generating 4 datasets for each collection.
The first collection was over the episode of ABC’s Q&A mentioned earlier, including 4 hours during and shortly after the episode, and then 15 hours the following day.
The second was conducted over two separate weekends in 2019 aiming to follow discussions about Australian Rules Football.
[CLICK] All collection, storage and analysis of the data was conducted under two ethics protocols approved by Adelaide Uni.
(the Adelaide Uni HREC.)
#170316 and H-2018-045
Q&A image: https://www.abc.net.au/cm/rimage/10760138-1x1-large.jpg?v=2
AFL image: https://en.wikipedia.org/wiki/Australian_Football_League#/media/File:Australian_Football_League.svg
These tables show just a raw count of the number of posts collected and the number of accounts that appeared in them.
Our hypothesis said the collections should be nearly identical, which is clearly not the case.
(Remembering that our null hypothesis was that collecting at the same time, with the same criteria against the same social media platform should result in similar if not identical datasets, we can see that our collections were unexpectedly different.)
We can see that these variations do cause differences in the networks we build from them. [CLICK]
70% more tweets and 40% more accounts in the Twarc dataset result in [CLICK] 25-50% more nodes and edges in the corresponding mention [CLICK] and reply networks.
(These diagrams are of just the largest components from each of those networks,)
(but they correspond to 95 and 70% respectively of the mention and reply network nodes.)
Twarc mentions: 5819 of 6119
RAPID mentions: 4326 of 4535
Twarc replies: 1081 of 1490
RAPID replies: 829 of 1184
To examine the centrality values of nodes in corresponding networks, we can’t just compare the values directly. Instead, given two networks G1 and G2, we first of all only consider the common nodes [CLICK], and then rank them by their centrality values [CLICK] and see how closely the rankings match. To compare the rankings, [CLICK] we used Kendall’s tau and Spearman’s rho (rank correlation coefficients).
Ideally, our corresponding networks should have mostly the same nodes, and they should appear in similar rankings, according to our null hypothesis.
We compared the common nodes from the top 1000 of each corresponding network, hoping for tau and rho values around 0.9. Moderate similarity values would be 0.4 to 0.6 [CLICK] but as you can see we didn’t often even get that.
Reply networks were more similar than mention networks, but that could be due to them being much smaller.
Part 1 mention 521 – 585
reply 988 – 993
Part 2 mention 893 – 906
reply 998 (all)
By this point, it’s no longer surprising that there are differences in the clusters we discovered using the Louvain method. Here, we considered just the largest 20 clusters in each pair of corresponding networks for reply, mention and retweet networks. Although they follow similar trends of size, the differences are clear.
We spent some time examining the membership of corresponding clusters [CLICK], but came to no firm conclusions other than that the reply networks were the most similar.
Like the Q&A datasets, the first AFL datasets were dramatically different in size, though [CLICK] it appeared as though there was significant overlap.
A manual scan of the tweets showed many non-English tweets in the Twarc dataset, [CLICK] so we looked at the language distributions and found a significant amount of Japanese content.
Looking at just the English tweets [CLICK], we find the datasets are much more similar in size.
So what’s are those Japanese tweets about? Is AFL really big in Japan?
English only:
Twarc: 22,962
RAPID: 20,431
English & und:
Twarc: 25,236
RAPID: 21,235
Picking a random Japanese tweet, it looks like it’s promoting some kind of boxing match. Others referred to a Japanese marketplace, like eBay or Amazon.
[CLICK] Looking at the raw JSON (don’t read it), the string “afl” appears in this block here [CLICK], and if we look closely [CLICK], it’s part of a URL, and has nothing to do with Aussie Rules.
So what’s going on?
[CLICK] Twarc is a very thin layer over Twitter’s APIs, so it should be doing the least above and beyond what Twitter is offering. If the string “afl” appears in a tweet, it will return it, pending rate limits.
[CLICK] We spoke to the RAPID team at Melbourne Uni and found that it does some useful post-collection processing. It retrieves the same tweets as Twarc, but then checks that the filter terms appear in the text-based fields in the tweet, such as in the text of the tweet or the account’s screen name or description, both of which are included in tweets’ metadata.
So RAPID was matching these Japanese tweets but then dumping them.
[CLICK] That’s important to know.
So what have we learned?
[CLICK] We’ve found that attempting to conduct the same collection simultaneously using different tools can produce very different datasets.
Furthermore, we’ve shown that analyses of social networks based on those datasets produce very different results
[CLICK] We’ve learned tools can value-add, but we should know what they do.
We’ve learned that noise can also affect results, and that it may necessary to trawl through content manually to see what types of noise is present.
Some may be incidental or from spammers or advertising, but some may be from language clashes. Be wary of very short filter terms.
This activity focussed on the collection of data and building of networks, but in the absence of [CLICK] a clear guiding research question, which really help with deciding on network construction parameters and subsequent data requirements.
We chose not to constrain ourselves to particular research questions, so we possibly encountered more pitfalls than we might have done otherwise.
We invite you to learn from the consequences of our decisions.
The broader implication for those who analyse social media data, is that they should be aware that that their tools may affect their analyses and the decisions they base on their results. [CLICK] More focused research is required.
Thank you for your time and attention.
I’m happy to answer any questions now.
If we consider the content of the datasets and map out hashtags that are mentioned by the same accounts, we can see a lot more alignment between corresponding datasets, not only in the most mentioned hashtags but also in their linkages.
From this we learn two things:
From a high level, this kind of content is subjectively similar, despite the differences in the datasets; and
We can start to identify noise in the datasets. We wanted to see discussion about Q&A but there’s content here that clearly doesn’t match that. In fact, there appear to be clusters of Spanish content that was picked up because it included the term ‘qanda’.
[CLICK] In short, these variations might be fine for trend analysis, say if you’re an advertiser wanting to track an ad campaign, but it raises questions about using SNA techniques.
Frasedeldia – phrase of the day (Catalan)
Felizjueves – happy Thursday (Spanish)
Arabic – “five”, “spraying”,
8kasimdunyadelilergunu – 8th November world mad man day (Turkish) – “world madness day” – celebrated by Turks by sharing cartoons with each other on social media