1. The Python NLP Project
- An analysis of Twitter Data
NAIQING LIN & XIAOYE LI
2. Background & Introduction
Prevalence of food safety related crisis in recent years
Development of social media and social networking sites
Accessibility to twitter data through twitter API
3. Research Questions
What are the key features of the food safety information communicated by the Twitter users
(e.g., volume, variety, etc.)?
What are the most popular words or phrases discussed in tweets related to food safety crisis?
What are the key vertex and edges within the food safety network regarding information
dissemination?
What are the main clusters and their differences within different centrality metrics?
4. Method
Data Collection
Data extraction through Python Tweepy Package
Specific key words: foodsafety; globalfoodsupply, Allergen, Foodcontamination,
Yes2safe, & Foodillness
Length of data collection: 18 hours
5. Method
Data Description
Pilot test conducted, 117 tweets resulted
Number of dataset: 2286 tweets extracted
Small number of dataset: food safety as less popular topic in Twitter, compared
to other topics (e.g., politics, entertainment, etc.)
6. Data Analysis
Descriptive Analytics
User Analysis
◦ Unique users: 723
◦ Active users: StarzNsky4u with 207 tweets, NimsJane with 141 tweets
◦ Active user actively post tweets related to environmental conservation and animal rights advocates
◦ Visible users: users were mentioned more than 200 times by other users
◦ Influential users in legal regulations for American Food Exports Act
◦ Popular languages: English, Spanish
7. Descriptive Analytics
User Relationship Analysis
◦ Follower and following relationship analysis
◦ Canadian restaurant owner as the one with most followers (567510)
◦ Non-profit organization, ASPCA as another popular user, focusing on fighting against animal
cruelty
8. Descriptive Analytics
Tweet Features
◦ Original tweets: 1288
◦ Retweets: 998
◦ Most popular retweets: food allergy, foodborne illness, cross-contamination, horsemeat, and
foodborne illness outbreak
◦ Popular urls: food allergen, food recall, and food safety related news websites (992 urls in
total)
9. Content Analytics
Data Preprocessing
◦ Remove urls and user names
◦ Remove non-alphanumeric contents
◦ Text preprocessing (tokenization, lowercase conversion, stopwords removal, etc.)
13. Unsupervised Content Analytics
Clustering Analysis
K = 6 (clusters)
Topics in each clusters
Clu 0: legal regualtions
Clu 1: government actions
Clu 2: international related
Clu 3: Oklahoma incidents
Clu 4: negative effects
Clu 5: horsemeat
14. Sentiment Analysis
Python analysis & Bing-liu Sentiment Analysis
◦ Results: majority in positive sentiment (615) and subjectivility
◦ Most popular words in positive reviews: safe, yes, foodsafety, americans, vote
◦ Negative: 325
◦ Most popular words in negative reviews: stop, exporting, toxins, fdalabeled, warnings
◦ Neutral: 277
16. Practical Implications
For governmental agencies:
◦ More promotions of public twitter account and regular postings of useful tweet information
◦ Using popular hashtags to communicate the important food safety related information with
the public
For commercial business owners:
◦ Sharing of food safety related information and transparency in information disclosure
◦ Inclusion of website urls and expand influence by enlarging networks around them
17. Limitations and Future Research
Limitations
Time Constraints
Data collection obstacles (lack of data source)
Twitter as the single data source
18. Limitations and Future Research
Future research
Inclusion of more key words
Combination of other data sources (e.g., government website)
More in-depth analysis of important tweets in individual accounts (e.g., visible users)
Utilization of more data-visualization tools