Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
Trust Management: A Tutorial
Trust Management: A Tutorial
Loading in …3
×
1 of 32

eDrugTrends: Social Media Analysis to Monitor Cannabis Trends

2

Share

Download to read offline

Presentation (Webinar) given by Raminta Daniulaityte, Ph.D. at the Public Health Seminar at Columbia University, February 23, 2017.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

eDrugTrends: Social Media Analysis to Monitor Cannabis Trends

  1. 1. Social Media Analysis to Monitor Cannabis Trends Presenter: Raminta Daniulaityte, Ph.D. CITAR & Kno.e.sis, Wright State University Boonshoft School of Medicine T32 Substance Abuse Seminar (Public Health Seminar at Columbia University) February 23, 2017 © Wright State University Center for Interventions, Treatment, and Addiction Research (CITAR) Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis)
  2. 2. Research Team NIH/NIDA R01 DA03945 Trending: Social media analysis to monitor cannabis and synthetic cannabinoid use Principle Investigators: Raminta Daniulaityte, Ph.D. Amit Sheth, Ph.D. Center for Interventions, Treatment, and Addiction Research (CITAR), Wright State University Boonshoft School of Medicine Ohio Center of Excellence in Knowledge- Enabled Computing (Kno.e.sis), Wright State University Co-Investigators: Robert Carlson, Ph.D. (CITAR) Silvia Martins, M.D., Ph.D. (Columbia U) Ramzi Nahhas, Ph.D. (Comm. Health, WSU) Edward Boyer, M.D., Ph.D. (U Mass) Krishnaprasad Thirunarayan, Ph.D. (Kno.e.sis) Research Staff: Francois R. Lamy, PhD (CITAR, Postdoc); G. Alan Smith (Kno.e.sis, Software Engineer); Sanjaya Wijeratne (Kno.e.sis, Ph.D. student) Farahnaz Golroo (Kno.e.sis, Ph.D. student) No Conflicts of Interest to declare
  3. 3. Project Aims • Aim 1: Develop a comprehensive software platform, eDrugTrends, for semi-automated processing and visualization of spatio-temporal, and social network dimensions of social media data (Twitter and Web forums) on cannabis and synthetic cannabinoid use. • Aim 2: Deploy eDrugTrends to identify and compare trends in knowledge, attitudes, and behaviors related to cannabis and synthetic cannabinoid use across U.S. regions with different cannabis legalization policies using Twitter and Web forum data. • Types of data sources: o Twitter (brief content, but over 500 million tweets/day, geo-info) o Web forums such as Bluelight, drugs-forum, Reddit (detailed discussions of drug use practices) o Web survey on Bluelight
  4. 4. Presentation Objectives • Overview of the technical capabilities of eDrugTrends platform to process Twitter data • How data is collected • Geo-location identification • Keyword selection and monitoring • Tweet content processing • Exploration of recently collected and processed data on marijuana concentrates • Integrating geographic and content analysis features to explore cannabis-related tweeting activity
  5. 5. : Twitter Data Collection • Tweets are collected using Twitter’s streaming Application Programming Interface (API) that provides free access to 1% of all tweets. • Publically available tweets only. • The system automatically filters out non-English language tweets. • Current system started data collection March 2015; Close to 90 million tweets have been collected eDrugTrends Dashboard Showing in-coming Tweets and trending Topics
  6. 6. What does “up to 1%” mean? • Free access to 1% of all tweets o It can be thought of as a ”bucket” that can fit up to 1% of all tweets. o Assuming 400 million daily tweets are generated per day, 1% would constitute about 4 million daily tweets. o Still, it is possible to miss some of the tweets due to sudden volume spikes. • With a reasonably limited number of keywords, all or most relevant tweets can be collected. • Our system collects an average of about 150,000 tweets per day, which is below the allowable limit.
  7. 7. Extraction of Geo-Location Information • Tweets may contain GPS coordinates (via a mobile phone that supports the feature). • Users may indicate their geo information in their user profiles: WHERE THE WEED AT DAYTON, OH SAN DIEGO Pittsburgh, PA wonderland Earth • eDrugTrends geo-locates close to 30% of tweets for state- level and county-level information . • Some earlier studies reported 1-3%of tweets with geo-location identification.
  8. 8. Adjusted Measures of Tweeting Activity • To compare regional trends, we can’t work with raw numbers. • eDrugTrends started running a parallel data collection system to obtain general sample of tweets (denominator data). • General sample data are collected using another API stream; no keywords are used; data are processed to identify geographic information. • “General sample” is then used to calculate state-tweet- volume-adjusted state proportion of tweets o (or county-tweet-volume-adjusted county proportion of tweets)
  9. 9. RAW Numbers and ADJUSTED State Proportions of Cannabis-Related Tweets (March-September, 2016) Raw numbers Adjusted proportions
  10. 10. Twitter Data Collection: Keywords • Keywords/slang terms are used to collect relevant tweets: o Cannabis—weed, marijuana, spliff, ganja, kush, sativa, indica, chronic, blunt, hydro, skunk, reefer, joint, etc. o Marijuana concentrates—dabs, shatter, budder, wax BHO, butane honey oil, hash oil, etc. o Edibles—weed cookies, space cake, pot cookie, pot brownie, mj brownie, medibles, etc. o Synthetic cannabinoids—spice, K2, CHMINACA, AB-FUBINACA, synthetic weed, smoking blend, noid, black mamba, etc. • Inclusion of slang terms improves sensitivity (recall) in data collection
  11. 11. Keyword challenges • Issues with “precision”– risk of getting “noisy” or “irrelevant” data. • Ways to improve precision of collected data: o Ambiguous terms are combined with additional keywords indicating usage (e.g., smoke blunt, smoke budder) o “Black list” words are used to exclude irrelevant tweets (e.g., pumpkin spice latte, Emily Blunt). o Machine learning and other advanced information processing techniques are needed • On-going monitoring is needed: o New types of products or slang terms emerge. For example, “rosin”—new type of marijuana concentrate produced using solvent-less method. o New uses/meanings of words may affect the accuracy of collected data. (e.g., “dabs”)
  12. 12. Data Processing: Automated Tweets Classification • Using manually annotated training data sets, machine learning classifiers were developed to automatically classify tweets • Classification by the the source/type of communication (personal, media, retail) o Machine learning classifier (SVM) achieved F score = 0.81. • Classification by sentiment (positive, negative, neutral), o Sentiment classification is applied to personal communications only o Machine learning classifier (SVM) achieved F score = 0.71. Kickin back wit my spliff Late night dabs Medical marvel: the uses of cannabis continue to grow http://t.co/djtKPunxW9 $10 #Cannabis #Edibles 12 Varieties 1 Package 10MG #THC total http://t.co/9w3xrFUnAe Positive: Marijuana works wonders on the soul Strongest shatter I've ever smoked Negative: I’m not much of a fan when it comes to edibles hate when people think i smoke weed
  13. 13. Exploring Twitter Data on Marijuana Concentrates
  14. 14. Initial report about marijuana concentrate related tweeting: “Time for dabs” 2014 data • Data collected over 2 month period, end of 2014. • 27,018 tweets with identifiable state-level geo-location • Although over 10 keywords were used (shatter, concentrates, butane hash oil, etc.), keyword “dabs” produced over 99% of the total sample. Dabs on Dabs on Dabs Time for dabs I just need a cute girl to take dabs with me and get stoned together Time for dabs": Analyzing Twitter data on marijuana concentrates across the U.S. Daniulaityte R., Nahhas R.W., Wijeratne S., Carlson R.G., Lamy F.R., Martins S.S., Boyer E.W., (...), Sheth A. (2015) Drug and Alcohol Dependence, 155 , pp. 307-311.
  15. 15. 2015: Increases in Marijuana Concentrate-Related Tweeting Activity? Oops! Not So Fast… 0 2000 4000 6000 8000 10000 12000 14000 Jun 8th Jun 15th Jun 22nd Jun 29th Jul 6th Jul 13th Jul 20th Jul 27th Aug 3rd Aug 10th Aug 17th Aug 24th Aug 31st Sep 7th Sep 14th Sep 21st Sep 28th Oct 5th Oct 12th Oct 19th Oct 26th Nov 2nd Nov 9th Nov 16th Nov 23rd Marijuana Concentrates US Tweeting Activity Jun-Nov, 2015 Tweets Unique users
  16. 16. Issues with Collected Data Drug vs. Dance Cam Newton cheers on Kevin Hart in a bench press challenge…then Dabs Tell me why my mom DABS so well? https://t.co/7LZjdqBkQr Cam celebrates, Cam dabs, Cam does Cam thing
  17. 17. Development of Machine Learning Classifier to Extract Relevant Tweets • Machine learning (ML) classifier was developed using 1,000 manually labeled tweets • Excellent results: • ML classifier (NB) achieved F Measure=0.9; Kappa Statistic=0.8 • Dabs ML classifier was plugged into the system;
  18. 18. End of 2014 Start of 2017 • Similar geographic patterns remained • 96% were personal communication tweets (2017 data) • Decrease in variability across states: Marijuana Concentrate Related Tweeting Over Time
  19. 19. Emerging Product: Rosin Tech • Rosin technique is a solventless method to produce marijuana concentrates • Involves use of pressure and heat (e.g., hair straightener or rosin tech press) to produce concentrates • Occurrences of ‘rosin’ mentions in eDrugTrends steam (03 2015-09 2016), before “rosin” keyword was added
  20. 20. Rosin dabs: Preliminary data • Keyword “Rosin” (exclude violin, brass, bow); Time period: December 6 2016- February 22 2017; 3,471 tweets collected (with identifiable state-level geo-location) YOOOO JUST PRESSED FOR THE FIRST TIME AND IT WAS LIFE CHANGING 🙏🙏🙏🙏🔥😩 flower rosin is the new fav The future is bright for #Rosin. #Marijuana #Cannabis Nice chunk of rosin to start this morning off 2017 goal....buy a house & rosin press. Marijuana rosin, and increasingly common extract: https://t.co/tXZNErOPta Rosin Tech Hash Is perfect for the people in non medical marijuana states where it's hard to come across quality BHO to dab.
  21. 21. Adjusted Proportions of Rosin-Related Tweets (Preliminary data, Dec. 6, 2016-Feb. 22, 2017) 84% - personal communication tweets 8% - media related 8% - retail related Great Variability: Mean: 1.96; Variance: 2.5
  22. 22. Exploring Cannabis-Related Tweeting Activity: Combining Content and Geographic Analysis Features
  23. 23. Cannabis Data, March–May, 2016 • Between March and May of 2016, the eDrugTrends platform collected 13,233,837 cannabis-related tweets. • About 30% (N=3,948,402) of those tweets had identifiable state-level geo-location information. • These U.S.-based tweets were posted by 965, 610 unique users.
  24. 24. Content Classification and Analysis • Tweet content was automatically classified by: A. source (personal communication, media, retail) B. sentiment (positive, negative, neutral). • States were grouped by cannabis legalization polices into “recreational,” “medical, less restrictive,” “medical, more restrictive,” and “illegal.” • Permutation tests were performed to analyze differences among four groups in: A. Adjusted state proportions of all tweets, B. personal communications only, C. positive to negative sentiment ratios.
  25. 25. Classification of States by Legal Status
  26. 26. Adjusted state proportions of cannabis related tweets Adjusted tweet rate per state >3.0% 2.5%-3.0% 2.0%-2.49% 1.5%-1.9% 1.0%-1.49% Medical Marijuana Legal Recreational Marijuana Legal
  27. 27. Tweet Content Classification Results Source/Type of communication • 76.2% were personal communications, • 21.1% media • 2.7% retail-related Sentiment • About 71% of personal communication tweets expressed positive sentiment towards cannabis, • 16% negative sentiment, • 13% were neutral.
  28. 28. Results of Permutation Test
  29. 29. Mapping Positive to Negative Sentiment Tweet Ratios
  30. 30. Conclusion • Social media data present exciting new opportunities for timely, sensitive and flexible approaches to epidemiological surveillance of drug use practices and trends. • Continued research is needed to establish methodological standards and practices to reduce the “noise” and increase reliability and validity of social media data. • Social media monitoring can be of particular value for tracking cannabis-related trends in the context of rapid policy changes.
  31. 31. Keep up with our research/publications: @ project page: http://wiki.knoesis.org/index.php/EDrugTrends or Google: eDrugTrends or Twitter: @eDrugTrends Thank you! Center for Interventions, Treatment, and Addiction Research (CITAR) https://medicine.wright.edu/citar Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis) http://knoesis.org Sponsored by: Grant No. 5R01DA039454-02 Trending: Social media analysis to monitor cannabis and synthetic cannabinoid use. Any opinions, findings, conclusions or recommendations expressed in this material are those of the investigator(s) and do not necessarily reflect the views of the National Institutes of Health.
  32. 32. system architecture eDrugTrends is an extension of TwitrisTM system developed at Kno.e.sis: http://twitris.knoesis.org © Wright State University

×