SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Downloaden Sie, um offline zu lesen
Please tweet! #ESOMAR @LoveStats
Big Data Myths
Please tweet!
@LoveStats #ESOMAR
BIGData Myths
Presented by Annie Pettit
Chief Research Officer at Peanut Labs,
a Research Now Group Company .
Please tweet! #ESOMAR @LoveStats
Big Data Myths
What is Big Data?
Volume VelocityVariety
• Research panel data
• Shopper/ Loyalty/
Transactional data
• Web tracking data
• Text, video, audio, date, time, $, ¢,
coupons, loyalty card, SKU
• url, click, save, download, lat/long
• Eye motion, brain wave, electrical
pulse
• Every
picosecond
http://giphy.com/search/gotta-go-fast
Please tweet! #ESOMAR @LoveStats
Big Data Myths
Strike Down
The
Myths!
Please tweet! #ESOMAR @LoveStats
Big Data Myths
Myth: Big Data is New
2015 Supercomputer
•“Take on the biggest jobs, tasks other
computer systems simply can’t handle”
•Clock speed 173 petaflops
http://www.forbes.com/sites/sungardas/2015/04/14/the-amazing-super-powers-of-a-supercomputer/
Please tweet! #ESOMAR @LoveStats
Big Data Myths
Is 1985 New?
Supercomputer 30 years ago
•“Large memory and performance allows
users to solve problems that cannot be
solved with any other computer”
•Clock cycle 4.1 nanoseconds
http://archive.computerhistory.org/resources/text/Cray/Cray.Cray2.1985.102646185.pdf
Please tweet! #ESOMAR @LoveStats
Big Data Myths
Big Data is only new to some
• 2002: drugstore transactional database
• 2005: research panel database
• 2010: social media database
• 2015: research panel database
Just Me
• 1979: Texas Airlines loyalty program
• 2002: Target advertising, Andrew Poole
• 2004: Walmart stocks stores for hurricanes
MRX
Please tweet! #ESOMAR @LoveStats
Big Data Myths
Myth: Big Data Is Better
Emotions Attitudes
Beliefs
Please tweet! #ESOMAR @LoveStats
Big Data Myths
Myth: Volume trumps knowledge
4.7
5.1
15.0
15.8
15.6
16.5
Total
Only surveys
Only completes
Only USA
Only recent
No test links
PL’s
average
survey
minutes:
Please tweet! #ESOMAR @LoveStats
Big Data Myths
Myth: Big data is clean
One SQL Table
•N=75 million
•Variables = 1012
•Missing values = 53
million
Please tweet! #ESOMAR @LoveStats
Big Data Myths
Big data has clean parts
Please tweet! #ESOMAR @LoveStats
Big Data Myths
Myth: Big data is the population
One second later, it was missing 3 records
One minute later, it was missing 180 records
One day later, it was missing 260,000
records
Today, it is missing 15 million records.
On March 25 at 12:10:16…
Please tweet! #ESOMAR @LoveStats
Big Data Myths
Big Data is Never Complete
Please tweet! #ESOMAR @LoveStats
Big Data Myths
Myth: Data speed is everything
Completion Rate (Per Second)
Please tweet! #ESOMAR @LoveStats
Big Data Myths
Speed is awesome!
If you don’t care about…
Coding Errors
Outliers
AccuracyExceptions Interactions
Validity
Generalizability
Reliability
Comprehension
Please tweet! #ESOMAR @LoveStats
Big Data Myths
Myth: Big Data Renders Science Obsolete
• Incomplete data
• Miscoded data
• Misplaced data
Remember non-random…
• +/- 5, 19 times out of 20
• p-values
• Type 1 and Type 2 errors
Remember random…
Total Research
Error
Please tweet! #ESOMAR @LoveStats
Big Data MythsBig Data Myths
Please tweet! #ESOMAR @LoveStats
Big Data Myths
Name That Software!
Compute VoteForHiggins=999.
If Q2votepastYes=1 and Q4votetodayYes=1 and (Q6voteHigginsLikely=1) VoteForHiggins=1.
If Q2votepastYes=1 and Q4votetodayYes=1 and (Q6voteHigginsUnlikely=1) VoteForHiggins=0.
If (Q2votepastYes=1 and Q4votetodayYes=1) and (Q6voteHigginsUnsure=1)
VoteForHiggins=69.
MISSING VALUES VoteForHiggins (‘69’ '999').
Execute.
Please tweet! #ESOMAR @LoveStats
Big Data Myths
Name That Software!
PROC corr data = ResearchData.Client243 OUTP=ClientOutput
nomiss;
VAR PurchaseIntent Recommend Different New Value;
TITLE2 ‘Correlations of Key Indicators';
RUN;
Please tweet! #ESOMAR @LoveStats
Big Data Myths
Name That Software!
Select RecruitDate, avg(CompletesPerPerson)
From
(select RecruitDate, count(*) as CompletesPerPerson
from CompleteDataBase
group by UserID) RecruitData
Group by RecruitDate
Order by RecruitDate
Please tweet! #ESOMAR @LoveStats
Big Data Myths
Myth: Big Data is for the IT Department
1. RapidMiner
2. R
3. Excel
4. SQL
5. Python
6. Weka
7. KNIME
8. Hadoop
9. SAS base
10. SQL Server
http://www.kdnuggets.com/polls/2014/analytics-data-mining-data-science-software-used.html
Please tweet! #ESOMAR @LoveStats
Big Data Mythshttp://www.anlytcs.com/2014/01/data-science-venn-diagram-v20.html
Math
and
Statistics
Subject
Matter
Expertise
Computer
Science
BIG DATA is
YOU and ME!
Please tweet! #ESOMAR @LoveStats
Big Data Myths
Myth: Big data requires a big budget
Please tweet! #ESOMAR @LoveStats
Big Data Myths
Software Costs
Please tweet! #ESOMAR @LoveStats
Big Data Myths
People Costs
Marketing Manager: $60 000
IT Product Manager: $80 000
Research Scientist: $61 000
Software Engineer: $60 000
Statistician: $57 000
http://www.payscale.com/research/CA/Job=Data_Scientist,_IT/Salary
Data Scientist:
$70 000
Please tweet! #ESOMAR @LoveStats
Big Data Myths
Big Data…
• Is not new
• Is not clean nor complete
• Does not trump knowledge
• Does not render science obsolete
• Is not just for IT
• Doesn’t win because of speed
• Does not require a huge budget
• Is not by definition better
Please tweet! #ESOMAR @LoveStats
Big Data Myths
What is Big Data Really?
Fast Actionable Relevant
• Your products
• Your clients
• Your key metrics
• Definable
• Measurable
• Changeable
• Awesomeable
• Already fielded
• Already awesome
sample sizes
• Already in a dataset
Please tweet! #ESOMAR @LoveStats
Big Data Myths
Thank you!
Annie Pettit
Chief Research Officer
annie@peanutlabs.com
ca.linkedin.com/in/AnniePettit/
facebook.com/AnniePettit
twitter.com/LoveStats
Jonathan Cheriff
Director of Sales & Marketing
jonathan.cheriff@peanutlabs.com
Find PeanutLabs on
LinkedIn Facebook Twitter YouTube

Weitere ähnliche Inhalte

Ähnlich wie Blasting 10 Big Data Myths with 10 Panel Data Examples

Unit 1 (DSBDA) PD.pptx
Unit 1 (DSBDA)  PD.pptxUnit 1 (DSBDA)  PD.pptx
Unit 1 (DSBDA) PD.pptxSamiksha880257
 
Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science TJ Stalcup
 
Big data use cases in the cloud presentation
Big data use cases in the cloud presentationBig data use cases in the cloud presentation
Big data use cases in the cloud presentationTUSHAR GARG
 
How to crack Big Data and Data Science roles
How to crack Big Data and Data Science rolesHow to crack Big Data and Data Science roles
How to crack Big Data and Data Science rolesUpXAcademy
 
SuanIct-Bigdata desktop-final
SuanIct-Bigdata desktop-finalSuanIct-Bigdata desktop-final
SuanIct-Bigdata desktop-finalstelligence
 
Top BI trends and predictions for 2017
Top BI trends and predictions for 2017Top BI trends and predictions for 2017
Top BI trends and predictions for 2017Panorama Software
 
Getting started in data science (4:3)
Getting started in data science (4:3)Getting started in data science (4:3)
Getting started in data science (4:3)Thinkful
 
Getting started in data science (4:3)
Getting started in data science (4:3)Getting started in data science (4:3)
Getting started in data science (4:3)Thinkful
 
Computing and the future of everything
Computing and the future of everythingComputing and the future of everything
Computing and the future of everythingDavid Gerhard
 
Data analytics introduction
Data analytics introductionData analytics introduction
Data analytics introductionamiyadash
 
Introduction to big data for the EA course at Solvay MBA
Introduction to big data for the EA course at Solvay MBAIntroduction to big data for the EA course at Solvay MBA
Introduction to big data for the EA course at Solvay MBAWim Van Leuven
 
Big data 2017 final
Big data 2017   finalBig data 2017   final
Big data 2017 finalAmjid Ali
 
Data sciences and marketing analytics
Data sciences and marketing analyticsData sciences and marketing analytics
Data sciences and marketing analyticsMJ Xavier
 
Intro to Data Science
Intro to Data ScienceIntro to Data Science
Intro to Data ScienceTJ Stalcup
 

Ähnlich wie Blasting 10 Big Data Myths with 10 Panel Data Examples (20)

Ictam big data
Ictam big dataIctam big data
Ictam big data
 
Unit 1 (DSBDA) PD.pptx
Unit 1 (DSBDA)  PD.pptxUnit 1 (DSBDA)  PD.pptx
Unit 1 (DSBDA) PD.pptx
 
Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science
 
Big data use cases in the cloud presentation
Big data use cases in the cloud presentationBig data use cases in the cloud presentation
Big data use cases in the cloud presentation
 
Big Data
Big DataBig Data
Big Data
 
How to crack Big Data and Data Science roles
How to crack Big Data and Data Science rolesHow to crack Big Data and Data Science roles
How to crack Big Data and Data Science roles
 
Intro big data analytics
Intro big data analyticsIntro big data analytics
Intro big data analytics
 
SuanIct-Bigdata desktop-final
SuanIct-Bigdata desktop-finalSuanIct-Bigdata desktop-final
SuanIct-Bigdata desktop-final
 
IoT London July 2015
IoT London July 2015IoT London July 2015
IoT London July 2015
 
Big data
Big dataBig data
Big data
 
Top BI trends and predictions for 2017
Top BI trends and predictions for 2017Top BI trends and predictions for 2017
Top BI trends and predictions for 2017
 
Getting started in data science (4:3)
Getting started in data science (4:3)Getting started in data science (4:3)
Getting started in data science (4:3)
 
Getting started in data science (4:3)
Getting started in data science (4:3)Getting started in data science (4:3)
Getting started in data science (4:3)
 
Computing and the future of everything
Computing and the future of everythingComputing and the future of everything
Computing and the future of everything
 
Big Data World
Big Data WorldBig Data World
Big Data World
 
Data analytics introduction
Data analytics introductionData analytics introduction
Data analytics introduction
 
Introduction to big data for the EA course at Solvay MBA
Introduction to big data for the EA course at Solvay MBAIntroduction to big data for the EA course at Solvay MBA
Introduction to big data for the EA course at Solvay MBA
 
Big data 2017 final
Big data 2017   finalBig data 2017   final
Big data 2017 final
 
Data sciences and marketing analytics
Data sciences and marketing analyticsData sciences and marketing analytics
Data sciences and marketing analytics
 
Intro to Data Science
Intro to Data ScienceIntro to Data Science
Intro to Data Science
 

Mehr von Annie Pettit, Research Methodologist

Creating Honesty: The effect of requesting telephone numbers and reporting on...
Creating Honesty: The effect of requesting telephone numbers and reporting on...Creating Honesty: The effect of requesting telephone numbers and reporting on...
Creating Honesty: The effect of requesting telephone numbers and reporting on...Annie Pettit, Research Methodologist
 
A Tale of Two Surveys: How using real words instead of mumbo jumbo affects su...
A Tale of Two Surveys: How using real words instead of mumbo jumbo affects su...A Tale of Two Surveys: How using real words instead of mumbo jumbo affects su...
A Tale of Two Surveys: How using real words instead of mumbo jumbo affects su...Annie Pettit, Research Methodologist
 
Men are from Mars: Gender differences in word choices in social media
Men are from Mars: Gender differences in word choices in social mediaMen are from Mars: Gender differences in word choices in social media
Men are from Mars: Gender differences in word choices in social mediaAnnie Pettit, Research Methodologist
 

Mehr von Annie Pettit, Research Methodologist (20)

AI and Voice Search and Chatbots, oh my!
AI and Voice Search and Chatbots, oh my!AI and Voice Search and Chatbots, oh my!
AI and Voice Search and Chatbots, oh my!
 
Cognitive Biases for Marketers
Cognitive Biases for MarketersCognitive Biases for Marketers
Cognitive Biases for Marketers
 
Scientific rigour is worth melting for MRS Annual Conference 2016
Scientific rigour is worth melting for MRS Annual Conference 2016Scientific rigour is worth melting for MRS Annual Conference 2016
Scientific rigour is worth melting for MRS Annual Conference 2016
 
How to Turn Big Data into Actionable Data IIeX Europe 2016
How to Turn Big Data into Actionable Data IIeX Europe 2016How to Turn Big Data into Actionable Data IIeX Europe 2016
How to Turn Big Data into Actionable Data IIeX Europe 2016
 
NewMR 2016 presents: 9 Big Applications of Big Data
NewMR 2016 presents: 9 Big Applications of Big DataNewMR 2016 presents: 9 Big Applications of Big Data
NewMR 2016 presents: 9 Big Applications of Big Data
 
Creating Honesty: The effect of requesting telephone numbers and reporting on...
Creating Honesty: The effect of requesting telephone numbers and reporting on...Creating Honesty: The effect of requesting telephone numbers and reporting on...
Creating Honesty: The effect of requesting telephone numbers and reporting on...
 
How non-native English speakers respond to English surveys
How non-native English speakers respond to English surveysHow non-native English speakers respond to English surveys
How non-native English speakers respond to English surveys
 
Survey analysis in a nutshell with Jeffrey Henning
Survey analysis in a nutshell with Jeffrey HenningSurvey analysis in a nutshell with Jeffrey Henning
Survey analysis in a nutshell with Jeffrey Henning
 
VUE-JUNE-2015
VUE-JUNE-2015VUE-JUNE-2015
VUE-JUNE-2015
 
VUE-JAN-FEB-2015
VUE-JAN-FEB-2015VUE-JAN-FEB-2015
VUE-JAN-FEB-2015
 
VUE-MAY-2013
VUE-MAY-2013VUE-MAY-2013
VUE-MAY-2013
 
9 Tips For Creating a Great Numeric Question
9 Tips For Creating a Great Numeric Question9 Tips For Creating a Great Numeric Question
9 Tips For Creating a Great Numeric Question
 
The Rise and Fall and Rise of Social Media Research #IIEXap14
The Rise and Fall and Rise of Social Media Research #IIEXap14The Rise and Fall and Rise of Social Media Research #IIEXap14
The Rise and Fall and Rise of Social Media Research #IIEXap14
 
How to Create Census Sampling Targets for Free Using Data Ferret
How to Create Census Sampling Targets for Free Using Data FerretHow to Create Census Sampling Targets for Free Using Data Ferret
How to Create Census Sampling Targets for Free Using Data Ferret
 
A Tale of Two Surveys: How using real words instead of mumbo jumbo affects su...
A Tale of Two Surveys: How using real words instead of mumbo jumbo affects su...A Tale of Two Surveys: How using real words instead of mumbo jumbo affects su...
A Tale of Two Surveys: How using real words instead of mumbo jumbo affects su...
 
How people talk about us behind our backs
How people talk about us behind our backsHow people talk about us behind our backs
How people talk about us behind our backs
 
Effects of splitting long surveys into two
Effects of splitting long surveys into twoEffects of splitting long surveys into two
Effects of splitting long surveys into two
 
Using social media data for new product development
Using social media data for new product developmentUsing social media data for new product development
Using social media data for new product development
 
Men are from Mars: Gender differences in word choices in social media
Men are from Mars: Gender differences in word choices in social mediaMen are from Mars: Gender differences in word choices in social media
Men are from Mars: Gender differences in word choices in social media
 
Multimode Global Scale Usage
Multimode Global Scale UsageMultimode Global Scale Usage
Multimode Global Scale Usage
 

Kürzlich hochgeladen

Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Guido X Jansen
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerPavel Šabatka
 
Cyclistic Memberships Data Analysis Project
Cyclistic Memberships Data Analysis ProjectCyclistic Memberships Data Analysis Project
Cyclistic Memberships Data Analysis Projectdanielbell861
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.JasonViviers2
 
Create Data Model & Conduct Visualisation in Power BI Desktop
Create Data Model & Conduct Visualisation in Power BI DesktopCreate Data Model & Conduct Visualisation in Power BI Desktop
Create Data Model & Conduct Visualisation in Power BI DesktopThinkInnovation
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructuresonikadigital1
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxVenkatasubramani13
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptaigil2
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best PracticesDataArchiva
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024Becky Burwell
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Vladislav Solodkiy
 

Kürzlich hochgeladen (13)

Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
 
Cyclistic Memberships Data Analysis Project
Cyclistic Memberships Data Analysis ProjectCyclistic Memberships Data Analysis Project
Cyclistic Memberships Data Analysis Project
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.
 
Create Data Model & Conduct Visualisation in Power BI Desktop
Create Data Model & Conduct Visualisation in Power BI DesktopCreate Data Model & Conduct Visualisation in Power BI Desktop
Create Data Model & Conduct Visualisation in Power BI Desktop
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .ppt
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
 

Blasting 10 Big Data Myths with 10 Panel Data Examples

  • 1. Please tweet! #ESOMAR @LoveStats Big Data Myths Please tweet! @LoveStats #ESOMAR BIGData Myths Presented by Annie Pettit Chief Research Officer at Peanut Labs, a Research Now Group Company .
  • 2. Please tweet! #ESOMAR @LoveStats Big Data Myths What is Big Data? Volume VelocityVariety • Research panel data • Shopper/ Loyalty/ Transactional data • Web tracking data • Text, video, audio, date, time, $, ¢, coupons, loyalty card, SKU • url, click, save, download, lat/long • Eye motion, brain wave, electrical pulse • Every picosecond http://giphy.com/search/gotta-go-fast
  • 3. Please tweet! #ESOMAR @LoveStats Big Data Myths Strike Down The Myths!
  • 4. Please tweet! #ESOMAR @LoveStats Big Data Myths Myth: Big Data is New 2015 Supercomputer •“Take on the biggest jobs, tasks other computer systems simply can’t handle” •Clock speed 173 petaflops http://www.forbes.com/sites/sungardas/2015/04/14/the-amazing-super-powers-of-a-supercomputer/
  • 5. Please tweet! #ESOMAR @LoveStats Big Data Myths Is 1985 New? Supercomputer 30 years ago •“Large memory and performance allows users to solve problems that cannot be solved with any other computer” •Clock cycle 4.1 nanoseconds http://archive.computerhistory.org/resources/text/Cray/Cray.Cray2.1985.102646185.pdf
  • 6. Please tweet! #ESOMAR @LoveStats Big Data Myths Big Data is only new to some • 2002: drugstore transactional database • 2005: research panel database • 2010: social media database • 2015: research panel database Just Me • 1979: Texas Airlines loyalty program • 2002: Target advertising, Andrew Poole • 2004: Walmart stocks stores for hurricanes MRX
  • 7. Please tweet! #ESOMAR @LoveStats Big Data Myths Myth: Big Data Is Better Emotions Attitudes Beliefs
  • 8. Please tweet! #ESOMAR @LoveStats Big Data Myths Myth: Volume trumps knowledge 4.7 5.1 15.0 15.8 15.6 16.5 Total Only surveys Only completes Only USA Only recent No test links PL’s average survey minutes:
  • 9. Please tweet! #ESOMAR @LoveStats Big Data Myths Myth: Big data is clean One SQL Table •N=75 million •Variables = 1012 •Missing values = 53 million
  • 10. Please tweet! #ESOMAR @LoveStats Big Data Myths Big data has clean parts
  • 11. Please tweet! #ESOMAR @LoveStats Big Data Myths Myth: Big data is the population One second later, it was missing 3 records One minute later, it was missing 180 records One day later, it was missing 260,000 records Today, it is missing 15 million records. On March 25 at 12:10:16…
  • 12. Please tweet! #ESOMAR @LoveStats Big Data Myths Big Data is Never Complete
  • 13. Please tweet! #ESOMAR @LoveStats Big Data Myths Myth: Data speed is everything Completion Rate (Per Second)
  • 14. Please tweet! #ESOMAR @LoveStats Big Data Myths Speed is awesome! If you don’t care about… Coding Errors Outliers AccuracyExceptions Interactions Validity Generalizability Reliability Comprehension
  • 15. Please tweet! #ESOMAR @LoveStats Big Data Myths Myth: Big Data Renders Science Obsolete • Incomplete data • Miscoded data • Misplaced data Remember non-random… • +/- 5, 19 times out of 20 • p-values • Type 1 and Type 2 errors Remember random… Total Research Error
  • 16. Please tweet! #ESOMAR @LoveStats Big Data MythsBig Data Myths
  • 17. Please tweet! #ESOMAR @LoveStats Big Data Myths Name That Software! Compute VoteForHiggins=999. If Q2votepastYes=1 and Q4votetodayYes=1 and (Q6voteHigginsLikely=1) VoteForHiggins=1. If Q2votepastYes=1 and Q4votetodayYes=1 and (Q6voteHigginsUnlikely=1) VoteForHiggins=0. If (Q2votepastYes=1 and Q4votetodayYes=1) and (Q6voteHigginsUnsure=1) VoteForHiggins=69. MISSING VALUES VoteForHiggins (‘69’ '999'). Execute.
  • 18. Please tweet! #ESOMAR @LoveStats Big Data Myths Name That Software! PROC corr data = ResearchData.Client243 OUTP=ClientOutput nomiss; VAR PurchaseIntent Recommend Different New Value; TITLE2 ‘Correlations of Key Indicators'; RUN;
  • 19. Please tweet! #ESOMAR @LoveStats Big Data Myths Name That Software! Select RecruitDate, avg(CompletesPerPerson) From (select RecruitDate, count(*) as CompletesPerPerson from CompleteDataBase group by UserID) RecruitData Group by RecruitDate Order by RecruitDate
  • 20. Please tweet! #ESOMAR @LoveStats Big Data Myths Myth: Big Data is for the IT Department 1. RapidMiner 2. R 3. Excel 4. SQL 5. Python 6. Weka 7. KNIME 8. Hadoop 9. SAS base 10. SQL Server http://www.kdnuggets.com/polls/2014/analytics-data-mining-data-science-software-used.html
  • 21. Please tweet! #ESOMAR @LoveStats Big Data Mythshttp://www.anlytcs.com/2014/01/data-science-venn-diagram-v20.html Math and Statistics Subject Matter Expertise Computer Science BIG DATA is YOU and ME!
  • 22. Please tweet! #ESOMAR @LoveStats Big Data Myths Myth: Big data requires a big budget
  • 23. Please tweet! #ESOMAR @LoveStats Big Data Myths Software Costs
  • 24. Please tweet! #ESOMAR @LoveStats Big Data Myths People Costs Marketing Manager: $60 000 IT Product Manager: $80 000 Research Scientist: $61 000 Software Engineer: $60 000 Statistician: $57 000 http://www.payscale.com/research/CA/Job=Data_Scientist,_IT/Salary Data Scientist: $70 000
  • 25. Please tweet! #ESOMAR @LoveStats Big Data Myths Big Data… • Is not new • Is not clean nor complete • Does not trump knowledge • Does not render science obsolete • Is not just for IT • Doesn’t win because of speed • Does not require a huge budget • Is not by definition better
  • 26. Please tweet! #ESOMAR @LoveStats Big Data Myths What is Big Data Really? Fast Actionable Relevant • Your products • Your clients • Your key metrics • Definable • Measurable • Changeable • Awesomeable • Already fielded • Already awesome sample sizes • Already in a dataset
  • 27. Please tweet! #ESOMAR @LoveStats Big Data Myths Thank you! Annie Pettit Chief Research Officer annie@peanutlabs.com ca.linkedin.com/in/AnniePettit/ facebook.com/AnniePettit twitter.com/LoveStats Jonathan Cheriff Director of Sales & Marketing jonathan.cheriff@peanutlabs.com Find PeanutLabs on LinkedIn Facebook Twitter YouTube