SlideShare ist ein Scribd-Unternehmen logo
1 von 17
Downloaden Sie, um offline zu lesen
Validating Data at Scale 
Spenser Skates 
CEO at Amplitude
Doing things at scale is noisy 
u Code is supposed to run the same way, but what if you run the 
same loop a million times on a million different machines- how 
confident are you it will always run the same?
Data from phones is noisier 
u Running on tens of thousands of different platforms with 
hundreds of thousands of different software configurations on 
hundreds of millions of phones 
u Platforms have the craziest settings
How data can get messed up 
u HTTP requests get mangled in transit 
u Phone might not get the acknowledgement from the server 
u People’s clocks are off 
u People are running weird versions of Android 
u Memory/disk corruption 
u Gamma ray events
You can’t trust data from the 
client
Problem: Data gets mangled in 
transit 
u Parameters from post requests get dropped 
u Within a parameter, a chunk of data may not actually reach the 
server
Solution: Checksumming 
u Send a checksum that’s a function of all the fields 
u If the checksum is wrong/not present, you know that you haven’t 
got all the data. Tell the phone the upload wasn’t successful 
u The phone will attempt to reupload the data
Problem: Client sends the same 
data twice 
u How does the phone know that the server has received the data 
so it doesn’t reupload the same piece of data twice? It gets an 
acknowledgement back 
u How does the server know that the phone has received the 
acknowledgement? It doesn’t! 
u Equivalent to the two generals problem 
u Requests that are successfully received by the server fail to 
successfully send an acknowledgement to the phone 5% of the 
time 
u That means all counts are inflated by about 5%!
Solution: Deduplication 
u Your system must be idempotent on the event level- it must be 
able to receive an event it’s received before and not change its 
state 
u Create a unique key for every event that has been sent 
u When you see an event, check your list of keys if the key is already 
present, discard the event
Problem: Clocks are off 
u Phones are often offline, so an analytics SDK needs to cache data 
locally before uploading, including the time the event occurred 
u But people’s clocks are often off, occasionally by years! 
u We can’t timestamp to the upload time, 5% of data is uploaded 
>24 hours after an event happened
Solution: Get an estimate of the 
actual time an event was logged 
u Timestamp the upload from the phone 
u For each event, let’s compare: 
u The difference between the phone event timestamp and the server 
upload time 
u The difference between the phone upload timestamp and the server 
upload time
Solution: Get an estimate of the 
actual time an event was logged 
u For each event timestamp, subtract the difference between the 
phone’s upload time and the server’s upload time
Other Problems 
u People are running weird versions of Android 
u MD5 library 
u Memory/disk corruption 
u Gamma ray events
Clean Data
Questions? 
Always happy to talk about analytics problems! 
spenser@amplitude.com 
blog.amplitude.com 
twitter: @amplitudemobile 
MOBILE ANALYTICS FOR DECISION MAKERS

Weitere ähnliche Inhalte

Andere mochten auch

Design Thinking for Startups - Are You Design Driven?
Design Thinking for Startups - Are You Design Driven?Design Thinking for Startups - Are You Design Driven?
Design Thinking for Startups - Are You Design Driven?Amir Khella
 
Les technologies immersives @UXRepublic
Les technologies immersives @UXRepublicLes technologies immersives @UXRepublic
Les technologies immersives @UXRepublicUX REPUBLIC
 
Web real time communication @UXRepublic
Web real time communication @UXRepublicWeb real time communication @UXRepublic
Web real time communication @UXRepublicUX REPUBLIC
 
Les magasins de demain @uxrepublic
Les magasins de demain  @uxrepublicLes magasins de demain  @uxrepublic
Les magasins de demain @uxrepublicUX REPUBLIC
 
Le design éthique
Le design éthiqueLe design éthique
Le design éthiqueUX REPUBLIC
 
Tips digital communication victoria pereira
Tips digital communication   victoria pereiraTips digital communication   victoria pereira
Tips digital communication victoria pereiraUX REPUBLIC
 
SEO+UX = SEOUX @UXRepublic
SEO+UX = SEOUX @UXRepublicSEO+UX = SEOUX @UXRepublic
SEO+UX = SEOUX @UXRepublicUX REPUBLIC
 
XebiCon'16 : Les 5 questions con(tre) l'agilité et comment y répondre. Par M...
XebiCon'16 : Les 5 questions con(tre) l'agilité et comment y répondre.  Par M...XebiCon'16 : Les 5 questions con(tre) l'agilité et comment y répondre.  Par M...
XebiCon'16 : Les 5 questions con(tre) l'agilité et comment y répondre. Par M...Publicis Sapient Engineering
 
Why the lean start-up changes everything
Why the lean start-up changes everythingWhy the lean start-up changes everything
Why the lean start-up changes everythingWei Li
 
XebiCon'16 : Thiga - Qu'est ce que le Growth Hacking en 2016 ? Par Nicolas G...
XebiCon'16 : Thiga - Qu'est ce que le Growth Hacking en 2016 ?  Par Nicolas G...XebiCon'16 : Thiga - Qu'est ce que le Growth Hacking en 2016 ?  Par Nicolas G...
XebiCon'16 : Thiga - Qu'est ce que le Growth Hacking en 2016 ? Par Nicolas G...Publicis Sapient Engineering
 
Le rôle du développeur front dans la User eXperience
Le rôle du développeur front dans la User eXperienceLe rôle du développeur front dans la User eXperience
Le rôle du développeur front dans la User eXperienceUX REPUBLIC
 
Le social coding pour la Creative Technologie
Le social coding pour la Creative TechnologieLe social coding pour la Creative Technologie
Le social coding pour la Creative TechnologieUX REPUBLIC
 
XebiCon'16 : Europ Assistance - Un grand groupe peut-il construire une market...
XebiCon'16 : Europ Assistance - Un grand groupe peut-il construire une market...XebiCon'16 : Europ Assistance - Un grand groupe peut-il construire une market...
XebiCon'16 : Europ Assistance - Un grand groupe peut-il construire une market...Publicis Sapient Engineering
 
Webinar "Agile for Managers"
Webinar "Agile for Managers"Webinar "Agile for Managers"
Webinar "Agile for Managers"Pooja Gulati
 
Jeux d'innovation - UXDAY @UXRepublic
Jeux d'innovation - UXDAY @UXRepublicJeux d'innovation - UXDAY @UXRepublic
Jeux d'innovation - UXDAY @UXRepublicUX REPUBLIC
 
Offline first @UXRepublic
Offline first @UXRepublicOffline first @UXRepublic
Offline first @UXRepublicUX REPUBLIC
 
XebiCon'16 : Orange et Xebia Labs - De l'Agilité vers le Déploiement Continu ...
XebiCon'16 : Orange et Xebia Labs - De l'Agilité vers le Déploiement Continu ...XebiCon'16 : Orange et Xebia Labs - De l'Agilité vers le Déploiement Continu ...
XebiCon'16 : Orange et Xebia Labs - De l'Agilité vers le Déploiement Continu ...Publicis Sapient Engineering
 
Le Design empathique @UXRepublic
Le Design empathique @UXRepublicLe Design empathique @UXRepublic
Le Design empathique @UXRepublicUX REPUBLIC
 
23062014 jarl meijer agile survey xebia
23062014 jarl meijer agile survey xebia23062014 jarl meijer agile survey xebia
23062014 jarl meijer agile survey xebiaAgileConsortiumINT
 
Le système cognitif par l’exemple @UXRepublic
Le système cognitif par l’exemple @UXRepublicLe système cognitif par l’exemple @UXRepublic
Le système cognitif par l’exemple @UXRepublicUX REPUBLIC
 

Andere mochten auch (20)

Design Thinking for Startups - Are You Design Driven?
Design Thinking for Startups - Are You Design Driven?Design Thinking for Startups - Are You Design Driven?
Design Thinking for Startups - Are You Design Driven?
 
Les technologies immersives @UXRepublic
Les technologies immersives @UXRepublicLes technologies immersives @UXRepublic
Les technologies immersives @UXRepublic
 
Web real time communication @UXRepublic
Web real time communication @UXRepublicWeb real time communication @UXRepublic
Web real time communication @UXRepublic
 
Les magasins de demain @uxrepublic
Les magasins de demain  @uxrepublicLes magasins de demain  @uxrepublic
Les magasins de demain @uxrepublic
 
Le design éthique
Le design éthiqueLe design éthique
Le design éthique
 
Tips digital communication victoria pereira
Tips digital communication   victoria pereiraTips digital communication   victoria pereira
Tips digital communication victoria pereira
 
SEO+UX = SEOUX @UXRepublic
SEO+UX = SEOUX @UXRepublicSEO+UX = SEOUX @UXRepublic
SEO+UX = SEOUX @UXRepublic
 
XebiCon'16 : Les 5 questions con(tre) l'agilité et comment y répondre. Par M...
XebiCon'16 : Les 5 questions con(tre) l'agilité et comment y répondre.  Par M...XebiCon'16 : Les 5 questions con(tre) l'agilité et comment y répondre.  Par M...
XebiCon'16 : Les 5 questions con(tre) l'agilité et comment y répondre. Par M...
 
Why the lean start-up changes everything
Why the lean start-up changes everythingWhy the lean start-up changes everything
Why the lean start-up changes everything
 
XebiCon'16 : Thiga - Qu'est ce que le Growth Hacking en 2016 ? Par Nicolas G...
XebiCon'16 : Thiga - Qu'est ce que le Growth Hacking en 2016 ?  Par Nicolas G...XebiCon'16 : Thiga - Qu'est ce que le Growth Hacking en 2016 ?  Par Nicolas G...
XebiCon'16 : Thiga - Qu'est ce que le Growth Hacking en 2016 ? Par Nicolas G...
 
Le rôle du développeur front dans la User eXperience
Le rôle du développeur front dans la User eXperienceLe rôle du développeur front dans la User eXperience
Le rôle du développeur front dans la User eXperience
 
Le social coding pour la Creative Technologie
Le social coding pour la Creative TechnologieLe social coding pour la Creative Technologie
Le social coding pour la Creative Technologie
 
XebiCon'16 : Europ Assistance - Un grand groupe peut-il construire une market...
XebiCon'16 : Europ Assistance - Un grand groupe peut-il construire une market...XebiCon'16 : Europ Assistance - Un grand groupe peut-il construire une market...
XebiCon'16 : Europ Assistance - Un grand groupe peut-il construire une market...
 
Webinar "Agile for Managers"
Webinar "Agile for Managers"Webinar "Agile for Managers"
Webinar "Agile for Managers"
 
Jeux d'innovation - UXDAY @UXRepublic
Jeux d'innovation - UXDAY @UXRepublicJeux d'innovation - UXDAY @UXRepublic
Jeux d'innovation - UXDAY @UXRepublic
 
Offline first @UXRepublic
Offline first @UXRepublicOffline first @UXRepublic
Offline first @UXRepublic
 
XebiCon'16 : Orange et Xebia Labs - De l'Agilité vers le Déploiement Continu ...
XebiCon'16 : Orange et Xebia Labs - De l'Agilité vers le Déploiement Continu ...XebiCon'16 : Orange et Xebia Labs - De l'Agilité vers le Déploiement Continu ...
XebiCon'16 : Orange et Xebia Labs - De l'Agilité vers le Déploiement Continu ...
 
Le Design empathique @UXRepublic
Le Design empathique @UXRepublicLe Design empathique @UXRepublic
Le Design empathique @UXRepublic
 
23062014 jarl meijer agile survey xebia
23062014 jarl meijer agile survey xebia23062014 jarl meijer agile survey xebia
23062014 jarl meijer agile survey xebia
 
Le système cognitif par l’exemple @UXRepublic
Le système cognitif par l’exemple @UXRepublicLe système cognitif par l’exemple @UXRepublic
Le système cognitif par l’exemple @UXRepublic
 

Ähnlich wie Validating big data at scale

Automating Everything with FME
Automating Everything with FMEAutomating Everything with FME
Automating Everything with FMESafe Software
 
Electronic surveying
Electronic surveyingElectronic surveying
Electronic surveyingifmrcmf
 
Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital One
Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital OneUsing H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital One
Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital OneSri Ambati
 
Understanding event data
Understanding event dataUnderstanding event data
Understanding event datayalisassoon
 
Automatic attendance system
Automatic attendance systemAutomatic attendance system
Automatic attendance systemAkshay Surve
 
Machine learning pipeline
Machine learning pipelineMachine learning pipeline
Machine learning pipelineVadym Kuzmenko
 
Unlocking Realtime Web Applications - 4Developers Katowice 2023
Unlocking Realtime Web Applications  - 4Developers Katowice 2023Unlocking Realtime Web Applications  - 4Developers Katowice 2023
Unlocking Realtime Web Applications - 4Developers Katowice 2023Patryk Omiotek
 
How to Lower Android Power Consumption Without Affecting Performance
How to Lower Android Power Consumption Without Affecting PerformanceHow to Lower Android Power Consumption Without Affecting Performance
How to Lower Android Power Consumption Without Affecting Performancerickschwar
 
Dr. Alex Turner - Overview of Electronic Health Certification Systems: Passp...
Dr. Alex Turner - Overview of Electronic Health Certification Systems:  Passp...Dr. Alex Turner - Overview of Electronic Health Certification Systems:  Passp...
Dr. Alex Turner - Overview of Electronic Health Certification Systems: Passp...John Blue
 
Mobile functional testing
Mobile functional testingMobile functional testing
Mobile functional testingkevinroulleau
 
Social Cops Field Data Collection
Social Cops Field Data CollectionSocial Cops Field Data Collection
Social Cops Field Data CollectionVikas Plakkot
 
Experitest & Cigniti Co-Webinar -
Experitest & Cigniti Co-Webinar -Experitest & Cigniti Co-Webinar -
Experitest & Cigniti Co-Webinar -Experitest
 
Preparing For The Flood. How Do You Conduct Load Testing To Ready Your WordPr...
Preparing For The Flood. How Do You Conduct Load Testing To Ready Your WordPr...Preparing For The Flood. How Do You Conduct Load Testing To Ready Your WordPr...
Preparing For The Flood. How Do You Conduct Load Testing To Ready Your WordPr...WordCamp Sydney
 
Thinking about the full stack to create great mobile experiences
Thinking about the full stack to create great mobile experiencesThinking about the full stack to create great mobile experiences
Thinking about the full stack to create great mobile experiencesNew Relic
 
Outpost24 webinar - The economics of penetration testing in the new threat la...
Outpost24 webinar - The economics of penetration testing in the new threat la...Outpost24 webinar - The economics of penetration testing in the new threat la...
Outpost24 webinar - The economics of penetration testing in the new threat la...Outpost24
 
Why swarmg is important to getting to DONE
Why swarmg is important to getting to DONEWhy swarmg is important to getting to DONE
Why swarmg is important to getting to DONEJoseph Flahiff
 
Haven Teen Center Project
Haven Teen Center ProjectHaven Teen Center Project
Haven Teen Center ProjectElizabeth Evans
 

Ähnlich wie Validating big data at scale (20)

Automating Everything with FME
Automating Everything with FMEAutomating Everything with FME
Automating Everything with FME
 
Electronic surveying
Electronic surveyingElectronic surveying
Electronic surveying
 
Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital One
Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital OneUsing H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital One
Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital One
 
Understanding event data
Understanding event dataUnderstanding event data
Understanding event data
 
It Exercises
It ExercisesIt Exercises
It Exercises
 
Beyond Traditional Mobile Testing
Beyond Traditional Mobile TestingBeyond Traditional Mobile Testing
Beyond Traditional Mobile Testing
 
Automatic attendance system
Automatic attendance systemAutomatic attendance system
Automatic attendance system
 
Machine learning pipeline
Machine learning pipelineMachine learning pipeline
Machine learning pipeline
 
Unlocking Realtime Web Applications - 4Developers Katowice 2023
Unlocking Realtime Web Applications  - 4Developers Katowice 2023Unlocking Realtime Web Applications  - 4Developers Katowice 2023
Unlocking Realtime Web Applications - 4Developers Katowice 2023
 
How to Lower Android Power Consumption Without Affecting Performance
How to Lower Android Power Consumption Without Affecting PerformanceHow to Lower Android Power Consumption Without Affecting Performance
How to Lower Android Power Consumption Without Affecting Performance
 
Dr. Alex Turner - Overview of Electronic Health Certification Systems: Passp...
Dr. Alex Turner - Overview of Electronic Health Certification Systems:  Passp...Dr. Alex Turner - Overview of Electronic Health Certification Systems:  Passp...
Dr. Alex Turner - Overview of Electronic Health Certification Systems: Passp...
 
Mobile functional testing
Mobile functional testingMobile functional testing
Mobile functional testing
 
AndroidAppPPT
AndroidAppPPTAndroidAppPPT
AndroidAppPPT
 
Social Cops Field Data Collection
Social Cops Field Data CollectionSocial Cops Field Data Collection
Social Cops Field Data Collection
 
Experitest & Cigniti Co-Webinar -
Experitest & Cigniti Co-Webinar -Experitest & Cigniti Co-Webinar -
Experitest & Cigniti Co-Webinar -
 
Preparing For The Flood. How Do You Conduct Load Testing To Ready Your WordPr...
Preparing For The Flood. How Do You Conduct Load Testing To Ready Your WordPr...Preparing For The Flood. How Do You Conduct Load Testing To Ready Your WordPr...
Preparing For The Flood. How Do You Conduct Load Testing To Ready Your WordPr...
 
Thinking about the full stack to create great mobile experiences
Thinking about the full stack to create great mobile experiencesThinking about the full stack to create great mobile experiences
Thinking about the full stack to create great mobile experiences
 
Outpost24 webinar - The economics of penetration testing in the new threat la...
Outpost24 webinar - The economics of penetration testing in the new threat la...Outpost24 webinar - The economics of penetration testing in the new threat la...
Outpost24 webinar - The economics of penetration testing in the new threat la...
 
Why swarmg is important to getting to DONE
Why swarmg is important to getting to DONEWhy swarmg is important to getting to DONE
Why swarmg is important to getting to DONE
 
Haven Teen Center Project
Haven Teen Center ProjectHaven Teen Center Project
Haven Teen Center Project
 

Mehr von Amplitude

Amplitude Behavioral Cohorts Deep Dive
Amplitude Behavioral Cohorts Deep DiveAmplitude Behavioral Cohorts Deep Dive
Amplitude Behavioral Cohorts Deep DiveAmplitude
 
Product and Marketing Maximize Impact by Elie Javice, RBI and Marcelo Pascoa,...
Product and Marketing Maximize Impact by Elie Javice, RBI and Marcelo Pascoa,...Product and Marketing Maximize Impact by Elie Javice, RBI and Marcelo Pascoa,...
Product and Marketing Maximize Impact by Elie Javice, RBI and Marcelo Pascoa,...Amplitude
 
Product Intelligence by Justin Bauer and Shadi Rostami, Product and Engineeri...
Product Intelligence by Justin Bauer and Shadi Rostami, Product and Engineeri...Product Intelligence by Justin Bauer and Shadi Rostami, Product and Engineeri...
Product Intelligence by Justin Bauer and Shadi Rostami, Product and Engineeri...Amplitude
 
On Change by Siqi Chen, President and CPO, Sandbox VR
On Change by Siqi Chen, President and CPO, Sandbox VROn Change by Siqi Chen, President and CPO, Sandbox VR
On Change by Siqi Chen, President and CPO, Sandbox VRAmplitude
 
Happy to Help by Merci Victoria Grace, Partner, Lightspeed Venture Partners
Happy to Help by Merci Victoria Grace, Partner, Lightspeed Venture PartnersHappy to Help by Merci Victoria Grace, Partner, Lightspeed Venture Partners
Happy to Help by Merci Victoria Grace, Partner, Lightspeed Venture PartnersAmplitude
 
Building a Successful B2B Paid Growth Marketing Program by Lisa Sullivan Cros...
Building a Successful B2B Paid Growth Marketing Program by Lisa Sullivan Cros...Building a Successful B2B Paid Growth Marketing Program by Lisa Sullivan Cros...
Building a Successful B2B Paid Growth Marketing Program by Lisa Sullivan Cros...Amplitude
 
Product Vision by Spenser Skates, CEO & Co-founder, Amplitude
Product Vision by Spenser Skates, CEO & Co-founder, AmplitudeProduct Vision by Spenser Skates, CEO & Co-founder, Amplitude
Product Vision by Spenser Skates, CEO & Co-founder, AmplitudeAmplitude
 
Be a great product leader by Adam Nash, VP Product, Dropbox
Be a great product leader by Adam Nash, VP Product, DropboxBe a great product leader by Adam Nash, VP Product, Dropbox
Be a great product leader by Adam Nash, VP Product, DropboxAmplitude
 
Backstage 2019 - The UX of Data - Lex Roman
Backstage 2019 - The UX of Data - Lex RomanBackstage 2019 - The UX of Data - Lex Roman
Backstage 2019 - The UX of Data - Lex RomanAmplitude
 
Backstage 2019 - How to find friends and influence product - Rebecca Nackson
Backstage 2019 - How to find friends and influence product - Rebecca NacksonBackstage 2019 - How to find friends and influence product - Rebecca Nackson
Backstage 2019 - How to find friends and influence product - Rebecca NacksonAmplitude
 
Backstage 2019 - Data Our Common Language - Jonathan Hastings
Backstage 2019 - Data Our Common Language - Jonathan HastingsBackstage 2019 - Data Our Common Language - Jonathan Hastings
Backstage 2019 - Data Our Common Language - Jonathan HastingsAmplitude
 
Backstage 2019 - Building the Product Intelligence Muscle - John Cutler
Backstage 2019 - Building the Product Intelligence Muscle - John CutlerBackstage 2019 - Building the Product Intelligence Muscle - John Cutler
Backstage 2019 - Building the Product Intelligence Muscle - John CutlerAmplitude
 
Backstage 2019 - Accelerating Product Insights at Intuit - John Humphrey
Backstage 2019 - Accelerating Product Insights at Intuit - John HumphreyBackstage 2019 - Accelerating Product Insights at Intuit - John Humphrey
Backstage 2019 - Accelerating Product Insights at Intuit - John HumphreyAmplitude
 
Backstage 2019 - The Atlassian Journey with Amplitude - Itzik Feldman
Backstage 2019 - The Atlassian Journey with Amplitude - Itzik FeldmanBackstage 2019 - The Atlassian Journey with Amplitude - Itzik Feldman
Backstage 2019 - The Atlassian Journey with Amplitude - Itzik FeldmanAmplitude
 
Putting Your North Star Metric Into Action
Putting Your North Star Metric Into ActionPutting Your North Star Metric Into Action
Putting Your North Star Metric Into ActionAmplitude
 
Hire More Designers, OK?
Hire More Designers, OK?Hire More Designers, OK?
Hire More Designers, OK?Amplitude
 
Creating Value and Flow in Product Development
Creating Value and Flow in Product DevelopmentCreating Value and Flow in Product Development
Creating Value and Flow in Product DevelopmentAmplitude
 
Product Oriented Engineering Teams
Product Oriented Engineering TeamsProduct Oriented Engineering Teams
Product Oriented Engineering TeamsAmplitude
 
How to Stop Wasting Time—Jake Knapp at Amplify
How to Stop Wasting Time—Jake Knapp at AmplifyHow to Stop Wasting Time—Jake Knapp at Amplify
How to Stop Wasting Time—Jake Knapp at AmplifyAmplitude
 
A Framework for Integrity-Driven Product Development
A Framework for Integrity-Driven Product DevelopmentA Framework for Integrity-Driven Product Development
A Framework for Integrity-Driven Product DevelopmentAmplitude
 

Mehr von Amplitude (20)

Amplitude Behavioral Cohorts Deep Dive
Amplitude Behavioral Cohorts Deep DiveAmplitude Behavioral Cohorts Deep Dive
Amplitude Behavioral Cohorts Deep Dive
 
Product and Marketing Maximize Impact by Elie Javice, RBI and Marcelo Pascoa,...
Product and Marketing Maximize Impact by Elie Javice, RBI and Marcelo Pascoa,...Product and Marketing Maximize Impact by Elie Javice, RBI and Marcelo Pascoa,...
Product and Marketing Maximize Impact by Elie Javice, RBI and Marcelo Pascoa,...
 
Product Intelligence by Justin Bauer and Shadi Rostami, Product and Engineeri...
Product Intelligence by Justin Bauer and Shadi Rostami, Product and Engineeri...Product Intelligence by Justin Bauer and Shadi Rostami, Product and Engineeri...
Product Intelligence by Justin Bauer and Shadi Rostami, Product and Engineeri...
 
On Change by Siqi Chen, President and CPO, Sandbox VR
On Change by Siqi Chen, President and CPO, Sandbox VROn Change by Siqi Chen, President and CPO, Sandbox VR
On Change by Siqi Chen, President and CPO, Sandbox VR
 
Happy to Help by Merci Victoria Grace, Partner, Lightspeed Venture Partners
Happy to Help by Merci Victoria Grace, Partner, Lightspeed Venture PartnersHappy to Help by Merci Victoria Grace, Partner, Lightspeed Venture Partners
Happy to Help by Merci Victoria Grace, Partner, Lightspeed Venture Partners
 
Building a Successful B2B Paid Growth Marketing Program by Lisa Sullivan Cros...
Building a Successful B2B Paid Growth Marketing Program by Lisa Sullivan Cros...Building a Successful B2B Paid Growth Marketing Program by Lisa Sullivan Cros...
Building a Successful B2B Paid Growth Marketing Program by Lisa Sullivan Cros...
 
Product Vision by Spenser Skates, CEO & Co-founder, Amplitude
Product Vision by Spenser Skates, CEO & Co-founder, AmplitudeProduct Vision by Spenser Skates, CEO & Co-founder, Amplitude
Product Vision by Spenser Skates, CEO & Co-founder, Amplitude
 
Be a great product leader by Adam Nash, VP Product, Dropbox
Be a great product leader by Adam Nash, VP Product, DropboxBe a great product leader by Adam Nash, VP Product, Dropbox
Be a great product leader by Adam Nash, VP Product, Dropbox
 
Backstage 2019 - The UX of Data - Lex Roman
Backstage 2019 - The UX of Data - Lex RomanBackstage 2019 - The UX of Data - Lex Roman
Backstage 2019 - The UX of Data - Lex Roman
 
Backstage 2019 - How to find friends and influence product - Rebecca Nackson
Backstage 2019 - How to find friends and influence product - Rebecca NacksonBackstage 2019 - How to find friends and influence product - Rebecca Nackson
Backstage 2019 - How to find friends and influence product - Rebecca Nackson
 
Backstage 2019 - Data Our Common Language - Jonathan Hastings
Backstage 2019 - Data Our Common Language - Jonathan HastingsBackstage 2019 - Data Our Common Language - Jonathan Hastings
Backstage 2019 - Data Our Common Language - Jonathan Hastings
 
Backstage 2019 - Building the Product Intelligence Muscle - John Cutler
Backstage 2019 - Building the Product Intelligence Muscle - John CutlerBackstage 2019 - Building the Product Intelligence Muscle - John Cutler
Backstage 2019 - Building the Product Intelligence Muscle - John Cutler
 
Backstage 2019 - Accelerating Product Insights at Intuit - John Humphrey
Backstage 2019 - Accelerating Product Insights at Intuit - John HumphreyBackstage 2019 - Accelerating Product Insights at Intuit - John Humphrey
Backstage 2019 - Accelerating Product Insights at Intuit - John Humphrey
 
Backstage 2019 - The Atlassian Journey with Amplitude - Itzik Feldman
Backstage 2019 - The Atlassian Journey with Amplitude - Itzik FeldmanBackstage 2019 - The Atlassian Journey with Amplitude - Itzik Feldman
Backstage 2019 - The Atlassian Journey with Amplitude - Itzik Feldman
 
Putting Your North Star Metric Into Action
Putting Your North Star Metric Into ActionPutting Your North Star Metric Into Action
Putting Your North Star Metric Into Action
 
Hire More Designers, OK?
Hire More Designers, OK?Hire More Designers, OK?
Hire More Designers, OK?
 
Creating Value and Flow in Product Development
Creating Value and Flow in Product DevelopmentCreating Value and Flow in Product Development
Creating Value and Flow in Product Development
 
Product Oriented Engineering Teams
Product Oriented Engineering TeamsProduct Oriented Engineering Teams
Product Oriented Engineering Teams
 
How to Stop Wasting Time—Jake Knapp at Amplify
How to Stop Wasting Time—Jake Knapp at AmplifyHow to Stop Wasting Time—Jake Knapp at Amplify
How to Stop Wasting Time—Jake Knapp at Amplify
 
A Framework for Integrity-Driven Product Development
A Framework for Integrity-Driven Product DevelopmentA Framework for Integrity-Driven Product Development
A Framework for Integrity-Driven Product Development
 

Kürzlich hochgeladen

Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 

Kürzlich hochgeladen (20)

Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 

Validating big data at scale

  • 1. Validating Data at Scale Spenser Skates CEO at Amplitude
  • 2. Doing things at scale is noisy u Code is supposed to run the same way, but what if you run the same loop a million times on a million different machines- how confident are you it will always run the same?
  • 3. Data from phones is noisier u Running on tens of thousands of different platforms with hundreds of thousands of different software configurations on hundreds of millions of phones u Platforms have the craziest settings
  • 4. How data can get messed up u HTTP requests get mangled in transit u Phone might not get the acknowledgement from the server u People’s clocks are off u People are running weird versions of Android u Memory/disk corruption u Gamma ray events
  • 5. You can’t trust data from the client
  • 6. Problem: Data gets mangled in transit u Parameters from post requests get dropped u Within a parameter, a chunk of data may not actually reach the server
  • 7. Solution: Checksumming u Send a checksum that’s a function of all the fields u If the checksum is wrong/not present, you know that you haven’t got all the data. Tell the phone the upload wasn’t successful u The phone will attempt to reupload the data
  • 8. Problem: Client sends the same data twice u How does the phone know that the server has received the data so it doesn’t reupload the same piece of data twice? It gets an acknowledgement back u How does the server know that the phone has received the acknowledgement? It doesn’t! u Equivalent to the two generals problem u Requests that are successfully received by the server fail to successfully send an acknowledgement to the phone 5% of the time u That means all counts are inflated by about 5%!
  • 9. Solution: Deduplication u Your system must be idempotent on the event level- it must be able to receive an event it’s received before and not change its state u Create a unique key for every event that has been sent u When you see an event, check your list of keys if the key is already present, discard the event
  • 10. Problem: Clocks are off u Phones are often offline, so an analytics SDK needs to cache data locally before uploading, including the time the event occurred u But people’s clocks are often off, occasionally by years! u We can’t timestamp to the upload time, 5% of data is uploaded >24 hours after an event happened
  • 11. Solution: Get an estimate of the actual time an event was logged u Timestamp the upload from the phone u For each event, let’s compare: u The difference between the phone event timestamp and the server upload time u The difference between the phone upload timestamp and the server upload time
  • 12.
  • 13.
  • 14. Solution: Get an estimate of the actual time an event was logged u For each event timestamp, subtract the difference between the phone’s upload time and the server’s upload time
  • 15. Other Problems u People are running weird versions of Android u MD5 library u Memory/disk corruption u Gamma ray events
  • 17. Questions? Always happy to talk about analytics problems! spenser@amplitude.com blog.amplitude.com twitter: @amplitudemobile MOBILE ANALYTICS FOR DECISION MAKERS