SlideShare a Scribd company logo
1 of 35
Firefox
Crash
Reporting

laura@ mozilla.com
@lxt
Overview


• What is Socorro?
• Architecture
• Some numbers
• Work process and tools
• Future, questions
What is Socorro?
Socorro




          Very Large Array at Socorro, New Mexico, USA. Photo taken by Hajor, 08.Aug.2004. Released under cc.by.sa
          and/or GFDL. Source: http://en.wikipedia.org/wiki/File:USA.NM.VeryLargeArray.02.jpg
Typical use cases


 • What are the most common crashes for a product/version/
    channel?

 • What new crashes / regressions do we see emerging? What’s
    the cause of an emergent crash?

 • How crashy is one build compared to another?
 • What correlations do we see with a particular crash?
What else can we do?
• Does one build have more (null signature) crashes than other
   builds?
• Analyze differences between Flash versions x and y crashes
• Detect duplicate crashes
• Detect explosive crashes
• Find “frankeninstalls”
• Email victims of a particular crash
• Ad hoc reporting for tracking down chemspill bugs
Architecture
“Socorro has a lot of moving parts”
...
“I prefer to think of them as dancing parts”
Collector   Crashmover




 cronjobs     HBase        Monitor




Middleware   PostgreSQL   Processor




 Webapp
Scale
A different type of scaling:
• Typical webapp: scale to millions of users without
   degradation of response time
• Socorro: less than a hundred users, terabytes of data.
Basic law of scale still applies:
The bigger you get, the more spectacularly you fail
Firehose engineering

 • At peak we receive 2300 crashes per minute
 • 2.5 million per day
 • Median crash size 150k, max size 20MB (reject bigger)
   • Android crashes a bit bigger (~200k median)
 • ~500GB stored in PostgreSQL - metadata + generated reports
 • ~110TB stored in HDFS (3x replication, ~40TB of HBase data)
    - raw reports + processed reports
Implementation scale


• > 120 physical boxes (not cloud)
• ~8 developers + DBAs + sysadmin team +   QA + Hadoop ops/
  analysts

• Deploy weekly but could do continuous if needed - moving to
  continuous front end by end Q3
Lifetime of a crash


 • Breakpad submits raw crash via POST (metadata json +
    minidump)

 • Collected to disk by collector (web.py WSGI app)
 • Moved to HBase by crashmover
 • Noticed in queue by monitor and assigned for processing
Processing


• Processor spins off minidump stackwalk (MDSW)
• MDSW re-unites raw crash with symbols to generate a stack
• Processor generates a signature and pulls out other data
• Processor writes processed crash back to HBase and
  metadata to PostgreSQL
Back end processing

 Large number of cron jobs, e.g.:

  • Calculate aggregates: Top crashers by signature, crashes/
     100ADU/build

  • Process incoming builds from ftp server
  • Match known crashes to bugzilla bugs
  • Duplicate detection
  • Generate extracts (CSV) for further analysis (in CouchDB,
     f.e.)
Middleware


• Moving all data access to be through REST API
• (Still some queries in webapp)
• Enable other apps against the data platform and allow the
   core team to rewrite webapp more easily

• In upcoming version each component will have its own API
   for status and health checks
Webapp


• Hardest part is sometimes how to visualize the data
• Example: nightly builds, moving to reporting in build time
   rather than clock time

• Code a bit crufty: rewrite in progress
• Currently KohanaPHP, will be Django (playdoh)
Implementation details

 • Python 2.6
 • PHP 5.3
 • PostgreSQL 9.0.6, some stored procs in pgpl/sql
 • memcache for the webapp
 • Thrift for HBase access
 • HBase (CDH3)
 • Some bits of C++, Java, perl
Managing complexity
Development process


• Fork
• Hard to install (you can use a VM)
• Pull request with bugfix/feature
• Code review
• Lands on master
Development process -2

• Jenkins polls github master, picks up changes
• Jenkins runs tests, builds a package
• Package automatically picked up and pushed to dev
• Wanted changes merged to release branch
• Jenkins builds release branch, pushes to stage
• QA runs acceptance on stage (Selenium/Jenkins + manual)
• Push same build to production
Deployment =

• Run a single script with the build as
   parameter

• Pushes it out to all machines and restarts
   where needed
ABSOLUTELY CRITICAL

All the machinery for continuous deployment
even if you don’t want to deploy continuously
Configuration management


• Some releases involve a configuration change
• These are controlled and managed through Puppet
• Again, a single line to change config the same way every
   time

• Config controlled the same way on dev and stage; tested the
   same way; deployed the same way
Virtualization


 • You don’t want to install HBase
 • Use Vagrant to set up a virtual machine
 • Use Jenkins to build a new Vagrant VM with each code build
 • Use Puppet to configure the VM the same way as production
 • Tricky part is still getting the right (and enough) data
finally:
Upcoming

•   Lots more reports and visualizations: crash trends (done), Flash version reporting,
    better signature summaries, better correlation reports

•   ElasticSearch better search including faceting (code complete)

•   Dragnet: using crash data to populate a database of DLLs (code complete)

•   More analytics: automatic detection of explosive crashes (in progress), malware, etc

•   More ways to query data: API, Hive (done), Pig (done)

•   Better queueing: looking at Sagrada queue (from Firefox Notifications project)

•   Grand Unified Configuration System (done, implementing piecewise)
Everything is open (source)

  Site: https://crash-stats.mozilla.com
  Fork: https://github.com/mozilla/socorro
  Read/file/fix bugs: https://bugzilla.mozilla.org/
  Docs: http://www.readthedocs.org/docs/socorro
  Mailing list: https://lists.mozilla.org/listinfo/tools-socorro
  Join us in IRC: irc.mozilla.org #breakpad
Questions?




• Ask me, now or later
• laura@mozilla.com

More Related Content

What's hot

The ELK Stack - Get to Know Logs
The ELK Stack - Get to Know LogsThe ELK Stack - Get to Know Logs
The ELK Stack - Get to Know LogsGlobalLogic Ukraine
 
Apache Bookkeeper and Apache Zookeeper for Apache Pulsar
Apache Bookkeeper and Apache Zookeeper for Apache PulsarApache Bookkeeper and Apache Zookeeper for Apache Pulsar
Apache Bookkeeper and Apache Zookeeper for Apache PulsarEnrico Olivelli
 
RSYSLOG v8 improvements and how to write plugins in any language.
RSYSLOG v8 improvements and how to write plugins in any language.RSYSLOG v8 improvements and how to write plugins in any language.
RSYSLOG v8 improvements and how to write plugins in any language.Rainer Gerhards
 
High Performance Systems in Go - GopherCon 2014
High Performance Systems in Go - GopherCon 2014High Performance Systems in Go - GopherCon 2014
High Performance Systems in Go - GopherCon 2014Derek Collison
 
RESTEasy Reactive: Why should you care? | DevNation Tech Talk
RESTEasy Reactive: Why should you care? | DevNation Tech TalkRESTEasy Reactive: Why should you care? | DevNation Tech Talk
RESTEasy Reactive: Why should you care? | DevNation Tech TalkRed Hat Developers
 
Presto changes
Presto changesPresto changes
Presto changesN Masahiro
 
Streaming ETL - from RDBMS to Dashboard with KSQL
Streaming ETL - from RDBMS to Dashboard with KSQLStreaming ETL - from RDBMS to Dashboard with KSQL
Streaming ETL - from RDBMS to Dashboard with KSQLBjoern Rost
 
Reporting Large Environment Zabbix Database
Reporting Large Environment Zabbix DatabaseReporting Large Environment Zabbix Database
Reporting Large Environment Zabbix DatabaseAlain Ganuchaud
 
Grafana and MySQL - Benefits and Challenges
Grafana and MySQL - Benefits and ChallengesGrafana and MySQL - Benefits and Challenges
Grafana and MySQL - Benefits and ChallengesPhilip Wernersbach
 
Developing with the Go client for Apache Kafka
Developing with the Go client for Apache KafkaDeveloping with the Go client for Apache Kafka
Developing with the Go client for Apache KafkaJoe Stein
 
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Erik Onnen
 
Hacking on WildFly 9
Hacking on WildFly 9Hacking on WildFly 9
Hacking on WildFly 9JBUG London
 
Security Testing with OWASP ZAP in CI/CD - Simon Bennetts - Codemotion Amster...
Security Testing with OWASP ZAP in CI/CD - Simon Bennetts - Codemotion Amster...Security Testing with OWASP ZAP in CI/CD - Simon Bennetts - Codemotion Amster...
Security Testing with OWASP ZAP in CI/CD - Simon Bennetts - Codemotion Amster...Codemotion
 
PyCon India 2012: Celery Talk
PyCon India 2012: Celery TalkPyCon India 2012: Celery Talk
PyCon India 2012: Celery TalkPiyush Kumar
 
Apache Kafka – (Pattern and) Anti-Pattern
Apache Kafka – (Pattern and) Anti-PatternApache Kafka – (Pattern and) Anti-Pattern
Apache Kafka – (Pattern and) Anti-Patternconfluent
 
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...HostedbyConfluent
 
Kafka elastic search meetup 09242018
Kafka elastic search meetup 09242018Kafka elastic search meetup 09242018
Kafka elastic search meetup 09242018Ying Xu
 
Apache development with GitHub and Travis CI
Apache development with GitHub and Travis CIApache development with GitHub and Travis CI
Apache development with GitHub and Travis CIJukka Zitting
 
Espresso Database Replication with Kafka, Tom Quiggle
Espresso Database Replication with Kafka, Tom QuiggleEspresso Database Replication with Kafka, Tom Quiggle
Espresso Database Replication with Kafka, Tom Quiggleconfluent
 

What's hot (20)

The ELK Stack - Get to Know Logs
The ELK Stack - Get to Know LogsThe ELK Stack - Get to Know Logs
The ELK Stack - Get to Know Logs
 
Apache Bookkeeper and Apache Zookeeper for Apache Pulsar
Apache Bookkeeper and Apache Zookeeper for Apache PulsarApache Bookkeeper and Apache Zookeeper for Apache Pulsar
Apache Bookkeeper and Apache Zookeeper for Apache Pulsar
 
RSYSLOG v8 improvements and how to write plugins in any language.
RSYSLOG v8 improvements and how to write plugins in any language.RSYSLOG v8 improvements and how to write plugins in any language.
RSYSLOG v8 improvements and how to write plugins in any language.
 
High Performance Systems in Go - GopherCon 2014
High Performance Systems in Go - GopherCon 2014High Performance Systems in Go - GopherCon 2014
High Performance Systems in Go - GopherCon 2014
 
RESTEasy Reactive: Why should you care? | DevNation Tech Talk
RESTEasy Reactive: Why should you care? | DevNation Tech TalkRESTEasy Reactive: Why should you care? | DevNation Tech Talk
RESTEasy Reactive: Why should you care? | DevNation Tech Talk
 
Presto changes
Presto changesPresto changes
Presto changes
 
Streaming ETL - from RDBMS to Dashboard with KSQL
Streaming ETL - from RDBMS to Dashboard with KSQLStreaming ETL - from RDBMS to Dashboard with KSQL
Streaming ETL - from RDBMS to Dashboard with KSQL
 
Reporting Large Environment Zabbix Database
Reporting Large Environment Zabbix DatabaseReporting Large Environment Zabbix Database
Reporting Large Environment Zabbix Database
 
Grafana and MySQL - Benefits and Challenges
Grafana and MySQL - Benefits and ChallengesGrafana and MySQL - Benefits and Challenges
Grafana and MySQL - Benefits and Challenges
 
Developing with the Go client for Apache Kafka
Developing with the Go client for Apache KafkaDeveloping with the Go client for Apache Kafka
Developing with the Go client for Apache Kafka
 
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
 
Hacking on WildFly 9
Hacking on WildFly 9Hacking on WildFly 9
Hacking on WildFly 9
 
Security Testing with OWASP ZAP in CI/CD - Simon Bennetts - Codemotion Amster...
Security Testing with OWASP ZAP in CI/CD - Simon Bennetts - Codemotion Amster...Security Testing with OWASP ZAP in CI/CD - Simon Bennetts - Codemotion Amster...
Security Testing with OWASP ZAP in CI/CD - Simon Bennetts - Codemotion Amster...
 
PyCon India 2012: Celery Talk
PyCon India 2012: Celery TalkPyCon India 2012: Celery Talk
PyCon India 2012: Celery Talk
 
Evolution of deploy.sh
Evolution of deploy.shEvolution of deploy.sh
Evolution of deploy.sh
 
Apache Kafka – (Pattern and) Anti-Pattern
Apache Kafka – (Pattern and) Anti-PatternApache Kafka – (Pattern and) Anti-Pattern
Apache Kafka – (Pattern and) Anti-Pattern
 
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
 
Kafka elastic search meetup 09242018
Kafka elastic search meetup 09242018Kafka elastic search meetup 09242018
Kafka elastic search meetup 09242018
 
Apache development with GitHub and Travis CI
Apache development with GitHub and Travis CIApache development with GitHub and Travis CI
Apache development with GitHub and Travis CI
 
Espresso Database Replication with Kafka, Tom Quiggle
Espresso Database Replication with Kafka, Tom QuiggleEspresso Database Replication with Kafka, Tom Quiggle
Espresso Database Replication with Kafka, Tom Quiggle
 

Similar to Firefox Crash Reporting (@ Open Source Bridge)

Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansPeter Clapham
 
Tuenti Release Workflow v1.1
Tuenti Release Workflow v1.1Tuenti Release Workflow v1.1
Tuenti Release Workflow v1.1Tuenti
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopEvans Ye
 
Adding Support for Networking and Web Technologies to an Embedded System
Adding Support for Networking and Web Technologies to an Embedded SystemAdding Support for Networking and Web Technologies to an Embedded System
Adding Support for Networking and Web Technologies to an Embedded SystemJohn Efstathiades
 
Deploying PHP on PaaS: Why and How?
Deploying PHP on PaaS: Why and How?Deploying PHP on PaaS: Why and How?
Deploying PHP on PaaS: Why and How?Docker, Inc.
 
Midwest php 2013 deploying php on paas- why & how
Midwest php 2013   deploying php on paas- why & howMidwest php 2013   deploying php on paas- why & how
Midwest php 2013 deploying php on paas- why & howdotCloud
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudyJohn Adams
 
introduction to node.js
introduction to node.jsintroduction to node.js
introduction to node.jsorkaplan
 
DCSF19 Transforming a 15+ Year Old Semiconductor Manufacturing Environment
DCSF19 Transforming a 15+ Year Old Semiconductor Manufacturing EnvironmentDCSF19 Transforming a 15+ Year Old Semiconductor Manufacturing Environment
DCSF19 Transforming a 15+ Year Old Semiconductor Manufacturing EnvironmentDocker, Inc.
 
Michael stack -the state of apache h base
Michael stack -the state of apache h baseMichael stack -the state of apache h base
Michael stack -the state of apache h basehdhappy001
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterJohn Adams
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitterRoger Xia
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...smallerror
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...xlight
 
Fluentd at HKOScon
Fluentd at HKOSconFluentd at HKOScon
Fluentd at HKOSconN Masahiro
 

Similar to Firefox Crash Reporting (@ Open Source Bridge) (20)

Migrating big data
Migrating big dataMigrating big data
Migrating big data
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticians
 
Flexible compute
Flexible computeFlexible compute
Flexible compute
 
JavaScript Event Loop
JavaScript Event LoopJavaScript Event Loop
JavaScript Event Loop
 
Tuenti Release Workflow v1.1
Tuenti Release Workflow v1.1Tuenti Release Workflow v1.1
Tuenti Release Workflow v1.1
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache Bigtop
 
Adding Support for Networking and Web Technologies to an Embedded System
Adding Support for Networking and Web Technologies to an Embedded SystemAdding Support for Networking and Web Technologies to an Embedded System
Adding Support for Networking and Web Technologies to an Embedded System
 
Deploying PHP on PaaS: Why and How?
Deploying PHP on PaaS: Why and How?Deploying PHP on PaaS: Why and How?
Deploying PHP on PaaS: Why and How?
 
Midwest php 2013 deploying php on paas- why & how
Midwest php 2013   deploying php on paas- why & howMidwest php 2013   deploying php on paas- why & how
Midwest php 2013 deploying php on paas- why & how
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
 
Be faster then rabbits
Be faster then rabbitsBe faster then rabbits
Be faster then rabbits
 
introduction to node.js
introduction to node.jsintroduction to node.js
introduction to node.js
 
DCSF19 Transforming a 15+ Year Old Semiconductor Manufacturing Environment
DCSF19 Transforming a 15+ Year Old Semiconductor Manufacturing EnvironmentDCSF19 Transforming a 15+ Year Old Semiconductor Manufacturing Environment
DCSF19 Transforming a 15+ Year Old Semiconductor Manufacturing Environment
 
Michael stack -the state of apache h base
Michael stack -the state of apache h baseMichael stack -the state of apache h base
Michael stack -the state of apache h base
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling Twitter
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fluentd at HKOScon
Fluentd at HKOSconFluentd at HKOScon
Fluentd at HKOScon
 

Recently uploaded

Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 

Recently uploaded (20)

Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 

Firefox Crash Reporting (@ Open Source Bridge)

  • 2. Overview • What is Socorro? • Architecture • Some numbers • Work process and tools • Future, questions
  • 4. Socorro Very Large Array at Socorro, New Mexico, USA. Photo taken by Hajor, 08.Aug.2004. Released under cc.by.sa and/or GFDL. Source: http://en.wikipedia.org/wiki/File:USA.NM.VeryLargeArray.02.jpg
  • 5.
  • 6.
  • 7.
  • 8. Typical use cases • What are the most common crashes for a product/version/ channel? • What new crashes / regressions do we see emerging? What’s the cause of an emergent crash? • How crashy is one build compared to another? • What correlations do we see with a particular crash?
  • 9. What else can we do? • Does one build have more (null signature) crashes than other builds? • Analyze differences between Flash versions x and y crashes • Detect duplicate crashes • Detect explosive crashes • Find “frankeninstalls” • Email victims of a particular crash • Ad hoc reporting for tracking down chemspill bugs
  • 11.
  • 12. “Socorro has a lot of moving parts” ... “I prefer to think of them as dancing parts”
  • 13. Collector Crashmover cronjobs HBase Monitor Middleware PostgreSQL Processor Webapp
  • 14. Scale
  • 15. A different type of scaling: • Typical webapp: scale to millions of users without degradation of response time • Socorro: less than a hundred users, terabytes of data.
  • 16. Basic law of scale still applies: The bigger you get, the more spectacularly you fail
  • 17. Firehose engineering • At peak we receive 2300 crashes per minute • 2.5 million per day • Median crash size 150k, max size 20MB (reject bigger) • Android crashes a bit bigger (~200k median) • ~500GB stored in PostgreSQL - metadata + generated reports • ~110TB stored in HDFS (3x replication, ~40TB of HBase data) - raw reports + processed reports
  • 18. Implementation scale • > 120 physical boxes (not cloud) • ~8 developers + DBAs + sysadmin team + QA + Hadoop ops/ analysts • Deploy weekly but could do continuous if needed - moving to continuous front end by end Q3
  • 19. Lifetime of a crash • Breakpad submits raw crash via POST (metadata json + minidump) • Collected to disk by collector (web.py WSGI app) • Moved to HBase by crashmover • Noticed in queue by monitor and assigned for processing
  • 20. Processing • Processor spins off minidump stackwalk (MDSW) • MDSW re-unites raw crash with symbols to generate a stack • Processor generates a signature and pulls out other data • Processor writes processed crash back to HBase and metadata to PostgreSQL
  • 21. Back end processing Large number of cron jobs, e.g.: • Calculate aggregates: Top crashers by signature, crashes/ 100ADU/build • Process incoming builds from ftp server • Match known crashes to bugzilla bugs • Duplicate detection • Generate extracts (CSV) for further analysis (in CouchDB, f.e.)
  • 22. Middleware • Moving all data access to be through REST API • (Still some queries in webapp) • Enable other apps against the data platform and allow the core team to rewrite webapp more easily • In upcoming version each component will have its own API for status and health checks
  • 23. Webapp • Hardest part is sometimes how to visualize the data • Example: nightly builds, moving to reporting in build time rather than clock time • Code a bit crufty: rewrite in progress • Currently KohanaPHP, will be Django (playdoh)
  • 24. Implementation details • Python 2.6 • PHP 5.3 • PostgreSQL 9.0.6, some stored procs in pgpl/sql • memcache for the webapp • Thrift for HBase access • HBase (CDH3) • Some bits of C++, Java, perl
  • 26. Development process • Fork • Hard to install (you can use a VM) • Pull request with bugfix/feature • Code review • Lands on master
  • 27. Development process -2 • Jenkins polls github master, picks up changes • Jenkins runs tests, builds a package • Package automatically picked up and pushed to dev • Wanted changes merged to release branch • Jenkins builds release branch, pushes to stage • QA runs acceptance on stage (Selenium/Jenkins + manual) • Push same build to production
  • 28. Deployment = • Run a single script with the build as parameter • Pushes it out to all machines and restarts where needed
  • 29. ABSOLUTELY CRITICAL All the machinery for continuous deployment even if you don’t want to deploy continuously
  • 30. Configuration management • Some releases involve a configuration change • These are controlled and managed through Puppet • Again, a single line to change config the same way every time • Config controlled the same way on dev and stage; tested the same way; deployed the same way
  • 31. Virtualization • You don’t want to install HBase • Use Vagrant to set up a virtual machine • Use Jenkins to build a new Vagrant VM with each code build • Use Puppet to configure the VM the same way as production • Tricky part is still getting the right (and enough) data
  • 33. Upcoming • Lots more reports and visualizations: crash trends (done), Flash version reporting, better signature summaries, better correlation reports • ElasticSearch better search including faceting (code complete) • Dragnet: using crash data to populate a database of DLLs (code complete) • More analytics: automatic detection of explosive crashes (in progress), malware, etc • More ways to query data: API, Hive (done), Pig (done) • Better queueing: looking at Sagrada queue (from Firefox Notifications project) • Grand Unified Configuration System (done, implementing piecewise)
  • 34. Everything is open (source) Site: https://crash-stats.mozilla.com Fork: https://github.com/mozilla/socorro Read/file/fix bugs: https://bugzilla.mozilla.org/ Docs: http://www.readthedocs.org/docs/socorro Mailing list: https://lists.mozilla.org/listinfo/tools-socorro Join us in IRC: irc.mozilla.org #breakpad
  • 35. Questions? • Ask me, now or later • laura@mozilla.com

Editor's Notes

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n