SlideShare ist ein Scribd-Unternehmen logo
1 von 91
Downloaden Sie, um offline zu lesen
METRICS-DRIVEN
                 ENGINEERING at
                      Kellan Elliott-McCrea, VP of Eng.
                           kellan@etsy.com @kellan




Tuesday, June 5, 12
Tuesday, June 5, 12
Tuesday, June 5, 12
What is Etsy?



Tuesday, June 5, 12
8.5+ million items
                      in the marketplace




Tuesday, June 5, 12
400,000+ active




Tuesday, June 5, 12
$300+ million in
                        sales in 2010

                      ~$41 million/month


Tuesday, June 5, 12
> $1000 / minute



Tuesday, June 5, 12
> 1 billion page
                      views / month


Tuesday, June 5, 12
business in over
                       150 countries


Tuesday, June 5, 12
deploy the site,
                      every ~20 minutes


Tuesday, June 5, 12
engineering team
                            grew
                        ~4x in 2010


Tuesday, June 5, 12
Metrics?



Tuesday, June 5, 12
Logs, Graphs,
                          Trends,
                      and Correlations


Tuesday, June 5, 12
Metrics Driven?



Tuesday, June 5, 12
Making Decisions



Tuesday, June 5, 12
How many visitors
                              are
                       using this thing?


Tuesday, June 5, 12
Can we deploy that
                       to
              100% of our visitors?


Tuesday, June 5, 12
Did we make it
                          faster?


Tuesday, June 5, 12
Did I just break
                        something?


Tuesday, June 5, 12
Q.  WHO MAKES THESE
                             GRAPHS?
           A. Well,racksOps team manages thethe
            network,
                     the
                         the servers, installed
                      monitoring tools, wears the pagers,
                              blah, blah, blah...




Tuesday, June 5, 12
but... Engineers
                            build
                      the application.


Tuesday, June 5, 12
Dev + Ops


Tuesday, June 5, 12
ACCESS


Tuesday, June 5, 12
Yes!   No.




Tuesday, June 5, 12
“Engineers are
                        too busy!”


Tuesday, June 5, 12
Here’s the BIG
                        SECRET...


Tuesday, June 5, 12
... MAKE IT EASY!



Tuesday, June 5, 12
Simple, open
                      source tools


Tuesday, June 5, 12
Cacti (network, SNMP)
                      Ganglia (machines)
                      Graphite (application)
                      Splunk (log analysis, nightly
                      reports)
                      Nagios (alerting)



Tuesday, June 5, 12
Gan
                ★cluster oriented
                ★huge community contributed
                recipes
                ★2.0 released today (including
                several Flickr and Etsy patches!)
                ★gmetad makes it easy to track
                custom metrics


Tuesday, June 5, 12
Tuesday, June 5, 12
Graphite
                ★super flexible collection and
                display
                ★per metrics buckets
                ★single instance
                ★super easy to write and use
                custom display functions



Tuesday, June 5, 12
Logging


Tuesday, June 5, 12
Logger::log_error("User login
                        failed. Reason: $msg for
                          $username", “login”);




Tuesday, June 5, 12
web0054 [Fri Mar 04 16:27:48
                      2011] [error] [login] [14531658]
                      User login failed. Reason: wrong
                              password for ...




Tuesday, June 5, 12
web0054 [Fri Mar 04 16:27:48
                      2011] [error] [login] [14531658]
                      User login failed. Reason: wrong
                              password for ...




Tuesday, June 5, 12
web0054 [Fri Mar 04 16:27:48
                      2011] [error] [login] [14531658]
                      User login failed. Reason: wrong
                              password for ...




Tuesday, June 5, 12
web0054 [Fri Mar 04 16:27:48
                      2011] [info] [login] [14531658]
                      User login failed. Reason: wrong
                              password for ...




Tuesday, June 5, 12
web0054 [Fri Mar 04 16:27:48
                      2011] [info] [login] [14531658]
                      User login failed. Reason: wrong
                              password for ...




Tuesday, June 5, 12
web0054 [Fri Mar 04 16:27:48
                      2011] [info] [login] [14531658]
                      User login failed. Reason: wrong
                              password for ...




Tuesday, June 5, 12
Counting
                      and Timing
                      http://code.flickr.com/blog/
                      2008/10/27/counting-timing/




Tuesday, June 5, 12
Logster


Tuesday, June 5, 12
Logster
                      https://github.com/etsy/logster




Tuesday, June 5, 12
Forked from ganglia-logtailer :

                            - Daemon mode
                (only cron mode)
                            + Support for
                Graphite
                            + Simplified parsing
                scripts




Tuesday, June 5, 12
web0001        [04:28:54   2011]   [warning] [client 10.101.x.x] Gaaaaahhh!
       web0001        [04:28:54   2011]   [error] [client 10.101.x.x] Help me, Rhonda.
       web0001        [04:28:54   2011]   [error] [client 10.101.x.x] Oh noooooo!
       web0001        [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!
       web0001        [04:28:54   2011]   [error] [client 10.101.x.x] Heeeeeeellllllllllllllppppp!
       web0001        [04:28:54   2011]   [error] [client 10.101.x.x] Oh noooooo!
       web0001        [04:28:54   2011]   [fatal] [client 10.101.x.x] Gaaaaahhh!
       web0201        [04:28:54   2011]   [warning] [client 10.101.x.x] Gaaaaahhh!
       web0034        [04:28:54   2011]   [warning] [client 10.101.x.x] Oh nooooooooooo
       web0001        [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
       web1101        [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
       web0201        [04:28:54   2011]   [error] [client 10.101.x.x] You've been eaten by a grue.
       web0055        [04:28:54   2011]   [fatal] [client 10.101.x.x] Gaaaaahhh!!!
       web0002        [04:28:54   2011]   [warning] [client 10.101.x.x] Sky is falling.
       web0089        [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
       web0020        [04:28:54   2011]   [error] [client 10.101.x.x] Sky is falling.
       web1101        [04:28:54   2011]   [fatal] [client 10.101.x.x] Gaaaaahhh!
       web0055        [04:28:54   2011]   [warning] [client 10.101.x.x] Gaaaaahhh!
       web0001        [04:28:54   2011]   [warning] [client 10.101.x.x] Oh nooooooooooo
       web0001        [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
       web0034        [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
       web0087        [04:28:54   2011]   [fatal] [client 10.101.x.x] Sky is falling.
       web0002        [04:28:54   2011]   [error] [client 10.101.x.x] Oh noooooo!
       web0201        [04:28:54   2011]   [fatal] [client 10.101.x.x] Gaaaaahhh!
       web0077        [04:28:54   2011]   [warning] [client 10.101.x.x] Gaaaaahhh!
       web0355        [04:28:54   2011]   [warning] [client 10.101.x.x] Oh nooooooooooo
       web0052        [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
       web0001        [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
       web0003        [04:28:54   2011]   [error] [client 10.101.x.x] You've been eaten by a grue.
       web0066        [04:28:54   2011]   [fatal] [client 10.101.x.x] Gaaaaahhh!!!
       web0001        [04:28:54   2011]   [warning] [client 10.101.x.x] Sky is falling
Tuesday, June 5, 12
Fatals   Errors   Warnings




Tuesday, June 5, 12
★runs out of cron
                ★maintains a cursor into log files
                ★supports ganglia and graphite
                ★custom parsers much easier to
                write then gmetad




Tuesday, June 5, 12
Apache access logs


Tuesday, June 5, 12
LogFormat "%h %l %u %t "%r"
                  %>s %b" common




Tuesday, June 5, 12
LogFormat "%{X-Forwarded-For}i %
             {True-Client-IP}i %l %u %t "%r" %>s %b
                "%{Referer}i" "%{User-Agent}i" %
                {etsy_shop_id}n %{etsy_uaid}n %V %
                       {etsy_ab_selections}n %
                       {etsy_request_uuid}n %
                    {etsy_api_consumer_key}n %
                    {etsy_api_method_name}n %
                  {php_memory_usage_bytes}n %
               {php_time_microsec}n %D" combined

Tuesday, June 5, 12
%{etsy_ab_selections}n




Tuesday, June 5, 12
%{etsy_uaid}n




Tuesday, June 5, 12
Graphs


Tuesday, June 5, 12
“If Engineering at Etsy has
        a religion, it’s the Church
        of Graphs. If it moves, we
          track it.” - Erik Kastner

   http://codeascraft.etsy.com/2011/02/15/measure-
   anything-measure-everything/




Tuesday, June 5, 12
Tuesday, June 5, 12
StatsD


Tuesday, June 5, 12
StatsD
                        https://github.com/
                        etsy/statsd/




Tuesday, June 5, 12
StatsD::increment("logins.success");
       StatsD::timing("gearman.time", $msec);




Tuesday, June 5, 12
90th pct

                                    average
                                    lower


       StatsD::timing("gearman.time", $msec);




Tuesday, June 5, 12
Ad hoc
                      name value timestamp




Tuesday, June 5, 12
echo "events.deploy.site 1 `date +%s`" 
              | nc graphite.etsycorp.com 2003




Tuesday, June 5, 12
Correlations



Tuesday, June 5, 12
echo "events.deploy.site 1 `date +%s`" 
              | nc graphite.etsycorp.com 2003




Tuesday, June 5, 12
Trends + Events
         target=drawAsInfinite(events.deploy.site)




Tuesday, June 5, 12
What Happened?


Tuesday, June 5, 12
Holt-Winters


Tuesday, June 5, 12
"Forecasting Sales by
                      Exponentially Weighted
                      Moving Averages". Peter



Tuesday, June 5, 12
"Aberrant Behavior
                      Detection in Time Series
                      for Network Monitoring".



Tuesday, June 5, 12
"Holt-Winters Forecasting
                      Applied to Poisson
                   Processes in Real-Time".



Tuesday, June 5, 12
holtWintersConfidence(Upper|Lower)




Tuesday, June 5, 12
holtWintersAberration




Tuesday, June 5, 12
business metrics with
             confidence bands
                    ==
        alertable business metrics


Tuesday, June 5, 12
16,000 metrics in
                           GRAPHITE
                      (plus 32,000 metrics in GANGLIA)




Tuesday, June 5, 12
16,000 metrics in
                           GRAPHITE
                      (plus 32,000 metrics in GANGLIA)




Tuesday, June 5, 12
Dashboards


Tuesday, June 5, 12
Dashboards



Tuesday, June 5, 12
Dashboards



Tuesday, June 5, 12
Hard
       <a href="http://graphite.etsycorp.com/render?
       from=-1hours&width=800&height=600&title=File+or+Script+Not
       +Found&yMin=0&target=webs.errorLog.notExist&target=drawAsInfinite
       %28deploys.config.production%29&target=drawAsInfinite%28deploys.web.production
       %29&target=drawAsInfinite%28deploys.search.production%29&target=drawAsInfinite
       %28deploys.imagestorage.other%29&colorList=%2300cc00,%230000ff,
       %23ff0000,%23006633,%23cc6600">
       
   <img src="http://graphite.etsycorp.com/render?
       from=-1hours&width=280&height=220&title=File+or+Script+Not
       +Found&hideLegend=1&yMin=0&target=webs.errorLog.notExist&target=drawAsInfinite
       %28deploys.config.production%29&target=drawAsInfinite%28deploys.web.production
       %29&target=drawAsInfinite%28deploys.search.production%29&target=drawAsInfinite
       %28deploys.imagestorage.other%29&colorList=%2300cc00,%230000ff,
       %23ff0000,%23006633,%23cc6600">
       </a>




Tuesday, June 5, 12
Easy!
     $g = new Graphite($time);
     $g->setTitle('File Not Found');
     $g->addMetric('webs.errorLog.notExist', '#00cc00');
     $g->showDeploys(true);
     echo $g->getDashboardHTML(280, 220);




Tuesday, June 5, 12
48 dashboards by
                        32 engineers


Tuesday, June 5, 12
Application
                        health


Tuesday, June 5, 12
High-level
                       visibility


Tuesday, June 5, 12
Low MTTD


Tuesday, June 5, 12
Confidence


Tuesday, June 5, 12
Make metrics


Tuesday, June 5, 12
Make metrics


Tuesday, June 5, 12
Make metrics


Tuesday, June 5, 12
Not that much


Tuesday, June 5, 12
codeascraft.etsy.com
                      github.com/etsy/statsd
                      github.com/etsy/logster

                      bitbucket.org/maplebed/ganglia-
                      logtailer




Tuesday, June 5, 12
Questions?




Tuesday, June 5, 12

Weitere ähnliche Inhalte

Mehr von Kellan

Future of handmade
Future of handmadeFuture of handmade
Future of handmadeKellan
 
Architecting for Change: QCONNYC 2012
Architecting for Change: QCONNYC 2012Architecting for Change: QCONNYC 2012
Architecting for Change: QCONNYC 2012Kellan
 
Engineering Change
Engineering ChangeEngineering Change
Engineering ChangeKellan
 
Solving the "Brooklyn Problem"
Solving the "Brooklyn Problem" Solving the "Brooklyn Problem"
Solving the "Brooklyn Problem" Kellan
 
Social Software For Robots
Social Software For RobotsSocial Software For Robots
Social Software For RobotsKellan
 
Beyond REST? Building data services with XMPP
Beyond REST? Building data services with XMPPBeyond REST? Building data services with XMPP
Beyond REST? Building data services with XMPPKellan
 
Advanced OAuth Wrangling
Advanced OAuth WranglingAdvanced OAuth Wrangling
Advanced OAuth WranglingKellan
 
Casual Privacy (Ignite Web2.0 Expo)
Casual Privacy (Ignite Web2.0 Expo)Casual Privacy (Ignite Web2.0 Expo)
Casual Privacy (Ignite Web2.0 Expo)Kellan
 

Mehr von Kellan (8)

Future of handmade
Future of handmadeFuture of handmade
Future of handmade
 
Architecting for Change: QCONNYC 2012
Architecting for Change: QCONNYC 2012Architecting for Change: QCONNYC 2012
Architecting for Change: QCONNYC 2012
 
Engineering Change
Engineering ChangeEngineering Change
Engineering Change
 
Solving the "Brooklyn Problem"
Solving the "Brooklyn Problem" Solving the "Brooklyn Problem"
Solving the "Brooklyn Problem"
 
Social Software For Robots
Social Software For RobotsSocial Software For Robots
Social Software For Robots
 
Beyond REST? Building data services with XMPP
Beyond REST? Building data services with XMPPBeyond REST? Building data services with XMPP
Beyond REST? Building data services with XMPP
 
Advanced OAuth Wrangling
Advanced OAuth WranglingAdvanced OAuth Wrangling
Advanced OAuth Wrangling
 
Casual Privacy (Ignite Web2.0 Expo)
Casual Privacy (Ignite Web2.0 Expo)Casual Privacy (Ignite Web2.0 Expo)
Casual Privacy (Ignite Web2.0 Expo)
 

Kürzlich hochgeladen

Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 

Kürzlich hochgeladen (20)

Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 

Metrics driven engineering (velocity 2011)

  • 1. METRICS-DRIVEN ENGINEERING at Kellan Elliott-McCrea, VP of Eng. kellan@etsy.com @kellan Tuesday, June 5, 12
  • 5. 8.5+ million items in the marketplace Tuesday, June 5, 12
  • 7. $300+ million in sales in 2010 ~$41 million/month Tuesday, June 5, 12
  • 8. > $1000 / minute Tuesday, June 5, 12
  • 9. > 1 billion page views / month Tuesday, June 5, 12
  • 10. business in over 150 countries Tuesday, June 5, 12
  • 11. deploy the site, every ~20 minutes Tuesday, June 5, 12
  • 12. engineering team grew ~4x in 2010 Tuesday, June 5, 12
  • 14. Logs, Graphs, Trends, and Correlations Tuesday, June 5, 12
  • 17. How many visitors are using this thing? Tuesday, June 5, 12
  • 18. Can we deploy that to 100% of our visitors? Tuesday, June 5, 12
  • 19. Did we make it faster? Tuesday, June 5, 12
  • 20. Did I just break something? Tuesday, June 5, 12
  • 21. Q. WHO MAKES THESE GRAPHS? A. Well,racksOps team manages thethe network, the the servers, installed monitoring tools, wears the pagers, blah, blah, blah... Tuesday, June 5, 12
  • 22. but... Engineers build the application. Tuesday, June 5, 12
  • 23. Dev + Ops Tuesday, June 5, 12
  • 25. Yes! No. Tuesday, June 5, 12
  • 26. “Engineers are too busy!” Tuesday, June 5, 12
  • 27. Here’s the BIG SECRET... Tuesday, June 5, 12
  • 28. ... MAKE IT EASY! Tuesday, June 5, 12
  • 29. Simple, open source tools Tuesday, June 5, 12
  • 30. Cacti (network, SNMP) Ganglia (machines) Graphite (application) Splunk (log analysis, nightly reports) Nagios (alerting) Tuesday, June 5, 12
  • 31. Gan ★cluster oriented ★huge community contributed recipes ★2.0 released today (including several Flickr and Etsy patches!) ★gmetad makes it easy to track custom metrics Tuesday, June 5, 12
  • 33. Graphite ★super flexible collection and display ★per metrics buckets ★single instance ★super easy to write and use custom display functions Tuesday, June 5, 12
  • 35. Logger::log_error("User login failed. Reason: $msg for $username", “login”); Tuesday, June 5, 12
  • 36. web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [14531658] User login failed. Reason: wrong password for ... Tuesday, June 5, 12
  • 37. web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [14531658] User login failed. Reason: wrong password for ... Tuesday, June 5, 12
  • 38. web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [14531658] User login failed. Reason: wrong password for ... Tuesday, June 5, 12
  • 39. web0054 [Fri Mar 04 16:27:48 2011] [info] [login] [14531658] User login failed. Reason: wrong password for ... Tuesday, June 5, 12
  • 40. web0054 [Fri Mar 04 16:27:48 2011] [info] [login] [14531658] User login failed. Reason: wrong password for ... Tuesday, June 5, 12
  • 41. web0054 [Fri Mar 04 16:27:48 2011] [info] [login] [14531658] User login failed. Reason: wrong password for ... Tuesday, June 5, 12
  • 42. Counting and Timing http://code.flickr.com/blog/ 2008/10/27/counting-timing/ Tuesday, June 5, 12
  • 44. Logster https://github.com/etsy/logster Tuesday, June 5, 12
  • 45. Forked from ganglia-logtailer : - Daemon mode (only cron mode) + Support for Graphite + Simplified parsing scripts Tuesday, June 5, 12
  • 46. web0001 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh! web0001 [04:28:54 2011] [error] [client 10.101.x.x] Help me, Rhonda. web0001 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo! web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh! web0001 [04:28:54 2011] [error] [client 10.101.x.x] Heeeeeeellllllllllllllppppp! web0001 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo! web0001 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh! web0201 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh! web0034 [04:28:54 2011] [warning] [client 10.101.x.x] Oh nooooooooooo web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web1101 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0201 [04:28:54 2011] [error] [client 10.101.x.x] You've been eaten by a grue. web0055 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!!! web0002 [04:28:54 2011] [warning] [client 10.101.x.x] Sky is falling. web0089 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0020 [04:28:54 2011] [error] [client 10.101.x.x] Sky is falling. web1101 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh! web0055 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh! web0001 [04:28:54 2011] [warning] [client 10.101.x.x] Oh nooooooooooo web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0034 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0087 [04:28:54 2011] [fatal] [client 10.101.x.x] Sky is falling. web0002 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo! web0201 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh! web0077 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh! web0355 [04:28:54 2011] [warning] [client 10.101.x.x] Oh nooooooooooo web0052 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0003 [04:28:54 2011] [error] [client 10.101.x.x] You've been eaten by a grue. web0066 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!!! web0001 [04:28:54 2011] [warning] [client 10.101.x.x] Sky is falling Tuesday, June 5, 12
  • 47. Fatals Errors Warnings Tuesday, June 5, 12
  • 48. ★runs out of cron ★maintains a cursor into log files ★supports ganglia and graphite ★custom parsers much easier to write then gmetad Tuesday, June 5, 12
  • 50. LogFormat "%h %l %u %t "%r" %>s %b" common Tuesday, June 5, 12
  • 51. LogFormat "%{X-Forwarded-For}i % {True-Client-IP}i %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" % {etsy_shop_id}n %{etsy_uaid}n %V % {etsy_ab_selections}n % {etsy_request_uuid}n % {etsy_api_consumer_key}n % {etsy_api_method_name}n % {php_memory_usage_bytes}n % {php_time_microsec}n %D" combined Tuesday, June 5, 12
  • 55. “If Engineering at Etsy has a religion, it’s the Church of Graphs. If it moves, we track it.” - Erik Kastner http://codeascraft.etsy.com/2011/02/15/measure- anything-measure-everything/ Tuesday, June 5, 12
  • 58. StatsD https://github.com/ etsy/statsd/ Tuesday, June 5, 12
  • 59. StatsD::increment("logins.success"); StatsD::timing("gearman.time", $msec); Tuesday, June 5, 12
  • 60. 90th pct average lower StatsD::timing("gearman.time", $msec); Tuesday, June 5, 12
  • 61. Ad hoc name value timestamp Tuesday, June 5, 12
  • 62. echo "events.deploy.site 1 `date +%s`" | nc graphite.etsycorp.com 2003 Tuesday, June 5, 12
  • 64. echo "events.deploy.site 1 `date +%s`" | nc graphite.etsycorp.com 2003 Tuesday, June 5, 12
  • 65. Trends + Events target=drawAsInfinite(events.deploy.site) Tuesday, June 5, 12
  • 68. "Forecasting Sales by Exponentially Weighted Moving Averages". Peter Tuesday, June 5, 12
  • 69. "Aberrant Behavior Detection in Time Series for Network Monitoring". Tuesday, June 5, 12
  • 70. "Holt-Winters Forecasting Applied to Poisson Processes in Real-Time". Tuesday, June 5, 12
  • 73. business metrics with confidence bands == alertable business metrics Tuesday, June 5, 12
  • 74. 16,000 metrics in GRAPHITE (plus 32,000 metrics in GANGLIA) Tuesday, June 5, 12
  • 75. 16,000 metrics in GRAPHITE (plus 32,000 metrics in GANGLIA) Tuesday, June 5, 12
  • 79. Hard <a href="http://graphite.etsycorp.com/render? from=-1hours&width=800&height=600&title=File+or+Script+Not +Found&yMin=0&target=webs.errorLog.notExist&target=drawAsInfinite %28deploys.config.production%29&target=drawAsInfinite%28deploys.web.production %29&target=drawAsInfinite%28deploys.search.production%29&target=drawAsInfinite %28deploys.imagestorage.other%29&colorList=%2300cc00,%230000ff, %23ff0000,%23006633,%23cc6600"> <img src="http://graphite.etsycorp.com/render? from=-1hours&width=280&height=220&title=File+or+Script+Not +Found&hideLegend=1&yMin=0&target=webs.errorLog.notExist&target=drawAsInfinite %28deploys.config.production%29&target=drawAsInfinite%28deploys.web.production %29&target=drawAsInfinite%28deploys.search.production%29&target=drawAsInfinite %28deploys.imagestorage.other%29&colorList=%2300cc00,%230000ff, %23ff0000,%23006633,%23cc6600"> </a> Tuesday, June 5, 12
  • 80. Easy! $g = new Graphite($time); $g->setTitle('File Not Found'); $g->addMetric('webs.errorLog.notExist', '#00cc00'); $g->showDeploys(true); echo $g->getDashboardHTML(280, 220); Tuesday, June 5, 12
  • 81. 48 dashboards by 32 engineers Tuesday, June 5, 12
  • 82. Application health Tuesday, June 5, 12
  • 83. High-level visibility Tuesday, June 5, 12
  • 90. codeascraft.etsy.com github.com/etsy/statsd github.com/etsy/logster bitbucket.org/maplebed/ganglia- logtailer Tuesday, June 5, 12