SlideShare ist ein Scribd-Unternehmen logo
1 von 42
Downloaden Sie, um offline zu lesen
"GameDay"
Achieving resilience through Chaos Engineering
Matt Fellows
@matthewfellows
#AAGameDay
#ChaosTesting
Pete Cohen
@petecohen
What is the common thread for these catastrophes?
#1 They all combined Technology with People + Process
#2 They all had multiple causes
Overview
■ Why GameDay exercises?
■ Case Studies
■ How you can run one for yourself
■ Bugs
■ Integration issues
■ Distributed failure
■ The squishy stuff: People + Process
Classes of issues
User Interface Mobile
API Gateway
Mainframe / DB
Middleware /
APIs
VerticalSlice
Pace layered architecture
User Interface Mobile
API Gateway
Mainframe / DB
Middleware /
APIs
VerticalSlice
Bug
bug
User Interface Mobile
API Gateway
Mainframe / DB
Middleware /
APIs
VerticalSlice
Integration Issues
integration
User Interface Mobile
API Gateway
Mainframe / DB
Middleware /
APIs
VerticalSlice
Distributed Failures
distributed
User Interface Mobile
API Gateway
Mainframe / DB
Middleware /
APIs
VerticalSlice
Catastrophes
Customers
Engineers
Call Centre
bug
distributed
...
integration
Public Relations
Classes of issues
■ Bugs
■ Integration issues
■ Distributed failure
■ The squishy stuff: People + Process
So how do we avoid becoming front page news?
(the bad kind)
Fragility vs Resilience
Resilience vs Antifragility
Embracing Failure
■ We need to practice failure
■ Software Engineering needs its Fire Drill
An exercise where we place our systems
- technology, people + processes -
under stress in order to
learn and improve resilience.
GameDay
A GameDay manifesto?
DR GameDays
Driver Process Continuous Improvement
Approach Run sheet + requirements Loose plan + a little chaos
Focus Infrastructure Customer
Who Operations Cross functional,
multi-disciplinary team
Assumption System is built to a
robust design
System is hazardous
Once you finally start succeeding at agile…
Iterative software development
Independent feature teams
Nimble architectures
Distributed, scalable infrastructure
We want to inspire you
to give GameDays a go
Case Studies
Case Study: SEEK & nib
Logistics - how to plan a GameDay
dius.com.au/resources/game-day
■ People and roles to get involved
■ Preparation workshops and planning
■ Templates and checklist
■ Physical space set up
Get
buy in
Find
the
right
people
Run
workshops
Logistical
preparation
Run
the
GameDay
Communicate
and act on
outcomes
Get
buy in
Find
the
right
people
Run
workshops
Logistical
preparation
Run
the
GameDay
Communicate
and act on
outcomes
Decide
which
broad
areas to
test
Identify
scenarios
Capture
hypotheses
Formulate an
action plan to
set up
scenarios
Scenario and hypothesis generation workshop
Get a
common
view of
the stack
Load
Balancer
API API API API
Load
Balancer
Load Balancer
Load
Balancer
Load
Balancer
Post Mortem
Post Mortem
Load
Balancer
API API API API
Load Balancer Load Balancer Load Balancer Load Balancer
X X X XNo visibility!
✅ ✅ ✅ ✅
X
Release Dashboard
Ingredients for catastrophe
✓Introduction of a change to the system
✓Human error
✓Missing local controls (tests) to prevent syntax issue
✓Lack of salient information for operator (monitoring and alerting)
✓Opportunity to misinterpret data
✓Distance between expert and operator (process)
What did we learn?
■ Just getting teams together to discuss resilience
was worthwhile
■ We always found something
■ Our experiments reduced the impact of
hindsight bias
What matters:
■ Cross-functional team
■ Planning
■ Open to exposing failure
■ Customer focus
■ Bake it in - do GameDays frequently
What doesn’t matter:
■ Size of team/company
■ Waterfall/Agile
■ Language, technology...
Are GameDays the new hack days?
■ Collaboration
■ Problem solving
■ Creates business value
The journey towards automated resilience testing
Pre-Production:
■Create local experiments in Docker
■Manual chaos in integrated
environments
Production:
■Start small!
■Metrics-driven approach
Chaos Kong
pumba
Matt Fellows @matthewfellows mfellows@dius.com.au
Pete Cohen @petecohen pcohen@dius.com.au
For links, references, templates and your GameDay toolkit, head to:
dius.com.au/resources/game-day
Thank you!

Weitere ähnliche Inhalte

Was ist angesagt?

Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...
Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...
Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...
Ana Medina
 

Was ist angesagt? (20)

Chaos engineering
Chaos engineering Chaos engineering
Chaos engineering
 
Demystifying DevSecOps
Demystifying DevSecOpsDemystifying DevSecOps
Demystifying DevSecOps
 
Introduction to Chaos Engineering with Microsoft Azure
Introduction to Chaos Engineering with Microsoft AzureIntroduction to Chaos Engineering with Microsoft Azure
Introduction to Chaos Engineering with Microsoft Azure
 
GameDay: Creating Resiliency Through Destruction - LISA11
GameDay: Creating Resiliency Through Destruction - LISA11GameDay: Creating Resiliency Through Destruction - LISA11
GameDay: Creating Resiliency Through Destruction - LISA11
 
(DAT407) Amazon ElastiCache: Deep Dive
(DAT407) Amazon ElastiCache: Deep Dive(DAT407) Amazon ElastiCache: Deep Dive
(DAT407) Amazon ElastiCache: Deep Dive
 
Amazon ECS
Amazon ECSAmazon ECS
Amazon ECS
 
AWS Summit Seoul 2023 | 가격은 저렴, 성능은 최대로! 확 달라진 Amazon EC2 알아보기
AWS Summit Seoul 2023 | 가격은 저렴, 성능은 최대로! 확 달라진 Amazon EC2 알아보기AWS Summit Seoul 2023 | 가격은 저렴, 성능은 최대로! 확 달라진 Amazon EC2 알아보기
AWS Summit Seoul 2023 | 가격은 저렴, 성능은 최대로! 확 달라진 Amazon EC2 알아보기
 
Practical Chaos Engineering
Practical Chaos EngineeringPractical Chaos Engineering
Practical Chaos Engineering
 
Introducing DevOps, IT Sharing Session 20 Nov 2017
Introducing DevOps, IT Sharing Session 20 Nov 2017Introducing DevOps, IT Sharing Session 20 Nov 2017
Introducing DevOps, IT Sharing Session 20 Nov 2017
 
An Agile Approach to Accelerate Mass Migration | AWS Public Sector Summit 2016
An Agile Approach to Accelerate Mass Migration | AWS Public Sector Summit 2016An Agile Approach to Accelerate Mass Migration | AWS Public Sector Summit 2016
An Agile Approach to Accelerate Mass Migration | AWS Public Sector Summit 2016
 
Introduction to DevOps and the AWS Code Services
Introduction to DevOps and the AWS Code ServicesIntroduction to DevOps and the AWS Code Services
Introduction to DevOps and the AWS Code Services
 
Terraform
TerraformTerraform
Terraform
 
Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...
Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...
Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...
 
DevOps, Common use cases, Architectures, Best Practices
DevOps, Common use cases, Architectures, Best PracticesDevOps, Common use cases, Architectures, Best Practices
DevOps, Common use cases, Architectures, Best Practices
 
Creating AWS infrastructure using Terraform
Creating AWS infrastructure using TerraformCreating AWS infrastructure using Terraform
Creating AWS infrastructure using Terraform
 
Chaos Engineering with Kubernetes
Chaos Engineering with KubernetesChaos Engineering with Kubernetes
Chaos Engineering with Kubernetes
 
Infrastructure as Code
Infrastructure as CodeInfrastructure as Code
Infrastructure as Code
 
Introduction to Chaos Engineering
Introduction to Chaos EngineeringIntroduction to Chaos Engineering
Introduction to Chaos Engineering
 
Large-Scale AWS Migrations with CSC
Large-Scale AWS Migrations with CSCLarge-Scale AWS Migrations with CSC
Large-Scale AWS Migrations with CSC
 
AWS를 위한 도커, 컨테이너 (이미지) 환경 보안 방안 - 양희선 부장, TrendMicro :: AWS Summit Seoul 2019
AWS를 위한 도커, 컨테이너 (이미지) 환경 보안 방안 - 양희선 부장, TrendMicro :: AWS Summit Seoul 2019AWS를 위한 도커, 컨테이너 (이미지) 환경 보안 방안 - 양희선 부장, TrendMicro :: AWS Summit Seoul 2019
AWS를 위한 도커, 컨테이너 (이미지) 환경 보안 방안 - 양희선 부장, TrendMicro :: AWS Summit Seoul 2019
 

Ähnlich wie GameDay - Achieving resilience through Chaos Engineering

From NASA to Startups to Big Commerce
From NASA to Startups to Big CommerceFrom NASA to Startups to Big Commerce
From NASA to Startups to Big Commerce
Daniel Greenfeld
 
Sustainable Agile Development
Sustainable Agile DevelopmentSustainable Agile Development
Sustainable Agile Development
Gabriele Lana
 
Jesse Robbins Keynote - Hacking Culture @ Cloud Expo Europe 2013
Jesse Robbins Keynote - Hacking Culture @ Cloud Expo Europe 2013Jesse Robbins Keynote - Hacking Culture @ Cloud Expo Europe 2013
Jesse Robbins Keynote - Hacking Culture @ Cloud Expo Europe 2013
Jesse Robbins
 
Stream SQL eventflow visual programming for real programmers presentation
Stream SQL eventflow visual programming for real programmers presentationStream SQL eventflow visual programming for real programmers presentation
Stream SQL eventflow visual programming for real programmers presentation
streambase
 
Skillshare - From Noob to Tech CEO - nov 7th, 2011
Skillshare - From Noob to Tech CEO - nov 7th, 2011Skillshare - From Noob to Tech CEO - nov 7th, 2011
Skillshare - From Noob to Tech CEO - nov 7th, 2011
Kareem Amin
 

Ähnlich wie GameDay - Achieving resilience through Chaos Engineering (20)

From NASA to Startups to Big Commerce
From NASA to Startups to Big CommerceFrom NASA to Startups to Big Commerce
From NASA to Startups to Big Commerce
 
Sustainable Agile Development
Sustainable Agile DevelopmentSustainable Agile Development
Sustainable Agile Development
 
Jesse Robbins Keynote - Hacking Culture @ Cloud Expo Europe 2013
Jesse Robbins Keynote - Hacking Culture @ Cloud Expo Europe 2013Jesse Robbins Keynote - Hacking Culture @ Cloud Expo Europe 2013
Jesse Robbins Keynote - Hacking Culture @ Cloud Expo Europe 2013
 
Starting a Software Developer Career
Starting a Software Developer CareerStarting a Software Developer Career
Starting a Software Developer Career
 
Stream SQL eventflow visual programming for real programmers presentation
Stream SQL eventflow visual programming for real programmers presentationStream SQL eventflow visual programming for real programmers presentation
Stream SQL eventflow visual programming for real programmers presentation
 
How Product Managers Thrive in a DevOps World
How Product Managers Thrive in a DevOps WorldHow Product Managers Thrive in a DevOps World
How Product Managers Thrive in a DevOps World
 
Continuous Deployment
Continuous DeploymentContinuous Deployment
Continuous Deployment
 
Brand Commerce - We all know the shiny stuff at the front. But what magic is ...
Brand Commerce - We all know the shiny stuff at the front. But what magic is ...Brand Commerce - We all know the shiny stuff at the front. But what magic is ...
Brand Commerce - We all know the shiny stuff at the front. But what magic is ...
 
Dojo 1.0: Great Experiences For Everyone
Dojo 1.0: Great Experiences For EveryoneDojo 1.0: Great Experiences For Everyone
Dojo 1.0: Great Experiences For Everyone
 
Paging, Alerting, Chaos Eng Overview
Paging, Alerting, Chaos Eng OverviewPaging, Alerting, Chaos Eng Overview
Paging, Alerting, Chaos Eng Overview
 
How to Manage the Risk of your Polyglot Environments
How to Manage the Risk of your Polyglot EnvironmentsHow to Manage the Risk of your Polyglot Environments
How to Manage the Risk of your Polyglot Environments
 
Software Architecture and Predictive Models in R
Software Architecture and Predictive Models in RSoftware Architecture and Predictive Models in R
Software Architecture and Predictive Models in R
 
Maintainable Machine Learning Products
Maintainable Machine Learning ProductsMaintainable Machine Learning Products
Maintainable Machine Learning Products
 
Engineering Software and Software Lifecycle
Engineering Software and Software LifecycleEngineering Software and Software Lifecycle
Engineering Software and Software Lifecycle
 
Computational Patterns of the Cloud - QCon NYC 2014
Computational Patterns of the Cloud - QCon NYC 2014Computational Patterns of the Cloud - QCon NYC 2014
Computational Patterns of the Cloud - QCon NYC 2014
 
Skillshare - From Noob to Tech CEO - nov 7th, 2011
Skillshare - From Noob to Tech CEO - nov 7th, 2011Skillshare - From Noob to Tech CEO - nov 7th, 2011
Skillshare - From Noob to Tech CEO - nov 7th, 2011
 
Teaching Elephants to Dance (and Fly!): A Developer's Journey to Digital Tran...
Teaching Elephants to Dance (and Fly!): A Developer's Journey to Digital Tran...Teaching Elephants to Dance (and Fly!): A Developer's Journey to Digital Tran...
Teaching Elephants to Dance (and Fly!): A Developer's Journey to Digital Tran...
 
Building products - A Nifty Approach
Building products - A Nifty ApproachBuilding products - A Nifty Approach
Building products - A Nifty Approach
 
20160913 cookpad ios_en
20160913 cookpad ios_en20160913 cookpad ios_en
20160913 cookpad ios_en
 
So You Just Inherited a $Legacy Application… NomadPHP July 2016
So You Just Inherited a $Legacy Application… NomadPHP July 2016So You Just Inherited a $Legacy Application… NomadPHP July 2016
So You Just Inherited a $Legacy Application… NomadPHP July 2016
 

Mehr von DiUS

Mehr von DiUS (20)

Lunch and Learn: You have the data, now what?
Lunch and Learn: You have the data, now what?Lunch and Learn: You have the data, now what?
Lunch and Learn: You have the data, now what?
 
How to build confidence in your release cycle
How to build confidence in your release cycleHow to build confidence in your release cycle
How to build confidence in your release cycle
 
Serverless microservices: Test smarter, not harder
Serverless microservices: Test smarter, not harderServerless microservices: Test smarter, not harder
Serverless microservices: Test smarter, not harder
 
Test Smart, not hard
Test Smart, not hardTest Smart, not hard
Test Smart, not hard
 
10 things-to-inspire-in-10-mins
10 things-to-inspire-in-10-mins10 things-to-inspire-in-10-mins
10 things-to-inspire-in-10-mins
 
Trends and development practices in Serverless architectures
Trends and development practices in Serverless architecturesTrends and development practices in Serverless architectures
Trends and development practices in Serverless architectures
 
Deploying large-scale, serverless and asynchronous systems - without integrat...
Deploying large-scale, serverless and asynchronous systems - without integrat...Deploying large-scale, serverless and asynchronous systems - without integrat...
Deploying large-scale, serverless and asynchronous systems - without integrat...
 
The Diversity Dilemma - Supporting our Sisters in STEM
The Diversity Dilemma - Supporting our Sisters in STEMThe Diversity Dilemma - Supporting our Sisters in STEM
The Diversity Dilemma - Supporting our Sisters in STEM
 
The case for consumer-driven contracts
The case for consumer-driven contractsThe case for consumer-driven contracts
The case for consumer-driven contracts
 
Deploy with Confidence using Pact Go!
Deploy with Confidence using Pact Go!Deploy with Confidence using Pact Go!
Deploy with Confidence using Pact Go!
 
Crafting Quality Software
Crafting Quality SoftwareCrafting Quality Software
Crafting Quality Software
 
Metrics on the front, data in the back
Metrics on the front, data in the backMetrics on the front, data in the back
Metrics on the front, data in the back
 
Antifragility and testing for distributed systems failure
Antifragility and testing for distributed systems failureAntifragility and testing for distributed systems failure
Antifragility and testing for distributed systems failure
 
DIY IoT Backend
DIY IoT BackendDIY IoT Backend
DIY IoT Backend
 
How to Build Hardware Lean
How to Build Hardware LeanHow to Build Hardware Lean
How to Build Hardware Lean
 
Behaviour Change and Coaching: What we can learn from BJ Fogg
Behaviour Change and Coaching: What we can learn from BJ FoggBehaviour Change and Coaching: What we can learn from BJ Fogg
Behaviour Change and Coaching: What we can learn from BJ Fogg
 
Power in Agile Teams
Power in Agile Teams Power in Agile Teams
Power in Agile Teams
 
The Diversity Dilemma: Attracting and Retaining Talented Women in Technology-...
The Diversity Dilemma: Attracting and Retaining Talented Women in Technology-...The Diversity Dilemma: Attracting and Retaining Talented Women in Technology-...
The Diversity Dilemma: Attracting and Retaining Talented Women in Technology-...
 
Rise of the machines: Continuous Delivery at SEEK - YOW! Night Summary Slides
Rise of the machines: Continuous Delivery at SEEK - YOW! Night Summary SlidesRise of the machines: Continuous Delivery at SEEK - YOW! Night Summary Slides
Rise of the machines: Continuous Delivery at SEEK - YOW! Night Summary Slides
 
AWS Summit Melbourne 2014 | The Path to Business Agility for Vodafone: How Am...
AWS Summit Melbourne 2014 | The Path to Business Agility for Vodafone: How Am...AWS Summit Melbourne 2014 | The Path to Business Agility for Vodafone: How Am...
AWS Summit Melbourne 2014 | The Path to Business Agility for Vodafone: How Am...
 

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

GameDay - Achieving resilience through Chaos Engineering