This document summarizes a presentation given by Gene Kim on infosec and DevOps. It discusses research that found high performing IT organizations have fewer security issues and implement changes more successfully. The presentation introduces the concepts of Rugged software development and DevOps. It provides an overview of how to implement DevOps through systems thinking, amplifying feedback loops, and developing a culture of experimentation. Key aspects include integrating operations, security and development teams and processes. The goal is to reduce issues and improve flow to help the business.
3. Agenda
Background of research
The big unsolved problem
What is Rugged?
What is DevOps?
How do you do Rugged DevOps?
Things you can do right away
3
4. High Performing IT Organizations
High performers maintain a posture of compliance
Fewest number of repeat audit findings
One-third amount of audit preparation effort
High performers find and fix security breaches faster
5 times more likely to detect breaches by automated control
5 times less likely to have breaches result in a loss event
When high performers implement changes…
14 times more changes
One-half the change failure rate
One-quarter the first fix failure rate
10x faster MTTR for Sev 1 outages
When high performers manage IT resources…
One-third the amount of unplanned work
8 times more projects and IT services
6 times more applications
Source: IT Process Institute, 2008
5. Visible Ops: Playbook of High Performers
The IT Process Institute has
been studying high-performing
organizations since 1999
What is common to all the high
performers?
What is different between them
and average and low
performers?
How did they become great?
Answers have been codified in
the Visible Ops Methodology
www.ITPI.org
6. 2007: Three Controls Predict 60% Of
Performance
To what extent does an organization
define, monitor and enforce the following?
Standardized configuration strategy
Process discipline
Controlled access to production systems
Source: IT Process Institute, 2008
7. The Downward Spiral
Operations Sees… Dev Sees…
Fragile applications are prone to More urgent, date-driven projects
failure put into the queue
Long time required to figure out “which Even more fragile code (less
bit got flipped” secure) put into production
Detective control is a salesperson More releases have increasingly
“turbulent installs”
Too much time required to restore
service Release cycles lengthen to
amortize “cost of deployments”
Too much firefighting and unplanned
work Failing bigger deployments more
difficult to diagnose
Urgent security rework and
remediation Most senior and constrained IT
ops resources have less time to
Planned project work cannot complete fix underlying process problems
Frustrated customers leave Ever increasing backlog of work
Market share goes down that cold help the business win
Business misses Wall Street Ever increasing amount of
commitments tension between IT Ops,
Development, Design…
Business makes even larger promises
to Wall Street
These aren’t IT or Infosec problems…
These are business problems!
8. My Mission: Figure Out How Break The IT Core
Chronic Conflict
Every IT organization is pressured to
simultaneously:
Respond more quickly to urgent business needs
Provide stable, secure and predictable IT service
Words often used to describe process improvement:
“hysterical, irrelevant, bureaucratic, bottleneck, difficult to understand, not
aligned with the business, immature, shrill, perpetually focused on irrelevant
technical minutiae…”
Source: The authors acknowledge Dr. Eliyahu Goldratt, creator of the Theory of Constraints and
author of The Goal, has written extensively on the theory and practice of identifying and resolving
10 core, chronic conflicts.
9. Good News: It Can Be Done
Bad News: You Can’t Do It Alone
50. DevOps: It’s A Real Movement
I would never do another startup that didn’t
employ DevOps like principles
It’s not just startups – it’s happening in the
enterprise and in public sector, too
I believe working in DevOps environments will
be a necessary skillset 5 years from now
Just as Agile helped Dev regain trust with the
business, DevOps will help all of IT
52. The Prescriptive DevOps Cookbook
“DevOps Cookbook” Authors
Patrick DeBois, Mike
Orzen, John Willis
Goals
Codify how to start and finish
DevOps transformations
How does Development, IT
Operations and Infosec
become dependable partners
Describe in detail how to
replicate the transformations
describe in “When IT Fails: The
Novel”
58. The First Way:
Systems Thinking (Left To Right)
Never pass defects to downstream work centers
Never allow local optimization to create global
degradation
Increase flow: elevate bottlenecks, reduce WIP,
throttle release of work, reduce batch sizes
Understanding where reliance is placed
59. Phase 1: Extend the Agile CI/CR Processes
Assign Ops person into Dev team
Create one-step Dev, Test and Production
environment creation procedure in Sprint 0
Create the one-step automated code
deployment procedure
Define roles of Dev, QA, Prod Mgmt and Infosec
60. The First Way:
Systems Thinking: Infosec Insurgency
Have infosec attend the daily Agile standups
Gain awareness of what the team is working on
Find the automated infrastructure project team
(e.g., puppet, chef)
Provide hardening guidance
Integrate and extend their production configuration
monitoring
Find where code packaging is performed
Integrate security testing pre- and post-deployment
Integrate into continuous integration and release
process
Add security test scripts to automated test library
61. The First Way:
Outcomes
Determinism in the release process
Continuation of the Agile and CI/CR processes
Creating single repository for code and environments
Packaging responsibility moves to development
Consistent Dev, QA, Int, and Staging environments, all
properly built before deployment begins
Decrease cycle time
Reduce deployment times from 6 hours to 45 minutes
Refactor deployment process that had 1300+ steps
spanning 4 weeks
Faster release cadence
63. The Second Way:
Amplify Feedback Loops (Right to Left)
Protect the integrity of the entire system of
work, versus completion of tasks
Expose visual data so everyone can see how
their decisions affect the entire system
64. Phase 2: Extend Release Process And Create
Right -> Left Feedback Loops
Embed Dev into Ops escalation process
Invite Dev to post-mortems/root cause analysis
meeting
Create necessary rollback procedures (instead
of fixing forward)
Create application monitoring/metrics to aid in
Ops work (e.g., incident/problem management)
Actively manage flow of work across org
boundaries
65. The Second Way:
Amplify Feedback Loops: Infosec Insurgency
Extend criteria of what changes/deploys cannot be
made without triggering full retest
Create reusable Infosec use and abuse stories that
can be added to every project
“Handle peak traffic of 4MM users and constant 4-6
Gb/sec Anonymous DDoS attacks”
Integrate Infosec and IR into the Ops/Dev escalation
processes (e.g., RACI)
Pre-enable, shield streamline successful audits
Document separation of duty and compensating controls
Don’t let them disrupt the work
66. The Second Way:
Outcomes
Andon cords that stop the production line
Kanban to control work
Project freeze to reduce work in process
Eradicating “quick fixes” that circumvent the process
Ops user stories are part of the Agile planning
process
Better build and deployment systems
More stable environment
Happier and more productive staff
69. The Third Way:
Culture Of Continual Experimentation And
Learning
Foster a culture that rewards:
Experimentation (taking risks) and learning from
failure
Repetition is the prerequisite to mastery
Why?
You need a culture that keeps pushing into the danger
zone
And have the habits that enable you to survive in the
danger zone
71. Phase 3: Organize Dev and Ops To Achieve
Organizational Goals
Allocate 20% of Dev cycles to non-functional
requirements
Build Ops user stories and environments in Dev
that can be reused across all projects (e.g.,
deployment, capacity, security)
Integrate fault injection and resilience into
design, development and production (e.g.,
Chaos Monkey)
Prioritize backlog to manage technical debt
73. The Third Way:
Culture Of Continual Experimentation And
Learning: Infosec
Add Infosec fixes to the Agile backlog
Make technical debt visible
Help prioritize work against features and other non-functional requirements
Weaponize the Security Monkey
Evil/Fuzzy/Chaotic Monkey
Eridicate SQLi and XSS defects in our lifetime
Let loose the Security Monkies and the Simian Army
Eliminate needless complexity
Become the standard bearer: 20% of Dev cycles spent on
non-functional requirements
Take work out of the system
Keep decreasing cycle time: it increases work that the system
can achieve
74. The Third Way:
Outcomes
Dedicated time spent on improving daily work (best practice:
20% of Dev dedicated to non-functional requirements)
Continual reduction of unplanned work
More cycles for planned work
Projects completed to pay down technical debt and increase
flow
Elimination of needless complexity
More resilient code and environments
Balancing nimbleness and practiced repetition
Enabling wider range of risk/reward balance
93. This Is An Important Problem
Operations Sees… Dev Sees…
Fragile applications are prone to More urgent, date-driven projects
failure put into the queue
Long time required to figure out “which Even more fragile code (less
bit got flipped” secure) put into production
Detective control is a salesperson More releases have increasingly
“turbulent installs”
Too much time required to restore
service Release cycles lengthen to
amortize “cost of deployments”
Too much firefighting and unplanned
work Failing bigger deployments more
difficult to diagnose
Urgent security rework and
remediation Most senior and constrained IT
ops resources have less time to
Planned project work cannot complete fix underlying process problems
Frustrated customers leave Ever increasing backlog of work
Market share goes down that cold help the business win
Business misses Wall Street Ever increasing amount of
commitments tension between IT
Ops, Development, Design…
Business makes even larger promises
to Wall Street
94. When IT Fails: The Novel and The DevOps
Cookbook
Coming in July 2012
“In the tradition of the best MBA case studies, this
book should be mandatory reading for business
and IT graduates alike.”
Paul Muller, VP Software Marketing, Hewlett-
Packard
Gene Kim, Tripwire founder,
“The greatest IT management book of our
Visible Ops co-author generation.”
Branden Williams, CTO Marketing, RSA
95. When IT Fails: The Novel and The DevOps
Cookbook
Our mission is to positively affect the
lives of 1 million IT workers by 2017
If you would like the “Top 10 Things You
Need To Know About DevOps,” sample
chapters and updates on the book:
Sign up at http://itrevolution.com
Gene Kim, Tripwire
founder, Visible Ops co- Email genek@realgenekim.me
author Hand me a business card
98. Resources
From the IT Process Institute
www.itpi.org
Both Visible Ops Handbooks
ITPI IT Controls Performance Study
Rugged Software by Corman, et al:
http://ruggedsoftware.org
“Continuous Delivery: Reliable Software
Releases through Build, Test, and
Deployment Automation” by
Humble, Farley
Follow us…
@JoshCorman, @RealGeneKim
mailto:genek@realgenekim.me
http://realgenekim.me/blog
99. Common Traits of High Performers
Culture of…
Change management
Integration of IT operations/security via problem/change management
Processes that serve both organizational needs and business objectives
Highest rate of effective change
Causality
Highest service levels (MTTR, MTBF)
Highest first fix rate (unneeded rework)
Compliance and continual reduction of
operational variance
Production configurations
Highest level of pre-production staffing
Effective pre-production controls
Effective pairing of preventive and detective controls
Source: IT Process Institute
100. Visible Ops: Playbook of High Performers
The IT Process Institute has been
studying high-performing
organizations since 1999
What is common to all the high
performers?
What is different between them and
average and low performers?
How did they become great?
Answers have been codified in the
Visible Ops Methodology
The “Visible Ops Handbook” is
available from the ITPI
www.ITPI.org
101. IT Operations Increases Process Rigor
Standardize deployment
Standardize unplanned work: make it repeatable
Modify first response: ensure constrained
resources have all data at hand to diagnose
Elevate preventive activities to reduce incidents
102. Help Development…
Help them see downstream effects
Unplanned work comes at the expense of planned
work
Technical debt retards feature throughput
Environment matters as much as the code
Allocate time for fault modeling, asking “what
could go wrong?” and implementing
countermeasures
103. Help QA…
Ensure test plans cover not only code
functionality, but also:
Suitability of the environment the code runs in
The end-to-end deployment process
Help find variance…
Functionality, performance, configuration
Duration, wait time and handoff errors, rework, …
104. John Pesche, CISO
CISO for 12 years
39 years old
Aggressive career
climber
Ex-Big Four auditor
How each side Actively impedes the achievement of each other’s goals.
Who are they auditing? IT operations.I love IT operatoins. Why? Because when the developers screw up, the only people who can save the day are the IT operations people. Memory leak? No problem, we’ll do hourly reboots until you figure that out.Who here is from IT operations?Bad day:Not as prepared for the audit as they thoughtSpending 30% of their time scrambling, generating presentation for auditorsOr an outage, and the developer is adamant that they didn’t make the change – they’re saying, “it must be the security guys – they’re always causing outages”Or, there’s 50 systems behind the load balancer, and six systems are acting funny – what different, and who made them differentOr every server is like a snowflake, each having their own personalityWe as Tripwire practitioners can help them make sure changes are made visible, authorized, deployed completely and accurately, find differencesCreate and enforce a culture of change management and causality
Who’s introducing variance? Well, it’s often these guys. Show me a developer who isn’t causing an outage, I’ll show you one who is on vacation.Primary measurement is deploy features quickly – get to market.I’ve worked with two of the five largest Internet companies (Google, Microsoft, Yahoo, AOL, Amazon), and I now believe that the biggest differentiator to great time to market is great operations:Bad day: We do 6 weeks of testing, but deployment still fails. Why? QA environment doesn’t match productionOr there’s a failure in testing, and no one can agree whether it’s a code failure or an environment failureOr changes are made in QA, but no one wrote them down, so they didn’t get replicated downstream in productionBelieve it or not, we as Tripwire practitioners can even help them – make sure environments are available when we need them, that they’re properly configured correctly the first time, document all the changes, replicate them downstream
So who are all these constituencies that we can help, and increase our relevance as Tripwire practitioners and champions?How many people here are in infosec?Goal: protect critical systems and dataSafeguard organizational commitmentsPrevent security breaches, help quickly detect and recover from themBad day: no security standardsNo one is complyingYes, we’re 3 years behind. “Whaddyagonna do about it?”Vs. we (Tripwire owner) can become more relevant and add value by help infosec by leveraging all the configuration guidance out thereMeasure variance between produciton and those known good statesTrust and verify that when management says, we’ve trued up the configurations, they’ve actually done itWhy? Now, more than ever, there are an ever increasing amount of regulatory and contractual requirements to protect systems and data
Tell story of Amazon, Netflix: they care about, availability, securityIt’s not a push, it’s a pull – they’re looking for our help (#1 concern: fear of disintermediation and being marginalized)
At RSA 2009, Josh Corman, Jeff Williams, and David Rice were chatting at the Greylock cocktail party.
So software not only need
…fast, and…
…agile, but it also needs to be…
…rugged. Capable of withstanding…
…the harshest conditions…
…and most unfriendly environments…
[ text ] My personal goal is to prescriptively define 1) what does Dev need to do to become a reliable partner, 2) what does IT Operations need to do to become a realiable partner, and then 3) how do they work together to deliver unbelievable value to the business.Of course, the goal is more than happy coexistence. It’s to replicate the Etsy and LinkedIn stories:Increase the rate of features that we can put into production, while simultaneously maintaining the reliability, stability, security and survivability of the production environment.
[ picture of messy data center ] Ten minutes into Bill’s first day on the job, he has to deal with a payroll run failure. Tomorrow is payday, and finance just found out that while all the salaried employees are going to get paid, none of the hourly factory employees will. All their records from the factory timekeeping systems were zeroed out.Was it a SAN failure? A database failure? An application failure? Interface failure? Cabling error?
How each side Actively impedes the achievement of each other’s goals.
But it’s not just about effectiveness and efficiency. Or just about being efficiently effective, or effectively efficient. Which brings us to the second theme of this conference, which is relevance. The work has to mean something to someone. In my journey of studying high performing IT organizations, I’ve run into many non-high performers. And in those organizations, controls functions, and information security is often viewed as the shrill, hysterical people who are trying to create bureaucratic processes, which suck the will to live out of everyone it touches.These are the functions that tend to get marginalized, or worse, totally avoided. “We have an urgent project that needs to get done. Make sure you don’t invite Gene, because he’ll guarantee that it won’t get done.” Our job is to make money for the business, and I’m not sure what Gene’s job is…