10. The Downward Spiral
Operations Sees… Dev Sees…
Fragile applications are prone to More urgent, date-driven projects
failure put into the queue
Long time required to figure out “which Even more fragile code (less
bit got flipped” secure) put into production
Detective control is a salesperson More releases have increasingly
“turbulent installs”
Too much time required to restore
service Release cycles lengthen to
amortize “cost of deployments”
Too much firefighting and unplanned
work Failing bigger deployments more
difficult to diagnose
Urgent security rework and
remediation Most senior and constrained IT
ops resources have less time to
Planned project work cannot complete fix underlying process problems
Frustrated customers leave Ever increasing backlog of work
Market share goes down that cold help the business win
Business misses Wall Street Ever increasing amount of
commitments tension between IT
Ops, Development, Design…
Business makes even larger promises
to Wall Street
11. The IT Core Chronic Conflict
Every IT organization is pressured to
simultaneously:
Respond more quickly to urgent business needs
Provide stable, secure and predictable IT service
Source: The authors acknowledge Dr. Eliyahu Goldratt, creator of the Theory of Constraints and
author of The Goal, has written extensively on the theory and practice of identifying and resolving
11 core, chronic conflicts.
12. Every Company Is An IT Company…
95% of all capital projects have an IT
component…
50% of all capital spending is technology-related
Where we need to
be…
IT is always in the
way
(again…)
We are here…
20. The First Way:
Systems Thinking (Left To Right)
Understand the flow of work
Always seek to increase flow
Never unconsciously pass defects downstream
Never allow local optimization to cause global
degradation
Achieve profound understanding of the system
21. “Annual business planning sessions can be
madding. They think IT Operations is an „all you
can eat buffet.‟”
-Ben Rockwood,
Director Systems Engineering,
Joyent
22. Practice #1: Define The Work and Make It
Visible
Business projects (e.g., new order system)
Internal IT projects (e.g., Puppet automation)
Changes (e.g., deploys, improve database
performance)
Unplanned work (e.g., site down, site impaired)
22
24. Practice #2: Create One Step Environment
Creation Process
Make environments available early in the
Development process
Make sure Dev builds the code and environment
at the same time
Create a common Dev, QA and Production
environment creation process
25. Change the Agile sprint policy:
“At the end of each sprint, we must have working
code and the environment it runs in!!
26. The First Way:
Outcomes
Creating single repository for code and environments
Determinism in the release process
Consistent Dev, QA, Int, and Staging environments, all
properly built before deployment begins
Decreased cycle time
Reduce deployment times from 6 hours to 45 minutes
Refactor deployment process that had 1300+ steps
spanning 4 weeks
Faster release cadence
28. The Second Way:
Amplify Feedback Loops (Right to Left)
Understand and respond to the needs of all
customers, internal and external
Shorten and amplify all feedback loops: stop the
line when necessary
Create quality at the source
Create and embed knowledge where we need it
29. “We found that when we woke up developers at
2am, defects got fixed faster than ever”
-Patrick Lightbody,
CEO, BrowserMob
30. Pattern #3: Embed Dev Into IT Ops
Embed Dev into IT Ops incident escalation
process
Invite Dev to post-mortems/root cause analysis
meeting
Have Dev and Infosec cross-train IT Operations
Ensure application monitoring/metrics to aid in
Ops and Infosec work (e.g., incident/problem
management)
31. The Second Way:
Outcomes
Defects and security issues getting fixed faster
than ever
Reusable Ops and Infosec user stories now part
of the Agile process
All groups communicating and coordinating
better
Everybody is getting more work done
33. The Third Way:
Culture Of Continual Experimentation And
Learning
Foster a culture that rewards:
Experimentation (taking risks) and learning from
failure
Repetition is the prerequisite to mastery
Why?
You need a culture that keeps pushing into the danger
zone
And have the habits that enable you to survive in the
danger zone
34. Break Things Early And Often
“Do painful things more frequently, so you can
make it less painful… We don‟t get pushback
from Dev, because they know it makes rollouts
smoother.”
-- Adrian Cockcroft, Architect, Netflix
37. Pattern #6: Break Things Before Production
Enforce consistency in code, environments and
configurations across the environments
Add your ASSERTs to find
misconfigurations, enforce https, etc.
Add static code analysis to automated
continuous integration and testing process
42. An Innovation Culture
“By installing a rampant innovation culture, they
now do 165 experiments in the three months of tax
season.
Our business result? Conversion rate of the
website is up 50 percent. Employee result?
Everyone loves it, because now their ideas can
make it to market.”
--Scott Cook, Intuit Founder
42
45. The Downward Spiral
Operations Sees… Dev Sees…
Fragile applications are prone to More urgent, date-driven projects
failure put into the queue
Long time required to figure out “which Even more fragile code (less
bit got flipped” secure) put into production
Detective control is a salesperson More releases have increasingly
“turbulent installs”
Too much time required to restore
service Release cycles lengthen to
amortize “cost of deployments”
Too much firefighting and unplanned
work Failing bigger deployments more
difficult to diagnose
Urgent security rework and
remediation Most senior and constrained IT
ops resources have less time to
Planned project work cannot complete fix underlying process problems
Frustrated customers leave Ever increasing backlog of work
Market share goes down that cold help the business win
Business misses Wall Street Ever increasing amount of
commitments tension between IT
Ops, Development, Design…
Business makes even larger promises
to Wall Street
46. The Three Ways: Some Patterns
First Way Second Way Third Way
Define The Wake Up Break Things Early
Work And Make Developers And Often
It Visible
Make Embed Dev Into IT Reserve 20% Of
Environments Operations Cycles For
Available Early Technical Debt
Reduction
46
52. When IT Fails: A Business Novel and
The DevOps Cookbook
Coming January 15, 2013 and Q1 2013
“The lessons in When IT Fails might just save your business if IT fails
for you. Every IT executive should share this book with their business
peers.” -James Turnbull, VP Operations, Puppet Labs and author
of “Pro Puppet”
“The greatest IT management book of our generation.” –Branden
Williams, CTO Marketing, RSA
“This book will have a profound effect on IT, just as The Goal did for
manufacturing.‟ - Jez Humble, co-author of the Jolt award-winning
book Continuous Delivery, and Principal at ThoughtWorks
Studios.
53. Our Mission: Positively Impact The Lives Of
One Million IT Workers By 2017
For these slides, the “Top 10 Things You
Need To Know About DevOps,” Rugged
DevOps resources, and updates on the
book:
Sign up at http://itrevolution.com
Email genek@realgenekim.me
Or text “[email] 74730” to
+1 (858) 598-3980
Visit:
http://www.instantcustomer.com/go/7473
0
Hinweis der Redaktion
Who are they auditing? IT operations.I love IT operatoins. Why? Because when the developers screw up, the only people who can save the day are the IT operations people. Memory leak? No problem, we’ll do hourly reboots until you figure that out.Who here is from IT operations?Bad day:Not as prepared for the audit as they thoughtSpending 30% of their time scrambling, generating presentation for auditorsOr an outage, and the developer is adamant that they didn’t make the change – they’re saying, “it must be the security guys – they’re always causing outages”Or, there’s 50 systems behind the load balancer, and six systems are acting funny – what different, and who made them differentOr every server is like a snowflake, each having their own personalityWe as Tripwire practitioners can help them make sure changes are made visible, authorized, deployed completely and accurately, find differencesCreate and enforce a culture of change management and causality
Source: Flickr: birdsandanchors
Who’s introducing variance? Well, it’s often these guys. Show me a developer who isn’t causing an outage, I’ll show you one who is on vacation.Primary measurement is deploy features quickly – get to market.I’ve worked with two of the five largest Internet companies (Google, Microsoft, Yahoo, AOL, Amazon), and I now believe that the biggest differentiator to great time to market is great operations:Bad day: We do 6 weeks of testing, but deployment still fails. Why? QA environment doesn’t match productionOr there’s a failure in testing, and no one can agree whether it’s a code failure or an environment failureOr changes are made in QA, but no one wrote them down, so they didn’t get replicated downstream in productionBelieve it or not, we as Tripwire practitioners can even help them – make sure environments are available when we need them, that they’re properly configured correctly the first time, document all the changes, replicate them downstream
[ picture of messy data center ] Ten minutes into Bill’s first day on the job, he has to deal with a payroll run failure. Tomorrow is payday, and finance just found out that while all the salaried employees are going to get paid, none of the hourly factory employees will. All their records from the factory timekeeping systems were zeroed out.Was it a SAN failure? A database failure? An application failure? Interface failure? Cabling error?
How each side Actively impedes the achievement of each other’s goals.
Tell story of Amazon, Netflix: they care about, availability, securityIt’s not a push, it’s a pull – they’re looking for our help (#1 concern: fear of disintermediation and being marginalized)
[ picture of messy data center ] Ten minutes into Bill’s first day on the job, he has to deal with a payroll run failure. Tomorrow is payday, and finance just found out that while all the salaried employees are going to get paid, none of the hourly factory employees will. All their records from the factory timekeeping systems were zeroed out.Was it a SAN failure? A database failure? An application failure? Interface failure? Cabling error?
How each side Actively impedes the achievement of each other’s goals.