Establishing an appropriate cloud operating model is critical to forming your organization’s successful adoption of cloud, and delivering greater business agility, increasing the cloud migration Return on Investment, and deliver a more secure, performant, reliable, and cost effective cloud computing environment. The impact of the cloud will be felt across your entire organization, including processes and people - not just Information technology. It will significantly affect, and be affected by, your organizational culture and Information technology delivery structures. This session will provide prescriptive guidance regarding the best approaches to evolving an operating model from projects to products, manual, process intensive governance to a ‘trust but verify’ model, long development cycles to continuous integration and deployment, silos between business and IT into a collaborative organizational structure, self-service processes, and continuous improvement. The recommendations in the presentation are based upon lesson learned, best practices, and anti-patterns from thousands of customer’s cloud transformation journeys.
3. What is a Cloud Operating Model (COM)?
• Describes the relationship between the consumers and
suppliers of Cloud Services within an Enterprise
• The value a customer obtains from AWS is directly linked to
the effectiveness of the COM
• The primary purpose of the COM is to meet the needs of the
consumers
• Remember, very few customers designed their existing
operating model to be the way it is…
5. The Business and Technology Operations are not typically aligned with respect to their goals. By moving to a Cloud
Operations model the Business and Operations can be aligned.
Business Issue Traditional Operations Operating Model Approach for the Future State Operating Model
It take too long to get
new services
Enable self-service creation and management of application
environments (infrastructure) while ensuring security and
operational requirements are met, with a 3+ week lead time.
Fully automated self service for launching a new application within
minutes / hours.
POC’s can be spun up / down quickly to address business needs.
Small incremental changes can be released anytime* for speed of
delivery and competitive business advantage
Technology updates
disrupt the business
Apply routine and critical updates to operating systems, databases,
and middleware
Applications utilize immutable infrastructure which eliminates
patching reducing risk
Too much down time
Technology focused, technology centric processes which does not
include the business.
Large applications use large pieces of infrastructure, failovers /
disaster recovery is time consuming
Business centric processes, near zero downtime while leveraging
microservices
Business focused teams
do not control the
infrastructure
Centralized management of Infrastructure resources
Central Technology teams making technology focused decisions
App DevOps teams manage their own backup and restore activities,
employ HA architectures and services (multi-AZ/multi-region),
manage storage/archiving based on standard offerings, provision,
tune, and manage their own databases (EC2, RDS, and cloud-native
databases).
Business Benefits of moving to the Cloud Operating Model
8. Operational Excellence journey
App
Ops
Transitional State
“Bimodal Operations”
OperationsEngineering
PlatformApplications
Product Management
& Engineering
Cloud
Platform
Engineering
Cloud
Platform
Engineering
& Operations
Cloud
Platform
Ops
Journey from Traditional Ops
to Bimodal Operations
Application
Operations
Sustain – Current State
“Traditional Operations”
OperationsEngineering
PlatformApplications
Application
Engineering
Cloud
Platform
Engineering
Cloud
Platform
Engineering
Cloud
Platform
Operations
ITSM Customer
I
T
S
M
9. Operational Maturity Model
Traditional
Operations
Model
Early Cloud
Operating
Model
Rationalized
Cloud Operating
Model
Optimized
Cloud Operating
Model
§ Automation for Ops is
being explored
§ Most Operational
processes documented
§ ITIL based processes
§ Infra, App, Eng & Ops
teams are in separate
organizations
§ Manual procurement of
resources
§ Automation established for
manual processes
§ ITIL processes are being
optimized & automated
§ Infra & App teams being to
work towards the same goals
(Service based model)
§ Self Service Provisioning of
AWS Resources
§ FinOps Reporting
§ Runbook automation est.
as Operational best
practice drive improvement
§ Consistent measurement
of the organization to
§ Inefficiency is removed
from systems and culture
§ Self Service Provisioning
of AWS Accounts
§ Automated FinOps
§ AI Ops is established as
the primary Ops function
§ Release, Change &
Security mgmt. governed
by DevSecOps pipeline
§ Single teams owns end to
end management of the
service
§ Self Service optimized
across AWS platform
Level 1 Level 2 Level 3 Level 4
Journey from Traditional Ops to "Cloud Ops"
10. Tenets of Cloud Transformation – Changing the Mental Model
Cloud
Platform
Engineering
Implement
Gradual Change
“Insist on High Standards”
Reducing the
Organizational Silos
“Think Big, Learn & Be Curious”
Accepting Change
as Normal
“Ownership, Earn Trust”
Measure
Everything
“Deliver Results”
Leverage Tooling
& Automation
“Invent & Simplify, Frugality”
Delivering for the
Customer
“Customer Obsession, Working
Backwards”
Journey from Traditional Operations to Bimodal Operations
“Hire and Develop the Best”
12. What caused move to COM to accelerate
2 CCoE
§ Is an enabler and not a blocker
§ Experiment More, Learn More and deliver practical tools and
results
§ Communities of practice participation
§ Certification rewarded
§ Professional Level certification
§ Bringing in AWS SMEs and partners that have participated in
successful transformations
1 Skills
3 Organizational Change
§ Placing change agents and cloud advocates in key roles
§ Changing KPIs of CTO, Head of Infrastructure, CISO, Head of
Operations – data driven decisions
§ Removing people blockers from the core team
§ Change in hiring practices (culture)
15. Guiding Principles
• Don’t boil the ocean
• Everything Fails all the time
• Centralized monitoring & management is fundamental
• Automation is not a phase
• Continually look for horizontal concepts
• - Implement “Elasticity”
• Stay true to SOA principals
• Leverage different storage options
• Leverage different database options
• Autonomy is fundamental
to long-term success
• Innovation and Agility is a
fundamental goal
16. Application Resiliency ‘white paper’
• Failure detection - Canary testing and synthetic transactions
• Operations Automation - Central repository
• Continuous Improvement
• Root Cause Analysis
• Validation – Game Days and Chaos Engineering
• Platform – Well-architected
18. Lesson Learned
• The CCoE is the governance body
• Select a north star and execute design to deployment phases – uncover and
remediate blockers and unknowns
• Establish Cloud Platform Engineering to centralize Cloud security, engineering and
operations
• From cross-functional DevSecOps teams early; removed external dependencies /
consolidating full phase deliveries into single full stack teams
• Implement automated controls early; push continuous improvements as cloud
adoption scales
• Push financials down to account owners for financial ownership and accountability
• Encourage controlled risk taking – fail fast and learn from failures. Provide a safety
net for failures.
• Require cloud training – build knowledge and hand on experience with cloud
• Leverage collective experience from AWS, internal SMEs, and other customers
19. Best Practices
• Implement FinOps early
• Implement clear cross-enterprise service prioritization (service catalog) and security
review process – communicate
• Automate cloud security governance through detective controls and proactive
monitoring
• Access key on-premises tools for cloud visibility – replace with cloud native tools as
needed
• Focus on integrating development and operations – empower teams through
decision making with clear accountability for these decisions
• Highlight cloud struggles/failures demonstrating anti-patterns
• Reward sharing/collaboration
• Get CI/CD and infrastructure as code right on one cloud first
21. Positive Patterns – Actions to take
• Cloud Steering Committee active participation
• Product (aka program) Management Office (PMO)
• Pragmatic, practical governance (aka guardrails)
• Mandate Well Architected Reviews (WARs)
• Practice Operations (aka GameDays)
• Community of practice (aka guilds)
22. Anti-Patterns – Actions to avoid
• Under-estimate the power of middle management
• Organizational and culture change
• Mismatched guardrails
• Lack of or incomplete upskilling strategy
• Staffing the CEE with Enterprise Architects
• Keep the same processes and tools
24. One year roadmap
Alignment
• Identify the executive
sponsors
• Document why need for
Cloud Operating Model
• Determine if cloud first or
cloud optional
organization
• Communicate the plan
• Leadership embraces
iterative change,
expeditious decision
making, blameless
culture, failure
• Discovery and review of
operations processes
• Understand and define
roles and
responsibilities for cloud
and on-premises
• Define Operational
Readiness processes
• Automate where
possible, make self-
service when
automation not possible
• Create automation
pipeline (aka CI/CD
toolchain and process)
Operations People Transformation
• Codify what the organization
should look like in 2 years –
PR/FAQ
• Assess and improve upskilling
initiatives
• Ensure rewards,
compensation, promotions
and hiring are based upon the
new culture
• Continue to measure, refine,
and improve across products,
people, processes, and
organizational change
• Utilize the expertise of AWS
and partners
First Month 6+ monthsMonths 2-3
Product Thinking
• Identity rebels that will
define, execute, and
continuously improve
org and product change
• Define and document
product thinking
• Identify pilot
applications to
transformation to
products – including
AWS
• All new applications
should utilize product
‘mindset’ and
processes
Months 4-5