5. Back in 2011 we started simple
System overview:
• Started on AWS
• PHP frontend and Java backend applications
• Built and supported by a small team: 3-4 backend engineers
MySQL
6. And then we started expanding rapidly
• City-specific environments
• Branching the code base
• Manually building infrastructure and configuration
MySQL
PHP Java
MySQL
PHP Java
MySQL
PHP Java
7. As we grew, getting features out became a challenge
We quickly found out that extending monoliths is hard:
• Hard to maintain the codebase
• Any new feature took weeks to deliver
• Hard to scale the dev teams
Failure to deliver business value
Performance
8. Operating a monolith in the cloud got even harder
A lot of development and ops time wasted in firefighting:
• Lack of automation
• Multiple SPOFs
• No proper monitoring
MySQLMonolith Monolith
• Unclear responsibilities
• No well defined escalation or ownership
process – many non actionable alerts
transformed into “all hands on deck” actions
10. We wanted to build a global platform
• Everything had to be automated – any workflow, any action
• Everything had to be resilient and self-healing – regardless of the failure
source: infrastructure, network or code
• Each service had to be responsible for one thing and one thing only
Looked at what we did wrong and redesigned it:
Key challenges:
• We decided to move from PHP & Java to Go
• We had to build everything from scratch but move all our production traffic
without any downtime
• We had to change our culture as we go
11. eu-west-1
Message Bus+
API Gateway
C*
us-east-1
API Gateway
C*
Message Bus+
Go
Service
Go
Service
Go
Service
Go
Service
Go
Service
Go
Service
Go
Service
Go
Service
Go
Service
Go
Service
Go
Service
Go
Service
12. We started with the building blocks
Logic
Storage
Library for abstracting
service-to-service
comms
service-layer
Handler platform-layer
Self-configuring
external service
adapters
A service under the hood:
• Service to service communication
libs
• Discovery
• Configuration
• A/B testing capabilities
• Monitoring & Instrumentation
• … and much more
Any service gets for free:
13. In preparation for the migration
Introducing a smart API Gateway made our life easy:
• Let us do a transparent, seamless migration from user perspective
• Gave us a lot of flexibility about how we route our traffic
• Enabled us to build a lot of failover capabilities
API Gateway Monolith
14. And then we started breaking down our monoliths
• We aimed to get production traffic on the new platform as quickly as possible
• We identified the low hanging fruit first and rewrote them
• We kept iterating on our platform and building more tools as we needed
API Gateway Decouple
15. At present we have
• Microservices ecosystem (99.9% written in Go)
• Designed specifically for the cloud – different building blocks and
components will constantly be in flux, broken or unavailable
• 1000+ AWS instances spanning multiple regions
• 200+ services in production
16. The Platform
Troll a platform by Swinsto101 / CC BY-SA 3.0 /
Desaturated from original
18. • Lowest level building blocks
• We mostly use basic PaaS components and services as they cover most of our
needs
• We expect every underlying component to fail and we designed for this
TeVPC
Auto Scaling
S3
CFEC2
Route 53
Redshift
Cloud Provider
20. • We use auto scaling groups for everything
Guarantees each component can be rebuilt automatically
Including our database clusters that run on ephemeral storage ( we do keep
6 copies of each piece of data in 2 regions )
• Minimum of 3 AZs in every region
• Every workflow is automated
• Every component has to be self-healing and scalable
Core principles
21. • Our “cloud provider abstraction” layer
• Main purpose is infrastructure and workflow automation and discovery
• Has a global view of everything happening across our infrastructure
• Provides additional capabilities on top of AWS
• The only services directly aware of our cloud provider specifics – gives us a lot of
flexibility and let us introduce changes quickly
OrchestrationEnv DNS
Release AutoScalingComputeEIP
Whisper
22. Everything in our platform emits events
So naturally we want to capture all external events as well!
23. Whisper Service
It’s all about event driven compute – think Lambda but within our platform
Events
Events
Hundreds of publishers & subscribers
NSQ Topics
Events
External
sources
Actions
To subscribe to any new event source
we have to only change a single service
24. Provides the most essential platform functions for every service:
• Service Discovery
• Service Provisioning
• Routing & Load Balancing
• Authentication/Authorization
• Monitoring
• Configuration
Discovery
Monitoring
Routing
Core
Platform
Provisioning
Login
Config
25. • Self-contained units of execution
• Built around business capabilities or domain objects
• Small enough to be rewritten in a few days
• Independently scalable
• They are all about adding business value
Services
26. All good but did we make our development any faster?
27. Up and running in seconds
Vetted & Tested & Built ~ 100 – 140 sec
Setup ~ 1 sec
Trigger a build ~ 1 sec
39. Key Learnings
• Automate everything – it enabled us to do more with less
• Identify your KPIs and track them
• Invest in tooling: the complexity in a microservices architecture is
not in your application code anymore - it is in the thousands of
service interactions!
• Empowering our engineers increases your velocity tremendously!
• Moving to microservices is a journey – make sure you take everyone
onboard!