3. Netflix Member Web Site Home Page
Personalization Driven – How Does It Work?
4. How Netflix Streaming Works
Consumer
Electronics User Data
Web Site or
AWS Cloud
Discovery API
Services
Personalization
CDN Edge
Locations
DRM
Customer Device
Streaming API
(PC, PS3, TV…)
QoS Logging
CDN
Management and
Steering
OpenConnect
CDN Boxes
Content Encoding
7. Real Web Server Dependencies Flow
(Netflix Home page business transaction as seen by AppDynamics)
Each icon is
three to a few
hundred
instances
across three Cassandra
AWS zones
memcached
Web service
Start Here
S3 bucket
Three Personalization movie group
choosers (for US, Canada and Latam)
8. Cloud Native Architecture
Clients Things
Autoscaled Micro JVM JVM JVM
Services
Autoscaled Micro JVM JVM Memcached
Services
Distributed Quorum Cassandra Cassandra Cassandra
NoSQL Datastores
Zone A Zone B Zone C
10. New Anti-Fragile Patterns
Micro-services
Chaos engines
Highly available systems composed
from ephemeral components
11. Stateless Micro-Service Architecture
Linux Base AMI (CentOS or Ubuntu)
Optional
Apache
frontend,
Java (JDK 6 or 7)
memcached,
non-java apps
AppDynamics
Monitoring
appagent
monitoring
Tomcat
Log rotation Application war file, base Healthcheck, status
to S3 GC and thread servlet, platform, client servlets, JMX interface,
AppDynamics dump logging interface jars, Astyanax Servo autoscale
machineagent
Epic/Atlas
12. Cassandra Instance Architecture
Linux Base AMI (CentOS or Ubuntu)
Tomcat and
Priam on JDK Java (JDK 7)
Healthcheck,
Status
AppDynamics
appagent
monitoring
Cassandra Server
Monitoring
AppDynamics Local Ephemeral Disk Space – 2TB of SSD or 1.6TB disk
GC and thread holding Commit log and SSTables
machineagent dump logging
Epic/Atlas
14. Edda – Configuration History
http://techblog.netflix.com/2012/11/edda-learn-stories-of-your-cloud.html
Eureka
Services
metadata
AWS
AppDynamics
Instances,
Request flow
ASGs, etc.
Edda Monkeys
15. Edda Query Examples
Find any instances that have ever had a specific public IP address
$ curl "http://edda/api/v2/view/instances;publicIpAddress=1.2.3.4;_since=0"
["i-0123456789","i-012345678a","i-012345678b”]
Show the most recent change to a security group
$ curl "http://edda/api/v2/aws/securityGroups/sg-0123456789;_diff;_all;_limit=2"
--- /api/v2/aws.securityGroups/sg-0123456789;_pp;_at=1351040779810
+++ /api/v2/aws.securityGroups/sg-0123456789;_pp;_at=1351044093504
@@ -1,33 +1,33 @@
{
…
"ipRanges" : [
"10.10.1.1/32",
"10.10.1.2/32",
+ "10.10.1.3/32",
- "10.10.1.4/32"
…
}
16. Cloud Native
Master copies of data are cloud resident
Everything is dynamically provisioned
All services are ephemeral
19. Cloud Deployment Scalability
New Autoscaled AMI – zero to 500 instances from 21:38:52 - 21:46:32, 7m40s
Scaled up and down over a few days, total 2176 instance launches, m2.2xlarge (4 core 34GB)
Min. 1st Qu. Median Mean 3rd Qu. Max.
41.0 104.2 149.0 171.8 215.8 562.0
20. Ephemeral Instances
• Largest services are autoscaled
• Average lifetime of an instance is 36 hours
P
u
s
h
Autoscale Up
Autoscale Down
21. Leveraging Public Scale
1,000 Instances 100,000 Instances
Grey
Public Private
Area
Startups Netflix Google
22. How big is Public?
AWS Maximum Possible Instance Count 3.7 Million
Growth >10x in Three Years, >2x Per Annum
AWS upper bound estimate based on the number of public IP Addresses
Every provisioned instance gets a public IP by default
23. Availability
Is it running yet?
How many places is it running in?
How far apart are those places?
26. Outages
• Running very fast with scissors
– Mostly self inflicted – bugs, mistakes
– Some caused by AWS bugs and mistakes
• Next step is multi-region
– Investigating and building in stages during 2013
– Could have prevented some of our 2012 outages
27. Managing Multi-Region Availability
AWS DynECT
Route53 UltraDNS DNS
Regional Load Balancers Regional Load Balancers
Zone A Zone B Zone C Zone A Zone B Zone C
Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas
What we need is a portable way to manage multiple DNS providers….
28. Denominator
Software Defined DNS for Java
Edda, Multi-
Use Cases Region
Failover
Common Model Denominator
DNS Vendor Plug-in AWS Route53 DynECT UltraDNS Etc…
API Models (varied IAM Key Auth User/pwd User/pwd
and mostly broken) REST REST SOAP
Currently being built by Adrian Cole (the jClouds guy, he works for Netflix now…)
31. Three Questions
Why is Netflix doing this?
How does it all fit together?
What is coming next?
32. Beware of Geeks Bearing Gifts: Strategies for an
Increasingly Open Economy
Simon Wardley - Researcher at the Leading Edge Forum
33. How did Netflix get ahead?
Netflix Business + Developer Org Traditional IT Operations
• Doing it right now • Taking their time
• SaaS Applications • Pilot private cloud projects
• PaaS for agility • Beta quality installations
• Public IaaS for AWS features • Small scale
• Big data in the cloud • Integrating several vendors
• Integrating many APIs • Paying big $ for software
• FOSS from github • Paying big $ for consulting
• Renting hardware for 1hr • Buying hardware for 3yrs
• Coding in Java/Groovy/Scala • Hacking at scripts
34. Netflix Platform Evolution
2009-2010 2011-2012 2013-2014
Bleeding Edge Common Shared
Innovation Pattern Pattern
Netflix ended up several years ahead of the
industry, but it’s not a sustainable position
35. Making it easy to follow
Exploring the wild west each time vs. laying down a shared route
36. Establish our Hire, Retain and
solutions as Best Engage Top
Practices / Standards Engineers
Goals
Build up Netflix Benefit from a
Technology Brand shared ecosystem
43. What’s Coming Next?
Better portability
Higher availability
More
Features Easier to deploy
Contributions from end users
Contributions from vendors
More Use Cases
44. Vendor Driven Portability
Interest in using NetflixOSS for Enterprise Private Clouds
“It’s done when it runs Asgard”
Functionally complete
Demonstrated March
Release 3.3 in 2Q13
Some vendor interest
Some vendor interest Many missing features
Needs AWS compatible Autoscaler Bait and switch AWS API strategy
53. Judges
Aino Corry
Martin Fowler
Program Chair for Qcon/GOTO Simon Wardley Chief Scientist Thoughtworks
Strategist
Werner Vogels Yury Izrailevsky
CTO Amazon Joe Weinman VP Cloud Netflix
SVP Telx, Author “Cloudonomics”
54. What are Judges Looking For?
Eligible, Apache 2.0 licensed
Original and useful contribution to NetflixOSS
Code that successfully builds and passes a test suite
A large number of watchers, stars and forks on github
NetflixOSS project pull requests
Good code quality and structure
Documentation on how to build and run it
Evidence that code is in use by other projects, or is running in production
55. What do you win?
One winner in each of the 10 categories
Ticket and expenses to attend AWS
Re:Invent 2013 in Las Vegas
A Trophy
56. How do you enter?
Get a (free) github account
Fork github.com/netflix/cloud-prize
Send us your email address
Describe and build your entry
Twitter #cloudprize
57. Award
Apache
Registration Close Entries AWS Ceremony
Github Opens Today
Github Licensed Github September 15 Dinner
Contributions Re:Invent
November
Judges Winners
$10K cash
$5K AWS
Netflix
Nominations Categories
Ten Prize Engineering
Categories
AWS
Trophy Re:Invent Conforms to Working Community
Tickets Entrants Rules Code Traction
58. Functionality and scale now, portability coming
Moving from parts to a platform in 2013
Netflix is fostering an ecosystem
Rapid Evolution - Low MTBIAMSH
(Mean Time Between Idea And Making Stuff Happen)
59. Takeaway
Netflix is making it easy for everyone to adopt Cloud Native patterns.
Open Source is not just the default, it’s a strategic weapon.
http://netflix.github.com
http://techblog.netflix.com
http://slideshare.net/Netflix
http://www.linkedin.com/in/adriancockcroft
@adrianco #netflixcloud @NetflixOSS
Hinweis der Redaktion
When Netflix first moved to cloud it was bleeding edge innovation, we figured stuff out and made stuff up from first principles. Over the last two years more large companies have moved to cloud, and the principles, practices and patterns have become better understood and adopted. At this point there is intense interest in how Netflix runs in the cloud, and several forward looking organizations adopting our architectures and starting to use some of the code we have shared. Over the coming years, we want to make it easier for people to share the patterns we use.
The railroad made it possible for California to be developed quickly, by creating an easy to follow path we can create a much bigger ecosystem around the Netflix platform