DraftKings uses New Relic and AWS to enable a DevOps culture of continuous delivery. New Relic provides DraftKings with observability across their stack from customers to code to infrastructure. This allows DraftKings to rapidly deploy new features, understand performance issues, and ensure engineering teams are accountable. DraftKings leverages AWS services for infrastructure as code and microservices. The collaboration between New Relic and AWS provides DraftKings insights and dashboards to monitor applications and services in real-time, empowering faster innovation.
3. Today’s presenters
Kevin Cochran, Partner Solutions Architect, Amazon Web
Services, Inc.
Abner Germanow, Sr. Solutions Marketing, New Relic
Mark DiAntonio, Head of Product, DraftKings
4. Today’s agenda
• An overview of DevOps and AWS
• Build, deploy, and scale your DevOps practice
• DevOps at DraftKings
• Q&A/Discussion
5. Learning objectives
• Innovate and iterate new features and products faster using real-time application
monitoring and AWS monitoring via the New Relic platform.
• Establish a deep instrumentation strategy across your systems to help you ship
software that you’re confident will work in production.
• Employ a DevOps model across your organization to enable development teams
to own responsibility for infrastructure as code – from build to deploy to scale –
using AWS services.
• Create rich digital experiences and drive increased value for customers through
continuous delivery, feedback loops, and product development cycles.
7. Traditional development models are obsolete
Business is
increasingly
software-driven.
End-users expect
both continuous
improvement and
stability from
applications.
IT needs to be
able to provision
infrastructure as
rapidly as
developers
demand it.
An organization’s
pace of innovation
is largely
constrained by
their ability to
develop
applications.
8. DevOps can help
Decrease
Length of development cycles
Time to market
Deployment failures and
rollbacks
Time to recover upon failure
Operational overhead
DevOps practices enable companies to innovate at a higher velocity for customers
Increase
Business agility
Application stability
Ability to meet customer
demand
Time spent on innovation
Security
9. Infrastructure as
code
Microservices Logging and
monitoring
Continuous
integration/
continuous
delivery
DevOps on AWS
AWS provides on-demand infrastructure resources and tooling built to enable common
DevOps practices
10. Provision the server, storage, and networking
capacity you need on demand.
Deploy independently, as a single service, or a
group of services.
Make configuration changes repeatable and
standardized.
Build custom templates to provision resources in a
controlled and predictable way.
Use version control to keep track of all changes
made to your infrastructure and application stack.
Infrastructure as code
Replace traditional infrastructure provisioning and management with code-based techniques
11. Build services around the business capabilities you
require.
Scale up and down as required with virtually no
notice.
Make configuration code changes repeatable and
standardized.
API-driven model enables management of
infrastructure with language typically used in
application code.
Free developers from manually configuring operating
systems, system applications, and server software.
Microservices
Build applications as a set of small services that communicates with other services
through APIs
12. Maintain visibility and auditability of activity in your
application infrastructure.
Assess how application and infrastructure performance
impact end-user experience.
Gain insight into the root causes of problems or
unexpected changes.
Support services that must be available 24/7 as a
result of continuous integration/continuous delivery.
Create alerts based on thresholds you define.
Logging and monitoring
Capture, categorize, and analyze data and logs generated by applications and infrastructure
13. Model and visualize your own custom release
workflow.
Automate deployments of new code.
Improve developer productivity and deliver updates
faster.
Find and address bugs quicker with more frequent
and comprehensive testing.
Store anything from source code to binaries using
existing Git tools.
Continuous integration and continuous delivery
Rapidly and reliably build, test, and deploy your applications, while improving quality and
reducing time to market.
14. Benefits of DevOps on AWS
Get started
quickly
and pay as you
go
Automate
systems
operations
Scale without
infrastructure
constraints
Improve
visibility
and security
Leverage fully
managed
services
15. AWS Shared Responsibility Model
APN Partners help customers meet their part of their shared responsibility model
17. New Relic is the catalyst to adoptAWS faster
17
NEWR
NYSE – 2014
16k+
Customers
1.5B
Scale: Events & Metrics per Minute
14,000+ Disruptors2,000+ Global Enterprises
~45%
App Instances Reporting from AWS
18. AWS and New Relic
PartnershipIntegrated
• Fast and easy deployment
• Service integrations with
context, inventory, and controls
• Customer to Code to the Coolest
AWS Innovations
• Entire New Relic platform available
in AWS Marketplace, including
private contracts & SaaS contracts
• Engaged with APN Consulting
Partners
• 7,000+ shared customers
20. What you are measuring is getting more complex…
Web App: Then Web App: Now
1 application
1 database
1 data center
1 deploy/quarter
3 large servers
4 services
2 managed services
2 databases
2 AWS regions
3 deploys/day
30 to 300 containers
10 to 100 small instances
22. Wire Data
Application
Agents
Browser
Agents
Lots of data sources
Synthetic
Infrastructure
Mobile SDK Synthetic
Users
Infrastructure
Agents
Custom & Amazon
CloudWatch Metrics
StatsD
Logging /
Machine Data
Social & Phone
Calls
23. From any source, monitoring data can be organized
into threemain categories
Log Data
Metrics
Human-readable
events
Measurement
of an event
Examples
System startup output,
process output
Throughput, error rate,
request rate, request duration
New Relic provides
*via Splunk & Sumo Logic
Integrations
Traces
Relationships
between events
Application components
involved during a request with
an error
24. Requirements to moving fast with confidence
24
See ALL
your stuff
See what changed
and why it changed
Make sure the
team agrees
Observability Adaptability Alignment
25. Observability:Can you see(and alert on) what you have?
25
One of the reasons we use New Relic is the visibility it gives us across our digital stack.
That has helped us pinpoint where the errors are happening or what service is down,
and reliability has improved significantly.
Todd Wilson, Director of Platform Engineering
At every layer and between entities: From your customer, to code, to containers
Infrastructure
Application services
Customer experience
26. You need to INSTRUMENT EVERYTHING fast
Fast,easy,complete
26
7 Programming
Languages
AWS, On-Host,
SDK Integrations
27. Adaptability: understand the context of change
28
Microservice
Continuous
Delivery
Containers
Serverless
Monolith
Waterfall
On-
Premises
Cloud
Virtual
Machines
Server
Deliver new value continuously for every organizational and technology shift
We reduced delivery times from four to six weeks to 15 to 20 minutes.
Mark Kelly, Director of Cloud and Infrastructure Services Architecture
28. What can New Relic do with this data?
“I need to understand my
application architecture
dependencies”
“I need to understand how
infrastructure impacts
application performance”
“I need to understand the
relationships between code,
performance, and errors”
Service maps Health map Transaction traces
28
29. Alignment: empowerteams
29
Take action with insight across your organization
New Relic data enabled our agile teams to move faster and more confidently
with measurable results.
Kevin Evans, VP of DevOps and Cloud Services
30. AmazonCloudWatch
30
Monitors
• Amazon EC2 instance
• Virtualization
• Hardware
• [CPU / Disk / Networking]
Doesn’t know about:
• Server OS
• Memory / Filesystem
• Processes
• Configuration
• Application
- Latency
- Error rates
Amazon EC2 Instance
Server (Virtual)
HardwareAmazon
CloudWatch
AWS
Management
Console
Browser Mobile
Server OS
Application &
Application
Microservices
31. The New Relic platform
31
Customer Experience:
• Synthetic tests
Application Analytics:
• App health
• App performance
• Microservice Dependencies
InstanceAnalytics:
• How O.S. is performing
• Configuration Changes
• Files & Packages
• Processes
Doesn’t know
• Virtualization
Server (Virtual)
Hardware
DASHBOARDS
New Relic
Application
Monitoring
New Relic
Infrastructure
Monitoring
Amazon EC2 Instance
Browser Mobile
Server OS
Amazon
CloudWatch
AWS
Management
Console
Application &
Application
Microservices
32. Amazon EC2 Instance
Server (Virtual)
Hardware
Server OS
Application &
Application
Microservices
DASHBOARDS
New Relic +AmazonCloudWatch
New Relic
Application
Monitoring
New Relic
Infrastructure
Monitoring
Amazon
CloudWatch
New Relic
Monitors
CloudWatch
monitors
Integrations
AmazonCloudWatch
integrations
• Visibility into virtualization
• CPU / Disk / Networking
• Popular AWS Services
New Relic
• CPU / Disk / Networking
• Memory / Filesystem
• Processes
- Infrastructure components
- Configuration inventory
• Application / Microservices:
- Latency
- Error rates
- App insights
AWS
Management
Console
Browser Mobile
33. AmazonCloudWatch: metrics and logging vs. New Relic
33
Logging & metrics
platforms
New Relic today New RelicTOMORROW
Hey, a bunch of T2 Micros are
running close to 100% CPU
utilization.
Why did that happen?
No idea.
Hey, a bunch of T2 Micros are
running close to 99.99% CPU
utilization.
Why did that happen?
It looks like a new
Node.js app is scaling
new instances.
Based on the historical
behaviors, your Node.js app is
deployed on an inappropriate
instance size for its resource
needs.
Recommended Instance
size: M4
What is the right
frequency & cost for us
to poll AWS APIs?
34. DevOps measurement basics
34
See ALL
your stuff
See what changed
and why it changed
Make sure the
team agrees
Observability Adaptability Alignment
35. Summary: New Relic’s cloud platform enables:
• Observability: Instrumentation of its front-end customer experience
enables DraftKings to conduct informed root-cause analysis, prevent
incidents, and maintain a low mean-time-to-resolution (MTTR) for any
issues discovered in production.
• Adaptability: Powerful intelligence enables
DraftKings to rapidly deploy new features and get real-
time performance feedback.
• Alignment: Actionable insights delivered via sophisticated dashboards
enables DraftKings to drive a DevOps culture of common
understanding, accountability, and continuous feature delivery.
36. 3636
My Story
What long, strange trip it’s been
• I have been at DraftKings ~3 of the 5 years. I joined to help lead the conversion from Monolith to Service
Oriented Architecture, tasked originally with starting the various App teams –Web, iOS, Android
• Spent ~5 years prior at WB Games, Turbine working
on MMO’s and then as part of the Digital Platform
• Lord of the Rings Online & Dungeons and Dragons
• Batman Arkham City
• Mortal Kombat
• Injustice
• Got a chance to live through 2015 when Daily Fantasy
commercials dominated over the airwaves
• More recently, heading up new initiatives in Media for
DraftKings Live
37. DraftKings is an innovative sports-
tech entertainment company forever
changing the way consumers engage
with their favorite sports, teams,
leagues, and athletes.
38. 3838
DevOps @ DraftKings
What we’re made of
• Amazon Web Services
• Our team’s own infrastructure as code
• Build
• Deploy
• Scaling
• Our teams monitor and own monitoring and alerting
• Insights dashboards
• New Relic alerts
• Other tools, includingAmazon CloudWatch
39. 3939
New Relic is helping us to drive our performance culture
How we started using New Relic
• Brought over from my experience using it atWB Games
• First used cases for New Relic Browser were really around solving for
client visibility and JS error alerting
• Brought New Relic Mobile into mobile for Android application
• Synthetics for scripted checks (difference between knowing the
platform works and the platform is accessible)
• Performance profiling part of teams objectives and key results (OKRs)
– New Relic is a key tool for us there
• Insights dashboards provides a way to get out of the aggregate and
drill down into specifics
40. 40
High Level Product Development Lifecycle
Concept
written
• Spec is usually added into the roadmap with some WAG level
estimates
Design
Phase
• UX & Design works through key items
• Tech works through high level planning
Consensus
• All relevant stakeholders (product, design, tech) give okay to proceed in
green light
Active
Development
• Work begins with demos and check-ins ~2 weeks
Rollout
• Experiment system gradually brings new feature to market
Follow-up
• Iterate in small releases; clean up experiment code
• Report on key performance indicators (KPIs)
43. 43
ReleaseToggles
• Use a base feature toggle for longer development
cycles.
• Unshipped code as inventory is a risk.
• Maintains build cadence - allows us to keep quality
and standards around feature branches & PRs.
• Continue to ”ship” without the risk of some merge
conflicts or anyone ever seeing that.
• Transitions cleanly to internal beta group via
permission toggle.
44. 44
ExperimentToggles
• Used for A/B testing
• Sends user down one code path vs. the other based
on segment/cohort assigned
• Toggle can last hours, days, or even a week based on
statistical significance
• Need to maintain the code for each path being
experimented
• Could be simple – button placement or color
• Could be complex – updated UI on user generated
contest screen
45. 4545
Beta Group vs Canary Cohort
Permissions Toggles, Point of 1st Contact
Beta Group
• “Champagne Brunch” – internal users
drinking their own champagne
• ExternalVIPs get early access
• Very targeted audience
• Capable of qualitative feedback -
solicit reviews from individuals!
• Manually intensive
Canary Cohort
• Randomly assigned segment/cohort
• Some ~5% of traffic based on rate
towards statistical significance
• Monitor and report back on key KPIs
• Automated
46. 46
OpsToggles
• Client circuit breaker
• The good old fashioned kill switch
• Make sure your client is resilient! What
happens if you shut a feature off?
• Take down intensive, less critical
features while keeping things online
47. 4747
Per-Request Overrides
Allow a toggle's on/off state to be overridden on a per-request basis
Priority Source Description
1 URL Param Check query string for override
2 Local Storage Checks here if toggle versions been set
3 Higgs DK Experiment System Lets toggle know what experiment to factor in
4 Release Flag System A structured config file application specific
feature toggles
5 Default Value
48. 48
Carrying Cost & Clean Up
• Keeping inventory low means that you need to actively manage it
• Add a story to the backlog to clean up toggles
• Add dates to toggles so you don’t need to hunt down how old it is
• A soft warnings for old toggles
• Fail builds when toggles exceed an agreed on age
• Apply a limit to the number of toggles in place
• Don’t allow team to exceed number with out exception to ensure
cleanup
• Adding a toggle means removing an older one
49. 49
Product Launch - Example
Note: In both flows we default all users to the registration screen
and they need to switch
We launched this new page and monitored KPIs:
attempt rate, success rate, and error rate
Old Page New Page
50. 50
Product Launch Strategy
Use Experiment System
Experiment system designed to make testing easier
Can test things from small UX/copy changes to full product changes
Allows us to have multiple cohorts, control group, holdout group
General Launch Strategy
Determine KPIs and goals of the product (increase metric, do no harm, etc.)
If controlled launch required, develop launch timeline for A/B test
Monitor KPIs and identify anomalies
If necessary, work with UX and stakeholders to make product decision to fix issue
Re-release, re-test as needed
51. 51
0%
10%
20%
30%
40%
50%
1416182022 0 2 4 6 8 10121416182022
2/8/2017 2/9/2017Hour | Date
Cumulative Login
Attempt Rate
New Page
A/B testing showed us something
was off
By mid-day we had found the
source of the issue
Only affected attempt rate
Success rate, re-try rate, etc.
in line
Must be something causing users
to bounce from page
About 500 users over 2 day
stretch
24.5%
26.8%
Product Launch - Example
Statistically
Significant
Difference
after 6 Hours
Hit the screen and attempt to login within 5
minutes
52. 52
Product Launch - Example
Old Page New Page
Hypothesis:
Removing the login method from header was causing users to become confused and bounce
Note: In both flows we default all users to the registration screen and
they need to switch
53. 53
Product Launch - Example
Note: In both flows we default all users to the registration
screen and they need to switch
Old Page New Page
So we added that option
54. 54
0%
10%
20%
30%
40%
50%
15 17 19 21 23 1 3 5 7 9 11 13
2/15/2017 2/16/2017Hour | Date
Cumulative Login Attempt
Rate
New Page
Received a quick turn around from
UX and engineering
Resulting change lowered the
attempt gap from 92% of old page
to ~98% of old page
Same principles applied to all
feature changes
Set and monitor KPIs
Identify differences in product
from KPIs
Iterate until success
28.3%
27.7%
Product Launch - Example
Never Statistically
Significant
Difference
Hit the screen and attempt to login within 5
minutes
Re-Launch
55. 55
Summary
• Ensure your life cycle empowers teams and reinforces autonomy
• Do it safely – spend time understanding how you intend to release
• Layer toggles in with experiments
• Drive experiments from a central system out to clients
• Performance culture demands a COMMON UNDERSTANDING of the metrics
needed to:
• Drive consensus
• Execute
• Improve via iteration
• Don’t Fly Blind!
• Need for Metrics = GreatTools
• New Relic provides these
• If you didn’t try out DraftKings last week (like millions of people did) – try it!
56. Summary: New Relic on AWS
Features
• See across digital application stack, infrastructure, and
customer experience.
• Capture real-time insight and detailed performance metrics
from deployments to configuration changes.
• Share metrics with teams – as the same point of reference.
• Tight integration with 31 AWS services.
Results
• Innovate and iterate new features and products faster.
• Establish a deep instrumentation strategy across your systems.
• Employ a DevOps model across your organization.
• Create rich digital experiences.
58. Try New Relic on AWS
• https://aws.amazon.com/marketplace/pp/B07774G9RB
Learn more about New Relic
• https://newrelic.com/partner/aws-monitoring
Learn more about DevOps on AWS
• https://aws.amazon.com/devops/
Try AWS:
• https://aws.amazon.com/
Next steps and further information: