This document discusses designing cloud services to gracefully degrade under heavy loads.
It proposes using asynchronous architectures and event-driven programming to implement scalable cloud services. This allows requests to be serviced concurrently without blocking workers. Frameworks like gevent make asynchronous programming easy using greenlets.
The document presents an architecture that uses load balancers, authentication, throttling, and concurrency management layers to queue requests when backend resources are overloaded. This allows requests to be delayed instead of failed to avoid service failures.
8. Incoming Requests
Load
Balancer Worker
Pool
AAA
...
AAA AAA
Throttling Throttling Throttling W
W
App App App W App
W
Server Server Server Server
W
W
W W
Throttling Throttling Throttling
10. Problem Summary
• Cloud services often use worker
pools to handle incoming requests
• When load goes beyond size of the
worker pool, requests fail
11. What next?
A few observations based on work
implementing and scaling the Twilio API
over the past 4 years...
• Twilio Voice/SMS Cloud APIs
• 100,000 Twilio Developers
• 100+ employees
12. Observation 1
For many APIs, taking more time to
service a request is better than failing that
request
Implication: in many cases, it is better
to service a request with some delay
rather than failing it
13. Observation 2
Matching the amount of available
resources precisely to the size of incoming
request worker pools is challenging
Implication: under load, it may be
possible delay or drop only those
requests that truly impact resources
14. What are we going to do?
Suggestion: if request concurrency was
very cheap, we could implement delay
and finer-grained resource controls much
more easily...
25. Enter gevent
“gevent is a coroutine-based Python
networking library that uses greenlet
to provide a high-level synchronous API
on top of the libevent event loop.”
Natively Async
socket.write()
resp = socket.read()
print resp
26. Enter gevent
Simple Echo Server
from gevent.server import StreamServer
def echo(socket, address):
print ('New connection from %s:%s' % address)
socket.sendall('Welcome to the echo server!rn')
line = fileobj.readline()
fileobj.write(line)
fileobj.flush()
print ("echoed %r" % line)
if __name__ == '__main__':
server = StreamServer(('0.0.0.0', 6000), echo)
server.serve_forever()
Easy sequential model
Fully async
27. Async Services with Ginkgo
Ginkgo is a simple framework for composing
async gevent services with common
configuration, logging, demonizing etc.
https://github.com/progrium/ginkgo
Let’s look a simple example
that implements a TCP and
HTTP server...
28. Async Services with Ginkgo
import gevent
from gevent.pywsgi import WSGIServer
from gevent.server import StreamServer
from ginkgo.core import Service
def handle_http(env, start_response):
start_response('200 OK', [('Content-Type', 'text/html')])
print 'new http request!'
return ["hello world"]
def handle_tcp(socket, address):
print 'new tcp connection!'
while True:
socket.send('hellon')
gevent.sleep(1)
app = Service()
app.add_service(StreamServer(('127.0.0.1', 1234),
handle_tcp))
app.add_service(WSGIServer(('127.0.0.1', 8080), handle_http))
app.serve_forever()
29. Async Services with Ginkgo
import gevent
from gevent.pywsgi import WSGIServer
from gevent.server import StreamServer
Import WSGI/TCP
Servers
from ginkgo.core import Service
def handle_http(env, start_response):
start_response('200 OK', [('Content-Type', 'text/html')])
print 'new http request!'
return ["hello world"]
def handle_tcp(socket, address):
print 'new tcp connection!'
while True:
socket.send('hellon')
gevent.sleep(1)
app = Service()
app.add_service(StreamServer(('127.0.0.1', 1234),
handle_tcp))
app.add_service(WSGIServer(('127.0.0.1', 8080), handle_http))
app.serve_forever()
30. Async Services with Ginkgo
import gevent
from gevent.pywsgi import WSGIServer
from gevent.server import StreamServer
from ginkgo.core import Service
HTTP Handler
def handle_http(env, start_response):
start_response('200 OK', [('Content-Type', 'text/html')])
print 'new http request!'
return ["hello world"]
def handle_tcp(socket, address):
print 'new tcp connection!'
while True:
socket.send('hellon')
gevent.sleep(1)
app = Service()
app.add_service(StreamServer(('127.0.0.1', 1234),
handle_tcp))
app.add_service(WSGIServer(('127.0.0.1', 8080), handle_http))
app.serve_forever()
31. Async Services with Ginkgo
import gevent
from gevent.pywsgi import WSGIServer
from gevent.server import StreamServer
from ginkgo.core import Service
def handle_http(env, start_response):
start_response('200 OK', [('Content-Type', 'text/html')])
print 'new http request!'
return ["hello world"]
def handle_tcp(socket, address): TCP Handler
print 'new tcp connection!'
while True:
socket.send('hellon')
gevent.sleep(1)
app = Service()
app.add_service(StreamServer(('127.0.0.1', 1234),
handle_tcp))
app.add_service(WSGIServer(('127.0.0.1', 8080), handle_http))
app.serve_forever()
32. Async Services with Ginkgo
import gevent
from gevent.pywsgi import WSGIServer
from gevent.server import StreamServer
from ginkgo.core import Service
def handle_http(env, start_response):
start_response('200 OK', [('Content-Type', 'text/html')])
print 'new http request!'
return ["hello world"]
def handle_tcp(socket, address):
print 'new tcp connection!'
while True:
socket.send('hellon') Service
gevent.sleep(1)
Composition
app = Service()
app.add_service(StreamServer(('127.0.0.1', 1234),
handle_tcp))
app.add_service(WSGIServer(('127.0.0.1', 8080), handle_http))
app.serve_forever()
33. Incoming Requests
Load
Balancer
Async
Server
Async
Server
... Async
Server
Using our async reactor-based
approach let’s redesign our serving
infrastructure
34. Incoming Requests
Load
Balancer
AAA
...
AAA AAA
Async Async Async
Server Server Server
Step 1: define an authentication and
authorization layer that will identify
the user and the resource being
requested
35. Incoming Requests
Load
Balancer
AAA
...
AAA AAA
Throttling Throttling Throttling
Async Async Async Concurrency
Server Server Server Manager
Step 2: add a throttling layer and
concurrency manager
36. Concurrency
Admission Control
• Goal: limit concurrency by delaying
or selectively failing requests
• Common metrics
- By Account
- By Resource Type
- By Availability of Dependent Resources
• What we’ve found useful
- By (Account, Resource Type)
37. Delay - delay responses without failing
requests
Load
Latency
38. Deny - deny requests based on resource
usage
Load
Latency /x Fail
Latency /*
39. Incoming Requests
Load
Balancer
AAA
...
AAA AAA
Throttling Throttling Throttling
App App App Concurrency
Server Server Server Manager
Throttling Throttling Throttling
Step 3: allow backend resources to
throttle requests
Dependent
Services
40. Summary
Async frameworks like gevent allow you
to easily decouple a request from access
to constrained resources
Service-wide
Failure
Request
Latency
Time