Implementing Scalable Cloud Services with Asynchronous Architectures

Asynchronous
Architectures for
Implementing
Scalable Cloud
Services
Designing for Graceful
Degradation

EVAN COOKE
CO-FOUNDER & CTO twilio
CLOUD COMMUNICATIONS

Cloud services power the apps that are
the backbone of modern society. How
we work, play, and communicate.

Cloud Workloads
Can Be
Unpredictable

SMS API Usage

6x spike in 5 mins

Danger!
Load higher than
instantaneous throughput

Load

FAIL
Request
Latency

Time

Incoming Requests
Load
Balancer Worker
Pool
AAA

...
AAA AAA
Throttling Throttling Throttling W
W
App App App W App
W
Server Server Server Server
W
W
W W
Throttling Throttling Throttling

Worker Pools
e.g., Apache/Nginx
Failed
Requests

100%+
70%

10%

Time

Problem Summary

• Cloud services often use worker
pools to handle incoming requests
• When load goes beyond size of the
worker pool, requests fail

What next?
A few observations based on work
implementing and scaling the Twilio API
over the past 4 years...

• Twilio Voice/SMS Cloud APIs
• 100,000 Twilio Developers
• 100+ employees

Observation 1

For many APIs, taking more time to
service a request is better than failing that
request
Implication: in many cases, it is better
to service a request with some delay
rather than failing it

Observation 2

Matching the amount of available
resources precisely to the size of incoming
request worker pools is challenging
Implication: under load, it may be
possible delay or drop only those
requests that truly impact resources

What are we going to do?

Suggestion: if request concurrency was
very cheap, we could implement delay
and ﬁner-grained resource controls much
more easily...

Event-driven programming
and the Reactor Pattern

Worker Time
req = ‘GET /’; 1
req.append(‘/r/n/r/n’); 1
socket.write(req); 10000x
resp = socket.read(); 10000000x
print(resp); 10

Time
req = ‘GET /’; 1
socket.write(req); 10000x
resp = socket.read(); 10000000x
print(resp); 10
Huge IO latency
blocks worker


req = ‘GET /’;
req.append(‘/r/n/r/n’);
socket.write(req, fn() { Make IO
socket.read(fn(resp) { operations
print(resp); async and
}); “callback”
}); when done


req = ‘GET /’;
req.append(‘/r/n/r/n’);
socket.write(req, fn() {
socket.read(fn(resp) {
print(resp);
}); Central dispatch
}); to coordinate
reactor.run_forever(); event callbacks

Time
req = ‘GET /’; 1
socket.write(req, fn() { 10
socket.read(fn(resp) { 10
print(resp); 10
});
Result: we
});
don’t block
reactor.run_forever(); the worker

(Some)
Reactor Pattern Frameworks
c/libevent
c/libev
java/nio/netty
js/node.js
ruby/eventmachine
python/twisted
python/gevent

The Callback Mess
Python Twisted
req = ‘GET /’
req += ‘/r/n/r/n’

def r(resp):
print resp

def w():
socket.read().addCallback(r)

socket.write().addCallback(w)

The Callback Mess
Python Twisted
req = ‘GET /’

yield socket.write()
resp = yield socket.read()
print resp
Use deferred
generators and
inline callbacks

The Callback Mess
Python Twisted
req = ‘GET /’

yield socket.write()
resp = yield socket.read()
print resp

Easy sequential
programming with
mostly implicit async IO

Enter gevent
“gevent is a coroutine-based Python
networking library that uses greenlet
to provide a high-level synchronous API
on top of the libevent event loop.”

Natively Async
socket.write()
resp = socket.read()
print resp

Enter gevent
Simple Echo Server
from gevent.server import StreamServer

def echo(socket, address):
print ('New connection from %s:%s' % address)
socket.sendall('Welcome to the echo server!rn')
line = fileobj.readline()
fileobj.write(line)
fileobj.flush()
print ("echoed %r" % line)

if __name__ == '__main__':
server = StreamServer(('0.0.0.0', 6000), echo)
server.serve_forever()

Easy sequential model
Fully async

Async Services with Ginkgo

Ginkgo is a simple framework for composing
async gevent services with common
conﬁguration, logging, demonizing etc.
https://github.com/progrium/ginkgo

Let’s look a simple example
that implements a TCP and
HTTP server...

import gevent
from gevent.pywsgi import WSGIServer

from ginkgo.core import Service

def handle_http(env, start_response):
start_response('200 OK', [('Content-Type', 'text/html')])
print 'new http request!'
return ["hello world"]

def handle_tcp(socket, address):
print 'new tcp connection!'
while True:
socket.send('hellon')
gevent.sleep(1)

app = Service()
app.add_service(StreamServer(('127.0.0.1', 1234),
handle_tcp))
app.add_service(WSGIServer(('127.0.0.1', 8080), handle_http))
app.serve_forever()

import gevent
Import WSGI/TCP
Servers


while True:
gevent.sleep(1)

app = Service()
handle_tcp))
app.serve_forever()

import gevent

HTTP Handler

while True:
gevent.sleep(1)

app = Service()
handle_tcp))
app.serve_forever()

import gevent



def handle_tcp(socket, address): TCP Handler
while True:
gevent.sleep(1)

app = Service()
handle_tcp))
app.serve_forever()

import gevent



while True:
socket.send('hellon') Service
gevent.sleep(1)
Composition
app = Service()
handle_tcp))
app.serve_forever()

Incoming Requests
Load
Balancer

Async
Server
Async
Server
... Async
Server

Using our async reactor-based
approach let’s redesign our serving
infrastructure

Incoming Requests
Load
Balancer

AAA

...
AAA AAA

Async Async Async
Server Server Server

Step 1: deﬁne an authentication and
authorization layer that will identify
the user and the resource being
requested

Incoming Requests
Load
Balancer

AAA

...
AAA AAA

Async Async Async Concurrency
Server Server Server Manager

Step 2: add a throttling layer and
concurrency manager

Concurrency
Admission Control
• Goal: limit concurrency by delaying
or selectively failing requests
• Common metrics
- By Account
- By Resource Type
- By Availability of Dependent Resources

• What we’ve found useful
- By (Account, Resource Type)

Delay - delay responses without failing
requests

Load

Latency

Deny - deny requests based on resource
usage

Load
Latency /x Fail
Latency /*

Incoming Requests
Load
Balancer

AAA

...
AAA AAA

App App App Concurrency
Server Server Server Manager

Step 3: allow backend resources to
throttle requests
Dependent
Services

Summary
Async frameworks like gevent allow you
to easily decouple a request from access
to constrained resources
Service-wide
Failure
Request
Latency

Time

Don’t Fail
Requests
Decrease
Performance

Implementing Scalable Cloud Services with Asynchronous Architectures

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Implementing Scalable Cloud Services with Asynchronous Architectures

Ähnlich wie Implementing Scalable Cloud Services with Asynchronous Architectures (20)

Mehr von Twilio Inc

Mehr von Twilio Inc (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Implementing Scalable Cloud Services with Asynchronous Architectures