Take control of your SAP testing with UiPath Test Suite
DynamoDB Gluecon 2012
1. ¡Ay, caramba!
Wrestle Your NoSQL
Data with DynamoDB
Je ff Dougl a s @je ff do n th em ic
C lo udSp ok es C ommun it y Arch itec t
2. Rambling Talk Roadmap
Short NoSQL overview (thanks Max @ 10gen!)
Why NoSQL database are like Mexican Wrestlers
Amazon DynamoDB in depth
Amazon DynamoDB demo and code
CloudSpokes challenge submissions for “Build an
#Awesome Demo with Amazon DynamoDB”
3. Times they are a-changin’
Cloud applications and
APIs need to be fast,
flexible and scalable.
RDBMS typically do not
scale well for certain data-intensive application.
NoSQL is cloud friendly.
“NoSQL is a rebellion against the DBAs who prevent us from
doing shit.”
- James Governor, Gluecon 2012
4. Why is NoSQL #awesome?
Developed to manage large volumes of data that
do not necessarily follow a fixed schema
Great for heavy read/write workloads
Simple to setup, configure and administer
Distributed, fault tolerant architecture
Scale out not up
Specialized database for the right task
5. Key NoSQL differences
Do not use SQL as a query language
Dynamic & schema-less
Non-relational, no JOIN operations
No complex transactions
May not give full ACID guarantees; eventually
consistent instead. Performance and real-time
nature is more important than consistency.
7. NoSQL database types
Document store (MongoDB, CouchDB)
A document-oriented database that stores, retrieves, and manages semi
structured data including XML, YAML, JSON and binary (PDF, DOC)
Key-value store (Cassandra, Redis)
Stores scheme-less data referenced by a simple key value
Graph database (Neo4j, FlockDB)
Stores the relationship of data as a graph (social relations, network
topologies)
8. How to choose?
With all of the different NoSQL database types, how
do you choose the “best” one?
9. El Toro Más Macho
MongoDB
Stores structured data as JSON-like
documents.
Ad hoc queries, indexing, master-slave
replication, sharding, server-side JavaScript
execution
All the “cool kids” are using it.
Node.js + MongoDB = WINNING!
10. Muy Guapo
Couchbase
JSON Document store
Embedded CouchDB with caching,
clustering and high-performance storage
management components.
JavaScript as its query language and
HTTP for an API
Serve HTML and JavaScript-based
“CouchApps”
11. El Matador Misterio
Redis
What exactly is redis? MAGIC!
By definition, it’s an in-memory, key-value
data store with optional durability.
Data model includes list of string, sets of
strings, sorted sets of strings & hashes.
Awesome at doing set comparisons.
12. Comando Loco
Apache Hadoop
Fast, reliable analysis of both structured data
and complex data.
Derived from Google's MapReduce and File
System (GFS) papers. Yahoo is one of the
main contributors.
Reliable data storage using the Hadoop
Distributed File System (HDFS) and high-
performance parallel data processing using
MapReduce.
13. El Jefe Supremo
Apache Cassandra
Massively scalable key-value store initially
developed by Facebook.
BigTable data model (nested hashes) running
on an Amazon Dynamo-like infrastructure.
Has some RDBMS “feel” with column families
that make it it a hybrid column/row store.
No single point of failure, fault-tolerant multi
data center replication, MapReduce support.
CQL (Cassandra Query Language)
16. ¡Hola DynamoDB
Amazon DynamoDB is a fast, fully managed key-value
database service that scales seamlessly with extremely
low latency and predictable performance.
Store and retrieve any amount of data
Serve any level of request traffic
Hands off administration
Pay for throughput and not storage
17. ¡No! administración
No hardware or software provisioning, setup and
configuration, software patching, or partitioning data over
multiple instances and regions.
Specify the request throughput for your table and in the
background, Amazon handles the provisioning of resources to
meet the requested throughput rate.
Automatically partitions/re-partitions data and provisions
additional server capacity based upon table size & throughput.
Synchronously replicates data across multiple facilities in an
AWS Region giving you high availability and data durability.
18. Muy rápido
Consistent, predictable performance
Runs on a new solid state disk (SSD) architecture
for low-latency response times.
Read latencies average less than 5 milliseconds,
and write latencies average less than 10
milliseconds.
19. Muy Escalable
No table size limits (adiós SimpleDB?)
No downtime when scaling up or down
Unlimited storage
Automatically scale machine resources in
response to increases in database traffic without
the need of client-side partitioning.
20. Modelo de datos flexible
Flexible data model with familiar tables, items
and key-value pairs.
Schema-less document storage. Each item can
have different attributes.
Easy to create and modify documents. Simple
API.
No cross-table joins. Use composite keys to
model relationships.
21. Duradero
Consistent, disk-only writes
Atomic increment/decrement (w/single API call)
Optimistic concurrency control (aka conditional
writes & updates)
Item level transactions (even in bulk)
Automatic and synchronous replication across
data centers and availability zones.
22. Costos?
Pay for throughput and not storage.
Priced per hour of provisioned read/write
throughput
Scales up and down well with a free tier
25. Other features
Integrates with Amazon Elastic MapReduce and
Hadoop.
Libraries, mappers and mocks for Django,
Erlang, Java, .NET, Node.js, Perl, PHP, Python &
Ruby.
Session based authentication using Amazon
Security Token Service
Monitoring via CloudWatch
26. DynamoDB Semantics
Tables, item & attributes
Items are indexed by primary key (single hash
and composite keys)
Items are a collection of attributes and attributes
have a key and value.
Unlimited number of attributes up to 64k total.
35. Flickr on DynamoDB
Wcheung (Canada) submitted a Grails application that caches Flickr photos in
Amazon DynamoDB. You can then search for cached feed entries by primary key
(author + published date/time range) or by table scan. You can also “like” a
photo, resulting in the atomic “like” counter for the item in DynamoDB getting
incremented.
http://screencast.com/t/MAVgm7xeqDpr
36. Posterity
Mbleigh (US) submitted a simple, barebones Twitter-esque service created in
Ruby using Sinatra. It is far from complete but uses a number of DynamoDB's
key features including Hash/Range Keys and Atomic Set Push Operations.
http://www.screencast.com/t/me8hW27MYs3x
37. DynamoDB Task Manager
Darthdeus (Czech Republic) wrote his app in Ruby using Sinatra. It uses a custom
ORM he wrote called DynamoRecord to access DynamoDB. His main idea was to
get at least some of the ActiveRecord-ish API to DynamoDB using some basic
metaprogramming
http://www.youtube.com/watch?v=9tOzaDPP39I
38. Simple Sur vey
Peakpado (US) created an application using Ruby on Rails. For each table he
created a sophisticated hask/range key model class which resulted in an API very
similar to ActiveRecord for DynamoDB.
http://screencast.com/t/ri1XkMxGcpnS
39. Data Sets for Mumbai
Romin (India) developed an API that exposes data sets of Mumbai city in JSON
format. The solution uses Amazon DynamoDB for storing the data and a NodeJS
application that exposes the REST interface and talks to Amazon DynamoDB via
a backend Java application.