Explore Amazon DynamoDB capabilities and benefits in detail and learn how to get the most out of your DynamoDB database. We go over best practices for schema design with DynamoDB across multiple use cases, including gaming, IoT, and others. We explore designing efficient indexes, scanning, and querying, and go into detail on a number of recently released features, including DynamoDB Accelerator (DAX), DynamoDB Time-to-Live, and more. We also provide lessons learned from operating DynamoDB at scale, including provisioning DynamoDB for IoT.
5. SQL (Relational)
Price Desc.
$11.50
$8.99
Chaplin’s
first …
Columns
Rows
Primary Key Index
$14.95
One of 2
major …
The
Partitas
Product
ID
Type
1
2
3
Title Date
Odyssey 1871
Book ID Author
1 Homer
Books
Albums
Title
6 Partitas
Album
ID
Artist
2 Bach
Genre Director
Drama,
Comedy
Chaplin
Movie ID Title
3 The Kid
Movies
Products
Book
Album
Movie
6. SQL (Relational) vs. NoSQL (Non-relational)
Product
ID
Type
Odyssey Homer1 Book ID
2 Album ID 6 Partitas
2
Album ID:
Track ID
Partita
No. 1
Bach
Attributes
Schema is defined per item
Items
Partition Key Sort Key
Price Desc.
$11.50
$8.99
Chaplin’s
first …
Columns
Rows
Primary Key Index
$14.95
One of 2
major …
The
Partitas
3 Movie ID The Kid
Drama,
Comedy
1871
Chaplin
Primary Key
Product
ID
Type
1
2
3
Title Date
Odyssey 1871
Book ID Author
1 Homer
Books
Albums
Title
6 Partitas
Album
ID
Artist
2 Bach
Genre Director
Drama,
Comedy
Chaplin
Movie ID Title
3 The Kid
Movies
Products Products
Book
Album
Movie
7. Why NoSQL?
Optimized for storage Optimized for compute
Normalized/relational Denormalized/hierarchical
Ad hoc queries Instantiated views
Scale vertically Scale horizontally
Good for OLAP Built for OLTP at scale
SQL NoSQL
8. Agenda
Tables, API, data types, indexes
Replication
Data modeling [1:1, 1:M]
DynamoDB Streams
Use cases
Reference architecture
9. Amazon DynamoDB
Managed NoSQL database service
Simple and powerful API
Supports both document and key-value data models
Highly scalable
Consistent, single-digit millisecond latency at any scale
Highly available—3x replication
13. Data types
String (S)
Number (N)
Binary (B)
String Set (SS)
Number Set (NS)
Binary Set (BS)
Map (M)
List (L)
Boolean (BOOL)
Null (NULL)
Used for storing nested JSON documents
14. 00 55 A954 AA FF
Partition table
Partition key uniquely identifies an item
Partition key is used for building an unordered hash index
Table can be partitioned for scale
00 FF
Id = 1
Name = Jim
Hash (1) = 7B
Id = 2
Name = Andy
Dept = Engg
Hash (2) = 48
Id = 3
Name = Kim
Dept = Ops
Hash (3) = CD
Key Space
15. Partitions are three-way replicated
Id = 2
Name = Andy
Dept = Engg
Id = 3
Name = Kim
Dept = Ops
Id = 1
Name = Jim
Id = 2
Name = Andy
Dept = Engg
Id = 3
Name = Kim
Dept = Ops
Id = 1
Name = Jim
Id = 2
Name = Andy
Dept = Engg
Id = 3
Name = Kim
Dept = Ops
Id = 1
Name = Jim
Replica 1
Replica 2
Replica 3
Partition 1 Partition 2 Partition N
16. Partition-sort key table
Partition key and sort key together uniquely identify an Item
Within unordered partition key-space, data is sorted by the sort key
No limit on the number of items (∞) per partition key
• Except if you have local secondary indexes
00:0 FF:∞
Hash (2) = 48
Customer# = 2
Order# = 10
Item = Pen
Customer# = 2
Order# = 11
Item = Shoes
Customer# = 1
Order# = 10
Item = Toy
Customer# = 1
Order# = 11
Item = Boots
Hash (1) = 7B
Customer# = 3
Order# = 10
Item = Book
Customer# = 3
Order# = 11
Item = Paper
Hash (3) = CD
55:0 A9:∞54:∞ AA:0
Partition 1 Partition 2 Partition 3
18. Global secondary index (GSI)
Alternate partition (+sort) key
Index is across all table partition keys
GSIs
A5
(part.)
A4
(sort)
A1
(table key)
A3
(projected)
Table
INCLUDE A3
A4
(part.)
A5
(sort)
A1
(table key)
A2
(projected)
A3
(projected) ALL
A2
(part.)
A1
(table key) KEYS_ONLY
RCU/WCU provisioned
separately for GSIs
Online Indexing
A1
(partition)
A2 A3 A4 A5
19. How do GSI updates work?
Table
Primary
table
Primary
table
Primary
table
Primary
table
Global
Secondary
Index
Client
3. Asynchronous
update (in progress)
If GSIs don’t have enough write capacity, table writes will be throttled!
20. Local secondary index (LSI)
Alternate sort key attribute
Index is local to a partition key
A1
(partition)
A3
(sort)
A2
(table key)
A1
(partition)
A2
(sort)
A3 A4 A5
LSIs
A1
(partition)
A4
(sort)
A2
(table key)
A3
(projected)
Table
KEYS_ONLY
INCLUDE A3
A1
(partition)
A5
(sort)
A2
(table key)
A3
(projected)
A4
(projected)
ALL
10 GB max per partition
key, therefore LSIs limit
the # of sort keys!
22. Scaling: Throughput
Provisioned at the table level
Read and write throughput limits are independent
Tables re-partition according to provisioned throughput
• Write capacity units (WCUs) are measured in 1 KB per second
• Read capacity units (RCUs) are measured in 4 KB per second
• RCUs measure strictly consistent reads
• Eventually consistent reads cost 1/2 of consistent reads
WCURCU
23. Scaling: Disk Size
Virtually limitless storage
DynamoDB automatically splits partitions after 10GB
Items are limited to 400KB in size
24. Partitioning math
# 𝑜𝑓 𝑃𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝑠 =
𝑇𝑎𝑏𝑙𝑒 𝑆𝑖𝑧𝑒 𝑖𝑛 𝐺𝐵
10 𝐺𝐵(𝑓𝑜𝑟 𝑠𝑖𝑧𝑒)
# 𝑜𝑓 𝑃𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝑠
(𝑓𝑜𝑟 𝑡ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡)
=
𝑅𝐶𝑈𝑓𝑜𝑟 𝑟𝑒𝑎𝑑𝑠
3000 𝑅𝐶𝑈
+
𝑊𝐶𝑈𝑓𝑜𝑟 𝑤𝑟𝑖𝑡𝑒𝑠
1000 𝑊𝐶𝑈
(𝑓𝑜𝑟 𝑡ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡)(𝑓𝑜𝑟 𝑠𝑖𝑧𝑒)(𝑡𝑜𝑡𝑎𝑙)
In the future, these details might change…
Not used until table is created
26. Getting the most out of DynamoDB throughput
“To get the most out of DynamoDB
throughput, create tables where
the partition key has a large
number of distinct values, and
values are requested fairly
uniformly, as randomly as
possible.”
—DynamoDB Developer Guide
1. Space:
1. Key Choice: High key
cardinality
2. Uniform Access: access
is evenly spread over
the key-space
2. Time: requests arrive evenly
spaced in time
32. Burst capacity may not be sufficient
0
400
800
1200
1600
CapacityUnits
Time
Provisioned Consumed Attempted
Throttled requests
Don’t completely depend on burst capacity… provision sufficient throughput
Burst: 300 seconds
(1200 × 300 = 360k CU)
33. What causes throttling?
Non-uniform workloads
• Hot keys/hot partitions
• Large collection sizes [100s of MBs under one partition key]
Dilution of throughout across partitions caused by mixing
hot data with cold data
• Use a table per time period for storing time series data so WCUs
and RCUs are applied to the hot data set
35. 1:1 relationships or key-values
Use a table or GSI with a partition key
Use GetItem or BatchGetItem API
Example: Given a user or email, get attributes
Users Table
Partition key Attributes
UserId = bob Email = bob@gmail.com, JoinDate = 2016-11-15
UserId = fred Email = fred@yahoo.com, JoinDate = 2016-12-01
Users-Email-GSI
Partition key Attributes
Email = bob@gmail.com UserId = bob, JoinDate = 2016-11-15
Email = fred@yahoo.com UserId = fred, JoinDate = 2016-12-01
36. 1:N relationships or parent-children
Use a table or GSI with partition and sort key
Use Query API
Example: Given a device, find all readings
between epoch X, Y
Device-measurements
Part. Key Sort key Attributes
DeviceId = 1 epoch = 5513A97C Temperature = 30, pressure = 90
DeviceId = 1 epoch = 5513A9DB Temperature = 30, pressure = 90
37. N:M relationships
Use a table and GSI with partition and sort key elements
switched
Use Query API
Example: Given a user, find all games. Or given a game,
find all users.
User-Games-Table
Part. Key Sort key
UserId = bob GameId = Game1
UserId = fred GameId = Game2
UserId = bob GameId = Game3
Game-Users-GSI
Part. Key Sort key
GameId = Game1 UserId = bob
GameId = Game2 UserId = fred
GameId = Game3 UserId = bob
38. Rich expressions
Projection expression
• Query/Get/Scan: ProductReviews.FiveStar[0]
Filter expression
• Query/Scan: #V > :num (#V is a place holder for keyword VIEWS)
Conditional expression
• Put/Update/DeleteItem: attribute_not_exists (#pr.FiveStar)
Update expression
• UpdateItem: set Replies = Replies + :num
39. DynamoDB Streams
Partition A
Partition B
Partition C
Ordered stream of item
changes
Exactly once, strictly
ordered by key
Highly durable, scalable
24 hour retention
Sub-second latency
Compatible with Kinesis
Client Library
DynamoDB Streams
1
Shards have a lineage and
automatically close after time or
when the associated DynamoDB
partition splits
2
3
Updates
KCL
Worker
Amazon
Kinesis Client
Library
Application
KCL
Worker
KCL
Worker
GetRecords
Amazon DynamoDB
Table
DynamoDB Streams
Stream
Shards
42. Time series tables
Events_table_2016_April
Event_id
(Partition key)
Timestamp
(sort key)
Attribute1 …. Attribute N
Events_table_2016_March
Event_id
(Partition key)
Timestamp
(sort key)
Attribute1 …. Attribute N
Events_table_2016_Feburary
Event_id
(Partition key)
Timestamp
(sort key)
Attribute1 …. Attribute N
Events_table_2016_January
Event_id
(Partition key)
Timestamp
(sort key)
Attribute1 …. Attribute N
RCUs = 1000
WCUs = 100
RCUs = 10000
WCUs = 10000
RCUs = 100
WCUs = 1
RCUs = 10
WCUs = 1
Current table
Older tables
HotdataColddata
Don’t mix hot and cold data; archive cold data to Amazon S3
43. DynamoDB TTL
RCUs = 10000
WCUs = 10000
RCUs = 100
WCUs = 1
HotdataColddata
Use DynamoDB TTL and Streams to archive
Events_table_2016_April
Event_id
(Partition key)
Timestamp
(sort key)
myTTL
1489188093
…. Attribute N
Current table
Events_Archive
Event_id
(Partition key)
Timestamp
(sort key)
Attribute1 …. Attribute N
44. Isolate cold data from hot data
Pre-create daily, weekly, monthly tables
Provision required throughput for current table
Writes go to the current table
Turn off (or reduce) throughput for older tables
OR move items to separate table with TTL
Dealing with time series data
49. What is DAX?
The benefits of DAX:
•Latency: <1ms
•Throughput: Millions of RPS for a 3-node (highly
available) cluster
•Simplified caching: DynamoDB API, AWS integration
•Hot-key/cost savings: reduce over provisioning for
frequently accessed data
53. How DAX works
DAX is API compatible with DynamoDB
Read APIs: GetItem, BatchGetItem, Query, Scan
Modify APIs: PutItem, UpdateItem, DeleteItem,
BatchWriteItem
Control plane APIs: Not supported (CreateTable,
DeleteTable, etc.)
56. Requirements for promotion tracking
Monitor customer promotion redemption
Support adhoc queries from Hive
Archive historical promotion redemption in Redshift
BI requirements are not known in advance
57. 1. Create Orders
2. Capture User Activity
CUSTOMER-ID ORDER-ITEM-ID PROMOTION-ID STATUS
CUST-1 ITEM-a PROMO-PRIMEDAY OK
CUST-1 ITEM-b PROMO-GOLDBOX OK
CUST-1 ITEM-c PROMO-GOLDBOX OK
CUST-2 ITEM-a PROMO-PRIMEDAY OK
CUST-3 ITEM-a PROMO-PRIMEDAY RETURNED
58. BI Questions
1. Which promotions were valued by customers aged 18-24 y/o?
2. How many PrimeDay promotions were returned, by country?
3. How did the PrimeDay sales compare to Black Friday last year?
CUSTOMER-ID ORDER-ITEM-ID PROMOTION-ID STATUS
CUST-1 ITEM-a PROMO-PRIMEDAY OK
CUST-1 ITEM-b PROMO-GOLDBOX OK
CUST-1 ITEM-c PROMO-GOLDBOX OK
CUST-2 ITEM-a PROMO-PRIMEDAY OK
CUST-3 ITEM-a PROMO-PRIMEDAY RETURNED
Amazon S3 Redshift Other data
sources
59. CUSTOMER-ID ORDER-ITEM-ID PROMOTION-ID STATUS
CUST-1 ITEM-a PROMO-PRIMEDAY OK
CUST-1 ITEM-b PROMO-GOLDBOX OK
CUST-1 ITEM-c PROMO-GOLDBOX OK
CUST-2 ITEM-a PROMO-PRIMEDAY OK
CUST-3 ITEM-a PROMO-PRIMEDAY RETURNED
Amazon
DynamoDB
DynamoDB
Streams
Amazon
Kinesis
Firehose
Redshift
1. Keep all promo redemption history in Redshift
CUSTOMER-ID ORDER-ITEM-ID PROMOTION-ID STATUS
CUST-1 ITEM-a PROMO-PRIMEDAY OK
CUST-1 ITEM-b PROMO-GOLDBOX OK
CUST-1 ITEM-c PROMO-GOLDBOX OK
CUST-2 ITEM-a PROMO-PRIMEDAY OK
CUST-3 ITEM-a PROMO-PRIMEDAY RETURNED
2. Support adhoc queries with HQL
Amazon EMR
Amazon
DynamoDB
Other data
sources
Amazon S3
60. Use DynamoDB Streams to record changes
Use S3 as intermediary data store
Use Hive at low priority for adhoc queries
Analyze Items out of band
Queries are unknown at design
Other data
sources
Amazon S3
RedshiftAmazon
DynamoDB
DynamoDB
Streams
68. Trade off read cost for write scalability
Consider throughput per partition key and per partition
Shard write-heavy partition keys
Your write workload is not horizontally
scalable
69. Correctness in voting
UserId Candidate Date
Alice A 2016-10-02
Bob B 2016-10-02
Eve B 2016-10-02
Chuck A 2016-10-02
RawVotes Table
Segment Votes
A_1 23
B_2 12
B_1 14
A_2 25
AggregateVotes Table
Voter
1. Record vote and de-dupe; retry 2. Increment candidate counter
70. Correctness in aggregation?
UserId Candidate Date
Alice A 2016-10-02
Bob B 2016-10-02
Eve B 2016-10-02
Chuck A 2016-10-02
RawVotes Table
Segment Votes
A_1 23
B_2 12
B_1 14
A_2 25
AggregateVotes Table
Voter
72. Stream of updates to a table
Asynchronous
Exactly once
Strictly ordered
• Per item
Highly durable
• Scale with table
24-hour lifetime
Sub-second latency
DynamoDB Streams
73. View Type Destination
Old image—before update Name = John, Destination = Mars
New image—after update Name = John, Destination = Pluto
Old and new images Name = John, Destination = Mars
Name = John, Destination = Pluto
Keys only Name = John
View types
UpdateItem (Name = John, Destination = Pluto)
75. DynamoDB Streams
Open Source Cross-
Region Replication Library
Asia Pacific (Sydney) EU (Ireland) Replica
US East (N. Virginia)
Cross-region replication
81. Analytics with
DynamoDB Streams
Collect and de-dupe data in DynamoDB
Aggregate data in-memory and flush periodically
Performing real-time aggregation and
analytics