Scalable Streaming Data Pipelines with Redis

Scalable Streaming Data
Pipelines with Redis
Avram Lyon
@ajlyon / github.com/avram
redisconf / May 10, 2016

MOBILE GAMES - PUBLISHER AND DEVELOPER

What kind of data?
• App opened
• Killed a walker
• Bought something
• Heartbeat
• Memory usage report
• App error
• Declined a review
prompt
• Finished the tutorial
• Clicked on that button
• Lost a battle
• Found a treasure chest
• Received a push
message
• Finished a turn
• Sent an invite
• Scored a Yahtzee
• Spent 100 silver coins
• Anything else any
game designer or
developer wants to
learn about

How much?
Recently:
Peak:
2.8 million events / minute
2.4 billion events / day

Primary Data Stream
Collection
Kinesis
Warehousing
Enrichment
Realtime MonitoringKinesisPublic API

Collection
HTTP
Collection
SQS
SQS
SQS
Studio A
Studio B
Studio C
Kinesis
SQS Failover
Redis
Caching App Configurations
System Configurations

Kinesis
SQS Failover
S3
Kinesis
Elasticsearch?
Enricher
Data
Warehouse
Forwarder
Ariel
(Realtime)
Idempotence
Aggregation
Idempotence
Idempotence

Where does this flow?
Ariel / Real-Time
Operational monitoring
Business alerts
Dashboarding
Data Warehouse
Funnel analysis
Ad-hoc batch analysis
Reporting
Behavior analysis
Elasticsearch
Ad-hoc realtime analysis
Fraud detection
Top-K summaries
Exploration
Ad-Hoc Forwarding
Data integration with partners
Game-specific systems

Kinesis
• Distributed, sharded streams. Akin to Kafka.
• Get an iterator over the stream— and checkpoint with current stream
pointer occasionally.
• Workers coordinate shard leases and checkpoints in DynamoDB (via
KCL)
Shard 0
Shard 1
Shard 2

Shard 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Checkpointing
Checkpoint for Shard 0: 10 Given: Worker checkpoints every 5
Worker A 🔥
Worker B

Auxiliary Idempotence
• Idempotence keys at each stage
• Redis sets of idempotence keys by time window
• Gives resilience against various types of failures

Auxiliary Idempotence
• Gotcha: Set expiry is O(N)
• Broke up into small sets, partitioned by first 2 bytes of md5 of
idempotence key

Collection
Kinesis
Warehousing
Enrichment
Realtime MonitoringKinesisPublic API

1. Deserialize event batch
2. Apply changes to application properties
3. Get current device and application properties
4. Get known facts about sending device
5. Emit to each enriched event to Kinesis
Collection
Kinesis
Enrichment

Kinesis
SQS Failover
Kinesis
S3
Elasticsearch
?
S3 Backups
to HDFS
Enricher
Data
Warehouse
Forwarder
Idempotence
Ariel
Realtime
Idempotence
Aggregation
Idempotence

Now we have a stream of well-
described, denormalized event facts.

Pipeline to HDFS
• Partitioned by event name and game, buffered in-memory and
written to S3
• Picked up every hour by Spark job
• Converts to Parquet, loaded to HDFS

Ariel Goals
• Low time-to-visibility
• Easy configuration
• Low cost per configured metric

Live Metrics (Ariel)
Enriched Event Data
name: game_end
time: 2015-07-15 10:00:00.000 UTC
_devices_per_turn: 1.0
event_id: 12345
device_token: AAAA
user_id: 100
name: game_end
time: 2015-07-15 10:01:00.000 UTC
event_id: 12346
device_token: BBBB
user_id: 100
name: Cheating Games
predicate: _devices_per_turn > 1.5
target: event_id
type: DISTINCT
id: 1
name: Cheating Players
target: user_id
type: DISTINCT
id: 2
name: game_end
time: 2015-07-15 10:01:00.000 UTC
event_id: 12347
device_token: BBBB
user_id: 100
PFADD /m/1/2015-07-15-10-00 12346
PFADD /m/1/2015-07-15-10-00 123467
PFADD /m/2/2015-07-15-10-00 BBBB
PFADD /m/2/2015-07-15-10-00 BBBB
PFCOUNT /m/1/2015-07-15-10-00
2
PFCOUNT /m/2/2015-07-15-10-00
1
Configured Metrics
Collector

HyperLogLog
• High-level algorithm (four bullet-point version stolen from my
colleague, Cristian)
• b bits of the hashed function is used as an index pointer (redis
uses b = 14, i.e. m = 16384 registers)
• The rest of the hash is inspected for the longest run of zeroes
we can encounter (N)
• The register pointed by the index is replaced with
max(currentValue, N + 1)
• An estimator function is used to calculate the approximated
cardinality
http://content.research.neustar.biz/blog/hll.html

Live Metrics (Ariel)
Enriched Event Data
name: game_end
time: 2015-07-15 10:00:00.000 UTC
event_id: 12345
device_token: AAAA
user_id: 100
name: game_end
time: 2015-07-15 10:01:00.000 UTC
event_id: 12346
device_token: BBBB
user_id: 100
name: Cheating Games
target: event_id
type: DISTINCT
id: 1
name: Cheating Players
target: user_id
type: DISTINCT
id: 2
name: game_end
time: 2015-07-15 10:01:00.000 UTC
event_id: 12347
device_token: BBBB
user_id: 100
PFADD /m/1/2015-07-15-10-00 12346
PFADD /m/1/2015-07-15-10-00 123467
PFADD /m/2/2015-07-15-10-00 BBBB
PFADD /m/2/2015-07-15-10-00 BBBB
PFCOUNT /m/1/2015-07-15-10-00
2
PFCOUNT /m/2/2015-07-15-10-00
1
Configured Metrics
We can count
different things
Collector

Kinesis
Aggregation
Ariel
PFCOUNT
Are installs anomalous?
Collector
Idempotence
PFADD
Web
Workers

Pipeline Delay
• Pipelines back up
• Dashboards get outdated
• Alarms fire!

Alarm Clocks
• Push timestamp of current events to per-game
pub/sub channel
• Worker takes 99th percentile age of last N events
per title as delay
• Use that time for alarm calculations
• Overlay delays on dashboards

Ariel, now with clocks
Event ClockKinesis
Aggregation
PFCOUNT
Collector
Idempotence
PFADD
Web
Workers

Ariel 1.0
• ~30K metrics configured
• Aggregation into 30-minute
buckets
• 12 kilobytes per HLL set
(plus overhead)

Challenges
• Dataset size.
RedisLabs non-cluster
max = 100GB
• Packet/s limits: 250K in
EC2-Classic
• Alarm granularity

Hybrid Datastore:
Requirements
• Need to keep HLL sets to count distinct
• Redis is relatively finite
• HLL outside of Redis is messy

Hybrid Datastore: Plan
• Move older HLL sets to DynamoDB
• They’re just strings!
• Cache reports aggressively
• Fetch backing HLL data from DynamoDB as
needed on web layer, merge using on-instance
Redis

Ariel, now with hybrid datastore
DynamoDB
Report Caches
Old Data Migration
Event Clock
Kinesis
Aggregation
PFCOUNT
Collector
Idempotence
PFADD
Web
Workers
Merge Scratchpad

Redis Roles
• Idempotence
• Configuration Caching
• Aggregation
• Clock
• Scratchpad for merges
• Cache of reports
• Staging of DWH extracts

Other Considerations
• Multitenancy. We run parallel stacks and give
games an assigned affinity, to insulate from
pipeline delays
• Backfill. System is forward-looking only; can replay
Kinesis backups to backfill, or backfill from
warehouse

Why Not _____?
• Druid
• Flink
• InfluxDB
• RethinkDB

Thanks!
Questions?
scopely.com/jobs
@ajlyon
avram@scopely.com
github.com/avram

Scalable Streaming Data Pipelines with Redis

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Scalable Streaming Data Pipelines with Redis

Ähnlich wie Scalable Streaming Data Pipelines with Redis (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Scalable Streaming Data Pipelines with Redis

Hinweis der Redaktion