Uber Business Metrics Generation and Management Through Apache Flink

Uber Business Metrics
Generation and Management
Through Apache Flink
Uber
Engineer at Uber Marketplace
Wenrui Meng
Miao Yu
2018 Flink Forward Asia

Uber Mission
We ignite opportunity by setting the world in motion.
我们构建交通服务平台，为全球点燃机遇之火.

● Marketplace
● Project background
● Architecture
● User case
● Challenges
● Future work
Outline

Marketplace Team
Marketplace
Platform
Marketplace
Data
Marketplace
Dynamics
Marketplace
Fare

Marketplace Data
● Mission: Empower teams throughout Uber to understand and improve the eﬃciency of
the marketplace by providing real time data for our Marketplace levers, modeling and
analysis across 15 teams
● Data sources: trip lifecycle, matching, rider, driver, fare, mobile across 100+ Apache
Kafka topics
● Data categories: trip, rider, driver, session, product, geo, time

Marketplace Data
Trip Flow Patterns
Demand Patterns
Human Consumption - Incentives
Human Consumption - Regional Metrics

Marketplace Data
attribution note

● Usage pattern
● Issues
● Solution
Project Background

● Metrics are aggregated on the fly
from raw data.
● Metric definitions have a DAG
structure.
● Metrics are usually aggregated
upon a limited set of dimensions,
e.g., time, geo, and etc.
Usage Pattern

● Resource waste
● Bad performance
● Inconsistency
● ...
Issues

Solution
● An E2E platform
○ Provides a unified DSL to define metrics for multiple datasources like Kafka, Elastic Search,
and etc.
○ Based on metric definitions, optimizes execution plan based on objectives including
performance, isolation, and etc.
○ Directly generates metrics, instead of generating an intermediate raw dataset and then
aggregating on the fly.

● Formula: a DSL describing an aggregation or arithmetic operation over aggregation
metric_A + metric_B * metric_C - avg((table_A where (exists(column_A) AND (column_B != 0))).column_C)
● Dimensions: over which this metric will be aggregated
Time, geo, product type, and ...
● Sinks: physical storage of this metric
Elastic search, Cassandra, MySQL, and ...
Metric Definition

1. Parse all metric definitions
2. Build a DAG of the most basic Flink operations needed for generating
these metrics.
3. Merging nodes and edges in operation DAG to form a new DAG of
logical metric groups, to reduce duplication, promote isolation and
improve performance.
4. Generate metric group config files and feed them into execution plan
generator.
Execution Optimizer

1. Parse metric group config files.
2. Given configs, build a DAG of stream operators. Within each stream
operator, there could be also a DAG of event operators.
3. Applied pre-defined operator templates and generate the final Flink
execution plan.
Execution Plan Generator

● Given specific set of metric, export SQL for it.
● Given SQL query, analyze percentage overlap with existing data in the
platform
● Convert the adhoc query to leverage existing data in platform
● Auto debugging to find root cause of data abnormality
● Metric discovery
● ….
Functionalities

● Avoid inconsistency (only 1 source of truth)
● Avoid inefficiency in development and duplicate effort
● Avoid cluster resource waste by optimization framework proposed
● Avoid duplicate definition by detecting the duplication when adding or
updating metric
● Improve the metric definition (ex if there is existing metrics A+B to
define C, it’s not good to define from scratch)
● Decouple metrics definition and generation techniques, easy to adopt
new techniques
● Decouple platform from storage
Benefits

Driver status summary of one minute:
1.Driver utilization per hexagon on 1 minute
2.Driver utilization per city on 1 minute
3.Driver total total worked minutes per hour
4.Open cars of each hexagon at 1PM in San Francisco
Back-end
service
Apache Kafka
message
Pipeline
sinks
every 4
seconds
Use Case Walkthrough

There are two solutions that support these three use cases:
1. Store aggregation for each driver and hexagon based on one minute
into Elasticsearch. Different service queries with additional on-the-fly
aggregation.
2. Streaming jobs to calculate the exact data needed to feed into service
or model. Streaming job maintained by different teams.
Dimension Value
Hexagon id Hex_1
Driver id Driver_1
Status Driving_client
Others.. ..

Driver team
Location-update
event
Driver summary
per minute
Metric group 1:
driver per hour
Metric group 2:
driver utilization
hexagon
Metric group 3:
driver utilization
city
Driver_id, hex_id,
status, others
Metric group 4:
total open car
hexagon
Pricing team
Surge team
Heath team
Our new framework supports all these use cases but without any duplicate
computations or inconsistency.

Complex topology
● With the scale and size of Uber’s current metrics, we will encounter very
complex topology once most of the metrics are onboarded onto this
framework.
● We may need to cluster the tasks and split them into a few separate
jobs, connected to Apache Kafka.
Challenges

Unstable sinks
● Each metrics group may have a list of different sinks including, Apache
Hive, Apache Kafka, ELK, Cassandra, etc. These storage systems are
owned by different teams at Uber, with different SLAs.
● Slow and unstable storage could bring down the whole pipeline SLA. It’s
desirable to separate the sink from processor. At Uber, we have a
dedicated publisher pipeline to fulfill the sink tasks with the ability to turn
on/off specific sink.
Challenges

Metrics evolution
● User will add new metric and update the existing metric. How do we
support this with minimum effort?
○ We need distinguish the golden sets from ad hoc cases. For the
golden set, a bach of metrics changes are reviewed and put into
pipeline.
○ For ad hoc use cases, a user can run the pipeline by themselves but
won’t receive much support.
Challenges

Onboarding
● At Uber, there are 15 teams and 20+ services that use Gairos. It’s a big
challenge to ask each client to change their way to consume data.
● We host our gateway to query data in Gairos. It’’s desirable to build
conversion in gateway to convert the old way consuming data to new
approach.
Challenges

● Improve execution optimizer
● Improve tooling
Future Work

Uber Business Metrics Generation and Management Through Apache Flink

Uber Business Metrics Generation and Management Through Apache Flink

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Uber Business Metrics Generation and Management Through Apache Flink

Ähnlich wie Uber Business Metrics Generation and Management Through Apache Flink (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Uber Business Metrics Generation and Management Through Apache Flink