SlideShare a Scribd company logo
1 of 35
Schemas Beyond The Edge
Alexei Zenin
Platform Engineer, Uken Games
August 17th, 2021
Journey
Mobile Analytics at Uken Games
Key Features of Schema Registry
Revamped Mobile Analytics Pipeline: JSON to Protobuf
2
Mobile Analytics at Uken Games
Collection of data to drive:
● Retention & engagement analysis (AB tests, DAU)
● Operational debugging & investigation
(purchases, rewards)
● Lower-level metric analysis (HTTP, CPU, Memory)
Focus will be on the first two
3
Current Ingestion Pipeline Design
4
~200 million events per day
Schema Silos
5
● Each silo has its own processes and
tools
● Several teams managing the same data
asset definitions
● Duplication of work across the
boundaries
○ Structure
○ Data types
○ Naming
6
Schema Drift
Pros & Cons
Pros
● Adding a new event is quick
● JSON is easy to use
● JSON well supported across systems
Cons
● Duplicated effort across teams for data management
● Repeated schema definitions
● Manual processes to keep things in sync
● JSON retransmits schema information each time
7
Schema Registry Refresher
8
Overview of Schema Registry
● Provides a central place to upload your
schemas
● Immutable & Idempotent API
● Acts as “barcode system”
9
https://bit.ly/3ikXUci
Default Wire Format (Confluent)
10
Single Primary Architecture
11
● Primary is elected via
Kafka Group Protocol
● Each Secondary is informed of
the primary’s address
● Primary is responsible for writes
(e.g. registering new schemas)
● Every node can serve reads or
forward write requests
Ingestion Pipeline 2.0: Protobuf
12
Ingestion Pipeline 2.0
Goals:
● Decrease boilerplate work
● Leverage automation
● Increase transparency
● Single source of truth for schemas
Solution:
Gitops paradigm with Protobuf & Schema Registry
13
JSON to Protobuf: Schema Structures
● Legacy JSON schema structure use the concept of an envelope with various subdivisions
● Convert into Protobuf, preserve schema structure
14
Protobuf Equivalent
Protobuf features:
● Can express custom types and use
composition to glue together
schemas in 1 top level schema
● Has support for some native types
like google.protobuf.Timestamp
● Ability to generate code from schema
(e.g. Java, C#)
15
Envelope per payload?
● Uken has between 200-300 event
payloads per game
● One approach is to copy paste the
envelope per payload per game
Envelopes = Payloads X Games
● Downsides are duplication in schemas
and generated code
● Leads to a poor developer experience
16
“Oneof” Branching
● Try using the oneof construct to
enumerate all possible payloads at
the EventPayload level
● Only need an envelope per game or 1
mega envelope for all games
Envelopes = O(Games)
● Similar to Avro Unions
● Still has duplication of envelope and
EventPayload
17
How to get around strict Protobuf schemas
● Each definition needs to be defined upfront and explicitly in Protobuf
Solution:
Defer schema attached until runtime to be able to reuse envelope across games
18
Protobuf’s Any: Dynamic Message Container
● Allows you to use any embedded
Protobuf type
● Uses special “packing” code
● The type_url only accepts Protobuf
package names
● Requires class to be present in
application
19
https://bit.ly/2Uo6RcS
Schema Registry Compatible “Any”
● Enables attaching schemas at runtime
● Removes type_url for schema_id
● The value field is Proto3 encoded bytes
20
Dynamic Envelope
● Can define one envelope for
all games
● Use the AnyUken type for
EventPayload
● Tradeoff explicitness for
flexibility
● Elevate schema IDs to a first
class concept within the
schemas themselves
(schema pointers)
21
22
Protobuf View
How do we get these schemas
past the edge?
23
Integrating Schema Registry with Mobile clients
Problems:
● Generated Protobuf classes are not aware of schema IDs
● Client device needs access to schema IDs
24
“Online” Approach (traditional)
25
● Expose Schema Registry (SR) to
mobile clients directly
● Fetch required schema IDs
during app runtime
Disadvantages:
● Need to setup security for
schema registry
● Need to scale SR to millions of
clients
● Point of failure for client
● Adds network overhead to
client
“Embedded” Approach
26
Embedded Tradeoffs
● No need to expose SR to millions of clients
● Leverage the immutable property of schema
IDs
● Custom build process
● Total snapshot of schema IDs built into app
27
Serverless + Schema Registry
28
EventBatch wire format
29
● Define a generic envelope as the API contract
between API Gateway & Mobile client
● Able to take any resolvable schema, return
error if bad data
● Avoids hardcoding a specific version of
AnalyticsEvent
● Keeps wire format compliant to Proto3
Nesting Schema IDs Visualized
30
Performance Comparison
31
● 3x improvement in latency to ingest a
batch of events
● 2x reduction in size per event
● 2x increase in possible number of
events buffered on device
JSON Latency
Protobuf Latency
Schema Management: GitOps paradigm
● Version control Protobuf schemas in Git monorepo
● Use Merge Requests to collaborate on impending changes
● Run CI/CD on commits
● Place people into the right spots, let automation do the rest
32
33
34
Next Steps
● Adding a Data Dictionary
● Improving visibility with integrations to Slack
● Iterating on RACI matrix for data management
● Migrating Spark Jobs to utilize new data management process
35
Takeaways
● GitOps allows for data management
automation
● Schema Registry can empower devices outside
the data center
● Schema IDs allow flexible envelope designs that
operate better at scale
● Protobuf can help reduce costs by several times

More Related Content

What's hot

Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache KafkaChhavi Parasher
 
Elastic stack Presentation
Elastic stack PresentationElastic stack Presentation
Elastic stack PresentationAmr Alaa Yassen
 
Distributed Transaction in Microservice
Distributed Transaction in MicroserviceDistributed Transaction in Microservice
Distributed Transaction in MicroserviceNghia Minh
 
Grokking TechTalk #33: High Concurrency Architecture at TIKI
Grokking TechTalk #33: High Concurrency Architecture at TIKIGrokking TechTalk #33: High Concurrency Architecture at TIKI
Grokking TechTalk #33: High Concurrency Architecture at TIKIGrokking VN
 
Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...
Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...
Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...Databricks
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Yohei Onishi
 
Graph Databases & OrientDB
Graph Databases & OrientDBGraph Databases & OrientDB
Graph Databases & OrientDBArpit Poladia
 
Kafka Tutorial - introduction to the Kafka streaming platform
Kafka Tutorial - introduction to the Kafka streaming platformKafka Tutorial - introduction to the Kafka streaming platform
Kafka Tutorial - introduction to the Kafka streaming platformJean-Paul Azar
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformMapR Technologies
 
Kafka as Message Broker
Kafka as Message BrokerKafka as Message Broker
Kafka as Message BrokerHaluan Irsad
 
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...HostedbyConfluent
 
Enabling the Active Data Warehouse with Apache Kudu
Enabling the Active Data Warehouse with Apache KuduEnabling the Active Data Warehouse with Apache Kudu
Enabling the Active Data Warehouse with Apache KuduGrant Henke
 
Livy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache SparkLivy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache SparkJen Aman
 
Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)DataWorks Summit
 
Domain Driven Design và Event Driven Architecture
Domain Driven Design và Event Driven Architecture Domain Driven Design và Event Driven Architecture
Domain Driven Design và Event Driven Architecture IT Expert Club
 
Snowflake Data Loading.pptx
Snowflake Data Loading.pptxSnowflake Data Loading.pptx
Snowflake Data Loading.pptxParag860410
 
Architecting application with Hadoop - using clickstream analytics as an example
Architecting application with Hadoop - using clickstream analytics as an exampleArchitecting application with Hadoop - using clickstream analytics as an example
Architecting application with Hadoop - using clickstream analytics as an examplehadooparchbook
 

What's hot (20)

Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Elastic stack Presentation
Elastic stack PresentationElastic stack Presentation
Elastic stack Presentation
 
Distributed Transaction in Microservice
Distributed Transaction in MicroserviceDistributed Transaction in Microservice
Distributed Transaction in Microservice
 
Grokking TechTalk #33: High Concurrency Architecture at TIKI
Grokking TechTalk #33: High Concurrency Architecture at TIKIGrokking TechTalk #33: High Concurrency Architecture at TIKI
Grokking TechTalk #33: High Concurrency Architecture at TIKI
 
Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...
Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...
Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
 
Graph Databases & OrientDB
Graph Databases & OrientDBGraph Databases & OrientDB
Graph Databases & OrientDB
 
Kafka Tutorial - introduction to the Kafka streaming platform
Kafka Tutorial - introduction to the Kafka streaming platformKafka Tutorial - introduction to the Kafka streaming platform
Kafka Tutorial - introduction to the Kafka streaming platform
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
 
Kafka as Message Broker
Kafka as Message BrokerKafka as Message Broker
Kafka as Message Broker
 
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
 
Enabling the Active Data Warehouse with Apache Kudu
Enabling the Active Data Warehouse with Apache KuduEnabling the Active Data Warehouse with Apache Kudu
Enabling the Active Data Warehouse with Apache Kudu
 
Livy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache SparkLivy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache Spark
 
Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)
 
Snowflake Datawarehouse Architecturing
Snowflake Datawarehouse ArchitecturingSnowflake Datawarehouse Architecturing
Snowflake Datawarehouse Architecturing
 
Domain Driven Design và Event Driven Architecture
Domain Driven Design và Event Driven Architecture Domain Driven Design và Event Driven Architecture
Domain Driven Design và Event Driven Architecture
 
Apache Ranger Hive Metastore Security
Apache Ranger Hive Metastore Security Apache Ranger Hive Metastore Security
Apache Ranger Hive Metastore Security
 
Snowflake Data Loading.pptx
Snowflake Data Loading.pptxSnowflake Data Loading.pptx
Snowflake Data Loading.pptx
 
Architecting application with Hadoop - using clickstream analytics as an example
Architecting application with Hadoop - using clickstream analytics as an exampleArchitecting application with Hadoop - using clickstream analytics as an example
Architecting application with Hadoop - using clickstream analytics as an example
 

Similar to Schemas Beyond The Edge

Node.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scaleNode.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scaleDmytro Semenov
 
Building Kick Ass Video Games for the Cloud
Building Kick Ass Video Games for the CloudBuilding Kick Ass Video Games for the Cloud
Building Kick Ass Video Games for the CloudChris Schalk
 
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with KubernetesKubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with KubernetesSeungYong Oh
 
Node.js Presentation
Node.js PresentationNode.js Presentation
Node.js PresentationExist
 
RPC in Smalltalk
 RPC in Smalltalk RPC in Smalltalk
RPC in SmalltalkESUG
 
Seminario eMadrid 2015 09 10 sobre Serious Games (UCM) Manuel Freire - RAGE:...
Seminario eMadrid 2015 09 10 sobre Serious Games (UCM) Manuel Freire -  RAGE:...Seminario eMadrid 2015 09 10 sobre Serious Games (UCM) Manuel Freire -  RAGE:...
Seminario eMadrid 2015 09 10 sobre Serious Games (UCM) Manuel Freire - RAGE:...eMadrid network
 
Machine Learning Infrastructure
Machine Learning InfrastructureMachine Learning Infrastructure
Machine Learning InfrastructureSigOpt
 
LCU14 310- Cisco ODP v2
LCU14 310- Cisco ODP v2LCU14 310- Cisco ODP v2
LCU14 310- Cisco ODP v2Linaro
 
mloc.js 2014 - JavaScript and the browser as a platform for game development
mloc.js 2014 - JavaScript and the browser as a platform for game developmentmloc.js 2014 - JavaScript and the browser as a platform for game development
mloc.js 2014 - JavaScript and the browser as a platform for game developmentDavid Galeano
 
Mobile game architecture on GCP
Mobile game architecture on GCPMobile game architecture on GCP
Mobile game architecture on GCP명근 최
 
Zero Downtime JEE Architectures
Zero Downtime JEE ArchitecturesZero Downtime JEE Architectures
Zero Downtime JEE ArchitecturesAlexander Penev
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Guglielmo Iozzia
 
Electron JS | Build cross-platform desktop applications with web technologies
Electron JS | Build cross-platform desktop applications with web technologiesElectron JS | Build cross-platform desktop applications with web technologies
Electron JS | Build cross-platform desktop applications with web technologiesBethmi Gunasekara
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kevin Lynch
 
Unite2014 Bunny Necropsy - Servers, Syncing Game State, Security and Optimiza...
Unite2014 Bunny Necropsy - Servers, Syncing Game State, Security and Optimiza...Unite2014 Bunny Necropsy - Servers, Syncing Game State, Security and Optimiza...
Unite2014 Bunny Necropsy - Servers, Syncing Game State, Security and Optimiza...David Geurts
 
Ultimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on KubernetesUltimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on Kuberneteskloia
 

Similar to Schemas Beyond The Edge (20)

Node.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scaleNode.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scale
 
ARISE
ARISEARISE
ARISE
 
Building Kick Ass Video Games for the Cloud
Building Kick Ass Video Games for the CloudBuilding Kick Ass Video Games for the Cloud
Building Kick Ass Video Games for the Cloud
 
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with KubernetesKubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
 
Node.js Presentation
Node.js PresentationNode.js Presentation
Node.js Presentation
 
RPC in Smalltalk
 RPC in Smalltalk RPC in Smalltalk
RPC in Smalltalk
 
Seminario eMadrid 2015 09 10 sobre Serious Games (UCM) Manuel Freire - RAGE:...
Seminario eMadrid 2015 09 10 sobre Serious Games (UCM) Manuel Freire -  RAGE:...Seminario eMadrid 2015 09 10 sobre Serious Games (UCM) Manuel Freire -  RAGE:...
Seminario eMadrid 2015 09 10 sobre Serious Games (UCM) Manuel Freire - RAGE:...
 
Machine Learning Infrastructure
Machine Learning InfrastructureMachine Learning Infrastructure
Machine Learning Infrastructure
 
LCU14 310- Cisco ODP v2
LCU14 310- Cisco ODP v2LCU14 310- Cisco ODP v2
LCU14 310- Cisco ODP v2
 
mloc.js 2014 - JavaScript and the browser as a platform for game development
mloc.js 2014 - JavaScript and the browser as a platform for game developmentmloc.js 2014 - JavaScript and the browser as a platform for game development
mloc.js 2014 - JavaScript and the browser as a platform for game development
 
Mobile game architecture on GCP
Mobile game architecture on GCPMobile game architecture on GCP
Mobile game architecture on GCP
 
Zero Downtime JEE Architectures
Zero Downtime JEE ArchitecturesZero Downtime JEE Architectures
Zero Downtime JEE Architectures
 
COM+ & MSMQ
COM+ & MSMQCOM+ & MSMQ
COM+ & MSMQ
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
 
Electron JS | Build cross-platform desktop applications with web technologies
Electron JS | Build cross-platform desktop applications with web technologiesElectron JS | Build cross-platform desktop applications with web technologies
Electron JS | Build cross-platform desktop applications with web technologies
 
resume
resumeresume
resume
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
 
Unite2014 Bunny Necropsy - Servers, Syncing Game State, Security and Optimiza...
Unite2014 Bunny Necropsy - Servers, Syncing Game State, Security and Optimiza...Unite2014 Bunny Necropsy - Servers, Syncing Game State, Security and Optimiza...
Unite2014 Bunny Necropsy - Servers, Syncing Game State, Security and Optimiza...
 
Ultimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on KubernetesUltimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on Kubernetes
 
Rashmi_Resume
Rashmi_ResumeRashmi_Resume
Rashmi_Resume
 

More from confluent

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flinkconfluent
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flinkconfluent
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluentconfluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkconfluent
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloudconfluent
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Diveconfluent
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluentconfluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3confluent
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernizationconfluent
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataconfluent
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2confluent
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023confluent
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesisconfluent
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023confluent
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streamsconfluent
 

More from confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Recently uploaded

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 

Recently uploaded (20)

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 

Schemas Beyond The Edge

  • 1. Schemas Beyond The Edge Alexei Zenin Platform Engineer, Uken Games August 17th, 2021
  • 2. Journey Mobile Analytics at Uken Games Key Features of Schema Registry Revamped Mobile Analytics Pipeline: JSON to Protobuf 2
  • 3. Mobile Analytics at Uken Games Collection of data to drive: ● Retention & engagement analysis (AB tests, DAU) ● Operational debugging & investigation (purchases, rewards) ● Lower-level metric analysis (HTTP, CPU, Memory) Focus will be on the first two 3
  • 4. Current Ingestion Pipeline Design 4 ~200 million events per day
  • 5. Schema Silos 5 ● Each silo has its own processes and tools ● Several teams managing the same data asset definitions ● Duplication of work across the boundaries ○ Structure ○ Data types ○ Naming
  • 7. Pros & Cons Pros ● Adding a new event is quick ● JSON is easy to use ● JSON well supported across systems Cons ● Duplicated effort across teams for data management ● Repeated schema definitions ● Manual processes to keep things in sync ● JSON retransmits schema information each time 7
  • 9. Overview of Schema Registry ● Provides a central place to upload your schemas ● Immutable & Idempotent API ● Acts as “barcode system” 9 https://bit.ly/3ikXUci
  • 10. Default Wire Format (Confluent) 10
  • 11. Single Primary Architecture 11 ● Primary is elected via Kafka Group Protocol ● Each Secondary is informed of the primary’s address ● Primary is responsible for writes (e.g. registering new schemas) ● Every node can serve reads or forward write requests
  • 12. Ingestion Pipeline 2.0: Protobuf 12
  • 13. Ingestion Pipeline 2.0 Goals: ● Decrease boilerplate work ● Leverage automation ● Increase transparency ● Single source of truth for schemas Solution: Gitops paradigm with Protobuf & Schema Registry 13
  • 14. JSON to Protobuf: Schema Structures ● Legacy JSON schema structure use the concept of an envelope with various subdivisions ● Convert into Protobuf, preserve schema structure 14
  • 15. Protobuf Equivalent Protobuf features: ● Can express custom types and use composition to glue together schemas in 1 top level schema ● Has support for some native types like google.protobuf.Timestamp ● Ability to generate code from schema (e.g. Java, C#) 15
  • 16. Envelope per payload? ● Uken has between 200-300 event payloads per game ● One approach is to copy paste the envelope per payload per game Envelopes = Payloads X Games ● Downsides are duplication in schemas and generated code ● Leads to a poor developer experience 16
  • 17. “Oneof” Branching ● Try using the oneof construct to enumerate all possible payloads at the EventPayload level ● Only need an envelope per game or 1 mega envelope for all games Envelopes = O(Games) ● Similar to Avro Unions ● Still has duplication of envelope and EventPayload 17
  • 18. How to get around strict Protobuf schemas ● Each definition needs to be defined upfront and explicitly in Protobuf Solution: Defer schema attached until runtime to be able to reuse envelope across games 18
  • 19. Protobuf’s Any: Dynamic Message Container ● Allows you to use any embedded Protobuf type ● Uses special “packing” code ● The type_url only accepts Protobuf package names ● Requires class to be present in application 19 https://bit.ly/2Uo6RcS
  • 20. Schema Registry Compatible “Any” ● Enables attaching schemas at runtime ● Removes type_url for schema_id ● The value field is Proto3 encoded bytes 20
  • 21. Dynamic Envelope ● Can define one envelope for all games ● Use the AnyUken type for EventPayload ● Tradeoff explicitness for flexibility ● Elevate schema IDs to a first class concept within the schemas themselves (schema pointers) 21
  • 23. How do we get these schemas past the edge? 23
  • 24. Integrating Schema Registry with Mobile clients Problems: ● Generated Protobuf classes are not aware of schema IDs ● Client device needs access to schema IDs 24
  • 25. “Online” Approach (traditional) 25 ● Expose Schema Registry (SR) to mobile clients directly ● Fetch required schema IDs during app runtime Disadvantages: ● Need to setup security for schema registry ● Need to scale SR to millions of clients ● Point of failure for client ● Adds network overhead to client
  • 27. Embedded Tradeoffs ● No need to expose SR to millions of clients ● Leverage the immutable property of schema IDs ● Custom build process ● Total snapshot of schema IDs built into app 27
  • 28. Serverless + Schema Registry 28
  • 29. EventBatch wire format 29 ● Define a generic envelope as the API contract between API Gateway & Mobile client ● Able to take any resolvable schema, return error if bad data ● Avoids hardcoding a specific version of AnalyticsEvent ● Keeps wire format compliant to Proto3
  • 30. Nesting Schema IDs Visualized 30
  • 31. Performance Comparison 31 ● 3x improvement in latency to ingest a batch of events ● 2x reduction in size per event ● 2x increase in possible number of events buffered on device JSON Latency Protobuf Latency
  • 32. Schema Management: GitOps paradigm ● Version control Protobuf schemas in Git monorepo ● Use Merge Requests to collaborate on impending changes ● Run CI/CD on commits ● Place people into the right spots, let automation do the rest 32
  • 33. 33
  • 34. 34 Next Steps ● Adding a Data Dictionary ● Improving visibility with integrations to Slack ● Iterating on RACI matrix for data management ● Migrating Spark Jobs to utilize new data management process
  • 35. 35 Takeaways ● GitOps allows for data management automation ● Schema Registry can empower devices outside the data center ● Schema IDs allow flexible envelope designs that operate better at scale ● Protobuf can help reduce costs by several times