SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Downloaden Sie, um offline zu lesen
DEEP DIVE INTO / INTERNALSDEEP DIVE INTO / INTERNALS
OFOF
KAFKA STREAMSKAFKA STREAMS
KAFKA STREAMS 2.2.0KAFKA STREAMS 2.2.0
//
//
THE "INTERNALS" BOOKS:THE "INTERNALS" BOOKS:
//
@JACEKLASKOWSKI@JACEKLASKOWSKI
STACKOVERFLOWSTACKOVERFLOW GITHUBGITHUB
KAFKAKAFKA
STREAMSSTREAMS APACHE KAFKAAPACHE KAFKA
MY VERY FIRST #KAFKASUMMIT!MY VERY FIRST #KAFKASUMMIT!
Jacek Laskowski is a freelance IT consultant
Core competencies in Spark, Kafka, Kafka
Streams, Scala
Development | Consulting | Training
Among contributors to
Contact me at jacek@japila.pl
Follow on twitter
for more #ApacheSpark, #ApacheKafka,
#KafkaStreams
Apache Spark
@JacekLaskowski
Jacek is best known by the online "Internals"
books:
1.
2.
3.
4.
5.
The Internals of Apache Spark
The Internals of Spark SQL
The Internals of Spark Structured
Streaming
The Internals of Kafka Streams
The Internals of Apache Kafka
Jacek is "active" on StackOverflow
AGENDAAGENDA
1.
2.
1.
2.
3.
1.
2.
3.
4.
5.
Kafka Streams
Main Development Entities
Topology
KafkaStreams
Main Execution Entities
StreamThread
TaskManager
StreamTask and StandbyTask
StreamsPartitionAssignor
RebalanceListener
KAFKA STREAMSKAFKA STREAMS (1 OF 2)(1 OF 2)
1. Kafka Streams is a client library for stream
processing applications that process records
in Kafka topics
Low-level Processor API
2. Stream processing primitives, e.g. KStream and
KTable
High-level Streams DSL
3. Topology to describe the processing flow
» One record at a time «
4. Wrapper around the Kafka Producer and
Consumer APIs
5. Supports fault­tolerant local state for stateful
operations (e.g. windowed aggregations and
joins)
KAFKA STREAMSKAFKA STREAMS (2 OF 2)(2 OF 2)
1. "Groundbreaking" facts (which changed my life):
1. When started, a topology processes one
record at a time
2. A Kafka Streams application can be run in
multiple instances (e.g. as Docker containers)
3. Increasing stream processing power is about
increasing number of threads or even
instances
MAIN DEVELOPMENT ENTITIESMAIN DEVELOPMENT ENTITIES
1. As a Kafka Streams developer you work with the
two main developer-facing entities:
1. Topology
2. KafkaStreams
TOPOLOGYTOPOLOGY
1. Represents the stream processing logic of a
Kafka Streams application
2. Directed Acyclic Graph of Processors (Stream
Processing Nodes)
3. Logical representation
4. Created directly or indirectly (using Streams
DSL)
5. Topology API
Adding sources, processors, sinks, state stores
6. Can be described (and printed out to stdout)
EXAMPLE: CREATING TOPOLOGYEXAMPLE: CREATING TOPOLOGY
// Creating directly
import org.apache.kafka.streams.Topology
val topology = new Topology
// Created using Streams DSL (StreamsBuilder API)
// Scala API for Kafka Streams
import org.apache.kafka.streams.scala._
import ImplicitConversions._
import Serdes._
val builder = new StreamsBuilder
val topology = builder.build
EXAMPLE: DESCRIBING TOPOLOGYEXAMPLE: DESCRIBING TOPOLOGY
scala> println(topology.describe)
Topologies:
Sub-topology: 0 for global store (will not generate tasks)
Source: demo-source-processor (topics: [demo-topic])
--> demo-processor-supplier
Processor: demo-processor-supplier (stores: [in-memory-key-v
--> none
<-- demo-source-processor
KAFKASTREAMSKAFKASTREAMS
1. Manages execution of a topology of a Kafka
Streams application
Start, close (shut down), state
2. Consumes messages from and produces
processing results to Kafka topics
3. Acceptable to create multiple KafkaStreams
instances per Kafka Streams application
4. For better performance, consider creating
multiple KafkaStreams instances as separate
instances of Kafka Streams application
EXAMPLE: CREATING KAFKASTREAMSEXAMPLE: CREATING KAFKASTREAMS
import org.apache.kafka.streams.KafkaStreams
val topology: Topology = ...
val config: StreamsConfig = ...
val ks = new KafkaStreams(topology, config)
ks.start // <-- starts execution
MAIN EXECUTION ENTITIESMAIN EXECUTION ENTITIES
1. StreamThread
2. TaskManager
3. StreamTask and StandbyTask
4. StreamsPartitionAssignor
5. RebalanceListener
STREAMTHREADSTREAMTHREAD (1 OF 2)(1 OF 2)
1. Stream Processor Thread
Runs the main record processing loop
Thread of execution (java.lang.Thread)
2. StreamThread uses a Kafka consumer (to poll for
records)
Subscribes to source topics
Think of Consumer Group
3. num.stream.threads configuration property
(default: 1)
4. Uses Kafka Consumer "tools" for operation
Registers StreamsPartitionAssignor
Uses RebalanceListener to intercept
changes to the partitions assigned
5. Uses TaskManager to manage processing tasks
(on next slide)
STREAMTHREADSTREAMTHREAD (2 OF 2)(2 OF 2)
TASKMANAGERTASKMANAGER (1 OF 2)(1 OF 2)
1. Task manager of a StreamThread
2. Manages active and standby tasks
Only active StreamTasks process records
3. Creates processor tasks for assigned partitions
RebalanceListener.onPartitionsAssigned
TASKMANAGERTASKMANAGER (2 OF 2)(2 OF 2)
STREAM PROCESSOR TASKSSTREAM PROCESSOR TASKS (1 OF 2)(1 OF 2)
1. Managed by TaskManager to run a topology of
stream processors
2. StreamTask - the default processor task
As many as partitions assigned
Managed as a group as
AssignedStreamsTasks
Only active StreamTasks process records
Use FIFO RecordQueues (per Kafka
TopicPartition)
3. StandbyTask - a backup processor task
"Ghost" tasks for active StreamTasks
Default: 0 standby tasks
Managed as a group as
AssignedStandbyTasks
STREAM PROCESSOR TASKSSTREAM PROCESSOR TASKS (2 OF 2)(2 OF 2)
STREAMSPARTITIONASSIGNORSTREAMSPARTITIONASSIGNOR (1 OF 2)(1 OF 2)
1. Custom PartitionAssignor from the Kafka Consum
Used for dynamic partition assignment and distr
across the members of a consumer group
partition.assignment.strategy /
ConsumerConfig.PARTITION_ASSIGNME
configuration property
2. Group management protocol
group membership
state synchronization
3. Assigns partitions dynamically across the instances
Required application.id /
StreamsConfig.APPLICATION_ID_CONFI
STREAMSPARTITIONASSIGNORSTREAMSPARTITIONASSIGNOR (2 OF 2)(2 OF 2)
REBALANCELISTENERREBALANCELISTENER (1 OF 2)(1 OF 2)
1. Custom ConsumerRebalanceListener from
the Kafka Consumer API
Callback interface for custom actions when
the set of partitions assigned to the consumer
changes
partition re-assignment will be triggered any
time the members of the consumer group
change
2. Intercepts changes to the partitions assigned to a
single StreamThread
REBALANCELISTENERREBALANCELISTENER (2 OF 2)(2 OF 2)
RECAPRECAP
1.
2.
1.
2.
3.
1.
2.
3.
4.
5.
Kafka Streams
Main Development Entities
Topology
KafkaStreams
Main Execution Entities
StreamThread
TaskManager
StreamTask and StandbyTask
StreamsPartitionAssignor
RebalanceListener
QUESTIONS?QUESTIONS?
Read
Read
Follow on twitter (DMs
open)
Upvote
Contact me at jacek@japila.pl
The Internals of Kafka Streams
The Internals of Apache Kafka
@jaceklaskowski
my questions and answers on
StackOverflow
© 2019 / / jacek@japila.plJacek Laskowski @JacekLaskowski

Weitere ähnliche Inhalte

Was ist angesagt?

From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planningconfluent
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controllerconfluent
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explainedconfluent
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaJeff Holoman
 
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안SANG WON PARK
 
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming ApplicationsRunning Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming ApplicationsLightbend
 
Integrating Apache Kafka and Elastic Using the Connect Framework
Integrating Apache Kafka and Elastic Using the Connect FrameworkIntegrating Apache Kafka and Elastic Using the Connect Framework
Integrating Apache Kafka and Elastic Using the Connect Frameworkconfluent
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaJiangjie Qin
 
Stream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETStream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETconfluent
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!Guido Schmutz
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsFlink Forward
 
Improving fault tolerance and scaling out in Kafka Streams with Bill Bejeck |...
Improving fault tolerance and scaling out in Kafka Streams with Bill Bejeck |...Improving fault tolerance and scaling out in Kafka Streams with Bill Bejeck |...
Improving fault tolerance and scaling out in Kafka Streams with Bill Bejeck |...HostedbyConfluent
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?confluent
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink Forward
 
Troubleshooting Kafka's socket server: from incident to resolution
Troubleshooting Kafka's socket server: from incident to resolutionTroubleshooting Kafka's socket server: from incident to resolution
Troubleshooting Kafka's socket server: from incident to resolutionJoel Koshy
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache KafkaAmir Sedighi
 

Was ist angesagt? (20)

From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
 
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming ApplicationsRunning Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
 
Integrating Apache Kafka and Elastic Using the Connect Framework
Integrating Apache Kafka and Elastic Using the Connect FrameworkIntegrating Apache Kafka and Elastic Using the Connect Framework
Integrating Apache Kafka and Elastic Using the Connect Framework
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
Stream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETStream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NET
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobs
 
Improving fault tolerance and scaling out in Kafka Streams with Bill Bejeck |...
Improving fault tolerance and scaling out in Kafka Streams with Bill Bejeck |...Improving fault tolerance and scaling out in Kafka Streams with Bill Bejeck |...
Improving fault tolerance and scaling out in Kafka Streams with Bill Bejeck |...
 
Apache Kafka Security
Apache Kafka Security Apache Kafka Security
Apache Kafka Security
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Troubleshooting Kafka's socket server: from incident to resolution
Troubleshooting Kafka's socket server: from incident to resolutionTroubleshooting Kafka's socket server: from incident to resolution
Troubleshooting Kafka's socket server: from incident to resolution
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
 

Ähnlich wie Deep Dive Into Kafka Streams (and the Distributed Stream Processing Engine) (Jacek Laskowski, Consultant) Kafka Summit London 2019

Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark StreamingReal-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark StreamingAbdelhamide EL ARIB
 
What is Apache Kafka®?
What is Apache Kafka®?What is Apache Kafka®?
What is Apache Kafka®?Eventador
 
What is apache Kafka?
What is apache Kafka?What is apache Kafka?
What is apache Kafka?Kenny Gorman
 
Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsGuido Schmutz
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Helena Edelson
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Helena Edelson
 
Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaJoe Stein
 
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...Lightbend
 
DataConf.TW2018: Develop Kafka Streams Application on Your Laptop
DataConf.TW2018: Develop Kafka Streams Application on Your LaptopDataConf.TW2018: Develop Kafka Streams Application on Your Laptop
DataConf.TW2018: Develop Kafka Streams Application on Your LaptopYu-Jhe Li
 
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim DowlingStructured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim DowlingDatabricks
 
Writing Blazing Fast, and Production-Ready Kafka Streams apps in less than 30...
Writing Blazing Fast, and Production-Ready Kafka Streams apps in less than 30...Writing Blazing Fast, and Production-Ready Kafka Streams apps in less than 30...
Writing Blazing Fast, and Production-Ready Kafka Streams apps in less than 30...HostedbyConfluent
 
Camel Kafka Connectors: Tune Kafka to “Speak” with (Almost) Everything (Andre...
Camel Kafka Connectors: Tune Kafka to “Speak” with (Almost) Everything (Andre...Camel Kafka Connectors: Tune Kafka to “Speak” with (Almost) Everything (Andre...
Camel Kafka Connectors: Tune Kafka to “Speak” with (Almost) Everything (Andre...HostedbyConfluent
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Guido Schmutz
 
Big Data LDN 2018: STREAMING DATA MICROSERVICES WITH AKKA STREAMS, KAFKA STRE...
Big Data LDN 2018: STREAMING DATA MICROSERVICES WITH AKKA STREAMS, KAFKA STRE...Big Data LDN 2018: STREAMING DATA MICROSERVICES WITH AKKA STREAMS, KAFKA STRE...
Big Data LDN 2018: STREAMING DATA MICROSERVICES WITH AKKA STREAMS, KAFKA STRE...Matt Stubbs
 
Understanding Akka Streams, Back Pressure, and Asynchronous Architectures
Understanding Akka Streams, Back Pressure, and Asynchronous ArchitecturesUnderstanding Akka Streams, Back Pressure, and Asynchronous Architectures
Understanding Akka Streams, Back Pressure, and Asynchronous ArchitecturesLightbend
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraJoe Stein
 

Ähnlich wie Deep Dive Into Kafka Streams (and the Distributed Stream Processing Engine) (Jacek Laskowski, Consultant) Kafka Summit London 2019 (20)

Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark StreamingReal-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
 
2017 meetup-apache-kafka-nov
2017 meetup-apache-kafka-nov2017 meetup-apache-kafka-nov
2017 meetup-apache-kafka-nov
 
What is Apache Kafka®?
What is Apache Kafka®?What is Apache Kafka®?
What is Apache Kafka®?
 
What is apache Kafka?
What is apache Kafka?What is apache Kafka?
What is apache Kafka?
 
Spark streaming + kafka 0.10
Spark streaming + kafka 0.10Spark streaming + kafka 0.10
Spark streaming + kafka 0.10
 
Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka Streams
 
Sparkstreaming
SparkstreamingSparkstreaming
Sparkstreaming
 
Spark 101
Spark 101Spark 101
Spark 101
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
 
Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache Kafka
 
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
 
DataConf.TW2018: Develop Kafka Streams Application on Your Laptop
DataConf.TW2018: Develop Kafka Streams Application on Your LaptopDataConf.TW2018: Develop Kafka Streams Application on Your Laptop
DataConf.TW2018: Develop Kafka Streams Application on Your Laptop
 
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim DowlingStructured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
 
Writing Blazing Fast, and Production-Ready Kafka Streams apps in less than 30...
Writing Blazing Fast, and Production-Ready Kafka Streams apps in less than 30...Writing Blazing Fast, and Production-Ready Kafka Streams apps in less than 30...
Writing Blazing Fast, and Production-Ready Kafka Streams apps in less than 30...
 
Camel Kafka Connectors: Tune Kafka to “Speak” with (Almost) Everything (Andre...
Camel Kafka Connectors: Tune Kafka to “Speak” with (Almost) Everything (Andre...Camel Kafka Connectors: Tune Kafka to “Speak” with (Almost) Everything (Andre...
Camel Kafka Connectors: Tune Kafka to “Speak” with (Almost) Everything (Andre...
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
Big Data LDN 2018: STREAMING DATA MICROSERVICES WITH AKKA STREAMS, KAFKA STRE...
Big Data LDN 2018: STREAMING DATA MICROSERVICES WITH AKKA STREAMS, KAFKA STRE...Big Data LDN 2018: STREAMING DATA MICROSERVICES WITH AKKA STREAMS, KAFKA STRE...
Big Data LDN 2018: STREAMING DATA MICROSERVICES WITH AKKA STREAMS, KAFKA STRE...
 
Understanding Akka Streams, Back Pressure, and Asynchronous Architectures
Understanding Akka Streams, Back Pressure, and Asynchronous ArchitecturesUnderstanding Akka Streams, Back Pressure, and Asynchronous Architectures
Understanding Akka Streams, Back Pressure, and Asynchronous Architectures
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
 

Mehr von confluent

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flinkconfluent
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flinkconfluent
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluentconfluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkconfluent
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloudconfluent
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Diveconfluent
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluentconfluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3confluent
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernizationconfluent
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataconfluent
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2confluent
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023confluent
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesisconfluent
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023confluent
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streamsconfluent
 

Mehr von confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Kürzlich hochgeladen

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 

Kürzlich hochgeladen (20)

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 

Deep Dive Into Kafka Streams (and the Distributed Stream Processing Engine) (Jacek Laskowski, Consultant) Kafka Summit London 2019

  • 1. DEEP DIVE INTO / INTERNALSDEEP DIVE INTO / INTERNALS OFOF KAFKA STREAMSKAFKA STREAMS KAFKA STREAMS 2.2.0KAFKA STREAMS 2.2.0 // // THE "INTERNALS" BOOKS:THE "INTERNALS" BOOKS: // @JACEKLASKOWSKI@JACEKLASKOWSKI STACKOVERFLOWSTACKOVERFLOW GITHUBGITHUB KAFKAKAFKA STREAMSSTREAMS APACHE KAFKAAPACHE KAFKA
  • 2. MY VERY FIRST #KAFKASUMMIT!MY VERY FIRST #KAFKASUMMIT!
  • 3. Jacek Laskowski is a freelance IT consultant Core competencies in Spark, Kafka, Kafka Streams, Scala Development | Consulting | Training Among contributors to Contact me at jacek@japila.pl Follow on twitter for more #ApacheSpark, #ApacheKafka, #KafkaStreams Apache Spark @JacekLaskowski
  • 4. Jacek is best known by the online "Internals" books: 1. 2. 3. 4. 5. The Internals of Apache Spark The Internals of Spark SQL The Internals of Spark Structured Streaming The Internals of Kafka Streams The Internals of Apache Kafka
  • 5. Jacek is "active" on StackOverflow
  • 7. KAFKA STREAMSKAFKA STREAMS (1 OF 2)(1 OF 2) 1. Kafka Streams is a client library for stream processing applications that process records in Kafka topics Low-level Processor API 2. Stream processing primitives, e.g. KStream and KTable High-level Streams DSL 3. Topology to describe the processing flow » One record at a time « 4. Wrapper around the Kafka Producer and Consumer APIs 5. Supports fault­tolerant local state for stateful operations (e.g. windowed aggregations and joins)
  • 8. KAFKA STREAMSKAFKA STREAMS (2 OF 2)(2 OF 2) 1. "Groundbreaking" facts (which changed my life): 1. When started, a topology processes one record at a time 2. A Kafka Streams application can be run in multiple instances (e.g. as Docker containers) 3. Increasing stream processing power is about increasing number of threads or even instances
  • 9. MAIN DEVELOPMENT ENTITIESMAIN DEVELOPMENT ENTITIES 1. As a Kafka Streams developer you work with the two main developer-facing entities: 1. Topology 2. KafkaStreams
  • 10. TOPOLOGYTOPOLOGY 1. Represents the stream processing logic of a Kafka Streams application 2. Directed Acyclic Graph of Processors (Stream Processing Nodes) 3. Logical representation 4. Created directly or indirectly (using Streams DSL) 5. Topology API Adding sources, processors, sinks, state stores 6. Can be described (and printed out to stdout)
  • 11. EXAMPLE: CREATING TOPOLOGYEXAMPLE: CREATING TOPOLOGY // Creating directly import org.apache.kafka.streams.Topology val topology = new Topology // Created using Streams DSL (StreamsBuilder API) // Scala API for Kafka Streams import org.apache.kafka.streams.scala._ import ImplicitConversions._ import Serdes._ val builder = new StreamsBuilder val topology = builder.build
  • 12. EXAMPLE: DESCRIBING TOPOLOGYEXAMPLE: DESCRIBING TOPOLOGY scala> println(topology.describe) Topologies: Sub-topology: 0 for global store (will not generate tasks) Source: demo-source-processor (topics: [demo-topic]) --> demo-processor-supplier Processor: demo-processor-supplier (stores: [in-memory-key-v --> none <-- demo-source-processor
  • 13. KAFKASTREAMSKAFKASTREAMS 1. Manages execution of a topology of a Kafka Streams application Start, close (shut down), state 2. Consumes messages from and produces processing results to Kafka topics 3. Acceptable to create multiple KafkaStreams instances per Kafka Streams application 4. For better performance, consider creating multiple KafkaStreams instances as separate instances of Kafka Streams application
  • 14. EXAMPLE: CREATING KAFKASTREAMSEXAMPLE: CREATING KAFKASTREAMS import org.apache.kafka.streams.KafkaStreams val topology: Topology = ... val config: StreamsConfig = ... val ks = new KafkaStreams(topology, config) ks.start // <-- starts execution
  • 15. MAIN EXECUTION ENTITIESMAIN EXECUTION ENTITIES 1. StreamThread 2. TaskManager 3. StreamTask and StandbyTask 4. StreamsPartitionAssignor 5. RebalanceListener
  • 16. STREAMTHREADSTREAMTHREAD (1 OF 2)(1 OF 2) 1. Stream Processor Thread Runs the main record processing loop Thread of execution (java.lang.Thread) 2. StreamThread uses a Kafka consumer (to poll for records) Subscribes to source topics Think of Consumer Group 3. num.stream.threads configuration property (default: 1) 4. Uses Kafka Consumer "tools" for operation Registers StreamsPartitionAssignor Uses RebalanceListener to intercept changes to the partitions assigned 5. Uses TaskManager to manage processing tasks (on next slide)
  • 18. TASKMANAGERTASKMANAGER (1 OF 2)(1 OF 2) 1. Task manager of a StreamThread 2. Manages active and standby tasks Only active StreamTasks process records 3. Creates processor tasks for assigned partitions RebalanceListener.onPartitionsAssigned
  • 20. STREAM PROCESSOR TASKSSTREAM PROCESSOR TASKS (1 OF 2)(1 OF 2) 1. Managed by TaskManager to run a topology of stream processors 2. StreamTask - the default processor task As many as partitions assigned Managed as a group as AssignedStreamsTasks Only active StreamTasks process records Use FIFO RecordQueues (per Kafka TopicPartition) 3. StandbyTask - a backup processor task "Ghost" tasks for active StreamTasks Default: 0 standby tasks Managed as a group as AssignedStandbyTasks
  • 21. STREAM PROCESSOR TASKSSTREAM PROCESSOR TASKS (2 OF 2)(2 OF 2)
  • 22. STREAMSPARTITIONASSIGNORSTREAMSPARTITIONASSIGNOR (1 OF 2)(1 OF 2) 1. Custom PartitionAssignor from the Kafka Consum Used for dynamic partition assignment and distr across the members of a consumer group partition.assignment.strategy / ConsumerConfig.PARTITION_ASSIGNME configuration property 2. Group management protocol group membership state synchronization 3. Assigns partitions dynamically across the instances Required application.id / StreamsConfig.APPLICATION_ID_CONFI
  • 24. REBALANCELISTENERREBALANCELISTENER (1 OF 2)(1 OF 2) 1. Custom ConsumerRebalanceListener from the Kafka Consumer API Callback interface for custom actions when the set of partitions assigned to the consumer changes partition re-assignment will be triggered any time the members of the consumer group change 2. Intercepts changes to the partitions assigned to a single StreamThread
  • 27. QUESTIONS?QUESTIONS? Read Read Follow on twitter (DMs open) Upvote Contact me at jacek@japila.pl The Internals of Kafka Streams The Internals of Apache Kafka @jaceklaskowski my questions and answers on StackOverflow © 2019 / / jacek@japila.plJacek Laskowski @JacekLaskowski