SlideShare ist ein Scribd-Unternehmen logo
1 von 54
KStreamBuilder builder =
new KStreamBuilder();
KStreamBuilder builder =
new KStreamBuilder();
KStream<String, String> textLines =
builder.stream(inputTopic);
KStreamBuilder builder =
new KStreamBuilder();
KStream<String, String> textLines =
builder.stream(inputTopic);
KTable<String, Long> wordCounts = textLines
.flatMapValues(value -> toWords(value))
.groupBy((key, word) -> word)
.count("Counts");
KStreamBuilder builder =
new KStreamBuilder();
KStream<String, String> textLines =
builder.stream(inputTopic);
KTable<String, Long> wordCounts = textLines
.flatMapValues(value -> toWords(value))
.groupBy((key, word) -> word)
.count("Counts");
wordCounts.toStream().to(outputTopic);
•elasticity and scalability
•Local/Remote state in
stream processing
 Kafka Streams – The Power without the Weight at Streaming Morning@Lohika
 Kafka Streams – The Power without the Weight at Streaming Morning@Lohika
 Kafka Streams – The Power without the Weight at Streaming Morning@Lohika
 Kafka Streams – The Power without the Weight at Streaming Morning@Lohika
 Kafka Streams – The Power without the Weight at Streaming Morning@Lohika
 Kafka Streams – The Power without the Weight at Streaming Morning@Lohika
 Kafka Streams – The Power without the Weight at Streaming Morning@Lohika
 Kafka Streams – The Power without the Weight at Streaming Morning@Lohika
 Kafka Streams – The Power without the Weight at Streaming Morning@Lohika

Weitere ähnliche Inhalte

Ähnlich wie Kafka Streams – The Power without the Weight at Streaming Morning@Lohika

Look Ma, “update DB to HTML5 using C++”, no hands! 
Look Ma, “update DB to HTML5 using C++”, no hands! Look Ma, “update DB to HTML5 using C++”, no hands! 
Look Ma, “update DB to HTML5 using C++”, no hands! aleks-f
 
Sql saturday 829_decalogo_powerbi
Sql saturday 829_decalogo_powerbiSql saturday 829_decalogo_powerbi
Sql saturday 829_decalogo_powerbiLorenzo Vercellati
 
In a few short paragraphs, explain which cloud services you use (G
In a few short paragraphs, explain which cloud services you use (GIn a few short paragraphs, explain which cloud services you use (G
In a few short paragraphs, explain which cloud services you use (GMalikPinckney86
 
Computer Programming -II (Lec. 10).pptx
Computer Programming -II (Lec. 10).pptxComputer Programming -II (Lec. 10).pptx
Computer Programming -II (Lec. 10).pptxSaurabhSharma783949
 
S01 e01 schema-design
S01 e01 schema-designS01 e01 schema-design
S01 e01 schema-designMongoDB
 
Connect() Mini 2016
Connect() Mini 2016Connect() Mini 2016
Connect() Mini 2016Jeff Chu
 
Kotlin and Domain-Driven Design: A perfect match - Kotlin Meetup Munich
Kotlin and Domain-Driven Design: A perfect match - Kotlin Meetup MunichKotlin and Domain-Driven Design: A perfect match - Kotlin Meetup Munich
Kotlin and Domain-Driven Design: A perfect match - Kotlin Meetup MunichFlorian Benz
 
Heroku Postgres Cloud Database Webinar
Heroku Postgres Cloud Database WebinarHeroku Postgres Cloud Database Webinar
Heroku Postgres Cloud Database WebinarSalesforce Developers
 
C# Starter L04-Collections
C# Starter L04-CollectionsC# Starter L04-Collections
C# Starter L04-CollectionsMohammad Shaker
 
Работа с документами в JavaScript
Работа с документами в JavaScriptРабота с документами в JavaScript
Работа с документами в JavaScriptДмитрий Радыно
 
Cassandra London - 2.2 and 3.0
Cassandra London - 2.2 and 3.0Cassandra London - 2.2 and 3.0
Cassandra London - 2.2 and 3.0Christopher Batey
 
Stefan Hochdörfer - The NoSQL Store everyone ignores: PostgreSQL - NoSQL matt...
Stefan Hochdörfer - The NoSQL Store everyone ignores: PostgreSQL - NoSQL matt...Stefan Hochdörfer - The NoSQL Store everyone ignores: PostgreSQL - NoSQL matt...
Stefan Hochdörfer - The NoSQL Store everyone ignores: PostgreSQL - NoSQL matt...NoSQLmatters
 
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...Altinity Ltd
 
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...Beat Signer
 
Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015StampedeCon
 
Webinar: Build an Application Series - Session 2 - Getting Started
Webinar: Build an Application Series - Session 2 - Getting StartedWebinar: Build an Application Series - Session 2 - Getting Started
Webinar: Build an Application Series - Session 2 - Getting StartedMongoDB
 
Type safe embedded domain-specific languages
Type safe embedded domain-specific languagesType safe embedded domain-specific languages
Type safe embedded domain-specific languagesArthur Xavier
 

Ähnlich wie Kafka Streams – The Power without the Weight at Streaming Morning@Lohika (20)

Look Ma, “update DB to HTML5 using C++”, no hands! 
Look Ma, “update DB to HTML5 using C++”, no hands! Look Ma, “update DB to HTML5 using C++”, no hands! 
Look Ma, “update DB to HTML5 using C++”, no hands! 
 
Sql saturday 829_decalogo_powerbi
Sql saturday 829_decalogo_powerbiSql saturday 829_decalogo_powerbi
Sql saturday 829_decalogo_powerbi
 
In a few short paragraphs, explain which cloud services you use (G
In a few short paragraphs, explain which cloud services you use (GIn a few short paragraphs, explain which cloud services you use (G
In a few short paragraphs, explain which cloud services you use (G
 
Computer Programming -II (Lec. 10).pptx
Computer Programming -II (Lec. 10).pptxComputer Programming -II (Lec. 10).pptx
Computer Programming -II (Lec. 10).pptx
 
S01 e01 schema-design
S01 e01 schema-designS01 e01 schema-design
S01 e01 schema-design
 
Connect() Mini 2016
Connect() Mini 2016Connect() Mini 2016
Connect() Mini 2016
 
Kotlin and Domain-Driven Design: A perfect match - Kotlin Meetup Munich
Kotlin and Domain-Driven Design: A perfect match - Kotlin Meetup MunichKotlin and Domain-Driven Design: A perfect match - Kotlin Meetup Munich
Kotlin and Domain-Driven Design: A perfect match - Kotlin Meetup Munich
 
Heroku Postgres Cloud Database Webinar
Heroku Postgres Cloud Database WebinarHeroku Postgres Cloud Database Webinar
Heroku Postgres Cloud Database Webinar
 
C# Starter L04-Collections
C# Starter L04-CollectionsC# Starter L04-Collections
C# Starter L04-Collections
 
MongoDB
MongoDBMongoDB
MongoDB
 
Работа с документами в JavaScript
Работа с документами в JavaScriptРабота с документами в JavaScript
Работа с документами в JavaScript
 
Cassandra London - 2.2 and 3.0
Cassandra London - 2.2 and 3.0Cassandra London - 2.2 and 3.0
Cassandra London - 2.2 and 3.0
 
Stefan Hochdörfer - The NoSQL Store everyone ignores: PostgreSQL - NoSQL matt...
Stefan Hochdörfer - The NoSQL Store everyone ignores: PostgreSQL - NoSQL matt...Stefan Hochdörfer - The NoSQL Store everyone ignores: PostgreSQL - NoSQL matt...
Stefan Hochdörfer - The NoSQL Store everyone ignores: PostgreSQL - NoSQL matt...
 
Javascripting.pptx
Javascripting.pptxJavascripting.pptx
Javascripting.pptx
 
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
 
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...
 
Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015
 
Webinar: Build an Application Series - Session 2 - Getting Started
Webinar: Build an Application Series - Session 2 - Getting StartedWebinar: Build an Application Series - Session 2 - Getting Started
Webinar: Build an Application Series - Session 2 - Getting Started
 
Type safe embedded domain-specific languages
Type safe embedded domain-specific languagesType safe embedded domain-specific languages
Type safe embedded domain-specific languages
 
greenDAO
greenDAOgreenDAO
greenDAO
 

Kürzlich hochgeladen

8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitterShivangiSharma879191
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substationstephanwindworld
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066
 
lifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptxlifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptxsomshekarkn64
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptMadan Karki
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 

Kürzlich hochgeladen (20)

young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substation
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
 
lifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptxlifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptx
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.ppt
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 

Kafka Streams – The Power without the Weight at Streaming Morning@Lohika

  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25. KStreamBuilder builder = new KStreamBuilder();
  • 26. KStreamBuilder builder = new KStreamBuilder(); KStream<String, String> textLines = builder.stream(inputTopic);
  • 27. KStreamBuilder builder = new KStreamBuilder(); KStream<String, String> textLines = builder.stream(inputTopic); KTable<String, Long> wordCounts = textLines .flatMapValues(value -> toWords(value)) .groupBy((key, word) -> word) .count("Counts");
  • 28. KStreamBuilder builder = new KStreamBuilder(); KStream<String, String> textLines = builder.stream(inputTopic); KTable<String, Long> wordCounts = textLines .flatMapValues(value -> toWords(value)) .groupBy((key, word) -> word) .count("Counts"); wordCounts.toStream().to(outputTopic);
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 42.
  • 43.
  • 44.

Hinweis der Redaktion

  1. Request–response, or request–reply, is one of the basic methods computers use to communicate with each other, in which the first computer sends a request for some data and the second computer responds to the request. Usually, there is a series of such interchanges until the complete message is sent; browsing a web page is an example of request–response communication.  Important thing to remember – that such application possible serve only future requests.
  2. Important thing to remember – that such application possible serve only historical data.
  3. A stream is the most important abstraction provided by Kafka Streams: it represents an unbounded, continuously updating data set, where unbounded means “of unknown or of unlimited size”. A stream is an ordered, replayable, and fault-tolerant sequence of immutable data records, where a data record is defined as a key-value pair.
  4. The who’s who: Kafka distinguishes producers, consumers, and brokers. In short, producers publish data to Kafka brokers, and consumers read published data from Kafka brokers. Producers and consumers are totally decoupled. A Kafka clusterconsists of one or more brokers. The data: Data is stored in topics. The topic is the most important abstraction provided by Kafka: it is a category or feed name to which data is published by producers. Every topic in Kafka is split into one or more partitions, which are replicated across Kafka brokers for fault tolerance. Parallelism: Partitions of Kafka topics, and especially their number for a given topic, are also the main factor that determines the parallelism of Kafka with regards to reading and writing data. Because of their tight integration the parallelism of Kafka Streams is heavily influenced by and depending on Kafka’s parallelism.
  5. For each topic, the Kafka cluster maintains a partitioned log that looks like this: Each partition is an ordered, immutable sequence of records that is continually appended to—a structured commit log. The records in the partitions are each assigned a sequential id number called the offset that uniquely identifies each record within the partition.
  6. The Kafka cluster retains all published records—whether or not they have been consumed—using a configurable retention period. For example, if the retention policy is set to two days, then for the two days after a record is published, it is available for consumption, after which it will be discarded to free up space. Kafka's performance is effectively constant with respect to data size so storing data for a long time is not a problem. In fact, the only metadata retained on a per-consumer basis is the offset or position of that consumer in the log. This offset is controlled by the consumer: normally a consumer will advance its offset linearly as it reads records, but, in fact, since the position is controlled by the consumer it can consume records in any order it likes. For example a consumer can reset to an older offset to reprocess data from the past or skip ahead to the most recent record and start consuming from "now". This combination of features means that Kafka consumers are very cheap—they can come and go without much impact on the cluster or on other consumers. For example, you can use our command line tools to "tail" the contents of any topic without changing what is consumed by any existing consumers. The partitions in the log serve several purposes. First, they allow the log to scale beyond a size that will fit on a single server. Each individual partition must fit on the servers that host it, but a topic may have many partitions so it can handle an arbitrary amount of data. Second they act as the unit of parallelism—more on that in a bit.
  7. First option is to process streams if you have Kafka – its do it yourself – no frameworks, no libraries. Consume some messages from kafka Process Produce output
  8. Using the Kafka APIs directly works well for simple things. It doesn’t pull in any heavy dependencies to your app. We called this “hipster stream processing” since it is a kind of low-tech solution that appealed to people who liked to roll their own. This works well for simple one-message-at-a-time processing, but the problem comes when you want to do something more involved, say compute aggregations or join streams. In this case inventing a solution on top of the Kafka consumer APIs is fairly involved.
  9. Its hard, you need to deal with a lot of things and reinvent the wheel. Ordering? Partitioning? Fault tolerance? State management? Window operations? How to reprocess data?
  10. Pulling in a full-fledged stream processing framework gives you easy access to these more advanced operations. But the cost for a simple application is an explosion of complexity. This makes everything difficult, from debugging to performance optimization to monitoring to deployment. This is even worse if your app has both synchronous and asynchronous pieces as then you end up splitting your code between the stream processing framework and whatever mechanism you have for implementing services or apps. It’s just really hard to build and operationalize a critical part of your business in this way. This isn’t such a problem in all domains—after all, if you are already using Spark to build a batch workflow, and you want to add a Spark Streaming job into this mix for some real-time bits, the additional complexity is pretty low and it reuses the skills you already have. However if you are deploying a Spark cluster for the sole purpose of this new application, that is definitely a big complexity hit.
  11. Pulling in a full-fledged stream processing framework gives you easy access to these more advanced operations. But the cost for a simple application is an explosion of complexity. This makes everything difficult, from debugging to performance optimization to monitoring to deployment. This is even worse if your app has both synchronous and asynchronous pieces as then you end up splitting your code between the stream processing framework and whatever mechanism you have for implementing services or apps. It’s just really hard to build and operationalize a critical part of your business in this way. This isn’t such a problem in all domains—after all, if you are already using Spark to build a batch workflow, and you want to add a Spark Streaming job into this mix for some real-time bits, the additional complexity is pretty low and it reuses the skills you already have. However if you are deploying a Spark cluster for the sole purpose of this new application, that is definitely a big complexity hit.
  12. With kafka streams it looks much simple The inputs and outputs are just Kafka topics The data model is just Kafka’s keyed record data model throughout The partitioning model is just Kafka’s partitioning model, a Kafka partitioner works for streams too The group membership mechanism that manages partitions, assignment, and liveness is just Kafka’s group membership mechanism Tables and other stateful computations are just log compacted topics. Metrics are unified across the producer, consumer, and streams app so there is only one type of metric to capture for monitoring The position of your app is maintained by the application’s offsets, just as any Kafka consumer er.
  13. core abstractions in Kafka to be the primitives for stream processing we wanted to be able to give something that provides you what you would get out of a stream processing framework, but which has very little additional operational complexity beyond the normal Kafka producer and consumer APIs. In other words we were aiming for something like this
  14. Kafka Streams has a strong focus on usability and a great developer experience. It offers all the necessary stream processing primitives to allow applications to read data from Kafka as streams, process the data, and then either write the resulting data back to Kafka or send the final output to an external system. Developers can choose between a high-level DSL with commonly used operations like filter, map, join, as well as a low-level API for developers who need maximum control and flexibility.
  15. Designed as a lightweight library in Apache Kafka, much like the Kafka producer and consumer client libraries. You can easily embed and integrate Kafka Streams into your own applications, which is a significant departure from framework-based stream processing tools that dictate many requirements upon you such as how you must package and “submit” processing jobs to their cluster. Has no external dependencies on systems other than Apache Kafka and can be used in any Java application. Read: You do not need to deploy and operate a separate cluster for your stream processing needs. Your Operations and Info Sec teams, among others, will surely be happy to hear this. Leverages Kafka as its internal messaging layer instead of (re)implementing a custom messaging layer like many other stream processing tools. Notably, Kafka Streams uses Kafka’s partitioning model to horizontally scale processing while maintaining strong ordering guarantees. This ensures high performance, scalability, and operational simplicity for production environments. A key benefit of this design decision is that you do not have to understand and tune two different messaging layers – one for moving data streams at scale (Kafka) plus a separate one for your stream processing tool. Similarly, any performance and reliability improvements of Kafka will automatically be available to Kafka Streams, too, thus tapping into the momentum of Kafka’s strong developer community.
  16. Employs one-record-at-a-time processing to achieve low processing latency, which is crucial for a variety of use cases 
  17. Let’s illustrate this with an example. Imagine a table that tracks the total number of pageviews by user (first column of diagram below). Over time, whenever a new pageview event is processed, the state of the table is updated accordingly. Here, the state changes between different points in time – and different revisions of the table – can be represented as a changelog stream (second column).
  18. Interestingly, because of the stream-table duality, the same stream can be used to reconstruct the original table (third column):
  19. The stream-table duality describes the close relationship between streams and tables. Stream as Table: A stream can be considered a changelog of a table, where each data record in the stream captures a state change of the table. A stream is thus a table in disguise, and it can be easily turned into a “real” table by replaying the changelog from beginning to end to reconstruct the table. Similarly, in a more general analogy, aggregating data records in a stream – such as computing the total number of pageviews by user from a stream of pageview events – will return a table (here with the key and the value being the user and its corresponding pageview count, respectively). Table as Stream: A table can be considered a snapshot, at a point in time, of the latest value for each key in a stream (a stream’s data records are key-value pairs). A table is thus a stream in disguise, and it can be easily turned into a “real” stream by iterating over each key-value entry in the table.
  20. Figure 1: Before adding capacity, only a single instance of your Kafka Streams application is running.  At this point the corresponding “consumer group” of your application contains only a single member (this instance).  All data is being read and processed by this single instance.
  21.  After adding capacity, two additional instances of your Kafka Streams application are running, and they have automatically joined the application’s consumer group for a total of three current members. These three instances are automatically splitting the processing work between each other. The splitting is based on the Kafka topic partitions from which data is being read.
  22. If one of the application instances is stopped (e.g. intentional reduction of capacity, maintenance, machine failure), it will automatically leave the application’s consumer group, which causes the remaining instances to automatically take over the stopped instance’s processing work. how many instances can or should you run for your application?  Is there an upper limit for the number of instances and, similarly, for the parallelism of your application?  In a nutshell, the parallelism of a Kafka Streams application -- similar to the parallelism of Kafka -- is primarily determined by the number of partitions of the input topic(s) from which your application is reading. For example, if your application reads from a single topic that has 10 partitions, then you can run up to 10 instances of your applications (note that you can run further instances but these will be idle)
  23. A common pattern is for the stream processing job to take input records from its input stream, and for each input record make a remote call to a distributed database. The input stream is partitioned over multiple processors, each of which query a remote database. And, of course, since this is a distributed database, it is itself partitioning over multiple machines. One possibility is to co-partition the database and the input processing, and then move the data to be directly co-located with the processing
  24. An easy way to understand state in stream processing is to think about the kinds of operations you might do in SQL. Imagine running SQL queries against a real-time stream of data. If your SQL query contains only filtering and single-row transformations (a simple select and where clause, say), then it is stateless. That is, you can process a single row at a time without needing to remember anything in between rows. However, if your query involves aggregating many rows (a group by) or joining together data from multiple streams, then it must maintain some state in between rows. If you are grouping data by some field and counting, then the state you maintain would be the counts that have accumulated so far in the window you are processing. If you are joining two streams, the state would be the rows in each stream waiting to find a match in the other stream.
  25. Interactive Queries enables faster and more efficient use of the application state. Data is local to your application (in memory or possibly on SSDs); you can access it very quickly. This is especially useful for applications that need to access large amounts of application state, e.g., when they do joins. There is no duplication of data between the store doing aggregation for stream processing and the store answering queries. What do you need to do as a developer to make your Kafka Streams applications queryable? It turns out that Kafka Streams handles most of the low-level querying, metadata discovery and data fault tolerance for you. Depending on the application, you might be able to query straight out-of-the box with zero work (see local stores section below), or you might have to implement a layer of indirection for distributed querying  We start simple with a single app’s instance. That instance can query its own local state stores out-of-the-box. It can enumerate all the stores that are in that instance (in any processor node) as well as query the values of keys in those stores. This is simple and useful for apps that have a single instance, however apps can have multiple instances running in potentially different servers. Kafka Streams will partition up the data amongst these instances to scale out the capacity. If I want to get the latest value for key “stock X” and that key is not in my instance, how do I go about finding it?
  26. An application might have multiple instances running, each with its own set of state stores. We have made each instance aware of each other instance’s state stores through periodic metadata exchanges, which we provide through Kafka’s group membership protocol. Starting with Confluent Platform 3.1 and Apache Kafka 0.10.1, each instance may expose its endpoint information metadata (hostname and port, collectively known as the “application.server” config parameter) to other instances of the same application. The new Interactive Query APIs allow a developer to obtain the metadata for a given store name and key, and do that across the instances of an application. Hence, you can discover where the store is that holds a particular key by examining the metadata. So now we know which store on which application instance holds our key but how do we actually query that (potentially remote) store? It sounds like we need some sort of RPC mechanism.
  27. Out-of-the box, the Kafka Streams library includes all the low-level APIs to query state stores and discover state stores across the running instances of an application. The application can then implement its own query patterns on the data and route the requests among instances as needed. Apps often have intricate processing logic and might need non-trivial routing of requests among Kafka Streams instances. For example, an app might want to locate the instance that holds a particular customer ID, and then route a call to rank that customer’s stock to that particular instance. The business logic on that instance would sort the available data and return the result to the app. In more complex instances, we could have scatter-gather query patterns where a call to an instance results in N calls from that instance to other instances and N results being collated and returned to the original caller. It is clear from these examples that there is no one API for distributed querying. Furthermore, there is no single best transport layer for the RPCs either (some users favor or must use REST within their company, others might opt for Thrift, etc.) RPCs are needed for inter-instance communication (i.e., within the same app). Otherwise instance 3 of the app can’t actually retrieve state information from instance 1. They are also needed for inter-apps communication, where other applications query the original application’s state.