This document discusses using Apache Kafka Streams for stream processing. It begins with an overview of Apache Kafka and Kafka Streams. It then presents several real-life use cases that have been implemented with Kafka Streams, including data conversions from XML to Avro, stream-table joins for event propagation, duplicate elimination, and detecting absence of events. The document concludes with recommendations for developing and operating Kafka Streams applications.
Streaming all over the world Real life use cases with Kafka Streams
1. Streaming all over the World
Real Life Use Cases with Kafka Streams
Dr. Benedikt Linse
2. About Me: Setting Data in Motion
● 3 years with Confluent
● Working with Customers mainly in DACH region
○ Architecture Workshops
○ Streaming Workshops
○ Multi-Data Center Workshops
○ Kafka Deployments and Upgrades
○ Kafka Security Workshops
○ Kafka Health Checks
● Long background in data integration and microservices
○ ThomsonReuters, HolidayInsider, MAN Truck and Bus, Pivotal / Dell-EMC
3. Outline
Apache Kafka
Apache Kafka Streams
Kafka Streams Real Life Use Cases
● Data Conversions: XML to Avro
● Stream-Table Joins: Container Event Propagation
● Duplicate Elimination (e.g. Crawled Data, Bulk Imports)
● Negation: Detecting Absence of Events
● Other Use Cases
Demo on Confluent Cloud
Key Takeaways / Outlook
Questions / Feedback
6. Kafka: Replication and High Availability
Source:
https://confluentinc.wordpress.com/2015/04/07/hands-free-kafka-replication-a-lesson-in-operational-si
7. What Kafka is famous for
● Open Source
● Active Community
● Scalability
● Durability
● Fault Tolerance
● Low Latency
● High Throughput
● Huge Ecosystem
● Extensibility
● Highly configurable
Source: Stackoverflow Trends
28. Development and Operations
Testing
● Use the Kafka Streams Topology
Test Driver
● Kafka Streams Topology Visualizer
Multi-Data-Center Strategy
● Think about your MDC
architectures
Cluster and Streams Deployment
● If at all possible, use a fully hosted
solution (Confluent Cloud). Starter
package is free to use.
● If not possible use CP-Ansible or
Confluent for Kubernetes
● Use Standby Replicas
Data Model
● Use Confluent Schema Registry to
keep Control over your data
Kafka Security
● Keep ACLs or role-bindings under
Version Control
● Use TLS
Application deployment
● Performance Monitor your Kafka
Streams apps and State Stores
using JMX
Use Kafka Connectors: Confluent Hub
● Source connectors for import
● Sink connectors for writing data to
external systems
30. Key Takeaways / Outlook
Kafka and Kafka Streams are a great
platform for Stream Processing
● Horizontal Scalability
● Fault Tolerance / High Availability
● High Throughput
● Low Latency
● DSL (built-in functions + libraries)
There is a Plethora of Use Cases that can
elegantly be solved with Kafka Streams.
● Requires in-depth knowledge
● Not all customers are there yet
● More flexible than KSQL
Stateful transformers are a powerful tool
for accessing state computed from
previous records
● Internally RocksDB is used as a
Key-Value Store to hold the state.
Some use cases are better implemented
using databases
● Full text search
● Multiple secondary indexes
Monitoring and Testing Kafka Streams
Apps is important
● Many JMX Metrics exposed (state stores
and processor node metrics)
● Use Kafka Streams Test Driver