Learn the differences between an event-driven streaming platform and middleware like MQ, ETL and ESBs – including best practices and anti-patterns, but also how these concepts and tools complement each other in an enterprise architecture.
Extract-Transform-Load (ETL) is still a widely-used pattern to move data between different systems via batch processing. Due to its challenges in today’s world where real time is the new standard, an Enterprise Service Bus (ESB) is used in many enterprises as integration backbone between any kind of microservice, legacy application or cloud service to move data via SOAP / REST Web Services or other technologies. Stream Processing is often added as its own component in the enterprise architecture for correlation of different events to implement contextual rules and stateful analytics. Using all these components introduces challenges and complexities in development and operations.
This session discusses how teams in different industries solve these challenges by building a native streaming platform from the ground up instead of using ETL and ESB tools in their architecture. This allows to build and deploy independent, mission-critical streaming real time application and microservices. The architecture leverages distributed processing and fault-tolerance with fast failover, no-downtime rolling deployments and the ability to reprocess events, so you can recalculate output when your code changes. Integration and Stream Processing are still key functionality but can be realized in real time natively instead of using additional ETL, ESB or Stream Processing tools.
3. 3
Apache Kafka vs. Traditional Middleware
Agreement:
Kafka is the de facto standard for …
• messaging at scale!
• decoupling of microservices!
• reliable, lightweight stream processing!
Controversial discussion:
Use Apache Kafka
as middleware!
5. 5
Source A
Source B
Sink X
Sink Y
Connector A
REST / SOAP
Connector X
JMS
ETL / ESB
Product
XYZ
Dream
6. 6
ETL /
ESB
Passive Server
Source A
Source B
Sink X
Sink Y
Connector A
REST / SOAP
Connector X
JMS
ETL /
ESB
Active Server
Messaging
System A
(Real Time)
Messaging
System B
(Big Data)
Database
(Important Data)
In-Memory
Data Grid
(Cache)
API Gateway Stream
Processing
Engine
Build your custom integration layer… At least 4-5 different products or open source frameworks
Another
integration tool
(because one tool
cannot integrate
everything, buy more…)
Reality
7. 7
Challenges with current integration architectures (MQ / ETL / ESB)
Zoo of technologies
Integration platform (Extract-Transform-Load / Enterprise service Bus) + additional “optional” components
Messaging System (Message Queue, Peer-to-Peer)
In-Memory Cache or Data Grid
Database
Streaming Engine
API Gateway
Architectures with limited scalability and availability
No end-to-end scalability (only parts are built for high volume of messages and high throughput)
No native, built-in scalability (you cannot just add a broker to the running system)
Broker-per-scenario (e.g. hundreds of MQ clusters)
Active-passive clustering
Downtime for maintenance, upgrades, configuration changes
No backwards-compatibility
Tight coupling
Platform-centric integration implementation (à no separation of concerns, vendor lock-in)
Synchronous communication via request-response (SOAP / REST / gRPC / …)
No handling of backpressure or unavailable consumers (push-based messaging queues)
8. 8
Business Digitalization Trends are Driving the Need to Process
Events at a whole new Scale, Speed and Efficiency
The World has Changed
Mobile Cloud Microservices Internet of Things Machine Learning
9. 9
#NoMQ #NoESB #NoETL
can handle this well!
Tight coupling, limited scale, zoo of components
You need to think event-based to realize these use cases!
You need to leverage a single, scalable and loosely coupled platform
Event Streaming Platform!
?
11. 11
Source A
Source B
Sink X
Sink Y
Connector A
SOAP
Connector X
REST
Event
Streaming
Platform
Hmm…
Yet another one of
those magic boxes
in the middle, huh?
15. 15
Where are they?
Events haven’t had a
proper home in
infrastructure or in code.
They are implicit.
Here!
16. 16
Implementing an Event-driven Architecture Requires a Paradigm Shift
Relational DB
From data represented in static tables
and accessed with RPC...
...to data represented as streams of events
Apps Microservices
SaaS apps
Custom apps
Data warehouse
19. 19
A Streaming Platform is the Underpinning of an Event-driven Architecture
Ubiquitous connectivity
Globally scalable platform for all
event producers and consumers
Immediate data access
Data accessible to all
consumers in real time
Single system of record
Persistent storage to enable
reprocessing of past events
Continuous queries
Stream processing capabilities
for in-line data transformation
Microservices
DBs
SaaS apps
Mobile
Customer 360
Real-time fraud
detection
Data warehouse
Producers
Consumers
Database
change
Microservices
events
SaaS
data
Customer
experience
s
Streams of real time events
Stream processing apps
21. 21
Source A
Source B
Sink X
Sink Y
Connector A
SOAP
Connector X
REST
Integration
Layer
Middleware: Message Queue… ETL… Enterprise Service Bus…
Event Streaming Platform: Apache Kafka.
Spoilt for Choice!
22. 22
Business Digitalization Trends are Driving the Need to Process
Events at a whole new Scale, Speed and Efficiency
Why do you need an Event Streaming Platform? Remember:
Mobile Cloud Microservices Internet of Things Machine Learning
The World has Changed!
25. 25
● Global-scale
● Real-time
● Persistent Storage
● Stream Processing
Apache Kafka: The De-facto Standard for Real-Time Event Streaming
Edge
Cloud
Data LakeDatabases
Datacenter
IoT
SaaS AppsMobile
Microservices Machine
Learning
Apache Kafka
26. 26
Apache Kafka at Scale at Tech Giants
> 4.5 trillion messages / day > 6 Petabytes / day
“You name it”
27. 27
Event Streaming Platform - Value per Use Case
Improve
Customer
Experience
(CX)
Increase
Revenue
(make money)
Business
Value
Decrease
Costs
(save
money)
Core Business
Platform
Increase
Operational
Efficiency
Migrate to
Cloud
Mitigate
Risk (protect
money)
Key Drivers
Strategic Objectives
(sample)
Fraud
Detection
IoT sensor
ingestion
Digital
replatforming/
Mainframe Offload
Connected Car: Navigation & improved
in-car experience: Audi
Customer 360 Simplifying Omni-channel Retail at
Scale: Target
Faster transactional
processing / analysis
incl. Machine Learning / AI
Mainframe Offload: RBC
Microservices
Architecture
Online Fraud Detection
Online Security
(syslog, log
aggregation, Splunk
replacement)
Middleware
replacement
Regulatory
Digital
Transformation
Application Modernization: Multiple
Examples
Website / Core
Operations
(Central Nervous System)
The [Silicon Valley] Digital Natives;
LinkedIn, Netflix, Uber, Yelp...
Predictive Maintenance: Audi
Streaming Platform in a regulated
environment (e.g. Electronic Medical
Records): Celmatix
Real-time app
updates
Real Time Streaming Platform for
Communications and Beyond: Capital One
Developer Velocity - Building Stateful
Financial Applications with Kafka
Streams: Funding Circle
Detect Fraud & Prevent Fraud in Real
Time: PayPal
Kafka as a Service - A Tale of Security
and Multi-Tenancy: Apple
Example Use Cases
$↑
$↓
$
Example Case Studies
(of many)
28. 28
Why Apache Kafka instead of traditional middleware?
Event Streaming Platform
The core is event-based
Supports real time stream processing
But also Fire-and-Forget, Publish / Subscribe, Request-Response / RPC, Batch, …
Single infrastructure
Messaging, storage, processing
Extreme Scale and throughput
Reliability and zero downtime (“built for failure”)
High availability
Rolling upgrades and dynamic configuration changes
Backwards compatibility
Decoupling of clients
Agile microservices
Dumb pipes, smart endpoints
Handling backpressure
No vendor lock-in
29. 29
Why Apache Kafka instead of traditional middleware?
”Eat your own dog good!”
30. 30
Enemies!
ESB MQ
Storage
Streaming
Engine
Messaging: Kafka Core
Storage: Kafka Core
Caching: Kafka Core
Real-Time, Batch: Kafka Clients
Integration: Kafka Connect
Stream Processing: Kafka Streams / KSQL
Request-Response: REST Proxy
”Eat your own dog good”
vs.
Enemy 1
Enemy 2
Enemy 3
Enemy 4
…. More components, clusters, technologies means more conflicts, incompatibility, operations burden!
Enemy 5
31. 31
ETL /
ESB
Kafka Broker X
Source A
Source B
Sink X
Sink Y
Kafka Connect
Python
Kafka Connect
Java
Kafka
Broker 1
Schema
Registry
REST Proxy Kafka
Streams /
KSQL
No need for another infrastructure, cluster, database… High availability and scale handled by Kafka Topics!
Monitoring Tool
“Yet
another
Kafka
addon”
33. 34
Example - Kafka Streams / KSQL : Fault-tolerance, powered by Kafka
Processing fails over automatically, without data loss or miscomputation.
1 Kafka consumer group
rebalance is triggered
2 Processing and state of #3
is migrated via Kafka to
remaining servers #1 + #2
#3 died, so #1 and #2 take over
1 Kafka consumer group
rebalance is triggered
2 Part of processing incl.
state is migrated via Kafka
from #1 + #2 to server #3
#3 is back, so work is split again
34. 35
Example - Kafka Streams / KSQL: Elasticity and Scalability, powered by Kafka
You can add, remove, restart servers during live operations.
We need more processing power!” “Ok, we can scale down again.”
35. 36
Don’t use your ESB knowledge with Kafka!
https://www.thoughtworks.com/radar/techniques/r
ecreating-esb-antipatterns-with-kafka
!
37. 38
Friends!
• Kafka Connect connectors
(JMS, IBM MQ, RabbitMQ, etc.)
• JMS Client
(Kafka-native JMS Implementation)
• ESB or ETL tools
with their own connectors
• Kafka’s Client APIs
(like Java, .NET, Go, Python, Javascript)
• REST Proxy
• Etc.
Plenty of integration options between Kafka and traditional middleware
Traditional
Middleware
Apache
Kafka
38. 39
Why friends?
Kafka is NOT THE ALLROUNDER for every single problem!
Maybe you don’t need a scalable, reliable, distributed
system, but “just”:
• Integration with legacy components (Cobol, Edifact, …)
• Point-to-point messaging with active / passive HA for
(“a few”) messages
• Very specific messaging solution for “less than
millisecond performance”
• API Management
• “Real” Batch Processing (Hadoop, Flink, Informatica, …)
• …
X
39. 40
Choose the right integration technology
• Visual coding for complex
graphical mappings
• Integration components for
complex legacy standard
software and protocols
• (SAP BAPI / iDoc, Cobol,
Edifact, SOAP / WS*, etc.)
45. 46
An Event Streaming Platform gives services independence
Orders Customers
Payments
Stock
Freedom to tap into and manage shared data
REST
JMS
ESB
REST
CRM
Mainframe
SOAP
…
Kafka
Kafka
Kafka
Kafka
46. 47
If you really need Request-Response with Kafka à A Correlation ID is your friend!
47. 48
Before IQ L
With IQ J
If you need to combine RPC with Streaming: Interactive Queries (IQ) (Kafka Streams)
48. 49
Confluent’s Streaming Maturity Model - where are you?
Value
Maturity (Investment & time)
2
Enterprise
Streaming Pilot /
Early Production
Pub + Sub Store Process
5
Central Nervous
System
1
Developer
Interest
Pre-Streaming
4
Global
Streaming
3
SLA Ready,
Integrated
Streaming
Projects
Platform
50. 51
Mainframe offloading to Apache Kafka
Date Amount
1/27/2017 $4.56
1/22/2017 $32.14
Transaction Data
Vendor Description
Starbucks Coffee
Walmart Blu-Ray
Transaction Description
Integration via
- Kafka Connect (JMS, MQ)
- REST Proxy
- 3rd party CDC tool
- Etc.
Website
Client profiles
Mainframe MIPS = $$
Ability to migrate back if not working well
51. 52
MQ Integration (and later Replacement)
Date Amount
1/27/2017 $4.56
1/22/2017 $32.14
Transaction Data
Messaging Solution
(IBM MQ, RabbitMQ, etc.)
Application
1) Legacy MQ Communication with App
2) Kafka for decoupling between MQ and App
3) Direct communication via Kafka (no MQ anymore)
52. 53
New, Innovative Projects and Applications
Date Amount
1/27/2017 $4.56
1/22/2017 $32.14
Transaction Data
Messaging Solution
(IBM MQ, RabbitMQ, etc.)
Application
Kafka
Microservices
Agile, lightweight
(but scalable robust)
Kafka microservice
Big Data project
(Elastic, Spark, AWS
Sagemaker, …)
1) Direct Legacy MQ Communication with App
2) Kafka for decoupling between MQ and App
3) Direct communication via Kafka (no MQ anymore)
4) New projects and applications
(independent or related to the existing migration projects)
External
Solution
53. 54
Use the right tool for the job (and combine them where it makes sense!)
54. 55
This is just the beginning of a new era! Confluent Vision for Kafka:
Global
Automated disaster recovery
Global applications with geo-awareness
Infinite
Efficient and infinite data with tiered storage
Unlimited horizontal scalability for single clusters
Faster elastic scaling for brokers and partition
Elastic
Easy Container-based orchestration and management
Faster elastic scaling when adding brokers and partitions
Cloud-native Apache Kafka