Confluent Platform is supporting London Metal Exchange’s Kafka Centre of Excellence across a number of projects with the main objective to provide a reliable, resilient, scalable and overall efficient Kafka as a Service model to the teams across the entire London Metal Exchange estate.
10. Apache Kafka is a Distributed Commit Log
Process streams of events
and produce new ones
In real time, as they occur
110101
010111
001101
100010
Publish and subscribe to
streams of events
Similar to a message queue
110101
010111
001101
100010
Store streams of events In a fault tolerant way
110101
010111
001101
100010
12
17. What happens inside a producer?
21
Producer
Producer Record
Topic
[Partition]
[Timestamp]
Value
Serializer Partitioner
Topic A
Partition 0
Batch 0
Batch 1
Batch 2
Topic B
Partition 1
Batch 0
Batch 1
Batch 2
Kafka
Broker
Send()
Retry
?
Fail
?
Yes
No
non-retriable
exception
success metadata
Yes
[Headers]
[Key]
18. Make Kafka
Widely Accessible
to Developers
Enable all developers to
leverage Kafka throughout
the organization with a wide
variety of Confluent clients
Confluent Clients
Battle-tested and high performing
producer and consumer APIs (plus
admin client)
20. Connect Any
Application to Kafka
REST Proxy
Non-Java
Applications
Native Kafka Java
Applications
Schema Registry
REST / HTTP
Allows third-party apps to
produce and consume
messages
Communicate via
HTTP-connected devices
Provides a RESTful
interface to a Kafka cluster
REST Proxy
21. 29
Admin REST APIs
Confluent Platform
introduces REST APIs for
administrative
operations to simplify
Kafka management
Admin REST APIs add even greater
flexibility in how you manage Kafka:
Describe, list, and configure brokers
Create, delete, describe, list, and configure
topics
Delete, describe, and list consumer groups
Create, delete, describe, and list ACLs
List partition reassignments
Confluent offers several options to run
admin operations, including Control
Center, the CLI, and Kafka clients...
22. REST Proxy: Key Features
API endpoints:
• Produce messages:
Topic : POST /topics/(string:topic_name)
Partition : POST /topics/(string:topic_name)/partitions/(int:partition_id)
• Consume messages (Note: requires stickyness to REST Proxy instance):
Consumer Group : GET /consumers/(string:group_name)/instances/(string:instance)/records
• Consumer group management:
Add/Remove Instances: POST /consumers/(string:group_name), DELETE
/consumers/(string:group_name)/instances/(string:instance)
Commit/Get Offsets : POST or GET /consumers/(string:group_name)/instances/(string:instance)/offsets
Modify Subscriptions: POST, GET or DELETE /consumers/(string:group_name)/instances/(string:instance)/subscription
Modify Assignments : POST or GET /consumers/(string:group_name)/instances/(string:instance)/assignments
Reposition : POST or GET /consumers/(string:group_name)/instances/(string:instance)/positions
• Get Metadata:
Topic : GET /topics, GET /topics/(string:topic_name)
Partition : GET /topics/(string:topic_name)/partitions/(int:partition_id)
Broker : GET /brokers
• Admin functions (preview):
Create Topic : POST /clusters/(string:cluster_id)/topics
Delete Topic: DELETE /clusters/(string:cluster_id)/topics/(string:topic_name)
List Topic Configs: Partition : GET /clusters/(string:cluster_id)/topics/(string:topic_name)/configs
30
24. Many sources without a policy
causes mayhem in a centralized
data pipeline
Ensuring downstream systems can
use the data is key to an operational
stream pipeline
Even within a single application,
different formats can be presented Incompatibly formatted message
The Challenge of
Data Compatibility
at Scale
App 03
App 02
App 01
32
26. Schema Registry: Key Features
• Manage schemas and enforce schema policies
Define, per Kafka topic, a set of compatible schemas that are “allowed”
Schemas can be defined by an admin or by clients at runtime
Avro, Protobuf, and JSON schemas all supported
• Automatic validation when data is written to a topic
If the data doesn’t match the schema, the producer gets an error
• Works transparently
When used with Confluent Kafka clients, Kafka REST Proxy, and Kafka Streams
• Integrates with Kafka Connect
• Integrates with Kafka Streams
• Supports high availability (within a datacenter)
• Supports multi-datacenter deployments
34
28. Kafka Connect
No-Code way of connecting known systems (databases, object storage, queues, etc)
to Apache Kafka
Some code can be written to do custom transforms and data conversions though
maybe out of the box Single Message Transforms and Converters exist
Kafka Connect Kafka Connect
Data
sources
Data
sinks
29. Kafka
Cluster
Kafka Connect
Durable Data
Pipelines
Schema
Registry
Worker
Integrate upstream and
downstream systems with Apache
Kafka®
• Capture schema from sources, use
schema to inform data sinks
• Highly Available workers ensure
data pipelines aren’t interrupted
• Extensible framework API for
building custom connectors
Kafka Connect
Worker
Worker
Worker
30. Instantly Connect Popular Data Sources & Sinks
Data Diode
210+
pre-built
connectors
90+ Confluent Supported 60+ Partner Supported, Confluent Verified
31. Confluent HUB
Easily browse connectors by:
• Source vs Sinks
• Confluent vs Partner supported
• Commercial vs Free
• Available in Confluent Cloud
confluent.io/hub
Instantly Connect
Popular Data
Sources & Sinks
34. Stream Processing by Analogy
46
Kafka Cluster
Connect API Stream Processing Connect API
$ cat < in.txt | grep "ksql" | tr a-z A-Z > out.txt
35. Kafka Streams
Scalable Stream
Processing
Build scalable, durable
stream-processing services with
the Kafka Streams Java Library
• Simple functional API
• Powerful Processing API
• No Framework needed, it’s a
Library, use it and deploy it as any
other JVM Library
builder.stream(inputTopic)
.map((k, v) ->
new KeyValue<>(
(String) v.getAccountId(),
(Integer) v.getTotalValue())
)
.groupByKey()
.count()
.toStream().to(outputTopic);
36. Where does the processing code run?
49
Brokers?
Nope!
App
Streams
API
Same app, many instances
App
Streams
API
App
Streams
API
37. Leverages Consumer Group Protocol
50
App
Streams
API
Same app, many instances
App
Streams
API
App
Streams
API
39. Stream Processing in Kafka
52
Flexibility Simplicity
Producer/Consume
r
Kafka Streams API
● subscribe()
● poll()
● send()
● flush()
● filter()
● map()
● join()
● aggregate()
ksqlDB
● Select…from…
● Join…where…
● Group by..
40. ksqlDB provides one solution for capturing events, stream processing, and serving both push
and pull queries
Simplify Your Stream Processing Architecture
DB
APP
APP
DB
PULL
PUSH
CONNECTORS
STREAM
PROCESSING
STATE STORES
ksqlDB
1 2
APP
41. Streaming app with 4 SQL statements
59
Serve lookups against
materialized views
Create
materialized views
Perform continuous
transformations
CREATE SOURCE CONNECTOR jdbcConnector WITH (
‘connector.class’ = '...JdbcSourceConnector',
‘connection.url’ = '...',
…);
CREATE STREAM purchases AS
SELECT viewtime, userid,pageid,
TIMESTAMPTOSTRING(viewtime, 'yyyy-MM-dd HH:mm:ss.SSS')
FROM pageviews;
CREATE TABLE orders_by_country AS
SELECT country, COUNT(*) AS order_count, SUM(order_total) AS order_total
FROM purchases
WINDOW TUMBLING (SIZE 5 MINUTES)
LEFT JOIN purchases ON purchases.customer_id = user_profiles.customer_id
GROUP BY country
EMIT CHANGES;
SELECT * FROM orders_by_country WHERE country='usa';
Capture data
43. Confluent
Control Center
The simplest way to operate
and build applications with
Apache Kafka
For Operators
Centrally manage and monitor
multi-cluster environments and security
For Developers
View messages, topics and schemas,
manage connectors and build ksqlDB
queries
44. Messages
Browse messages, and search
offsets or timestamps by
partition
Topics
Create, edit, delete and view all
topics in one place
Schemas
Create, edit and view topic
schemas, and compare schema
versions
Accelerate Application Development and
Integration
45. Adhere to Established
Event Streaming SLAs
Monitor and optimize
system health
• Broker and ZooKeeper uptime
• Under replicated partitions
• Out of sync replicas
• Disk usage and distribution
• Alerting
Broker overview
Cluster overview
48. What is Health+?
● Intelligent alerts: manage Health+
intelligent alerts via Confluent Cloud’s UI
● Accelerated Confluent Support:
Support uses the performance
metadata to help you with questions or
problems even faster.
● Monitoring dashboards: view all of your
critical metrics in a single cloud-based
dashboard.
● Confluent Telemetry Reporter: send
performance metadata back to
Confluent via Confluent Telemetry
Reporter plugin.
49. Intelligent alerts There are 50+ alerts available today (with
many more to come), including:
● Request handler idle percentage
● Network processor idle percentage
● Active controller count
● Offline partitions
● Unclean leader elections
● Under replicated partitions
● Under min in-sync replicas
● Disk usage
● Unused topics
● No metrics from cluster in one hour
50. How does it work ?
● Confluent Telemetry Reporter is a plugin
that runs inside each Confluent
Platform service (only brokers at the
moment) to push metadata about the
service to Confluent
● Data is sent over HTTP using an
encrypted connection, once per minute
by default
● _confluent-telemetry-metrics topic
is where metrics are stored
55. 75
Tiered Storage
Tiered Storage enables
infinite data retention
and elastic scalability by
decoupling the compute
and storage layers in
Kafka
Event Streaming is storage-intensive:
...
Micro-
service
...
SFDC App
Splunk
...
Device
Logs
Object
Storage
Main-
frame
...
Hadoop
Data Stores
3rd Party Apps Custom Apps /
Microservices
Logs
56. 76
Tiered Storage
Tiered Storage enables
infinite data retention
and elastic scalability by
decoupling the compute
and storage layers in
Kafka
Tiered Storage allows Kafka to
recognize two layers of storage:
Brokers
Cost-effective
Object Storage
Offload old data
to object store
57. 77
Tiered Storage
Tiered Storage enables
infinite data retention
and elastic scalability by
decoupling the compute
and storage layers in
Kafka
Tiered Storage delivers three primary
benefits that revolutionize the way
our customers experience Kafka:
Infinite data retention
Reimagine what event streaming apps can do
Reduced infrastructure costs
Offload data to cost-effective object storage
Platform elasticity
Scale compute and storage independently
59. 80
Multi-Region
Cluster (MRC)
A cluster stretched across multiple
regions that can replicate synchronously
and asynchronously. It requires 3 data
centers/regions minimum (at least for
zookeeper).
It is offset preserving and has automatic
client failover with no custom code.
Note: A rack can only have synchronous
or asynchronous replicas for a topic, not
both. But you can have multiple racks in
a DC/Zone
61. 84
Cluster Linking
Cluster Linking allows you to directly connect
clusters together and mirror topics from one
cluster to another without the need for Connect.
Cluster Linking makes it much easier to build
multi-datacenter, multi-cluster, and hybrid
cloud deployments.
62. Sharing data between independent
clusters or migrating clusters presents
two challenges:
1. Requires deploying a separate Connect
cluster
2. Offsets are not preserved, so messages
are at risk of being skipped or reread
85
Cluster Linking
Cluster Linking
simplifies hybrid cloud
and multi-cloud
deployments for Kafka
1
2
0 1 2 3 4 ...
4 5 6 7 8 ...
Topic 1, DC 1:
Topic 1, DC 2:
DC 1: DC 2:
63. 86
Cluster Linking
Cluster Linking
simplifies hybrid cloud
and multi-cloud
deployments for Kafka
Cluster Linking requires no additional
infrastructure and preserves offsets:
Migrate
clusters to
Confluent
Cloud