Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
What to Upload to SlideShare
Loading in …3
×
1 of 65

Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME

1

Share

Download to read offline

Confluent Platform is supporting London Metal Exchange’s Kafka Centre of Excellence across a number of projects with the main objective to provide a reliable, resilient, scalable and overall efficient Kafka as a Service model to the teams across the entire London Metal Exchange estate.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME

  1. 1. Confluent Platform Components Introduction Customer Success Engineering
  2. 2. Agenda 3 01 Confluent Platform What makes up the Confluent Platform 02 Basic Concepts Events, Distributed Commit Log, Event Streaming/Processing 03 Components Brokers, Zookeeper, Clients, REST Proxy, Schema Registry ,Connect, Kafka Streams, ksqlDB, Control Center, Health+ Features Multi-Region Clusters, Tiered Storage, Cluster Linking, Self Balancing clusters 04
  3. 3. Motivation 4
  4. 4. Destination
  5. 5. What is the Confluent Platform? An Enterprise Event Streaming Platform built around Apache Kafka
  6. 6. Dynamic Performance & Elasticity Elastic Scaling | Infinite Storage Self-Balancing Clusters | Tiered Storage Flexible DevOps Automation Confluent for K8s | Ansible Playbooks Marketplace Availability Management & Monitoring Cloud Data Flow | Metrics API Control Center | Health+ Streaming Database ksqlDB Rich Pre-built Ecosystem Connectors | Hub | Schema Registry Multi-language Development Non-Java Clients | REST Proxy Admin REST APIs Global Resilience Multi AZ Clusters | 99.95% SLA | Replicator Multi-Region Clusters | Cluster Linking Data Compatibility Schema Registry | Schema Validation Enterprise-grade Security RBAC | BYOK | Private Networking Encryption | Audit Logs TCO / ROI Revenue / Cost / Risk Impact Complete Engagement Model Efficient Operations at Scale Unrestricted Developer Productivity Production-stage Prerequisites Partnership for Business Success Availability Everywhere Committer-driven Expertise Cloud service Software Fully Managed Cloud Service Self-managed Software Training Partners Enterprise Support Professional Services ARCHITECT OPERATOR DEVELOPER EXECUTIVE Apache Kafka Complete: Confluent completes Apache Kafka
  7. 7. Basic Concepts
  8. 8. Apache Kafka 10
  9. 9. Confluent Platform Components https://www.confluent.io/whitepaper/confluent-enterprise-reference-architecture/ Application Sticky Load Balancer REST Proxy Proxy Kafka Brokers Broker + Rebalancer ZooKeeper Nodes ZK ZK ZK Proxy Broker + Rebalancer Broker + Rebalancer Broker + Rebalancer Schema Registry Leader Follower ZK ZK Confluent Control Center Application Clients KStreams pp Streams Kafka Connect Worker + Connectors or Replicator Microservice Worker + Connectors or Replicator ksqlDB ksqlDB Server ksqlDB Server
  10. 10. Apache Kafka is a Distributed Commit Log Process streams of events and produce new ones In real time, as they occur 110101 010111 001101 100010 Publish and subscribe to streams of events Similar to a message queue 110101 010111 001101 100010 Store streams of events In a fault tolerant way 110101 010111 001101 100010 12
  11. 11. Anatomy of a Kafka Topic 1 2 3 4 5 6 8 9 7 Partition 1 Old New 1 2 3 4 5 6 8 7 Partition 0 10 9 11 12 Partition 2 1 2 3 4 5 6 8 7 10 9 11 12 Writes 1 2 3 4 5 6 8 7 10 9 11 12 Producers Writes Consumer A (offset=4) Consumer B (offset=7) Reads
  12. 12. Components
  13. 13. Brokers & Zookeeper
  14. 14. Apache Kafka - scale out and failover 16 Broker 1 Topic1 partition1 Broker 2 Broker 3 Broker 4 Topic1 partition1 Topic1 partition1 Topic1 partition2 Topic1 partition2 Topic1 partition2 Topic1 partition3 Topic1 partition4 Topic1 partition3 Topic1 partition3 Topic1 partition4 Topic1 partition4
  15. 15. Apache Zookeeper - cluster coordination 17 Broker 1 partition Broker 2 (controller) Broker 3 Broker 4 Zookeeper 2 partition partition Zookeeper 1 Zookeeper 3 (leader) partition partition partition partition Stores metadata: heartbeats, watches, controller elections, cluster/topic configs, permissions writes go to leader
  16. 16. Clients Smart Clients to dumb pipes
  17. 17. What happens inside a producer? 21 Producer Producer Record Topic [Partition] [Timestamp] Value Serializer Partitioner Topic A Partition 0 Batch 0 Batch 1 Batch 2 Topic B Partition 1 Batch 0 Batch 1 Batch 2 Kafka Broker Send() Retry ? Fail ? Yes No non-retriable exception success metadata Yes [Headers] [Key]
  18. 18. Make Kafka Widely Accessible to Developers Enable all developers to leverage Kafka throughout the organization with a wide variety of Confluent clients Confluent Clients Battle-tested and high performing producer and consumer APIs (plus admin client)
  19. 19. REST Proxy
  20. 20. Connect Any Application to Kafka REST Proxy Non-Java Applications Native Kafka Java Applications Schema Registry REST / HTTP Allows third-party apps to produce and consume messages Communicate via HTTP-connected devices Provides a RESTful interface to a Kafka cluster REST Proxy
  21. 21. 29 Admin REST APIs Confluent Platform introduces REST APIs for administrative operations to simplify Kafka management Admin REST APIs add even greater flexibility in how you manage Kafka: Describe, list, and configure brokers Create, delete, describe, list, and configure topics Delete, describe, and list consumer groups Create, delete, describe, and list ACLs List partition reassignments Confluent offers several options to run admin operations, including Control Center, the CLI, and Kafka clients...
  22. 22. REST Proxy: Key Features API endpoints: • Produce messages: Topic : POST /topics/(string:topic_name) Partition : POST /topics/(string:topic_name)/partitions/(int:partition_id) • Consume messages (Note: requires stickyness to REST Proxy instance): Consumer Group : GET /consumers/(string:group_name)/instances/(string:instance)/records • Consumer group management: Add/Remove Instances: POST /consumers/(string:group_name), DELETE /consumers/(string:group_name)/instances/(string:instance) Commit/Get Offsets : POST or GET /consumers/(string:group_name)/instances/(string:instance)/offsets Modify Subscriptions: POST, GET or DELETE /consumers/(string:group_name)/instances/(string:instance)/subscription Modify Assignments : POST or GET /consumers/(string:group_name)/instances/(string:instance)/assignments Reposition : POST or GET /consumers/(string:group_name)/instances/(string:instance)/positions • Get Metadata: Topic : GET /topics, GET /topics/(string:topic_name) Partition : GET /topics/(string:topic_name)/partitions/(int:partition_id) Broker : GET /brokers • Admin functions (preview): Create Topic : POST /clusters/(string:cluster_id)/topics Delete Topic: DELETE /clusters/(string:cluster_id)/topics/(string:topic_name) List Topic Configs: Partition : GET /clusters/(string:cluster_id)/topics/(string:topic_name)/configs 30
  23. 23. Confluent Schema Registry Enforce Producer/Consumer compatibility
  24. 24. Many sources without a policy causes mayhem in a centralized data pipeline Ensuring downstream systems can use the data is key to an operational stream pipeline Even within a single application, different formats can be presented Incompatibly formatted message The Challenge of Data Compatibility at Scale App 03 App 02 App 01 32
  25. 25. Enable Application Development Compatibility App 1 ! Schema Registry Kafka topic ! Serializer App 1 Serializer Develop using standard schemas • Store and share a versioned history of all standard schemas • Validate data compatibility at the client level Reduce operational complexity • Avoid time-consuming coordination among developers to standardize on schemas Schema Registry
  26. 26. Schema Registry: Key Features • Manage schemas and enforce schema policies Define, per Kafka topic, a set of compatible schemas that are “allowed” Schemas can be defined by an admin or by clients at runtime Avro, Protobuf, and JSON schemas all supported • Automatic validation when data is written to a topic If the data doesn’t match the schema, the producer gets an error • Works transparently When used with Confluent Kafka clients, Kafka REST Proxy, and Kafka Streams • Integrates with Kafka Connect • Integrates with Kafka Streams • Supports high availability (within a datacenter) • Supports multi-datacenter deployments 34
  27. 27. Kafka Connect No/Low Code connectivity to many systems
  28. 28. Kafka Connect No-Code way of connecting known systems (databases, object storage, queues, etc) to Apache Kafka Some code can be written to do custom transforms and data conversions though maybe out of the box Single Message Transforms and Converters exist Kafka Connect Kafka Connect Data sources Data sinks
  29. 29. Kafka Cluster Kafka Connect Durable Data Pipelines Schema Registry Worker Integrate upstream and downstream systems with Apache Kafka® • Capture schema from sources, use schema to inform data sinks • Highly Available workers ensure data pipelines aren’t interrupted • Extensible framework API for building custom connectors Kafka Connect Worker Worker Worker
  30. 30. Instantly Connect Popular Data Sources & Sinks Data Diode 210+ pre-built connectors 90+ Confluent Supported 60+ Partner Supported, Confluent Verified
  31. 31. Confluent HUB Easily browse connectors by: • Source vs Sinks • Confluent vs Partner supported • Commercial vs Free • Available in Confluent Cloud confluent.io/hub Instantly Connect Popular Data Sources & Sinks
  32. 32. Kafka Connect Connectors are reusable components that know how to talk to specific sources and sinks.
  33. 33. Kafka Streams Build apps which with stream processing inside
  34. 34. Stream Processing by Analogy 46 Kafka Cluster Connect API Stream Processing Connect API $ cat < in.txt | grep "ksql" | tr a-z A-Z > out.txt
  35. 35. Kafka Streams Scalable Stream Processing Build scalable, durable stream-processing services with the Kafka Streams Java Library • Simple functional API • Powerful Processing API • No Framework needed, it’s a Library, use it and deploy it as any other JVM Library builder.stream(inputTopic) .map((k, v) -> new KeyValue<>( (String) v.getAccountId(), (Integer) v.getTotalValue()) ) .groupByKey() .count() .toStream().to(outputTopic);
  36. 36. Where does the processing code run? 49 Brokers? Nope! App Streams API Same app, many instances App Streams API App Streams API
  37. 37. Leverages Consumer Group Protocol 50 App Streams API Same app, many instances App Streams API App Streams API
  38. 38. ksqlDB Stream processing using SQL and much more
  39. 39. Stream Processing in Kafka 52 Flexibility Simplicity Producer/Consume r Kafka Streams API ● subscribe() ● poll() ● send() ● flush() ● filter() ● map() ● join() ● aggregate() ksqlDB ● Select…from… ● Join…where… ● Group by..
  40. 40. ksqlDB provides one solution for capturing events, stream processing, and serving both push and pull queries Simplify Your Stream Processing Architecture DB APP APP DB PULL PUSH CONNECTORS STREAM PROCESSING STATE STORES ksqlDB 1 2 APP
  41. 41. Streaming app with 4 SQL statements 59 Serve lookups against materialized views Create materialized views Perform continuous transformations CREATE SOURCE CONNECTOR jdbcConnector WITH ( ‘connector.class’ = '...JdbcSourceConnector', ‘connection.url’ = '...', …); CREATE STREAM purchases AS SELECT viewtime, userid,pageid, TIMESTAMPTOSTRING(viewtime, 'yyyy-MM-dd HH:mm:ss.SSS') FROM pageviews; CREATE TABLE orders_by_country AS SELECT country, COUNT(*) AS order_count, SUM(order_total) AS order_total FROM purchases WINDOW TUMBLING (SIZE 5 MINUTES) LEFT JOIN purchases ON purchases.customer_id = user_profiles.customer_id GROUP BY country EMIT CHANGES; SELECT * FROM orders_by_country WHERE country='usa'; Capture data
  42. 42. Confluent Control Center
  43. 43. Confluent Control Center The simplest way to operate and build applications with Apache Kafka For Operators Centrally manage and monitor multi-cluster environments and security For Developers View messages, topics and schemas, manage connectors and build ksqlDB queries
  44. 44. Messages Browse messages, and search offsets or timestamps by partition Topics Create, edit, delete and view all topics in one place Schemas Create, edit and view topic schemas, and compare schema versions Accelerate Application Development and Integration
  45. 45. Adhere to Established Event Streaming SLAs Monitor and optimize system health • Broker and ZooKeeper uptime • Under replicated partitions • Out of sync replicas • Disk usage and distribution • Alerting Broker overview Cluster overview
  46. 46. Accelerate Application Development and Integration Simplify the developer’s mental model for ksqlDB • View a summary of all clusters • Develop and run queries • Support multiple ksqlDB clusters at a time Query editor
  47. 47. Health+
  48. 48. What is Health+? ● Intelligent alerts: manage Health+ intelligent alerts via Confluent Cloud’s UI ● Accelerated Confluent Support: Support uses the performance metadata to help you with questions or problems even faster. ● Monitoring dashboards: view all of your critical metrics in a single cloud-based dashboard. ● Confluent Telemetry Reporter: send performance metadata back to Confluent via Confluent Telemetry Reporter plugin.
  49. 49. Intelligent alerts There are 50+ alerts available today (with many more to come), including: ● Request handler idle percentage ● Network processor idle percentage ● Active controller count ● Offline partitions ● Unclean leader elections ● Under replicated partitions ● Under min in-sync replicas ● Disk usage ● Unused topics ● No metrics from cluster in one hour
  50. 50. How does it work ? ● Confluent Telemetry Reporter is a plugin that runs inside each Confluent Platform service (only brokers at the moment) to push metadata about the service to Confluent ● Data is sent over HTTP using an encrypted connection, once per minute by default ● _confluent-telemetry-metrics topic is where metrics are stored
  51. 51. Self Balancing clusters
  52. 52. 72 Self-Balancing Clusters Self-Balancing Clusters automate partition rebalances to improve Kafka’s performance, elasticity, and ease of operations Shrinkage Uneven load Expansion Rebalances are required regularly to optimize cluster performance:
  53. 53. 73 Self-Balancing Clusters Self-Balancing Clusters automate partition rebalances to improve Kafka’s performance, elasticity, and ease of operations Manual Rebalance Process: $ cat partitions-to-move.json { "partitions": [{ "topic": "foo", "partition": 1, "replicas": [1, 2, 4] }, ...], "version": 1 } $ kafka-reassign-partitions ... Confluent Platform: No complex math, no risk of human error Self-Balancing
  54. 54. Tiered Storage
  55. 55. 75 Tiered Storage Tiered Storage enables infinite data retention and elastic scalability by decoupling the compute and storage layers in Kafka Event Streaming is storage-intensive: ... Micro- service ... SFDC App Splunk ... Device Logs Object Storage Main- frame ... Hadoop Data Stores 3rd Party Apps Custom Apps / Microservices Logs
  56. 56. 76 Tiered Storage Tiered Storage enables infinite data retention and elastic scalability by decoupling the compute and storage layers in Kafka Tiered Storage allows Kafka to recognize two layers of storage: Brokers Cost-effective Object Storage Offload old data to object store
  57. 57. 77 Tiered Storage Tiered Storage enables infinite data retention and elastic scalability by decoupling the compute and storage layers in Kafka Tiered Storage delivers three primary benefits that revolutionize the way our customers experience Kafka: Infinite data retention Reimagine what event streaming apps can do Reduced infrastructure costs Offload data to cost-effective object storage Platform elasticity Scale compute and storage independently
  58. 58. Multi-Region Clusters
  59. 59. 80 Multi-Region Cluster (MRC) A cluster stretched across multiple regions that can replicate synchronously and asynchronously. It requires 3 data centers/regions minimum (at least for zookeeper). It is offset preserving and has automatic client failover with no custom code. Note: A rack can only have synchronous or asynchronous replicas for a topic, not both. But you can have multiple racks in a DC/Zone
  60. 60. Cluster Linking
  61. 61. 84 Cluster Linking Cluster Linking allows you to directly connect clusters together and mirror topics from one cluster to another without the need for Connect. Cluster Linking makes it much easier to build multi-datacenter, multi-cluster, and hybrid cloud deployments.
  62. 62. Sharing data between independent clusters or migrating clusters presents two challenges: 1. Requires deploying a separate Connect cluster 2. Offsets are not preserved, so messages are at risk of being skipped or reread 85 Cluster Linking Cluster Linking simplifies hybrid cloud and multi-cloud deployments for Kafka 1 2 0 1 2 3 4 ... 4 5 6 7 8 ... Topic 1, DC 1: Topic 1, DC 2: DC 1: DC 2:
  63. 63. 86 Cluster Linking Cluster Linking simplifies hybrid cloud and multi-cloud deployments for Kafka Cluster Linking requires no additional infrastructure and preserves offsets: Migrate clusters to Confluent Cloud
  64. 64. Questions? ableasdale@confluent.io

×