Suppose you want to know analytics on your Pulsar topics, or you want to debug those hard corner cases that fail to be sent, or even you want to monitor your Pulsar deployment: how do you do it?
A tool exists to do this and more: Pulsar SQL. Since the 2.2.0 release, Pulsar SQL provides an abstraction layer to run any SQL query we may want against Pulsar effortlessly and without affecting performance. There is nothing like it on the pub-sub ecosystem.
In this short session, we will revisit what Pulsar SQL is, how to make the best out of it, how to deploy it, and how to use it!
Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europe 2021
1. Pulsar Virtual Summit Europe 2021
Interactive Analytics on
Pulsar with Pulsar SQL
Axel Sirota
AI and Coud Consultant
@AxelSirota
2. Who am I?
QR to my Pluralsight
courses
QR to my O’Reilly
trainings
–Microsoft Certified Trainer
–Author, Instructor and Editor at Pluralsight, O’Reilly
Media, and Develop Intelligence
–AI and Cloud Consultant
3. Pulsar Virtual Summit Europe 2021
Catalogue
• A Simple Scenario
• Inspecting and Debugging Topics with Pulsar SQL
• Interactive Analytics
5. Pulsar Virtual Summit Europe 2021
Ann
a,28
,$50
Application Instance Pulsar Deployment
File Source
Pulsar Function
Ingress topic
Processed
topic
6. Pulsar Virtual Summit Europe 2021
1. You check the status on the Pulsar Function and there
are some exceptions
2. And you haven’t set a log topic for each Pulsar function
(at least it happened to us)
3. You don’t want downtime to debug locally
Some issues appear…
What can you do?
7. Pulsar Virtual Summit Europe 2021
Catalogue
• Inspecting and Debugging Topics with Pulsar SQL
8. Pulsar Virtual Summit Europe 2021
Pulsar SQL enhances the Pulsar Presto connector to query
topics interactively
One can check every message that passed through the
topic easily and in a safe manner
It is lightweight, simple, enables high concurrent access,
and you can reuse existing Presto clusters
Introducing… Pulsar SQL
10. content page
Configuration file
Specify where are the zookeepers
and brokers
connector.name=pulsar
pulsar.broker-service-url=https://my-pulsar-
deployment.com
pulsar.zookeeper-uri=https://my-pulsar-
deployment.com:2181
Put in
conf/presto/catalog/pulsar.proper
ties
11. content page
Two commands and magic
Start the worker inside the Presto
cluster
->./bin/pulsar sql-worker start
Running in 6896
12. content page
Two commands and magic
->./bin/pulsar sql
presto>
Start the console
So simple, yet so powerful!
14. Pulsar Virtual Summit Europe 2021
1. Validate schemas in a readable SQL format
2. Easily debug bad messages that make Pulsar Functions
fail unexpectedly
3. Leverage SQL tools and queries for analytics
But… why should we use it?
What can you do?
17. Pulsar Virtual Summit Europe 2021
presto> show columns from pulsar."public/default"."voo";
Column | Type | Extra | Comment
-------------------+-----------+-------+-----------------------------------------------------------------------------
__value__ | varchar | | The value of the message with primitive type schema
__partition__ | integer | | The partition number which the message belongs to
__event_time__ | timestamp | | Application defined timestamp in milliseconds of when the event occurred
__publish_time__ | timestamp | | The timestamp in milliseconds of when event as published
__message_id__ | varchar | | The message ID of the message used to generate this row
__sequence_id__ | bigint | | The sequence ID of the message used to generate this row
__producer_name__ | varchar | | The name of the producer that publish the message used to generate this row
__key__ | varchar | | The partition key for the topic
__properties__ | varchar | | User defined properties
(9 rows)
18. Pulsar Virtual Summit Europe 2021
2021-09-13, 12 2021-09-14, 9 2021-09-15, 15
metrics topic without Schema in public/pulsar-summit
SELECT * from “public/pulsar-summit”.metrics
__value__
2021-09-13,12
2021-09-14,9
2021-09-15,15
19. Pulsar Virtual Summit Europe 2021
2021-09-13, 12 2021-09-14, 9 2021-09-15, 15
metrics topic with Schema in public/pulsar-summit (Date, Metric)
Date Metric
2021-09-13 12
2021-09-14 9
2021-09-15 15
SELECT * from “public/pulsar-summit”.metrics
20. Pulsar Virtual Summit Europe 2021
2021-09-13, 12 2021-09-14, 9 2021-09-15, 15
metrics topic with Schema in public/pulsar-summit (Date, Metric)
SELECT count(1) from “public/pulsar-summit”.metrics where Metric > 10
Count
3
2021-10-15, 120
21. Pulsar Virtual Summit Europe 2021
2021-09-13, 12 2021-09-14, 9 2021-09-15, 15
metrics topic with Schema in public/pulsar-summit (Date, Metric)
Select as month(Date) as month, SUM(Metric) as agg_metric
from “public/pulsar-summit”.metrics
group by 1, order by 2 DESC
Month agg_metric
10 120
9 36
2021-10-15, 120
22. Pulsar Virtual Summit Europe 2021
1. Interactively debug topics without open subscriptions
2. Audit who send each message, when, where, what did it
send, and how much it took
3. Do analytics on the messages flowing through Pulsar
If you need to…
Then Pulsar SQL is what you look for!
And all of this without affecting production performance
23. Pulsar Virtual Summit Europe 2021
Thanks!!
Questions?
Axel Sirota
AI and Coud Consultant
@AxelSirota