SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Downloaden Sie, um offline zu lesen
Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018
Real-time Analytics on
PostgreSQL at any Scale
Marco Slot <marco@citusdata.com>
Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018
You offer a product or service (e.g. SaaS, IoT platform, network telemetry, …)
that generates large volumes of time series data.
How to build an analytical dashboard for your customers that:
• Supports a large number of concurrent users
• Reflects new data within minutes
• Has subsecond response times
• Supports advanced analytics
What is real-time analytics?
2
(Heap Analytics)
Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018
Pipeline of Collect - Aggregate - Query:
Real-time analytics architecture
3
Event source
Event source
Event source
Event source
Storage
(Database) Aggregate
Rollups
(Database)
Dashboard
(App)
Collect
Queries
Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018
Pipeline of Collect - Aggregate - Query:
Real-time analytics architecture
4
Event source
Event source
Event source
Event source
Storage
(Database) Aggregate
Rollups
(Database)
Dashboard
(App)
Collect
Queries
Postgres/Citus
Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018
Define a table for storing raw events:
CREATE TABLE events (
event_id bigserial,
event_time timestamptz default now(),
customer_id bigint,
event_type text,
…
event_details jsonb
);
Raw data table
5
Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018
COPY is by far the fastest way of loading data.
COPY events (customer_id, event_time, … ) FROM STDIN;
A few parallel COPY streams can load hundreds of thousands of events per
second!
Load data
6
Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018
To achieve fast data loading:
• Use COPY
• Don’t use indexes
To achieve fast reading of new events for aggregation:
• Use an index
Fast data loading
7
Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018
To achieve fast data loading:
• Use COPY
• Don’t use large indexes
To achieve fast reading of new events for aggregation:
• Use an index
Block-range index is suitable for ordered columns:
CREATE INDEX event_time_idx ON events USING BRIN (event_time);
Fast data loading
8
Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018
Pre-computed aggregates for a period and set of (group by) dimensions.
Can be further filtered and aggregated to generate charts.
What is a rollup?
9
Period Customer Country Site Hit Count
SELECT…
Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018
Append new data to a raw events table (avoid indexes!):
COPY events FROM ...
Periodically aggregate events into rollup table (index away!):
INSERT INTO rollup SELECT … FROM events … GROUP BY …
Application queries the rollup table:
SELECT … FROM rollup WHERE customer_id = 1238 …
Postgres recipe for real-time analytics
10
Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018
Keep your data sorted into buckets
Partitioning
11
Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018
Partitioning keeps indexes small by dividing tables into partitions:
Benefits:
• Avoid fragmentation
• Smaller indexes
• Partition pruning for queries that filter by partition column
• Drop old data quickly, without bloat/fragmentation
Partitioning
12
COPY COPY
Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018
Defining a partitioned table:
CREATE TABLE events (...) PARTITION BY (event_time);
Setting up hourly partitioning with pg_partman:
SELECT partman.create_parent('public.events', 'event_time',
'native', 'hourly');
https://www.citusdata.com/blog/2018/01/24/citus-and-pg-partman-creating-a-sca
lable-time-series-database-on-PostgreSQL/
CREATE EXTENSION pg_partman
13
Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018
If you’re using partitioning, pg_partman can drop old partitions:
UPDATE partman.part_config
SET retention_keep_table = false, retention = '1 month'
WHERE parent_table = 'public.events';
Periodically run maintenance:
SELECT partman.run_maintenance();
Expiring old data in a partitioned table
14
Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018
Run pg_partman maintenance every hour using pg_cron:
SELECT cron.schedule('3 * * * *', $$
SELECT partman.run_maintenance()
$$);
https://github.com/citusdata/pg_cron
Periodic partitioning maintenance
15
Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018
High vs. Low Cardinality
Designing Rollup Tables
16
Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018
Define rollup tables containing aggregates:
CREATE TABLE rollup_by_period_and_dimensions (
<period>
<dimensions>
<aggregates>
primary key (<dimensions>,<period>)
);
Primary key index covers many queries, can also add additional indices:
CREATE INDEX usc_idx ON rollup (customer_id, site_id);
Rollup table
17
Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018
Two rollups are smaller than one: A*B + A*C < A*B*C
But… up to 2x more aggregation work.
Choosing granularity and dimensions
18
Time Customer Country Aggregates
Time Customer Site Aggregates
Time Customer Country Site Aggregates
~100 rows per period/customer
~20 rows per period/customer
~20*100=2000 rows per period/customer
Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018
Find balance between query performance and table management.
1. Identify dimensions, metrics (aggregates)
2. Try rollup with all dimensions:
3. Test compression/performance (goal is >5x smaller)
4. If too slow / too big, split rollup table based on query patterns
5. Go to 3
Usually ends up with 5-10 rollup tables
Guidelines for designing rollups
19
Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018
Append-only vs. Incremental
Running Aggregations
20
Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018
Use INSERT INTO rollup SELECT … FROM events … to populate rollup table.
Append-only aggregation (insert):
Supports all aggregates, including exact distinct, percentiles
Harder to handle late data
Incremental aggregation (upsert):
Supports late data
Cannot handle all aggregates (though can approximate using HLL, TopN)
Append-only vs. Incremental
21
Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018
Aggregate events for a particular time period and append them to the rollup
table, once all the data for the period is available.
INSERT INTO rollup
SELECT period, dimensions, aggregates
FROM events
WHERE event_time::date = '2018-09-04'
GROUP BY period, dimensions;
Should keep track of which periods have been aggregated.
Append-only Aggregation
22
Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018
Aggregate new events and upsert into rollup table.
INSERT INTO rollup
SELECT period, dimensions, aggregates
FROM events
WHERE event_id BETWEEN s AND e
GROUP BY period, dimensions
ON CONFLICT (dimensions, period) DO UPDATE
SET aggregates = aggregates + EXCLUDED.aggregates;
Need to be able to incrementally build aggregates.
Need to keep track of which events have been aggregated.
Incremental Aggregation
23
Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018
Incremental aggregation
Technique for incremental aggregation using a sequence number shown on
the Citus Data blog.
Incrementally approximate distinct count:
HyperLogLog extension
Incrementally approximate top N:
TopN extension
24
Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018
CREATE EXTENSION Citus
Scaling out your analytics pipeline
25
Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018
Citus is an open source extension to Postgres (9.6, 10, 11) for transparently
distributing tables across many Postgres servers.
CREATE EXTENSION citus
26
Coordinator
create_distributed_table('events', 'customer_id');events
Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018
Multi-tenancy
Tenant ID provides a natural sharding dimension for many applications.
Citus automatically co-locates event and rollup data for the same
SELECT create_distributed_table('events', 'tenant_id');
SELECT create_distributed_table('rollup', 'tenant_id');
Aggregations can be done locally, without network traffic:
INSERT INTO rollup SELECT tenant_id, … FROM events …
Dashboard queries are always for a particular tenant:
SELECT … FROM rollup WHERE tenant_id = 1238 …
27
Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018
COPY asynchronously scatters rows to different shards
Data loading in Citus
28
Coordinator
COPYevents
Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018
INSERT … SELECT can be parallelised across shards.
Aggregation in Citus
29
Coordinator
events
create_distributed_table('rollup', 'customer_id');
INSERT INTO rollup
SELECT … FROM events
GROUP BY customer_id, …rollup
INSERT INTO rollup_102182
SELECT … FROM events_102010
GROUP BY …
INSERT INTO rollup_102180
SELECT … FROM events_102008
GROUP BY …
Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018
SELECT on rollup for a particular customer (from the dashboard) can be
routed to the appropriate shard.
Querying rollups in Citus
30
Coordinator
events SELECT … FROM rollup
WHERE customer_id = 12834 …
…rollup
SELECT … FROM events_102180
WHERE customer_id = 1283 …
…
Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018
You should use:
• COPY to load raw data into a table
• BRIN index to find new events during aggregation
• Partitioning with pg_partman to expire old data
• Rollup tables built from raw event data
• Append-only aggregation if you need exact percentile/distinct count
• Incremental aggregation if you can have late data
• HLL to incrementally approximate distinct count
• TopN to incrementally approximate heavy hitters
• Citus to scale out
Summary
31
Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018
marco@citusdata.com
Q&A
32

Weitere ähnliche Inhalte

Was ist angesagt?

ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...Altinity Ltd
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfAltinity Ltd
 
SQL Server Integration Services
SQL Server Integration ServicesSQL Server Integration Services
SQL Server Integration ServicesRobert MacLean
 
Elastic stack Presentation
Elastic stack PresentationElastic stack Presentation
Elastic stack PresentationAmr Alaa Yassen
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Ryan Blue
 
Log management with ELK
Log management with ELKLog management with ELK
Log management with ELKGeert Pante
 
Presto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation EnginesPresto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation EnginesDatabricks
 
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...Altinity Ltd
 
How Discord Migrated Trillions of Messages from Cassandra to ScyllaDB
How Discord Migrated Trillions of Messages from Cassandra to ScyllaDBHow Discord Migrated Trillions of Messages from Cassandra to ScyllaDB
How Discord Migrated Trillions of Messages from Cassandra to ScyllaDBScyllaDB
 
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Altinity Ltd
 
State of the Trino Project
State of the Trino ProjectState of the Trino Project
State of the Trino ProjectMartin Traverso
 
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlareClickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlareAltinity Ltd
 
Postgres on OpenStack
Postgres on OpenStackPostgres on OpenStack
Postgres on OpenStackEDB
 
Facebook Messenger/Whatsapp System Design
Facebook Messenger/Whatsapp System DesignFacebook Messenger/Whatsapp System Design
Facebook Messenger/Whatsapp System DesignElia Ahadi
 
Monitoring Oracle Database Instances with Zabbix
Monitoring Oracle Database Instances with ZabbixMonitoring Oracle Database Instances with Zabbix
Monitoring Oracle Database Instances with ZabbixGerger
 
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...Altinity Ltd
 
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent) K...
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent)  K...Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent)  K...
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent) K...confluent
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaCloudera, Inc.
 
Trino at linkedIn - 2021
Trino at linkedIn - 2021Trino at linkedIn - 2021
Trino at linkedIn - 2021Akshay Rai
 

Was ist angesagt? (20)

ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
 
SQL Server Integration Services
SQL Server Integration ServicesSQL Server Integration Services
SQL Server Integration Services
 
Elastic stack Presentation
Elastic stack PresentationElastic stack Presentation
Elastic stack Presentation
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
 
Log management with ELK
Log management with ELKLog management with ELK
Log management with ELK
 
Presto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation EnginesPresto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation Engines
 
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
 
How Discord Migrated Trillions of Messages from Cassandra to ScyllaDB
How Discord Migrated Trillions of Messages from Cassandra to ScyllaDBHow Discord Migrated Trillions of Messages from Cassandra to ScyllaDB
How Discord Migrated Trillions of Messages from Cassandra to ScyllaDB
 
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
 
State of the Trino Project
State of the Trino ProjectState of the Trino Project
State of the Trino Project
 
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlareClickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
 
ClickHouse Intro
ClickHouse IntroClickHouse Intro
ClickHouse Intro
 
Postgres on OpenStack
Postgres on OpenStackPostgres on OpenStack
Postgres on OpenStack
 
Facebook Messenger/Whatsapp System Design
Facebook Messenger/Whatsapp System DesignFacebook Messenger/Whatsapp System Design
Facebook Messenger/Whatsapp System Design
 
Monitoring Oracle Database Instances with Zabbix
Monitoring Oracle Database Instances with ZabbixMonitoring Oracle Database Instances with Zabbix
Monitoring Oracle Database Instances with Zabbix
 
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
 
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent) K...
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent)  K...Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent)  K...
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent) K...
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
 
Trino at linkedIn - 2021
Trino at linkedIn - 2021Trino at linkedIn - 2021
Trino at linkedIn - 2021
 

Ähnlich wie Real time analytics at any scale | PostgreSQL User Group NL | Marco Slot

Cassandra at Finn.io — May 30th 2013
Cassandra at Finn.io — May 30th 2013Cassandra at Finn.io — May 30th 2013
Cassandra at Finn.io — May 30th 2013DataStax Academy
 
Distributing Queries the Citus Way | PostgresConf US 2018 | Marco Slot
Distributing Queries the Citus Way | PostgresConf US 2018 | Marco SlotDistributing Queries the Citus Way | PostgresConf US 2018 | Marco Slot
Distributing Queries the Citus Way | PostgresConf US 2018 | Marco SlotCitus Data
 
ClickHouse -If Combinators for Fun and Profit-2022-05-04.pdf
ClickHouse -If Combinators for Fun and Profit-2022-05-04.pdfClickHouse -If Combinators for Fun and Profit-2022-05-04.pdf
ClickHouse -If Combinators for Fun and Profit-2022-05-04.pdfAltinity Ltd
 
Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineeringJulian Hyde
 
The Future of Sharding
The Future of ShardingThe Future of Sharding
The Future of ShardingEDB
 
Why and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on FlinkWhy and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on FlinkDataWorks Summit
 
Rodney Matejek Portfolio
Rodney Matejek PortfolioRodney Matejek Portfolio
Rodney Matejek Portfoliormatejek
 
Real Time Analytics with Apache Cassandra - Cassandra Day Berlin
Real Time Analytics with Apache Cassandra - Cassandra Day BerlinReal Time Analytics with Apache Cassandra - Cassandra Day Berlin
Real Time Analytics with Apache Cassandra - Cassandra Day BerlinGuido Schmutz
 
A head start on cloud native event driven applications - bigdatadays
A head start on cloud native event driven applications - bigdatadaysA head start on cloud native event driven applications - bigdatadays
A head start on cloud native event driven applications - bigdatadaysSriskandarajah Suhothayan
 
The State of Postgres | Strata San Jose 2018 | Umur Cubukcu
The State of Postgres | Strata San Jose 2018 | Umur CubukcuThe State of Postgres | Strata San Jose 2018 | Umur Cubukcu
The State of Postgres | Strata San Jose 2018 | Umur CubukcuCitus Data
 
How to build an ETL pipeline with Apache Beam on Google Cloud Dataflow
How to build an ETL pipeline with Apache Beam on Google Cloud DataflowHow to build an ETL pipeline with Apache Beam on Google Cloud Dataflow
How to build an ETL pipeline with Apache Beam on Google Cloud DataflowLucas Arruda
 
TDC2017 | São Paulo - Trilha BigData How we figured out we had a SRE team at ...
TDC2017 | São Paulo - Trilha BigData How we figured out we had a SRE team at ...TDC2017 | São Paulo - Trilha BigData How we figured out we had a SRE team at ...
TDC2017 | São Paulo - Trilha BigData How we figured out we had a SRE team at ...tdc-globalcode
 
Analyzing Streaming Data in Real-time - AWS Summit Cape Town 2018
Analyzing Streaming Data in Real-time - AWS Summit Cape Town 2018Analyzing Streaming Data in Real-time - AWS Summit Cape Town 2018
Analyzing Streaming Data in Real-time - AWS Summit Cape Town 2018Amazon Web Services
 
SplunkLive! London: Splunk ninjas- new features and search dojo
SplunkLive! London: Splunk ninjas- new features and search dojoSplunkLive! London: Splunk ninjas- new features and search dojo
SplunkLive! London: Splunk ninjas- new features and search dojoSplunk
 
Making App Developers More Productive
Making App Developers More ProductiveMaking App Developers More Productive
Making App Developers More ProductivePostman
 
Using bluemix predictive analytics service in Node-RED
Using bluemix predictive analytics service in Node-REDUsing bluemix predictive analytics service in Node-RED
Using bluemix predictive analytics service in Node-REDLionel Mommeja
 
Data Modeling for IoT and Big Data
Data Modeling for IoT and Big DataData Modeling for IoT and Big Data
Data Modeling for IoT and Big DataJayesh Thakrar
 
JKJ_SSRS PPS Excel Services Project.Documentation
JKJ_SSRS PPS Excel Services Project.DocumentationJKJ_SSRS PPS Excel Services Project.Documentation
JKJ_SSRS PPS Excel Services Project.DocumentationJeff Jacob
 

Ähnlich wie Real time analytics at any scale | PostgreSQL User Group NL | Marco Slot (20)

Cassandra at Finn.io — May 30th 2013
Cassandra at Finn.io — May 30th 2013Cassandra at Finn.io — May 30th 2013
Cassandra at Finn.io — May 30th 2013
 
Distributing Queries the Citus Way | PostgresConf US 2018 | Marco Slot
Distributing Queries the Citus Way | PostgresConf US 2018 | Marco SlotDistributing Queries the Citus Way | PostgresConf US 2018 | Marco Slot
Distributing Queries the Citus Way | PostgresConf US 2018 | Marco Slot
 
ClickHouse -If Combinators for Fun and Profit-2022-05-04.pdf
ClickHouse -If Combinators for Fun and Profit-2022-05-04.pdfClickHouse -If Combinators for Fun and Profit-2022-05-04.pdf
ClickHouse -If Combinators for Fun and Profit-2022-05-04.pdf
 
Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineering
 
The Future of Sharding
The Future of ShardingThe Future of Sharding
The Future of Sharding
 
Why and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on FlinkWhy and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on Flink
 
Rodney Matejek Portfolio
Rodney Matejek PortfolioRodney Matejek Portfolio
Rodney Matejek Portfolio
 
Real Time Analytics with Apache Cassandra - Cassandra Day Berlin
Real Time Analytics with Apache Cassandra - Cassandra Day BerlinReal Time Analytics with Apache Cassandra - Cassandra Day Berlin
Real Time Analytics with Apache Cassandra - Cassandra Day Berlin
 
A head start on cloud native event driven applications - bigdatadays
A head start on cloud native event driven applications - bigdatadaysA head start on cloud native event driven applications - bigdatadays
A head start on cloud native event driven applications - bigdatadays
 
The State of Postgres | Strata San Jose 2018 | Umur Cubukcu
The State of Postgres | Strata San Jose 2018 | Umur CubukcuThe State of Postgres | Strata San Jose 2018 | Umur Cubukcu
The State of Postgres | Strata San Jose 2018 | Umur Cubukcu
 
Cubes 1.0 Overview
Cubes 1.0 OverviewCubes 1.0 Overview
Cubes 1.0 Overview
 
Monitoring with Prometheus
Monitoring with PrometheusMonitoring with Prometheus
Monitoring with Prometheus
 
How to build an ETL pipeline with Apache Beam on Google Cloud Dataflow
How to build an ETL pipeline with Apache Beam on Google Cloud DataflowHow to build an ETL pipeline with Apache Beam on Google Cloud Dataflow
How to build an ETL pipeline with Apache Beam on Google Cloud Dataflow
 
TDC2017 | São Paulo - Trilha BigData How we figured out we had a SRE team at ...
TDC2017 | São Paulo - Trilha BigData How we figured out we had a SRE team at ...TDC2017 | São Paulo - Trilha BigData How we figured out we had a SRE team at ...
TDC2017 | São Paulo - Trilha BigData How we figured out we had a SRE team at ...
 
Analyzing Streaming Data in Real-time - AWS Summit Cape Town 2018
Analyzing Streaming Data in Real-time - AWS Summit Cape Town 2018Analyzing Streaming Data in Real-time - AWS Summit Cape Town 2018
Analyzing Streaming Data in Real-time - AWS Summit Cape Town 2018
 
SplunkLive! London: Splunk ninjas- new features and search dojo
SplunkLive! London: Splunk ninjas- new features and search dojoSplunkLive! London: Splunk ninjas- new features and search dojo
SplunkLive! London: Splunk ninjas- new features and search dojo
 
Making App Developers More Productive
Making App Developers More ProductiveMaking App Developers More Productive
Making App Developers More Productive
 
Using bluemix predictive analytics service in Node-RED
Using bluemix predictive analytics service in Node-REDUsing bluemix predictive analytics service in Node-RED
Using bluemix predictive analytics service in Node-RED
 
Data Modeling for IoT and Big Data
Data Modeling for IoT and Big DataData Modeling for IoT and Big Data
Data Modeling for IoT and Big Data
 
JKJ_SSRS PPS Excel Services Project.Documentation
JKJ_SSRS PPS Excel Services Project.DocumentationJKJ_SSRS PPS Excel Services Project.Documentation
JKJ_SSRS PPS Excel Services Project.Documentation
 

Mehr von Citus Data

Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Citus Data
 
JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...
JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...
JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...Citus Data
 
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...Citus Data
 
Whats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
Whats wrong with postgres | PGConf EU 2019 | Craig KerstiensWhats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
Whats wrong with postgres | PGConf EU 2019 | Craig KerstiensCitus Data
 
When it all goes wrong | PGConf EU 2019 | Will Leinweber
When it all goes wrong | PGConf EU 2019 | Will LeinweberWhen it all goes wrong | PGConf EU 2019 | Will Leinweber
When it all goes wrong | PGConf EU 2019 | Will LeinweberCitus Data
 
Amazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise Grandjonc
Amazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise GrandjoncAmazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise Grandjonc
Amazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise GrandjoncCitus Data
 
What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...
What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...
What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...Citus Data
 
Deep Postgres Extensions in Rust | PGCon 2019 | Jeff Davis
Deep Postgres Extensions in Rust | PGCon 2019 | Jeff DavisDeep Postgres Extensions in Rust | PGCon 2019 | Jeff Davis
Deep Postgres Extensions in Rust | PGCon 2019 | Jeff DavisCitus Data
 
Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...
Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...
Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...Citus Data
 
A story on Postgres index types | PostgresLondon 2019 | Louise Grandjonc
A story on Postgres index types | PostgresLondon 2019 | Louise GrandjoncA story on Postgres index types | PostgresLondon 2019 | Louise Grandjonc
A story on Postgres index types | PostgresLondon 2019 | Louise GrandjoncCitus Data
 
Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...
Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...
Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...Citus Data
 
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri FontaineThe Art of PostgreSQL | PostgreSQL Ukraine | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri FontaineCitus Data
 
Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...
Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...
Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...Citus Data
 
When it all goes wrong (with Postgres) | RailsConf 2019 | Will Leinweber
When it all goes wrong (with Postgres) | RailsConf 2019 | Will LeinweberWhen it all goes wrong (with Postgres) | RailsConf 2019 | Will Leinweber
When it all goes wrong (with Postgres) | RailsConf 2019 | Will LeinweberCitus Data
 
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri FontaineThe Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri FontaineCitus Data
 
Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...
Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...
Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...Citus Data
 
How to write SQL queries | pgDay Paris 2019 | Dimitri Fontaine
How to write SQL queries | pgDay Paris 2019 | Dimitri FontaineHow to write SQL queries | pgDay Paris 2019 | Dimitri Fontaine
How to write SQL queries | pgDay Paris 2019 | Dimitri FontaineCitus Data
 
When it all Goes Wrong |Nordic PGDay 2019 | Will Leinweber
When it all Goes Wrong |Nordic PGDay 2019 | Will LeinweberWhen it all Goes Wrong |Nordic PGDay 2019 | Will Leinweber
When it all Goes Wrong |Nordic PGDay 2019 | Will LeinweberCitus Data
 
Why PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire Giordano
Why PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire GiordanoWhy PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire Giordano
Why PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire GiordanoCitus Data
 
Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbe...
Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbe...Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbe...
Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbe...Citus Data
 

Mehr von Citus Data (20)

Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
 
JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...
JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...
JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...
 
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...
 
Whats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
Whats wrong with postgres | PGConf EU 2019 | Craig KerstiensWhats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
Whats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
 
When it all goes wrong | PGConf EU 2019 | Will Leinweber
When it all goes wrong | PGConf EU 2019 | Will LeinweberWhen it all goes wrong | PGConf EU 2019 | Will Leinweber
When it all goes wrong | PGConf EU 2019 | Will Leinweber
 
Amazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise Grandjonc
Amazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise GrandjoncAmazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise Grandjonc
Amazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise Grandjonc
 
What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...
What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...
What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...
 
Deep Postgres Extensions in Rust | PGCon 2019 | Jeff Davis
Deep Postgres Extensions in Rust | PGCon 2019 | Jeff DavisDeep Postgres Extensions in Rust | PGCon 2019 | Jeff Davis
Deep Postgres Extensions in Rust | PGCon 2019 | Jeff Davis
 
Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...
Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...
Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...
 
A story on Postgres index types | PostgresLondon 2019 | Louise Grandjonc
A story on Postgres index types | PostgresLondon 2019 | Louise GrandjoncA story on Postgres index types | PostgresLondon 2019 | Louise Grandjonc
A story on Postgres index types | PostgresLondon 2019 | Louise Grandjonc
 
Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...
Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...
Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...
 
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri FontaineThe Art of PostgreSQL | PostgreSQL Ukraine | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri Fontaine
 
Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...
Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...
Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...
 
When it all goes wrong (with Postgres) | RailsConf 2019 | Will Leinweber
When it all goes wrong (with Postgres) | RailsConf 2019 | Will LeinweberWhen it all goes wrong (with Postgres) | RailsConf 2019 | Will Leinweber
When it all goes wrong (with Postgres) | RailsConf 2019 | Will Leinweber
 
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri FontaineThe Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri Fontaine
 
Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...
Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...
Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...
 
How to write SQL queries | pgDay Paris 2019 | Dimitri Fontaine
How to write SQL queries | pgDay Paris 2019 | Dimitri FontaineHow to write SQL queries | pgDay Paris 2019 | Dimitri Fontaine
How to write SQL queries | pgDay Paris 2019 | Dimitri Fontaine
 
When it all Goes Wrong |Nordic PGDay 2019 | Will Leinweber
When it all Goes Wrong |Nordic PGDay 2019 | Will LeinweberWhen it all Goes Wrong |Nordic PGDay 2019 | Will Leinweber
When it all Goes Wrong |Nordic PGDay 2019 | Will Leinweber
 
Why PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire Giordano
Why PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire GiordanoWhy PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire Giordano
Why PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire Giordano
 
Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbe...
Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbe...Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbe...
Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbe...
 

Kürzlich hochgeladen

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 

Kürzlich hochgeladen (20)

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 

Real time analytics at any scale | PostgreSQL User Group NL | Marco Slot

  • 1. Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018 Real-time Analytics on PostgreSQL at any Scale Marco Slot <marco@citusdata.com>
  • 2. Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018 You offer a product or service (e.g. SaaS, IoT platform, network telemetry, …) that generates large volumes of time series data. How to build an analytical dashboard for your customers that: • Supports a large number of concurrent users • Reflects new data within minutes • Has subsecond response times • Supports advanced analytics What is real-time analytics? 2 (Heap Analytics)
  • 3. Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018 Pipeline of Collect - Aggregate - Query: Real-time analytics architecture 3 Event source Event source Event source Event source Storage (Database) Aggregate Rollups (Database) Dashboard (App) Collect Queries
  • 4. Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018 Pipeline of Collect - Aggregate - Query: Real-time analytics architecture 4 Event source Event source Event source Event source Storage (Database) Aggregate Rollups (Database) Dashboard (App) Collect Queries Postgres/Citus
  • 5. Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018 Define a table for storing raw events: CREATE TABLE events ( event_id bigserial, event_time timestamptz default now(), customer_id bigint, event_type text, … event_details jsonb ); Raw data table 5
  • 6. Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018 COPY is by far the fastest way of loading data. COPY events (customer_id, event_time, … ) FROM STDIN; A few parallel COPY streams can load hundreds of thousands of events per second! Load data 6
  • 7. Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018 To achieve fast data loading: • Use COPY • Don’t use indexes To achieve fast reading of new events for aggregation: • Use an index Fast data loading 7
  • 8. Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018 To achieve fast data loading: • Use COPY • Don’t use large indexes To achieve fast reading of new events for aggregation: • Use an index Block-range index is suitable for ordered columns: CREATE INDEX event_time_idx ON events USING BRIN (event_time); Fast data loading 8
  • 9. Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018 Pre-computed aggregates for a period and set of (group by) dimensions. Can be further filtered and aggregated to generate charts. What is a rollup? 9 Period Customer Country Site Hit Count SELECT…
  • 10. Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018 Append new data to a raw events table (avoid indexes!): COPY events FROM ... Periodically aggregate events into rollup table (index away!): INSERT INTO rollup SELECT … FROM events … GROUP BY … Application queries the rollup table: SELECT … FROM rollup WHERE customer_id = 1238 … Postgres recipe for real-time analytics 10
  • 11. Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018 Keep your data sorted into buckets Partitioning 11
  • 12. Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018 Partitioning keeps indexes small by dividing tables into partitions: Benefits: • Avoid fragmentation • Smaller indexes • Partition pruning for queries that filter by partition column • Drop old data quickly, without bloat/fragmentation Partitioning 12 COPY COPY
  • 13. Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018 Defining a partitioned table: CREATE TABLE events (...) PARTITION BY (event_time); Setting up hourly partitioning with pg_partman: SELECT partman.create_parent('public.events', 'event_time', 'native', 'hourly'); https://www.citusdata.com/blog/2018/01/24/citus-and-pg-partman-creating-a-sca lable-time-series-database-on-PostgreSQL/ CREATE EXTENSION pg_partman 13
  • 14. Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018 If you’re using partitioning, pg_partman can drop old partitions: UPDATE partman.part_config SET retention_keep_table = false, retention = '1 month' WHERE parent_table = 'public.events'; Periodically run maintenance: SELECT partman.run_maintenance(); Expiring old data in a partitioned table 14
  • 15. Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018 Run pg_partman maintenance every hour using pg_cron: SELECT cron.schedule('3 * * * *', $$ SELECT partman.run_maintenance() $$); https://github.com/citusdata/pg_cron Periodic partitioning maintenance 15
  • 16. Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018 High vs. Low Cardinality Designing Rollup Tables 16
  • 17. Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018 Define rollup tables containing aggregates: CREATE TABLE rollup_by_period_and_dimensions ( <period> <dimensions> <aggregates> primary key (<dimensions>,<period>) ); Primary key index covers many queries, can also add additional indices: CREATE INDEX usc_idx ON rollup (customer_id, site_id); Rollup table 17
  • 18. Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018 Two rollups are smaller than one: A*B + A*C < A*B*C But… up to 2x more aggregation work. Choosing granularity and dimensions 18 Time Customer Country Aggregates Time Customer Site Aggregates Time Customer Country Site Aggregates ~100 rows per period/customer ~20 rows per period/customer ~20*100=2000 rows per period/customer
  • 19. Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018 Find balance between query performance and table management. 1. Identify dimensions, metrics (aggregates) 2. Try rollup with all dimensions: 3. Test compression/performance (goal is >5x smaller) 4. If too slow / too big, split rollup table based on query patterns 5. Go to 3 Usually ends up with 5-10 rollup tables Guidelines for designing rollups 19
  • 20. Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018 Append-only vs. Incremental Running Aggregations 20
  • 21. Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018 Use INSERT INTO rollup SELECT … FROM events … to populate rollup table. Append-only aggregation (insert): Supports all aggregates, including exact distinct, percentiles Harder to handle late data Incremental aggregation (upsert): Supports late data Cannot handle all aggregates (though can approximate using HLL, TopN) Append-only vs. Incremental 21
  • 22. Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018 Aggregate events for a particular time period and append them to the rollup table, once all the data for the period is available. INSERT INTO rollup SELECT period, dimensions, aggregates FROM events WHERE event_time::date = '2018-09-04' GROUP BY period, dimensions; Should keep track of which periods have been aggregated. Append-only Aggregation 22
  • 23. Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018 Aggregate new events and upsert into rollup table. INSERT INTO rollup SELECT period, dimensions, aggregates FROM events WHERE event_id BETWEEN s AND e GROUP BY period, dimensions ON CONFLICT (dimensions, period) DO UPDATE SET aggregates = aggregates + EXCLUDED.aggregates; Need to be able to incrementally build aggregates. Need to keep track of which events have been aggregated. Incremental Aggregation 23
  • 24. Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018 Incremental aggregation Technique for incremental aggregation using a sequence number shown on the Citus Data blog. Incrementally approximate distinct count: HyperLogLog extension Incrementally approximate top N: TopN extension 24
  • 25. Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018 CREATE EXTENSION Citus Scaling out your analytics pipeline 25
  • 26. Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018 Citus is an open source extension to Postgres (9.6, 10, 11) for transparently distributing tables across many Postgres servers. CREATE EXTENSION citus 26 Coordinator create_distributed_table('events', 'customer_id');events
  • 27. Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018 Multi-tenancy Tenant ID provides a natural sharding dimension for many applications. Citus automatically co-locates event and rollup data for the same SELECT create_distributed_table('events', 'tenant_id'); SELECT create_distributed_table('rollup', 'tenant_id'); Aggregations can be done locally, without network traffic: INSERT INTO rollup SELECT tenant_id, … FROM events … Dashboard queries are always for a particular tenant: SELECT … FROM rollup WHERE tenant_id = 1238 … 27
  • 28. Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018 COPY asynchronously scatters rows to different shards Data loading in Citus 28 Coordinator COPYevents
  • 29. Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018 INSERT … SELECT can be parallelised across shards. Aggregation in Citus 29 Coordinator events create_distributed_table('rollup', 'customer_id'); INSERT INTO rollup SELECT … FROM events GROUP BY customer_id, …rollup INSERT INTO rollup_102182 SELECT … FROM events_102010 GROUP BY … INSERT INTO rollup_102180 SELECT … FROM events_102008 GROUP BY …
  • 30. Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018 SELECT on rollup for a particular customer (from the dashboard) can be routed to the appropriate shard. Querying rollups in Citus 30 Coordinator events SELECT … FROM rollup WHERE customer_id = 12834 … …rollup SELECT … FROM events_102180 WHERE customer_id = 1283 … …
  • 31. Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018 You should use: • COPY to load raw data into a table • BRIN index to find new events during aggregation • Partitioning with pg_partman to expire old data • Rollup tables built from raw event data • Append-only aggregation if you need exact percentile/distinct count • Incremental aggregation if you can have late data • HLL to incrementally approximate distinct count • TopN to incrementally approximate heavy hitters • Citus to scale out Summary 31
  • 32. Marco Slot | Citus Data | PostgreSQL Meetup Amsterdam: November 2018 marco@citusdata.com Q&A 32