SlideShare ist ein Scribd-Unternehmen logo
1 von 38
Downloaden Sie, um offline zu lesen
An Introduction to Prometheus
Time Series Denver - May 30, 2018
Introduction
● CTO & Co-Founder - FreshTracks.io - A CA Accelerator Incubation
○ “Simplifying Kubernetes Visibility”
● bob@freshtracks.io
● @bob_cotton
● Father, Fly Fisher & Avid Homebrewer
Agenda
● What is a Cloud Native Application?
● Cloud Native Application Challenges
● The 5 Pillars of Monitoring
● An Introduction to Prometheus
● What FreshTracks Provides
What is a Cloud Native Application?
Cloud Native Application
● Follows 12 Factor Application Practices
● Packaged into containers
● Follows a micro-service architecture
● Managed by a Container Orchestration
○ Kubernetes, Docker Swarm, Mesos
● Usually deployed on dynamic
infrastructure
○ VMWare
○ Cloud providers
● Application lifecycle allows for
○ Auto-provisioning
○ Auto-scaling
○ Auto-redundancy
Cloud Native Applications Challenges
Cloud Native Challenges
● Containers are ephemeral
○ Scheduled on any node in the cluster
○ Move Frequently on restarts and deployments
● Kubernetes needs to be monitored
● Kubernetes brings additional complexities
○ Resource Quotas
○ Pod and Cluster Scaling
● Challenges traditional tools
5 Pillars of Monitoring
The 5 Pillars of Monitoring
Metrics and
Alerting Log Analytics
Distributed
Tracing
Application
Performance
Monitoring
Real User
Monitoring
Enter Prometheus
Prometheus
● Started in 2012 at SoundCloud by ex-Google Engineers
○ Open Sourced in 2015
● Patterned after “BorgMon” - Google’s Container monitoring system
● Second project accepted into the CNCF after Kubernetes
● Adoption surge is tracking Kubernetes
○ 63% of teams using Kubernetes use Prometheus
Prometheus Major Features
● Label/value based time series data model
● “Pull based” metrics collection
● Service discovery mechanism
● Simple metrics format with a rich set of “exporters”
● Extremely high-performance TSDB
● Extensive query language - PromQL
● Alert Manager
● Easily installable from Helm
○ Single, statically linked binary
● Open Source Grafana used for visualization
Time Series Data Model
<identifier> → [(t0, v0), (t1, v1), (t2, v2) …]
Identifier is a collection of label/value pairs
Time stored as int64 - Millis since the epoch
Values stored as float64
Efficient storage on disk -- 1.3 bytes/sample
Label/Value Based Data Model
● Graphite/StatsD
○ apache.192-168-5-1.home.200.http_request_total
○ apache.192-168-5-1.home.500.http_request_total
○ apache.192-168-5-1.about.200.http_request_total
● Prometheus
○ http_request_total{job=”apache”, instance=”192.168.5.1”, path=”/home”, status=”200”}
○ http_request_total{job=”apache”, instance=”192.168.5.1”, path=”/home”, status=”500”}
○ http_request_total{job=”apache”, instance=”192.168.5.1”, path=”/about”, status=”200”}
● Selecting Series
○ *.*.home.200.*.http_requests_total
○ http_requests_total{status=”200”, path=”/home”}
Client Data Model
● Counters
○ Always go up or get reset to 0
● Gauge
○ Tracks a real value e.g. temperature
● Histogram and Summary
○ Used for percentiles
Prometheus Service Discovery and Target Scrape
Prometheus
K8s API Server
TSDB
Kublet
(cAdvisor)
node-exporter
kube_state_metrics
App containers
other exporters
node_exporter
App containers
Kublet
(cAdvisor)
Service Discovery
Prometheus Exposition Format and Exporters
● The Prometheus exposition format - Text over http. Simple, human readable
● Supported by Sysdig and the TICK collector
○ Efforts to make it a standard
● Close to 100 exporters for various technologies
● The jmx_exporter can cover any Java/JMX application
● https://prometheus.io/docs/instrumenting/exporters/
Official Exporters:
● node_exporter
● jmx_exporter
● snmp_exporter
● haproxy_exporter
● cloudwatch_exporter
● collectd_exporter
● mysql_exporter
● memcached_exporter
Querying Series with PromQL
● PromQL is a functional query language. Nothing like SQL
rate(http_requests_total[5m])
select job, instance, path, status
rate(value, 5m)
FROM http_requests_total;
Querying Series with PromQL
Calculate a ratio of website hits to failures:
sum(rate(http_requests_total{status=”500”}[5m])) by (path) /
sum(rate(http_requests_total[5m])) by (path)
{path=”/home”} 0.014
{path=”/about”} 0.027
Graphing
Dashboards with Grafana
@bob_cotton@bob_cotton
Labels, Re-Label and Recording Rules
Oh My...
Label/Value Based Data Model
● Graphite/StatsD
○ apache.192-168-5-1.home.200.http_request_total
○ apache.192-168-5-1.home.500.http_request_total
○ apache.192-168-5-1.about.200.http_request_total
● Prometheus
○ http_request_total{job=”apache”, instance=”192.168.5.1”, path=”/home”, status=”200”}
○ http_request_total{job=”apache”, instance=”192.168.5.1”, path=”/home”, status=”500”}
○ http_request_total{job=”apache”, instance=”192.168.5.1”, path=”/about”, status=”200”}
● Selecting Series
○ *.*.home.200.*.http_requests_total
○ http_requests_total{status=”200”, path=”/home”}
@bob_cotton
Kubernetes Labels
● Kubernetes gives us labels on all the things
● Our scrape targets live in the context of the K8s labels
○ This comes from service discovery
● We want to enhance the scraped metric labels with K8s labels
● This is why we need relabel rules in Prometheus
@bob_cotton
K8s API Server
TSDB
Scrape Target
Service Discovery
Prometheus
0="{__address__ 300.196.17.41}"
1="{__meta_kubernetes_namespace default}"
2="{__meta_kubernetes_pod_annotation_freshtracks_io_data_sidecar true}"
3="{__meta_kubernetes_pod_annotation_freshtracks_io_path /metrics2}"
4="{__meta_kubernetes_pod_annotation_kubernetes_io_created_by "kind":"SerializedReference"?}"
5="{__meta_kubernetes_pod_annotation_kubernetes_io_limit_ranger LimitRanger plugin set: cpu
request for container prometheus-configmap-reload; cpu request for container data-sidecar}"
6="{__meta_kubernetes_pod_annotation_prometheus_io_port 8077}"
7="{__meta_kubernetes_pod_annotation_prometheus_io_scrape false}"
8="{__meta_kubernetes_pod_container_name prometheus-configmap-reload}"
9="{__meta_kubernetes_pod_host_ip 172.20.42.119}"
10="{__meta_kubernetes_pod_ip 100.96.17.41}"
11="{__meta_kubernetes_pod_label_freshtracks_io_cluster bowl.freshtracks.io}"
12="{__meta_kubernetes_pod_label_pod_template_hash 1636686694}"
13="{__meta_kubernetes_pod_label_run data-sidecar}"
14="{__meta_kubernetes_pod_name data-sidecar-1636686694-83crm}"
15="{__meta_kubernetes_pod_node_name ip-xx-xxx-xx-xxx.us-west-2.compute.internal}"
16="{__meta_kubernetes_pod_ready false}"
17="{__metrics_path__ /metrics}"
18="{__scheme__ http}"
19="{job ftio-data-sidecar-calc}"
<relabel_config>
{__address__ 300.196.17.41:8077}
{__scheme__ http}
{__metrics_path__ /metrics}
{job ftio-data-sidecar-calc}
{kubernetes_namespace default}
{container_name prometheus-configmap-reload}
http_requests_total{region=”us-east”,
az=”us-east-1”, instance_type=”m2.xlarge”,
instance=”i-3582k8”, hostname=”host1”} = 5439
http_requests_total{region=”us-east”,
az=”us-east-1”,
instance_type=”m2.xlarge”,
instance=”i-3582k8”,
hostname=”host1”,
instance=”300.196.17.41:8077”,
job=”ftio-data-sidecar-calc”,
kubernetes_namespace=”default”,
container_name=”prometheus-configmap-reload”,
} = 5439
<metric_relabel_config>
Recording Rules - Derivative Series
● New series can be generated by querying existing series and storing them
path:request_failures_per_requests:ratio_rate5m =
sum(rate(http_requests_total{status=”500”}[5m])) by (path) 
sum(rate(http_requests_total[5m])) by (path)
High Availability
Prometheus
Prometheus
Federation
Prometheus
Prometheus
Prometheus
Prometheus
Prometheus
Prometheus
Prometheus
Prometheus
Subset of Metrics
Long Term Storage and External Integrations
Prometheus
remote_write
● AppOptics: write
● Chronix: write
● Cortex: read and write
● CrateDB: read and write
● Elasticsearch: write
● Gnocchi: write
● Graphite: write
● InfluxDB: read and write
● OpenTSDB: write
● PostgreSQL/TimescaleD
B: read and write
● SignalFx: write
remote_read
Alerting
Alert Definition
ALERT <alert name>
EXPR <expression>
[ FOR <duration> ]
[ LABELS <label set> ]
[ ANNOTATIONS <labelset> ]
ALERT: IngesterCrowding
EXPR: count by(ft_cluster, node)
(cortex_ingester_ingested_samples_total) > 1
FOR: 30m
LABELS: severity: critical
ANNOTATIONS:
description:
https://github.com/Fresh-Tracks/gke-configs/blob/master
/docs/alerts.md#ingestercrowding
summary: Node {{ $labels.node }} is hosting {{ $value
}} ingester pods
Alert Manager
● Deduplication
● Grouping
● Routing
● Suppression
Alert Manager
Prometheus
Prometheus
Alert Manager
Alert Manager
PagerDuty
VictorOps
Slack
FreshTracks.io
Simplifying Kubernetes Visibility
Filling the Gaps
● A small Kubernetes cluster generate > 500K unique samples
○ Which metrics are important?
● Performance of any one container is easy
○ How is the whole microservice behaving? Node? Cluster?
● Prometheus has no anomaly detection
● Dashboard creation is tedious, even if you know what to watch
● How is my service behaving in the context of the cluster?
○ How do node/container/application metrics correlate to each other?
Kubernetes Hierarchy Visibility
Namespace
Workload
Pod
Container
(Workload can be a deployment,
replicaSet, statefulSet,
daemonSet or similar)
Demo
Thanks!
We’re Hiring!

Weitere ähnliche Inhalte

Was ist angesagt?

Let's Compare: A Benchmark review of InfluxDB and Elasticsearch
Let's Compare: A Benchmark review of InfluxDB and ElasticsearchLet's Compare: A Benchmark review of InfluxDB and Elasticsearch
Let's Compare: A Benchmark review of InfluxDB and ElasticsearchInfluxData
 
PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...
PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...
PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...PGConf APAC
 
How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...
How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...
How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...InfluxData
 
Scaling Up Logging and Metrics
Scaling Up Logging and MetricsScaling Up Logging and Metrics
Scaling Up Logging and MetricsRicardo Lourenço
 
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...Altinity Ltd
 
Streaming millions of Contact Center interactions in (near) real-time with Pu...
Streaming millions of Contact Center interactions in (near) real-time with Pu...Streaming millions of Contact Center interactions in (near) real-time with Pu...
Streaming millions of Contact Center interactions in (near) real-time with Pu...Frank Kelly
 
Inside the InfluxDB storage engine
Inside the InfluxDB storage engineInside the InfluxDB storage engine
Inside the InfluxDB storage engineInfluxData
 
DevOpsDays Taipei 2019 - Mastering IaC the DevOps Way
DevOpsDays Taipei 2019 - Mastering IaC the DevOps WayDevOpsDays Taipei 2019 - Mastering IaC the DevOps Way
DevOpsDays Taipei 2019 - Mastering IaC the DevOps Waysmalltown
 
Anatomy of an action
Anatomy of an actionAnatomy of an action
Anatomy of an actionGordon Chung
 
2016 08-30 Kubernetes talk for Waterloo DevOps
2016 08-30 Kubernetes talk for Waterloo DevOps2016 08-30 Kubernetes talk for Waterloo DevOps
2016 08-30 Kubernetes talk for Waterloo DevOpscraigbox
 
Managing Container Clusters in OpenStack Native Way
Managing Container Clusters in OpenStack Native WayManaging Container Clusters in OpenStack Native Way
Managing Container Clusters in OpenStack Native WayQiming Teng
 
Taking Your Database Beyond the Border of a Single Kubernetes Cluster
Taking Your Database Beyond the Border of a Single Kubernetes ClusterTaking Your Database Beyond the Border of a Single Kubernetes Cluster
Taking Your Database Beyond the Border of a Single Kubernetes ClusterChristopher Bradford
 
InfluxDB & Kubernetes
InfluxDB & KubernetesInfluxDB & Kubernetes
InfluxDB & KubernetesInfluxData
 
OpenStack Nova Scheduler
OpenStack Nova Scheduler OpenStack Nova Scheduler
OpenStack Nova Scheduler Peeyush Gupta
 
Scaling 100PB Data Warehouse in Cloud
Scaling 100PB Data Warehouse in CloudScaling 100PB Data Warehouse in Cloud
Scaling 100PB Data Warehouse in CloudChangshu Liu
 
Moving from CellsV1 to CellsV2 at CERN
Moving from CellsV1 to CellsV2 at CERNMoving from CellsV1 to CellsV2 at CERN
Moving from CellsV1 to CellsV2 at CERNBelmiro Moreira
 
Our Story With ClickHouse at seo.do
Our Story With ClickHouse at seo.doOur Story With ClickHouse at seo.do
Our Story With ClickHouse at seo.doMetehan Çetinkaya
 
Mixing Metrics and Logs with Grafana + Influx by David Kaltschmidt, Director ...
Mixing Metrics and Logs with Grafana + Influx by David Kaltschmidt, Director ...Mixing Metrics and Logs with Grafana + Influx by David Kaltschmidt, Director ...
Mixing Metrics and Logs with Grafana + Influx by David Kaltschmidt, Director ...InfluxData
 
Advanced kapacitor
Advanced kapacitorAdvanced kapacitor
Advanced kapacitorInfluxData
 
InfluxDB Client Libraries and Applications | Miroslav Malecha | Bonitoo
InfluxDB Client Libraries and Applications | Miroslav Malecha | BonitooInfluxDB Client Libraries and Applications | Miroslav Malecha | Bonitoo
InfluxDB Client Libraries and Applications | Miroslav Malecha | BonitooInfluxData
 

Was ist angesagt? (20)

Let's Compare: A Benchmark review of InfluxDB and Elasticsearch
Let's Compare: A Benchmark review of InfluxDB and ElasticsearchLet's Compare: A Benchmark review of InfluxDB and Elasticsearch
Let's Compare: A Benchmark review of InfluxDB and Elasticsearch
 
PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...
PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...
PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...
 
How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...
How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...
How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...
 
Scaling Up Logging and Metrics
Scaling Up Logging and MetricsScaling Up Logging and Metrics
Scaling Up Logging and Metrics
 
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
 
Streaming millions of Contact Center interactions in (near) real-time with Pu...
Streaming millions of Contact Center interactions in (near) real-time with Pu...Streaming millions of Contact Center interactions in (near) real-time with Pu...
Streaming millions of Contact Center interactions in (near) real-time with Pu...
 
Inside the InfluxDB storage engine
Inside the InfluxDB storage engineInside the InfluxDB storage engine
Inside the InfluxDB storage engine
 
DevOpsDays Taipei 2019 - Mastering IaC the DevOps Way
DevOpsDays Taipei 2019 - Mastering IaC the DevOps WayDevOpsDays Taipei 2019 - Mastering IaC the DevOps Way
DevOpsDays Taipei 2019 - Mastering IaC the DevOps Way
 
Anatomy of an action
Anatomy of an actionAnatomy of an action
Anatomy of an action
 
2016 08-30 Kubernetes talk for Waterloo DevOps
2016 08-30 Kubernetes talk for Waterloo DevOps2016 08-30 Kubernetes talk for Waterloo DevOps
2016 08-30 Kubernetes talk for Waterloo DevOps
 
Managing Container Clusters in OpenStack Native Way
Managing Container Clusters in OpenStack Native WayManaging Container Clusters in OpenStack Native Way
Managing Container Clusters in OpenStack Native Way
 
Taking Your Database Beyond the Border of a Single Kubernetes Cluster
Taking Your Database Beyond the Border of a Single Kubernetes ClusterTaking Your Database Beyond the Border of a Single Kubernetes Cluster
Taking Your Database Beyond the Border of a Single Kubernetes Cluster
 
InfluxDB & Kubernetes
InfluxDB & KubernetesInfluxDB & Kubernetes
InfluxDB & Kubernetes
 
OpenStack Nova Scheduler
OpenStack Nova Scheduler OpenStack Nova Scheduler
OpenStack Nova Scheduler
 
Scaling 100PB Data Warehouse in Cloud
Scaling 100PB Data Warehouse in CloudScaling 100PB Data Warehouse in Cloud
Scaling 100PB Data Warehouse in Cloud
 
Moving from CellsV1 to CellsV2 at CERN
Moving from CellsV1 to CellsV2 at CERNMoving from CellsV1 to CellsV2 at CERN
Moving from CellsV1 to CellsV2 at CERN
 
Our Story With ClickHouse at seo.do
Our Story With ClickHouse at seo.doOur Story With ClickHouse at seo.do
Our Story With ClickHouse at seo.do
 
Mixing Metrics and Logs with Grafana + Influx by David Kaltschmidt, Director ...
Mixing Metrics and Logs with Grafana + Influx by David Kaltschmidt, Director ...Mixing Metrics and Logs with Grafana + Influx by David Kaltschmidt, Director ...
Mixing Metrics and Logs with Grafana + Influx by David Kaltschmidt, Director ...
 
Advanced kapacitor
Advanced kapacitorAdvanced kapacitor
Advanced kapacitor
 
InfluxDB Client Libraries and Applications | Miroslav Malecha | Bonitoo
InfluxDB Client Libraries and Applications | Miroslav Malecha | BonitooInfluxDB Client Libraries and Applications | Miroslav Malecha | Bonitoo
InfluxDB Client Libraries and Applications | Miroslav Malecha | Bonitoo
 

Ähnlich wie Time series denver an introduction to prometheus

Elasticsearch on Kubernetes
Elasticsearch on KubernetesElasticsearch on Kubernetes
Elasticsearch on KubernetesJoerg Henning
 
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKevin Lynch
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusTobias Schmidt
 
Presto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performancePresto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performanceDataWorks Summit
 
OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...
OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...
OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...NETWAYS
 
DevOps Braga #15: Agentless monitoring with icinga and prometheus
DevOps Braga #15: Agentless monitoring with icinga and prometheusDevOps Braga #15: Agentless monitoring with icinga and prometheus
DevOps Braga #15: Agentless monitoring with icinga and prometheusDevOps Braga
 
Introduction to Container Storage Interface (CSI)
Introduction to Container Storage Interface (CSI)Introduction to Container Storage Interface (CSI)
Introduction to Container Storage Interface (CSI)Idan Atias
 
Integrating ChatGPT with Apache Airflow
Integrating ChatGPT with Apache AirflowIntegrating ChatGPT with Apache Airflow
Integrating ChatGPT with Apache AirflowTatiana Al-Chueyr
 
Taking Cloud to Extremes: Scaled-down, Highly Available, and Mission-critical...
Taking Cloud to Extremes: Scaled-down, Highly Available, and Mission-critical...Taking Cloud to Extremes: Scaled-down, Highly Available, and Mission-critical...
Taking Cloud to Extremes: Scaled-down, Highly Available, and Mission-critical...Altoros
 
6 tips for improving ruby performance
6 tips for improving ruby performance6 tips for improving ruby performance
6 tips for improving ruby performanceEngine Yard
 
Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016aspyker
 
Scheduling a fuller house - Talk at QCon NY 2016
Scheduling a fuller house - Talk at QCon NY 2016Scheduling a fuller house - Talk at QCon NY 2016
Scheduling a fuller house - Talk at QCon NY 2016Sharma Podila
 
Kubecon seattle 2018 workshop slides
Kubecon seattle 2018 workshop slidesKubecon seattle 2018 workshop slides
Kubecon seattle 2018 workshop slidesWeaveworks
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1Ruslan Meshenberg
 
Operator Lifecycle Management
Operator Lifecycle ManagementOperator Lifecycle Management
Operator Lifecycle ManagementDoKC
 
Operator Lifecycle Management
Operator Lifecycle ManagementOperator Lifecycle Management
Operator Lifecycle ManagementDoKC
 
ML-Based SQL Query Resource Usage Prediction
ML-Based SQL Query Resource Usage PredictionML-Based SQL Query Resource Usage Prediction
ML-Based SQL Query Resource Usage PredictionAlluxio, Inc.
 
Macy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightMacy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightDataStax Academy
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kevin Lynch
 

Ähnlich wie Time series denver an introduction to prometheus (20)

Elasticsearch on Kubernetes
Elasticsearch on KubernetesElasticsearch on Kubernetes
Elasticsearch on Kubernetes
 
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the Datacenter
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with Prometheus
 
Presto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performancePresto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performance
 
OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...
OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...
OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...
 
DevOps Braga #15: Agentless monitoring with icinga and prometheus
DevOps Braga #15: Agentless monitoring with icinga and prometheusDevOps Braga #15: Agentless monitoring with icinga and prometheus
DevOps Braga #15: Agentless monitoring with icinga and prometheus
 
Introduction to istio
Introduction to istioIntroduction to istio
Introduction to istio
 
Introduction to Container Storage Interface (CSI)
Introduction to Container Storage Interface (CSI)Introduction to Container Storage Interface (CSI)
Introduction to Container Storage Interface (CSI)
 
Integrating ChatGPT with Apache Airflow
Integrating ChatGPT with Apache AirflowIntegrating ChatGPT with Apache Airflow
Integrating ChatGPT with Apache Airflow
 
Taking Cloud to Extremes: Scaled-down, Highly Available, and Mission-critical...
Taking Cloud to Extremes: Scaled-down, Highly Available, and Mission-critical...Taking Cloud to Extremes: Scaled-down, Highly Available, and Mission-critical...
Taking Cloud to Extremes: Scaled-down, Highly Available, and Mission-critical...
 
6 tips for improving ruby performance
6 tips for improving ruby performance6 tips for improving ruby performance
6 tips for improving ruby performance
 
Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016
 
Scheduling a fuller house - Talk at QCon NY 2016
Scheduling a fuller house - Talk at QCon NY 2016Scheduling a fuller house - Talk at QCon NY 2016
Scheduling a fuller house - Talk at QCon NY 2016
 
Kubecon seattle 2018 workshop slides
Kubecon seattle 2018 workshop slidesKubecon seattle 2018 workshop slides
Kubecon seattle 2018 workshop slides
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
 
Operator Lifecycle Management
Operator Lifecycle ManagementOperator Lifecycle Management
Operator Lifecycle Management
 
Operator Lifecycle Management
Operator Lifecycle ManagementOperator Lifecycle Management
Operator Lifecycle Management
 
ML-Based SQL Query Resource Usage Prediction
ML-Based SQL Query Resource Usage PredictionML-Based SQL Query Resource Usage Prediction
ML-Based SQL Query Resource Usage Prediction
 
Macy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightMacy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-Flight
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
 

Kürzlich hochgeladen

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 

Kürzlich hochgeladen (20)

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 

Time series denver an introduction to prometheus

  • 1. An Introduction to Prometheus Time Series Denver - May 30, 2018
  • 2. Introduction ● CTO & Co-Founder - FreshTracks.io - A CA Accelerator Incubation ○ “Simplifying Kubernetes Visibility” ● bob@freshtracks.io ● @bob_cotton ● Father, Fly Fisher & Avid Homebrewer
  • 3. Agenda ● What is a Cloud Native Application? ● Cloud Native Application Challenges ● The 5 Pillars of Monitoring ● An Introduction to Prometheus ● What FreshTracks Provides
  • 4. What is a Cloud Native Application?
  • 5. Cloud Native Application ● Follows 12 Factor Application Practices ● Packaged into containers ● Follows a micro-service architecture ● Managed by a Container Orchestration ○ Kubernetes, Docker Swarm, Mesos ● Usually deployed on dynamic infrastructure ○ VMWare ○ Cloud providers ● Application lifecycle allows for ○ Auto-provisioning ○ Auto-scaling ○ Auto-redundancy
  • 7. Cloud Native Challenges ● Containers are ephemeral ○ Scheduled on any node in the cluster ○ Move Frequently on restarts and deployments ● Kubernetes needs to be monitored ● Kubernetes brings additional complexities ○ Resource Quotas ○ Pod and Cluster Scaling ● Challenges traditional tools
  • 8. 5 Pillars of Monitoring
  • 9. The 5 Pillars of Monitoring Metrics and Alerting Log Analytics Distributed Tracing Application Performance Monitoring Real User Monitoring
  • 11. Prometheus ● Started in 2012 at SoundCloud by ex-Google Engineers ○ Open Sourced in 2015 ● Patterned after “BorgMon” - Google’s Container monitoring system ● Second project accepted into the CNCF after Kubernetes ● Adoption surge is tracking Kubernetes ○ 63% of teams using Kubernetes use Prometheus
  • 12. Prometheus Major Features ● Label/value based time series data model ● “Pull based” metrics collection ● Service discovery mechanism ● Simple metrics format with a rich set of “exporters” ● Extremely high-performance TSDB ● Extensive query language - PromQL ● Alert Manager ● Easily installable from Helm ○ Single, statically linked binary ● Open Source Grafana used for visualization
  • 13. Time Series Data Model <identifier> → [(t0, v0), (t1, v1), (t2, v2) …] Identifier is a collection of label/value pairs Time stored as int64 - Millis since the epoch Values stored as float64 Efficient storage on disk -- 1.3 bytes/sample
  • 14. Label/Value Based Data Model ● Graphite/StatsD ○ apache.192-168-5-1.home.200.http_request_total ○ apache.192-168-5-1.home.500.http_request_total ○ apache.192-168-5-1.about.200.http_request_total ● Prometheus ○ http_request_total{job=”apache”, instance=”192.168.5.1”, path=”/home”, status=”200”} ○ http_request_total{job=”apache”, instance=”192.168.5.1”, path=”/home”, status=”500”} ○ http_request_total{job=”apache”, instance=”192.168.5.1”, path=”/about”, status=”200”} ● Selecting Series ○ *.*.home.200.*.http_requests_total ○ http_requests_total{status=”200”, path=”/home”}
  • 15. Client Data Model ● Counters ○ Always go up or get reset to 0 ● Gauge ○ Tracks a real value e.g. temperature ● Histogram and Summary ○ Used for percentiles
  • 16. Prometheus Service Discovery and Target Scrape Prometheus K8s API Server TSDB Kublet (cAdvisor) node-exporter kube_state_metrics App containers other exporters node_exporter App containers Kublet (cAdvisor) Service Discovery
  • 17. Prometheus Exposition Format and Exporters ● The Prometheus exposition format - Text over http. Simple, human readable ● Supported by Sysdig and the TICK collector ○ Efforts to make it a standard ● Close to 100 exporters for various technologies ● The jmx_exporter can cover any Java/JMX application ● https://prometheus.io/docs/instrumenting/exporters/ Official Exporters: ● node_exporter ● jmx_exporter ● snmp_exporter ● haproxy_exporter ● cloudwatch_exporter ● collectd_exporter ● mysql_exporter ● memcached_exporter
  • 18. Querying Series with PromQL ● PromQL is a functional query language. Nothing like SQL rate(http_requests_total[5m]) select job, instance, path, status rate(value, 5m) FROM http_requests_total;
  • 19. Querying Series with PromQL Calculate a ratio of website hits to failures: sum(rate(http_requests_total{status=”500”}[5m])) by (path) / sum(rate(http_requests_total[5m])) by (path) {path=”/home”} 0.014 {path=”/about”} 0.027
  • 23. Label/Value Based Data Model ● Graphite/StatsD ○ apache.192-168-5-1.home.200.http_request_total ○ apache.192-168-5-1.home.500.http_request_total ○ apache.192-168-5-1.about.200.http_request_total ● Prometheus ○ http_request_total{job=”apache”, instance=”192.168.5.1”, path=”/home”, status=”200”} ○ http_request_total{job=”apache”, instance=”192.168.5.1”, path=”/home”, status=”500”} ○ http_request_total{job=”apache”, instance=”192.168.5.1”, path=”/about”, status=”200”} ● Selecting Series ○ *.*.home.200.*.http_requests_total ○ http_requests_total{status=”200”, path=”/home”}
  • 24. @bob_cotton Kubernetes Labels ● Kubernetes gives us labels on all the things ● Our scrape targets live in the context of the K8s labels ○ This comes from service discovery ● We want to enhance the scraped metric labels with K8s labels ● This is why we need relabel rules in Prometheus
  • 25. @bob_cotton K8s API Server TSDB Scrape Target Service Discovery Prometheus 0="{__address__ 300.196.17.41}" 1="{__meta_kubernetes_namespace default}" 2="{__meta_kubernetes_pod_annotation_freshtracks_io_data_sidecar true}" 3="{__meta_kubernetes_pod_annotation_freshtracks_io_path /metrics2}" 4="{__meta_kubernetes_pod_annotation_kubernetes_io_created_by "kind":"SerializedReference"?}" 5="{__meta_kubernetes_pod_annotation_kubernetes_io_limit_ranger LimitRanger plugin set: cpu request for container prometheus-configmap-reload; cpu request for container data-sidecar}" 6="{__meta_kubernetes_pod_annotation_prometheus_io_port 8077}" 7="{__meta_kubernetes_pod_annotation_prometheus_io_scrape false}" 8="{__meta_kubernetes_pod_container_name prometheus-configmap-reload}" 9="{__meta_kubernetes_pod_host_ip 172.20.42.119}" 10="{__meta_kubernetes_pod_ip 100.96.17.41}" 11="{__meta_kubernetes_pod_label_freshtracks_io_cluster bowl.freshtracks.io}" 12="{__meta_kubernetes_pod_label_pod_template_hash 1636686694}" 13="{__meta_kubernetes_pod_label_run data-sidecar}" 14="{__meta_kubernetes_pod_name data-sidecar-1636686694-83crm}" 15="{__meta_kubernetes_pod_node_name ip-xx-xxx-xx-xxx.us-west-2.compute.internal}" 16="{__meta_kubernetes_pod_ready false}" 17="{__metrics_path__ /metrics}" 18="{__scheme__ http}" 19="{job ftio-data-sidecar-calc}" <relabel_config> {__address__ 300.196.17.41:8077} {__scheme__ http} {__metrics_path__ /metrics} {job ftio-data-sidecar-calc} {kubernetes_namespace default} {container_name prometheus-configmap-reload} http_requests_total{region=”us-east”, az=”us-east-1”, instance_type=”m2.xlarge”, instance=”i-3582k8”, hostname=”host1”} = 5439 http_requests_total{region=”us-east”, az=”us-east-1”, instance_type=”m2.xlarge”, instance=”i-3582k8”, hostname=”host1”, instance=”300.196.17.41:8077”, job=”ftio-data-sidecar-calc”, kubernetes_namespace=”default”, container_name=”prometheus-configmap-reload”, } = 5439 <metric_relabel_config>
  • 26. Recording Rules - Derivative Series ● New series can be generated by querying existing series and storing them path:request_failures_per_requests:ratio_rate5m = sum(rate(http_requests_total{status=”500”}[5m])) by (path) sum(rate(http_requests_total[5m])) by (path)
  • 29. Long Term Storage and External Integrations Prometheus remote_write ● AppOptics: write ● Chronix: write ● Cortex: read and write ● CrateDB: read and write ● Elasticsearch: write ● Gnocchi: write ● Graphite: write ● InfluxDB: read and write ● OpenTSDB: write ● PostgreSQL/TimescaleD B: read and write ● SignalFx: write remote_read
  • 31. Alert Definition ALERT <alert name> EXPR <expression> [ FOR <duration> ] [ LABELS <label set> ] [ ANNOTATIONS <labelset> ] ALERT: IngesterCrowding EXPR: count by(ft_cluster, node) (cortex_ingester_ingested_samples_total) > 1 FOR: 30m LABELS: severity: critical ANNOTATIONS: description: https://github.com/Fresh-Tracks/gke-configs/blob/master /docs/alerts.md#ingestercrowding summary: Node {{ $labels.node }} is hosting {{ $value }} ingester pods
  • 32. Alert Manager ● Deduplication ● Grouping ● Routing ● Suppression
  • 33. Alert Manager Prometheus Prometheus Alert Manager Alert Manager PagerDuty VictorOps Slack
  • 35. Filling the Gaps ● A small Kubernetes cluster generate > 500K unique samples ○ Which metrics are important? ● Performance of any one container is easy ○ How is the whole microservice behaving? Node? Cluster? ● Prometheus has no anomaly detection ● Dashboard creation is tedious, even if you know what to watch ● How is my service behaving in the context of the cluster? ○ How do node/container/application metrics correlate to each other?
  • 36. Kubernetes Hierarchy Visibility Namespace Workload Pod Container (Workload can be a deployment, replicaSet, statefulSet, daemonSet or similar)
  • 37. Demo