This presentation covers the following topics:
What is logging?
The purpose of logging: Debugging
The purpose of logging: Security
The purpose of logging: Stats & analytics
Traditional logging
Traditional logging: Advantages
Traditional logging: Disadvantages
The solution: Large-scale logging
Large-scale logging: Core principles
Large-scale logging: Solution types
Large-scale logging: Cloud vs on-prem
Large-scale logging: Operational complexity
Large-scale logging: Security
Large-scale logging: Costs
Large-scale logging: On-prem comparison
- Elasticsearch
- Grafana Loki
- VictoriaLogs
On-prem comparison: Setup and operation
On-prem comparison: Costs
On-prem comparison: Full-text search support
On-prem comparison: How to efficiently query 100TB of logs?
On-prem comparison: Integration with CLI tools
VictoriaLogs for large-scale logging
VictoriaLogs demo instance
- Ingestion rate: 3600 messages / minute
- The number of log messages: 1.1 billion
- Uncompressed log messages’ size: 1.5TB
- Compressed log messages’ size: 23GB
- Compression ratio: 47x
- Memory usage: 150MB
VictoriaLogs CLI integration demo
- Which errors have occurred in all the apps during the last hour?
- How many errors have occurred during the last hour?
- Which apps generated the most of errors during the last hour?
- The number of per-minute errors for the last 10 minutes
- Status codes for the last hour
- Non-200 status codes for the last week
- Top client IPs for the last 4 weeks with 404 and 500 response status codes
- Per-month stats for the given IP across all the logs
Large-scale logging solution
MUST provide
excellent CLI integration
VictoriaLogs: (temporary) drawbacks
VictoriaLogs: Recap
- Easy to setup and operate
- The lowest RAM usage and disk space usage (up to 30x less than Elasticsearch and Grafana Loki)
- Fast full-text search
- Excellent integration with traditional command-line tools for log analysis
- Accepts logs from popular log shippers (Filebeat, Fluentbit, Logstash, Vector, Promtail, Grafana Agent)
- Open source and free to use!
10. ● Which errors have occurred in the app during the last hour?
● Why the app returned unexpected response?
● Why the app wasn’t working correctly yesterday?
● What the app was doing at the particular time range?
The purpose of logging: debugging
11. The purpose of logging: security
● Who dropped the database in production?
● Which IP addresses were used for logging in as admin during the last hour?
● Who transferred all the money from my account?
● How many failed login attempts were during the last day?
12. The purpose of logging: stats and analytics
● How many requests were served per hour during the last day?
● How many unique users were accessing the app during the last month?
● How many requests were served for a particular IP range yesterday?
● What percentage of requests finished with errors during the last hour?
● What was the 95th percentile of request duration for the given web page
yesterday?
15. Traditional logging
● Save logs to files on the local filesystem
● Use command-line tools for log analysis: cat, grep, awk, sort, uniq, head, tail, etc.
18. Traditional logging: advantages
● Easy to setup and operate
● Easy to debug
● Easy to analyze logs with command-line tools and bash scripts
19. Traditional logging: advantages
● Easy to setup and operate
● Easy to debug
● Easy to analyze logs with command-line tools and bash scripts
● Works perfectly for 50 years (since 1970th)
22. Traditional logging: disadvantages
● Hard to analyze logs from thousands of hosts (hello, Kubernetes and
microservices)
● Slow search speed over large log files (e.g. 1TB log file may require a hour to
scan)
23. Traditional logging: disadvantages
● Hard to analyze logs from thousands of hosts (hello, Kubernetes and
microservices)
● Slow search speed over large log files (e.g. 1TB log file may require a hour to
scan)
● Imperfect support for structured logging (logs with arbitrary fields)
26. Large-scale logging: core principles
● Push logs from large number of apps to a centralized system
● Provide fast querying over all the ingested logs
27. Large-scale logging: core principles
● Push logs from large number of apps to a centralized system
● Provide fast querying over all the ingested logs
● Structured logging out of the box
33. Large-scale logging: operational complexity
● Cloud: easy - cloud provider operates the system
● On-prem: harder - you need to setup and operate the system
40. On-prem comparison: setup and operation
● Elasticsearch: hard because of non-trivial indexing configs for logs
41. On-prem comparison: setup and operation
● Elasticsearch: hard because of non-trivial indexing configs for logs
● Grafana Loki: hard because of microservice architecture and complex configs
42. On-prem comparison: setup and operation
● Elasticsearch: hard because of non-trivial indexing configs for logs
● Grafana Loki: hard because of microservice architecture and complex configs
● VictoriaLogs: easy because it runs out of the box from a single binary with
default configs
44. On-prem comparison: costs
● Elasticsearch: high - it needs a lot of RAM and disk space
● Grafana Loki: medium - it needs a lot of RAM for high-cardinality labels
45. On-prem comparison: costs
● Elasticsearch: high - it needs a lot of RAM and disk space
● Grafana Loki: medium - it needs a lot of RAM for high-cardinality labels
● VictoriaLogs: low - a single VictoriaLogs instance can replace a 30-node
Elasticsearch or Loki cluster
47. On-prem comparison: full-text search support
● Elasticsearch: yes, but needs proper index configuration
● Grafana Loki: yes, but very slow
48. On-prem comparison: full-text search support
● Elasticsearch: yes, but needs proper index configuration
● Grafana Loki: yes, but very slow
● VictoriaLogs: yes, works out of the box for all the ingested log fields
49. On-prem comparison: how to efficiently query 100TB of
logs?
● Find all the log messages with the given IP address
50. On-prem comparison: how to efficiently query 100TB of
logs?
● Find all the log messages with the given IP address
● Find all the log messages with the given trace_id
51. On-prem comparison: how to efficiently query 100TB of
logs?
● Find all the log messages with the given IP address
● Find all the log messages with the given trace_id
● Find all the transactions for the given user
52. On-prem comparison: how to efficiently query 100TB of
logs?
● Elasticsearch: run a cluster with 200TB of disk space and 6TB of RAM.
Infrastructure costs at GCE or AWS: ~€50K/month
53. On-prem comparison: how to efficiently query 100TB of
logs?
● Elasticsearch: run a cluster with 200TB of disk space and 6TB of RAM.
Infrastructure costs at GCE or AWS: ~€50K/month
● Grafana Loki: impossible because the query takes days to execute :(
54. On-prem comparison: how to efficiently query 100TB of
logs?
● Elasticsearch: run a cluster with 200TB of disk space and 6TB of RAM.
Infrastructure costs at GCE or AWS: ~€50K/month
● Grafana Loki: impossible because the query takes days to execute :(
● VictoriaLogs: run a single node with 6TB of disk space and 200GB of RAM.
Infrastructure costs at GCE or AWS: ~€2K/month
58. VictoriaLogs for large-scale logging
● Satisfies requirements for large-scale logging
○ Efficiently stores logs from large number of distributed apps
○ Provides fast full-text search
○ Supports both structured and unstructured logs
59. VictoriaLogs for large-scale logging
● Satisfies requirements for large-scale logging
○ Efficiently stores logs from large number of distributed apps
○ Provides fast full-text search
○ Supports both structured and unstructured logs
● Supports traditional logging features
○ Ease of use
○ Great integration with CLI tools - grep, awk, head, tail, less, etc.
87. VictoriaLogs: (temporary) drawbacks
● Missing data extraction and advanced stats functionality in LogsQL (but it can
be replaced with traditional CLI tools)
88. VictoriaLogs: (temporary) drawbacks
● Missing data extraction and advanced stats functionality in LogsQL (but it can
be replaced with traditional CLI tools)
● Missing cluster version
89. VictoriaLogs: (temporary) drawbacks
● Missing data extraction and advanced stats functionality in LogsQL (but it can
be replaced with traditional CLI tools)
● Missing cluster version (but a single-node VictoriaLogs can replace a 30-node
Elasticsearch or Loki cluster)
90. VictoriaLogs: (temporary) drawbacks
● Missing data extraction and advanced stats functionality in LogsQL (but it can
be replaced with traditional CLI tools)
● Missing cluster version (but a single-node VictoriaLogs can replace a 30-node
Elasticsearch or Loki cluster)
● Both drawbacks will be resolved soon
91. VictoriaLogs: (temporary) drawbacks
● Missing data extraction and advanced stats functionality in LogsQL (but it can
be replaced with traditional CLI tools)
● Missing cluster version (but a single-node VictoriaLogs can replace a 30-node
Elasticsearch or Loki cluster)
● Both drawbacks will be resolved soon (but try VictoriaLogs in production right
now!)
93. VictoriaLogs: recap
● Easy to setup and operate
● The lowest RAM usage and disk space usage (up to 30x less than Elasticsearch
and Grafana Loki)
94. VictoriaLogs: recap
● Easy to setup and operate
● The lowest RAM usage and disk space usage (up to 30x less than Elasticsearch
and Grafana Loki)
● Fast full-text search
95. VictoriaLogs: recap
● Easy to setup and operate
● The lowest RAM usage and disk space usage (up to 30x less than Elasticsearch
and Grafana Loki)
● Fast full-text search
● Excellent integration with traditional command-line tools for log analysis
96. VictoriaLogs: recap
● Easy to setup and operate
● The lowest RAM usage and disk space usage (up to 30x less than Elasticsearch
and Grafana Loki)
● Fast full-text search
● Excellent integration with traditional command-line tools for log analysis
● Accepts logs from popular log shippers (Filebeat, Fluentbit, Logstash, Vector,
Promtail, Grafana Agent)
97. VictoriaLogs: recap
● Easy to setup and operate
● The lowest RAM usage and disk space usage (up to 30x less than Elasticsearch
and Grafana Loki)
● Fast full-text search
● Excellent integration with traditional command-line tools for log analysis
● Accepts logs from popular log shippers (Filebeat, Fluentbit, Logstash, Vector,
Promtail, Grafana Agent)
● Open source and free to use!