1) The document discusses how monitoring micro-metrics like garbage collection logs and thread dumps can help predict production outages in applications. It provides examples of how specific micro-metrics could predict issues like memory leaks, backend slowdowns, CPU spikes, and poor response times.
2) The document also describes yCrash, a tool that captures micro-metrics every 3 minutes from applications and uses machine learning to detect potential problems and trigger full troubleshooting if an issue is forecasted. It provides open-source scripts to collect various system and application metrics for troubleshooting.
3) Real-world case studies are presented on how micro-metrics helped predict and solve issues for major financial, trading, and travel companies to prevent production
7. 7
What is GC Throughput?
Amount of time application spends in processing customer
transactions
vs
Amount of time application spends in processing garbage
collection activity
31. My App
yCrash
agent
yCrash Server
Container/Machine
1
Every 3 minutes Micro-Metrics*
are captured
2 Metrics are transmitted
4 If problem forecasted,
360 ° data capture
is triggered
3 ML, Patterns applied on the Micro-Metrics
Cloud/On-premise
31
Micro-Metrics *
1. Garbage Collection Log
2. Thread Dump + top –H
3. Application Log
Micro-Metrics Monitoring Architecture
33. Ram Lakshmanan ram@tier1app.com
@tier1app https://www.linkedin.com/company/ycrash
This deck will be published in:
https://blog.ycrash.io
If you want to learn more …
33
THANK YOU
FRIENDS