SlideShare ist ein Scribd-Unternehmen logo
1 von 42
Downloaden Sie, um offline zu lesen
Top 5 Production
Performance
Problems
Ram Lakshmanan
Architect yCrash
Top 5 Performance problems
What you will be learning?
2
Real case studies with Real data
Most efficient way to troubleshoot these problems
3
Backend Slowdown
Application Architecture
JDBC
SOAP
MainFrame
REST
Server Thread Pool
Application Server
HTTP(S) request
4
Application Architecture
JDBC
SOAP
MainFrame
REST
Server Thread Pool
Application Server
HTTP(S) request
5
1. GC Log
10. netstat
12. vmstat
2. Thread Dump
9. dmesg
3. Heap Dump
6. ps
8. Disk Usage
5. top 13. iostat
11. ping
14. Kernel Params
15. App Logs
16. metadata
4. Heap Substitute
7. top -H
6
Open-source script:
https://github.com/ycrash/yc-data-script
360° Troubleshooting artifacts
1 2
3
1 Timestamp at which thread dump was triggered
2 JVM Version info
3 Thread Details - <<details in following slides>>
7
1 2 3 4 5
6
7
1 Thread Name - InvoiceThread-A996
2 Priority - Can have values from 1 to 10
3 Thread Id - 0x00002b7cfc6fb000 – Unique ID assigned by JVM. It's returned by calling the Thread.getId() method.
4 Native Id - 0x4479 - This ID is highly platform dependent. On Linux, it's the pid of the thread. On Windows, it's simply the OS-level thread ID with
a process. On Mac OS X, it is said to be the native pthread_t value.
5 Address space - 0x00002b7d17ab8000 -
6 Thread State - RUNNABLE
7 Stack trace -
8
How to analyze Thread dump?
https://www.ibm.com/support/pages
/ibm-thread-and-monitor-dump-
analyzer-java-tmda
IBM TDMA
FastThread
https://fastthread.io/
03
02
https://tinyurl.com/wq95weo
Sample thread
report
yCrash
https://ycrash.io/
01
9
10
Case Study
Backend Slowdown in a Major
Financial Institution in N.
America
OutOfMemoryError
11
12
Memory Leak Program
Open Source app to simulate Performance Problems
BuggyApp
MemoryLeakDemo
Object1
Object2
MapManager
Key1 Large String…1
Key2 Large String…2
key3 Large String…3
:
:
KeyN Large String…N
myMap
14
Memory of Healthy Application
- Full Garbage Collection Event
15
Acute Memory Leak
- Full Garbage Collection Event
16
Memory Leak
- Full Garbage Collection Event
1. GC Log
10. netstat
12. vmstat
2. Thread Dump
9. dmesg
3. Heap Dump
6. ps
8. Disk Usage
5. top 13. iostat
11. ping
14. Kernel Params
15. App Logs
16. metadata
4. Heap Substitute
7. top -H
17
Open-source script:
https://github.com/ycrash/yc-data-script
360° Troubleshooting artifacts
How to analyze Heap dump?
jhat (oracle.com)
Jhat
Eclipse MAT
https://www.eclipse.org/mat
HeapHero
https://heaphero.io/
04
03
02
https://tinyurl.com/5sxz7dsr
Sample heap report
yCrash
https://ycrash.io/
01
18
CPU Spike
19
top –H –p <PROCESS_ID>’
Secrete Option:
20
We might have used ‘top’
1. GC Log
10. netstat
12. vmstat
2. Thread Dump
9. dmesg
3. Heap Dump
6. ps
8. Disk Usage
5. top 13. iostat
11. ping
14. Kernel Params
15. App Logs
16. metadata
4. Heap Substitute
7. top -H
21
Open-source script:
https://github.com/ycrash/yc-data-script
360° Troubleshooting artifacts
Case Study
Major Trading app in N.
America
https://blog.fastthread.io/2020/04/23/troubleshooting-cpu-spike-in-a-major-trading-application/
22
Garbage Collection
23
What is Garbage?
HTTP Request
Objects
Memory
Garbage
24
25
3-4 Decades ago
Developer
Writes code to Manually evict Garbage
JVM
Automatically evicts Garbage
Now
How are objects Garbage Collected?
Evolution: Manual -> Automatic
26
Automatic GC sounds good right?
Yes, but for
GC pauses CPU consumption
Open-source script:
https://github.com/ycrash/yc-data-script
1. GC Log
10. netstat
12. vmstat
2. Thread Dump
9. dmesg
3. Heap Dump
6. ps
8. Disk Usage
5. top 13. iostat
11. ping
14. Kernel Params
15. App Logs
16. metadata
4. Heap Substitute
7. top -H
27
360° Troubleshooting artifacts
2019-08-31T01:09:19.397+0000: 1.606: [GC (Metadata GC Threshold) [PSYoungGen: 545393K->18495K(2446848K)] 545393K->18519K(8039424K),
0.0189376 secs] [Times: user=0.15 sys=0.01, real=0.02 secs]
2019-08-31T01:09:19.416+0000: 1.625: [Full GC (Metadata GC Threshold) [PSYoungGen: 18495K->0K(2446848K)] [ParOldGen: 24K-
>17366K(5592576K)] 18519K->17366K(8039424K), [Metaspace: 20781K->20781K(1067008K)], 0.0416162 secs] [Times: user=0.38 sys=0.03,
real=0.04 secs]
2019-08-31T01:18:19.288+0000: 541.497: [GC (Metadata GC Threshold) [PSYoungGen: 1391495K->18847K(2446848K)] 1408861K-
>36230K(8039424K), 0.0568365 secs] [Times: user=0.31 sys=0.75, real=0.06 secs]
2019-08-31T01:18:19.345+0000: 541.554: [Full GC (Metadata GC Threshold) [PSYoungGen: 18847K->0K(2446848K)] [ParOldGen: 17382K-
>25397K(5592576K)] 36230K->25397K(8039424K), [Metaspace: 34865K->34865K(1079296K)], 0.0467640 secs] [Times: user=0.31 sys=0.08,
real=0.04 secs]
2019-08-31T02:33:20.326+0000: 5042.536: [GC (Allocation Failure) [PSYoungGen: 2097664K->11337K(2446848K)] 2123061K->36742K(8039424K),
0.3298985 secs] [Times: user=0.00 sys=9.20, real=0.33 secs]
2019-08-31T03:40:11.749+0000: 9053.959: [GC (Allocation Failure) [PSYoungGen: 2109001K->15776K(2446848K)] 2134406K->41189K(8039424K),
0.0517517 secs] [Times: user=0.00 sys=1.22, real=0.05 secs]
2019-08-31T05:11:46.869+0000: 14549.079: [GC (Allocation Failure) [PSYoungGen: 2113440K->24832K(2446848K)] 2138853K->50253K(8039424K),
0.0392831 secs] [Times: user=0.02 sys=0.79, real=0.04 secs]
2019-08-31T06:26:10.376+0000: 19012.586: [GC (Allocation Failure) [PSYoungGen: 2122496K->25600K(2756096K)] 2147917K->58149K(8348672K),
0.0371416 secs] [Times: user=0.01 sys=0.75, real=0.04 secs]
2019-08-31T07:50:03.442+0000: 24045.652: [GC (Allocation Failure) [PSYoungGen: 2756096K->32768K(2763264K)] 2788645K->72397K(8355840K),
0.0709641 secs] [Times: user=0.16 sys=1.39, real=0.07 secs]
2019-08-31T09:04:21.406+0000: 28503.616: [GC (Allocation Failure) [PSYoungGen: 2763264K->32768K(2733568K)] 2802893K->83469K(8326144K),
0.0789178 secs] [Times: user=0.12 sys=1.59, real=0.08 secs]
Sample GC Log
How to analyze GC Log?
https://developer.ibm.co
m/javasdk/tools/
IBM GC & Memory visualizer
GCeasy
yCrash
https://gceasy.io/
Google Garbage cat (cms)
https://code.google.com/
archive/a/eclipselabs.org/
p/garbagecat
HP Jmeter
https://h20392.www2.hpe
.com/portal/swdepot/displ
ayProductInfo.do?produc
tNumber=HPJMETER
03
02
01
05
04
https://ycrash.io/
29
30
Case Study
Long GC Pauses in Top Cloud hosting
Provider
https://blog.gceasy.io/2022/03/04/garbage-collection-tuning-success-story-reducing-young-gen-size/
How does 96% GC Throughput sound?
1 day = 1440 Minutes (i.e., 24 hours x 60 minutes)
96% GC Throughput means app pausing for 57.6
minutes/day 31
What is GC Throughput?
Amount of time application spends in processing customer
transactions
vs
Amount of time application spends in processing garbage
collection activity
Concurrency Issues
32
public void synchronized getData() {
doSomething();
}
Thread 1
Thread 2
Thread 1
BLOCKED THREADS
BLOCKED thread state
33
1. GC Log
10. netstat
12. vmstat
2. Thread Dump
9. dmesg
3. Heap Dump
6. ps
8. Disk Usage
5. top 13. iostat
11. ping
14. Kernel Params
15. App Logs
16. metadata
4. Heap Substitute
7. top -H
34
Open-source script:
https://github.com/ycrash/yc-data-script
360° Troubleshooting artifacts
Case Study
Major Leisure Travel Service
Provider
https://blog.ycrash.io/2022/03/09/java-uuid-generation-performance-impact/
35
36
Environmental Issues
Bonus
Case Study
Intermittent HTTP 502 errors in
AWS EBS Service
37
EBS Architecture
38
Clue: Nginx Error
39
1. GC Log
10. netstat
12. vmstat
2. Thread Dump
9. dmesg
3. Heap Dump
6. ps
8. Disk Usage
5. top 13. iostat
11. ping
14. Kernel Params
15. App Logs
16. metadata
4. Heap Substitute
7. top -H
40
Open-source script:
https://github.com/ycrash/yc-data-script
360° Data
41
Ram Lakshmanan ram@tier1app.com
@tier1app https://www.linkedin.com/company/ycrash
This deck will be published in:
https://blog.ycrash.io
If you want to learn more …
42
THANK YOU
FRIENDS

Weitere ähnliche Inhalte

Ähnlich wie Top-5-production-devconMunich-2023-v2.pptx

ATO Linux Performance 2018
ATO Linux Performance 2018ATO Linux Performance 2018
ATO Linux Performance 2018Brendan Gregg
 
16 ARTIFACTS TO CAPTURE WHEN YOUR CONTAINER APPLICATION IS IN TROUBLE
16 ARTIFACTS TO CAPTURE WHEN YOUR CONTAINER APPLICATION IS IN TROUBLE16 ARTIFACTS TO CAPTURE WHEN YOUR CONTAINER APPLICATION IS IN TROUBLE
16 ARTIFACTS TO CAPTURE WHEN YOUR CONTAINER APPLICATION IS IN TROUBLETier1 app
 
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...Ontico
 
Java profiling Do It Yourself
Java profiling Do It YourselfJava profiling Do It Yourself
Java profiling Do It Yourselfaragozin
 
Top-5-java-perf-problems-jax_mainz_2024.pptx
Top-5-java-perf-problems-jax_mainz_2024.pptxTop-5-java-perf-problems-jax_mainz_2024.pptx
Top-5-java-perf-problems-jax_mainz_2024.pptxTier1 app
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceBrendan Gregg
 
Become a Java GC Hero - All Day Devops
Become a Java GC Hero - All Day DevopsBecome a Java GC Hero - All Day Devops
Become a Java GC Hero - All Day DevopsTier1app
 
Pick diamonds from garbage
Pick diamonds from garbagePick diamonds from garbage
Pick diamonds from garbageTier1 App
 
Speedrunning the Open Street Map osm2pgsql Loader
Speedrunning the Open Street Map osm2pgsql LoaderSpeedrunning the Open Street Map osm2pgsql Loader
Speedrunning the Open Street Map osm2pgsql LoaderGregSmith458515
 
Reverse engineering Swisscom's Centro Grande Modem
Reverse engineering Swisscom's Centro Grande ModemReverse engineering Swisscom's Centro Grande Modem
Reverse engineering Swisscom's Centro Grande ModemCyber Security Alliance
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016Brendan Gregg
 
Varnish @ Velocity Ignite
Varnish @ Velocity IgniteVarnish @ Velocity Ignite
Varnish @ Velocity IgniteArtur Bergman
 
Troubleshooting real production problems
Troubleshooting real production problemsTroubleshooting real production problems
Troubleshooting real production problemsTier1 app
 
Nodejs性能分析优化和分布式设计探讨
Nodejs性能分析优化和分布式设计探讨Nodejs性能分析优化和分布式设计探讨
Nodejs性能分析优化和分布式设计探讨flyinweb
 
Debugging linux issues with eBPF
Debugging linux issues with eBPFDebugging linux issues with eBPF
Debugging linux issues with eBPFIvan Babrou
 
Java profiling Do It Yourself (jug.msk.ru 2016)
Java profiling Do It Yourself (jug.msk.ru 2016)Java profiling Do It Yourself (jug.msk.ru 2016)
Java profiling Do It Yourself (jug.msk.ru 2016)aragozin
 
Oracle Basics and Architecture
Oracle Basics and ArchitectureOracle Basics and Architecture
Oracle Basics and ArchitectureSidney Chen
 
this-is-garbage-talk-2022.pptx
this-is-garbage-talk-2022.pptxthis-is-garbage-talk-2022.pptx
this-is-garbage-talk-2022.pptxTier1 app
 
Building an Automated Behavioral Malware Analysis Environment using Free and ...
Building an Automated Behavioral Malware Analysis Environment using Free and ...Building an Automated Behavioral Malware Analysis Environment using Free and ...
Building an Automated Behavioral Malware Analysis Environment using Free and ...Jim Clausing
 
Am I reading GC logs Correctly?
Am I reading GC logs Correctly?Am I reading GC logs Correctly?
Am I reading GC logs Correctly?Tier1 App
 

Ähnlich wie Top-5-production-devconMunich-2023-v2.pptx (20)

ATO Linux Performance 2018
ATO Linux Performance 2018ATO Linux Performance 2018
ATO Linux Performance 2018
 
16 ARTIFACTS TO CAPTURE WHEN YOUR CONTAINER APPLICATION IS IN TROUBLE
16 ARTIFACTS TO CAPTURE WHEN YOUR CONTAINER APPLICATION IS IN TROUBLE16 ARTIFACTS TO CAPTURE WHEN YOUR CONTAINER APPLICATION IS IN TROUBLE
16 ARTIFACTS TO CAPTURE WHEN YOUR CONTAINER APPLICATION IS IN TROUBLE
 
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...
 
Java profiling Do It Yourself
Java profiling Do It YourselfJava profiling Do It Yourself
Java profiling Do It Yourself
 
Top-5-java-perf-problems-jax_mainz_2024.pptx
Top-5-java-perf-problems-jax_mainz_2024.pptxTop-5-java-perf-problems-jax_mainz_2024.pptx
Top-5-java-perf-problems-jax_mainz_2024.pptx
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems Performance
 
Become a Java GC Hero - All Day Devops
Become a Java GC Hero - All Day DevopsBecome a Java GC Hero - All Day Devops
Become a Java GC Hero - All Day Devops
 
Pick diamonds from garbage
Pick diamonds from garbagePick diamonds from garbage
Pick diamonds from garbage
 
Speedrunning the Open Street Map osm2pgsql Loader
Speedrunning the Open Street Map osm2pgsql LoaderSpeedrunning the Open Street Map osm2pgsql Loader
Speedrunning the Open Street Map osm2pgsql Loader
 
Reverse engineering Swisscom's Centro Grande Modem
Reverse engineering Swisscom's Centro Grande ModemReverse engineering Swisscom's Centro Grande Modem
Reverse engineering Swisscom's Centro Grande Modem
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
 
Varnish @ Velocity Ignite
Varnish @ Velocity IgniteVarnish @ Velocity Ignite
Varnish @ Velocity Ignite
 
Troubleshooting real production problems
Troubleshooting real production problemsTroubleshooting real production problems
Troubleshooting real production problems
 
Nodejs性能分析优化和分布式设计探讨
Nodejs性能分析优化和分布式设计探讨Nodejs性能分析优化和分布式设计探讨
Nodejs性能分析优化和分布式设计探讨
 
Debugging linux issues with eBPF
Debugging linux issues with eBPFDebugging linux issues with eBPF
Debugging linux issues with eBPF
 
Java profiling Do It Yourself (jug.msk.ru 2016)
Java profiling Do It Yourself (jug.msk.ru 2016)Java profiling Do It Yourself (jug.msk.ru 2016)
Java profiling Do It Yourself (jug.msk.ru 2016)
 
Oracle Basics and Architecture
Oracle Basics and ArchitectureOracle Basics and Architecture
Oracle Basics and Architecture
 
this-is-garbage-talk-2022.pptx
this-is-garbage-talk-2022.pptxthis-is-garbage-talk-2022.pptx
this-is-garbage-talk-2022.pptx
 
Building an Automated Behavioral Malware Analysis Environment using Free and ...
Building an Automated Behavioral Malware Analysis Environment using Free and ...Building an Automated Behavioral Malware Analysis Environment using Free and ...
Building an Automated Behavioral Malware Analysis Environment using Free and ...
 
Am I reading GC logs Correctly?
Am I reading GC logs Correctly?Am I reading GC logs Correctly?
Am I reading GC logs Correctly?
 

Mehr von Tier1 app

KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorTier1 app
 
predicting-m3-devopsconMunich-2023-v2.pptx
predicting-m3-devopsconMunich-2023-v2.pptxpredicting-m3-devopsconMunich-2023-v2.pptx
predicting-m3-devopsconMunich-2023-v2.pptxTier1 app
 
predicting-m3-devopsconMunich-2023.pptx
predicting-m3-devopsconMunich-2023.pptxpredicting-m3-devopsconMunich-2023.pptx
predicting-m3-devopsconMunich-2023.pptxTier1 app
 
Predicting Production Outages: Unleashing the Power of Micro-Metrics – ADDO C...
Predicting Production Outages: Unleashing the Power of Micro-Metrics – ADDO C...Predicting Production Outages: Unleashing the Power of Micro-Metrics – ADDO C...
Predicting Production Outages: Unleashing the Power of Micro-Metrics – ADDO C...Tier1 app
 
7-JVM-arguments-JaxLondon-2023.pptx
7-JVM-arguments-JaxLondon-2023.pptx7-JVM-arguments-JaxLondon-2023.pptx
7-JVM-arguments-JaxLondon-2023.pptxTier1 app
 
KnowAPIs-UnknownPerf-confoo-2023 (1).pptx
KnowAPIs-UnknownPerf-confoo-2023 (1).pptxKnowAPIs-UnknownPerf-confoo-2023 (1).pptx
KnowAPIs-UnknownPerf-confoo-2023 (1).pptxTier1 app
 
memory-patterns-confoo-2023.pptx
memory-patterns-confoo-2023.pptxmemory-patterns-confoo-2023.pptx
memory-patterns-confoo-2023.pptxTier1 app
 
millions-gc-jax-2022.pptx
millions-gc-jax-2022.pptxmillions-gc-jax-2022.pptx
millions-gc-jax-2022.pptxTier1 app
 
lets-crash-apps-jax-2022.pptx
lets-crash-apps-jax-2022.pptxlets-crash-apps-jax-2022.pptx
lets-crash-apps-jax-2022.pptxTier1 app
 
Lets crash-applications
Lets crash-applicationsLets crash-applications
Lets crash-applicationsTier1 app
 
Lets crash-applications
Lets crash-applicationsLets crash-applications
Lets crash-applicationsTier1 app
 
7 habits of highly effective Performance Troubleshooters
7 habits of highly effective Performance Troubleshooters7 habits of highly effective Performance Troubleshooters
7 habits of highly effective Performance TroubleshootersTier1 app
 
Major outagesmajorenteprises 2021
Major outagesmajorenteprises 2021Major outagesmajorenteprises 2021
Major outagesmajorenteprises 2021Tier1 app
 
Jvm internals-1-slide
Jvm internals-1-slideJvm internals-1-slide
Jvm internals-1-slideTier1 app
 
Accelerating Incident Response To Production Outages
Accelerating Incident Response To Production OutagesAccelerating Incident Response To Production Outages
Accelerating Incident Response To Production OutagesTier1 app
 
7 jvm-arguments-Confoo
7 jvm-arguments-Confoo7 jvm-arguments-Confoo
7 jvm-arguments-ConfooTier1 app
 
7 jvm-arguments-v1
7 jvm-arguments-v17 jvm-arguments-v1
7 jvm-arguments-v1Tier1 app
 
How & why-memory-efficient?
How & why-memory-efficient?How & why-memory-efficient?
How & why-memory-efficient?Tier1 app
 
Top feedbacks
Top feedbacksTop feedbacks
Top feedbacksTier1 app
 

Mehr von Tier1 app (20)

KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryError
 
predicting-m3-devopsconMunich-2023-v2.pptx
predicting-m3-devopsconMunich-2023-v2.pptxpredicting-m3-devopsconMunich-2023-v2.pptx
predicting-m3-devopsconMunich-2023-v2.pptx
 
predicting-m3-devopsconMunich-2023.pptx
predicting-m3-devopsconMunich-2023.pptxpredicting-m3-devopsconMunich-2023.pptx
predicting-m3-devopsconMunich-2023.pptx
 
Predicting Production Outages: Unleashing the Power of Micro-Metrics – ADDO C...
Predicting Production Outages: Unleashing the Power of Micro-Metrics – ADDO C...Predicting Production Outages: Unleashing the Power of Micro-Metrics – ADDO C...
Predicting Production Outages: Unleashing the Power of Micro-Metrics – ADDO C...
 
7-JVM-arguments-JaxLondon-2023.pptx
7-JVM-arguments-JaxLondon-2023.pptx7-JVM-arguments-JaxLondon-2023.pptx
7-JVM-arguments-JaxLondon-2023.pptx
 
KnowAPIs-UnknownPerf-confoo-2023 (1).pptx
KnowAPIs-UnknownPerf-confoo-2023 (1).pptxKnowAPIs-UnknownPerf-confoo-2023 (1).pptx
KnowAPIs-UnknownPerf-confoo-2023 (1).pptx
 
memory-patterns-confoo-2023.pptx
memory-patterns-confoo-2023.pptxmemory-patterns-confoo-2023.pptx
memory-patterns-confoo-2023.pptx
 
millions-gc-jax-2022.pptx
millions-gc-jax-2022.pptxmillions-gc-jax-2022.pptx
millions-gc-jax-2022.pptx
 
lets-crash-apps-jax-2022.pptx
lets-crash-apps-jax-2022.pptxlets-crash-apps-jax-2022.pptx
lets-crash-apps-jax-2022.pptx
 
Lets crash-applications
Lets crash-applicationsLets crash-applications
Lets crash-applications
 
Lets crash-applications
Lets crash-applicationsLets crash-applications
Lets crash-applications
 
7 habits of highly effective Performance Troubleshooters
7 habits of highly effective Performance Troubleshooters7 habits of highly effective Performance Troubleshooters
7 habits of highly effective Performance Troubleshooters
 
Major outagesmajorenteprises 2021
Major outagesmajorenteprises 2021Major outagesmajorenteprises 2021
Major outagesmajorenteprises 2021
 
Jvm internals-1-slide
Jvm internals-1-slideJvm internals-1-slide
Jvm internals-1-slide
 
Accelerating Incident Response To Production Outages
Accelerating Incident Response To Production OutagesAccelerating Incident Response To Production Outages
Accelerating Incident Response To Production Outages
 
7 jvm-arguments-Confoo
7 jvm-arguments-Confoo7 jvm-arguments-Confoo
7 jvm-arguments-Confoo
 
7 jvm-arguments-v1
7 jvm-arguments-v17 jvm-arguments-v1
7 jvm-arguments-v1
 
How & why-memory-efficient?
How & why-memory-efficient?How & why-memory-efficient?
How & why-memory-efficient?
 
Top feedbacks
Top feedbacksTop feedbacks
Top feedbacks
 

Kürzlich hochgeladen

ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 

Kürzlich hochgeladen (20)

ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 

Top-5-production-devconMunich-2023-v2.pptx

  • 1. Top 5 Production Performance Problems Ram Lakshmanan Architect yCrash
  • 2. Top 5 Performance problems What you will be learning? 2 Real case studies with Real data Most efficient way to troubleshoot these problems
  • 4. Application Architecture JDBC SOAP MainFrame REST Server Thread Pool Application Server HTTP(S) request 4
  • 5. Application Architecture JDBC SOAP MainFrame REST Server Thread Pool Application Server HTTP(S) request 5
  • 6. 1. GC Log 10. netstat 12. vmstat 2. Thread Dump 9. dmesg 3. Heap Dump 6. ps 8. Disk Usage 5. top 13. iostat 11. ping 14. Kernel Params 15. App Logs 16. metadata 4. Heap Substitute 7. top -H 6 Open-source script: https://github.com/ycrash/yc-data-script 360° Troubleshooting artifacts
  • 7. 1 2 3 1 Timestamp at which thread dump was triggered 2 JVM Version info 3 Thread Details - <<details in following slides>> 7
  • 8. 1 2 3 4 5 6 7 1 Thread Name - InvoiceThread-A996 2 Priority - Can have values from 1 to 10 3 Thread Id - 0x00002b7cfc6fb000 – Unique ID assigned by JVM. It's returned by calling the Thread.getId() method. 4 Native Id - 0x4479 - This ID is highly platform dependent. On Linux, it's the pid of the thread. On Windows, it's simply the OS-level thread ID with a process. On Mac OS X, it is said to be the native pthread_t value. 5 Address space - 0x00002b7d17ab8000 - 6 Thread State - RUNNABLE 7 Stack trace - 8
  • 9. How to analyze Thread dump? https://www.ibm.com/support/pages /ibm-thread-and-monitor-dump- analyzer-java-tmda IBM TDMA FastThread https://fastthread.io/ 03 02 https://tinyurl.com/wq95weo Sample thread report yCrash https://ycrash.io/ 01 9
  • 10. 10 Case Study Backend Slowdown in a Major Financial Institution in N. America
  • 12. 12 Memory Leak Program Open Source app to simulate Performance Problems BuggyApp
  • 13. MemoryLeakDemo Object1 Object2 MapManager Key1 Large String…1 Key2 Large String…2 key3 Large String…3 : : KeyN Large String…N myMap
  • 14. 14 Memory of Healthy Application - Full Garbage Collection Event
  • 15. 15 Acute Memory Leak - Full Garbage Collection Event
  • 16. 16 Memory Leak - Full Garbage Collection Event
  • 17. 1. GC Log 10. netstat 12. vmstat 2. Thread Dump 9. dmesg 3. Heap Dump 6. ps 8. Disk Usage 5. top 13. iostat 11. ping 14. Kernel Params 15. App Logs 16. metadata 4. Heap Substitute 7. top -H 17 Open-source script: https://github.com/ycrash/yc-data-script 360° Troubleshooting artifacts
  • 18. How to analyze Heap dump? jhat (oracle.com) Jhat Eclipse MAT https://www.eclipse.org/mat HeapHero https://heaphero.io/ 04 03 02 https://tinyurl.com/5sxz7dsr Sample heap report yCrash https://ycrash.io/ 01 18
  • 20. top –H –p <PROCESS_ID>’ Secrete Option: 20 We might have used ‘top’
  • 21. 1. GC Log 10. netstat 12. vmstat 2. Thread Dump 9. dmesg 3. Heap Dump 6. ps 8. Disk Usage 5. top 13. iostat 11. ping 14. Kernel Params 15. App Logs 16. metadata 4. Heap Substitute 7. top -H 21 Open-source script: https://github.com/ycrash/yc-data-script 360° Troubleshooting artifacts
  • 22. Case Study Major Trading app in N. America https://blog.fastthread.io/2020/04/23/troubleshooting-cpu-spike-in-a-major-trading-application/ 22
  • 24. What is Garbage? HTTP Request Objects Memory Garbage 24
  • 25. 25 3-4 Decades ago Developer Writes code to Manually evict Garbage JVM Automatically evicts Garbage Now How are objects Garbage Collected? Evolution: Manual -> Automatic
  • 26. 26 Automatic GC sounds good right? Yes, but for GC pauses CPU consumption
  • 27. Open-source script: https://github.com/ycrash/yc-data-script 1. GC Log 10. netstat 12. vmstat 2. Thread Dump 9. dmesg 3. Heap Dump 6. ps 8. Disk Usage 5. top 13. iostat 11. ping 14. Kernel Params 15. App Logs 16. metadata 4. Heap Substitute 7. top -H 27 360° Troubleshooting artifacts
  • 28. 2019-08-31T01:09:19.397+0000: 1.606: [GC (Metadata GC Threshold) [PSYoungGen: 545393K->18495K(2446848K)] 545393K->18519K(8039424K), 0.0189376 secs] [Times: user=0.15 sys=0.01, real=0.02 secs] 2019-08-31T01:09:19.416+0000: 1.625: [Full GC (Metadata GC Threshold) [PSYoungGen: 18495K->0K(2446848K)] [ParOldGen: 24K- >17366K(5592576K)] 18519K->17366K(8039424K), [Metaspace: 20781K->20781K(1067008K)], 0.0416162 secs] [Times: user=0.38 sys=0.03, real=0.04 secs] 2019-08-31T01:18:19.288+0000: 541.497: [GC (Metadata GC Threshold) [PSYoungGen: 1391495K->18847K(2446848K)] 1408861K- >36230K(8039424K), 0.0568365 secs] [Times: user=0.31 sys=0.75, real=0.06 secs] 2019-08-31T01:18:19.345+0000: 541.554: [Full GC (Metadata GC Threshold) [PSYoungGen: 18847K->0K(2446848K)] [ParOldGen: 17382K- >25397K(5592576K)] 36230K->25397K(8039424K), [Metaspace: 34865K->34865K(1079296K)], 0.0467640 secs] [Times: user=0.31 sys=0.08, real=0.04 secs] 2019-08-31T02:33:20.326+0000: 5042.536: [GC (Allocation Failure) [PSYoungGen: 2097664K->11337K(2446848K)] 2123061K->36742K(8039424K), 0.3298985 secs] [Times: user=0.00 sys=9.20, real=0.33 secs] 2019-08-31T03:40:11.749+0000: 9053.959: [GC (Allocation Failure) [PSYoungGen: 2109001K->15776K(2446848K)] 2134406K->41189K(8039424K), 0.0517517 secs] [Times: user=0.00 sys=1.22, real=0.05 secs] 2019-08-31T05:11:46.869+0000: 14549.079: [GC (Allocation Failure) [PSYoungGen: 2113440K->24832K(2446848K)] 2138853K->50253K(8039424K), 0.0392831 secs] [Times: user=0.02 sys=0.79, real=0.04 secs] 2019-08-31T06:26:10.376+0000: 19012.586: [GC (Allocation Failure) [PSYoungGen: 2122496K->25600K(2756096K)] 2147917K->58149K(8348672K), 0.0371416 secs] [Times: user=0.01 sys=0.75, real=0.04 secs] 2019-08-31T07:50:03.442+0000: 24045.652: [GC (Allocation Failure) [PSYoungGen: 2756096K->32768K(2763264K)] 2788645K->72397K(8355840K), 0.0709641 secs] [Times: user=0.16 sys=1.39, real=0.07 secs] 2019-08-31T09:04:21.406+0000: 28503.616: [GC (Allocation Failure) [PSYoungGen: 2763264K->32768K(2733568K)] 2802893K->83469K(8326144K), 0.0789178 secs] [Times: user=0.12 sys=1.59, real=0.08 secs] Sample GC Log
  • 29. How to analyze GC Log? https://developer.ibm.co m/javasdk/tools/ IBM GC & Memory visualizer GCeasy yCrash https://gceasy.io/ Google Garbage cat (cms) https://code.google.com/ archive/a/eclipselabs.org/ p/garbagecat HP Jmeter https://h20392.www2.hpe .com/portal/swdepot/displ ayProductInfo.do?produc tNumber=HPJMETER 03 02 01 05 04 https://ycrash.io/ 29
  • 30. 30 Case Study Long GC Pauses in Top Cloud hosting Provider https://blog.gceasy.io/2022/03/04/garbage-collection-tuning-success-story-reducing-young-gen-size/
  • 31. How does 96% GC Throughput sound? 1 day = 1440 Minutes (i.e., 24 hours x 60 minutes) 96% GC Throughput means app pausing for 57.6 minutes/day 31 What is GC Throughput? Amount of time application spends in processing customer transactions vs Amount of time application spends in processing garbage collection activity
  • 33. public void synchronized getData() { doSomething(); } Thread 1 Thread 2 Thread 1 BLOCKED THREADS BLOCKED thread state 33
  • 34. 1. GC Log 10. netstat 12. vmstat 2. Thread Dump 9. dmesg 3. Heap Dump 6. ps 8. Disk Usage 5. top 13. iostat 11. ping 14. Kernel Params 15. App Logs 16. metadata 4. Heap Substitute 7. top -H 34 Open-source script: https://github.com/ycrash/yc-data-script 360° Troubleshooting artifacts
  • 35. Case Study Major Leisure Travel Service Provider https://blog.ycrash.io/2022/03/09/java-uuid-generation-performance-impact/ 35
  • 37. Case Study Intermittent HTTP 502 errors in AWS EBS Service 37
  • 40. 1. GC Log 10. netstat 12. vmstat 2. Thread Dump 9. dmesg 3. Heap Dump 6. ps 8. Disk Usage 5. top 13. iostat 11. ping 14. Kernel Params 15. App Logs 16. metadata 4. Heap Substitute 7. top -H 40 Open-source script: https://github.com/ycrash/yc-data-script 360° Data
  • 41. 41
  • 42. Ram Lakshmanan ram@tier1app.com @tier1app https://www.linkedin.com/company/ycrash This deck will be published in: https://blog.ycrash.io If you want to learn more … 42 THANK YOU FRIENDS

Hinweis der Redaktion

  1. http://localhost:8080/yc-report.jsp?ou=SAP&de=198.134.23.1&app=yc&ts=2023-06-11T22-56-32
  2. http://localhost:8080/yc-report.jsp?ou=SAP&de=198.134.23.1&app=yc&ts=2023-06-11T22-56-32
  3. http://localhost:8080/yc-report.jsp?ou=SAP&de=192.168.17.183&app=yc&ts=2023-10-05T07-38-13
  4. http://localhost:8080/yc-load-report-hd?isWCReport=true&ou=SAP&de=192.168.17.183&app=yc&ts=2023-10-05T07-38-13
  5. http://localhost:8080/yc-load-report-hd?isWCReport=true&ou=SAP&de=192.168.17.183&app=yc&ts=2023-10-05T07-38-13
  6. http://localhost:8080/yc-report.jsp?ou=SAP&de=32.123.89.12&app=yc&ts=2023-06-11T23-54-10
  7. http://localhost:8080/yc-load-report-gc?ou=SAP&de=145.23.82.1&app=yc&ts=2023-06-11T23-03-50 http://localhost:8080/yc-load-report-gc?ou=SAP&de=193.45.89.12&app=yc&ts=2023-06-11T23-09-10
  8. http://localhost:8080/yc-report.jsp?ou=SAP&de=90.21.123.19&app=yc&ts=2023-12-03T19-11-33
  9. https://test.ycrash.io/yc-report-kernel.jsp?ou=czlWbG0rUko0UXAxazlSbjZrSUIwUT09&de=172.31.7.106&app=yc&ts=2023-09-01T10-25-39