SlideShare ist ein Scribd-Unternehmen logo
1 von 65
Downloaden Sie, um offline zu lesen
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
BPF Performance Analysis at Netflix
Brendan Gregg
O P N 3 0 3
Senior Performance Architect
Netflix
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Superpowers Demo
Agenda
Why BPF Is Changing Linux
BPF Internals
Performance Analysis
Tool Development
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Why BPF Is Changing Linux
Kernel
Applications
System Calls
Hardware
50 years, one (dominant) OS model
Hardware
Supervisor
Applications
Ring 1
Privilege
Ring 0
Ring 2
...
Origins: Multics
1960s
Kernel
User-mode
Applications
System Calls
Hardware
Modern Linux: a new OS model
Kernel-mode
Applications (BPF)
BPF Helper Calls
50 years, one process state model
SwappingKernel
User
Runnable
Wait
Block
Sleep
Idle
schedule
resource I/O
acquire lock
sleep
wait for work
Off-CPU
On-CPU
wakeup
acquired
wakeup
work arrives
preemption or time quantum expired
swap out
swap in
Linux groups
most sleep states
BPF uses a new program state model
Loaded
Enabled
event fires
program ended
Off-CPU On-CPU
BPF
attach
Kernel
helpers
Spinning
spin lock
Netconf 2018
Alexei Starvoitov
Kernel Recipes 2019, Alexei Starovoitov
~40 active BPF programs on every Facebook server
>150K Amazon EC2 server instances
~34% US Internet traffic at night
>130M subscribers
~14 active BPF programs on every instance (so far)
Kernel
User-mode
Applications
Hardware Events (incl. clock)
Modern Linux: Event-based Applications
Kernel-mode
Applications (BPF)
Scheduler Kernel
Events
U.E.
Smaller
Kernel
User-mode
Applications
Hardware
Modern Linux is becoming microkernel-ish
Kernel-mode
Services & Drivers
BPF BPF BPF
The word “microkernel” has already been invoked by Jonathan Corbet, Thomas Graf, Greg Kroah-Hartman, ...
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
BPF Internals
BPF 1992: Berkeley Packet Filter
A limited
virtual machine for
efficient packet filters
# tcpdump -d host 127.0.0.1 and port 80
(000) ldh [12]
(001) jeq #0x800 jt 2 jf 18
(002) ld [26]
(003) jeq #0x7f000001 jt 6 jf 4
(004) ld [30]
(005) jeq #0x7f000001 jt 6 jf 18
(006) ldb [23]
(007) jeq #0x84 jt 10 jf 8
(008) jeq #0x6 jt 10 jf 9
(009) jeq #0x11 jt 10 jf 18
(010) ldh [20]
(011) jset #0x1fff jt 18 jf 12
(012) ldxb 4*([14]&0xf)
(013) ldh [x + 14]
(014) jeq #0x50 jt 17 jf 15
(015) ldh [x + 16]
(016) jeq #0x50 jt 17 jf 18
(017) ret #262144
(018) ret #0
BPF 2019: aka extended BPF
bpftrace
BPF Microconference
XDP
& Facebook Katran, Google KRSI, Netflix flowsrus,
and many more
bpfconf
BPF 2019
Kernel
kprobes
uprobes
tracepoints
sockets
SDN Configuration
User-Defined BPF Programs
…
Event TargetsRuntime
perf_events
BPF
actions
BPF
verifier
DDoS Mitigation
Intrusion Detection
Container Security
Observability
Firewalls
Device Drivers
BPF is open source and in the Linux kernel
(you’re all getting it)
BPF is also now a technology name,
and no longer an acronym
BPF Internals
11
Registers
Map Storage (Mbytes)
Machine Code
Execution
BPF
Helpers
JIT Compiler
BPF Instructions
Rest of
Kernel
Events
BPF
Context
Verifier
Interpreter
Is BPF Turing complete?
BPF: a new type of software
Execution
model
User-
defined
Compile Security Failure
mode
Resource
access
User task yes any user-
based
abort syscall,
fault
Kernel task no static none panic direct
BPF event yes JIT,
CO-RE
verified,
JIT
error
message
restricted
helpers
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Performance Analysis
BPF enables a new class of
custom, efficient, and production-safe
performance analysis tools
BPF
Performance
Tools
Tool Examples by Subsystem
1. CPUs (scheduling)
2. Memory
3. Disks
4. File Systems
5. Networking
6. Languages
7. Applications
8. Kernel
9. Hypervisors
10. Containers
Tool Extensions & Sources
.py: BCC (Python)
.bt: bpftrace
(some tools exist for both)
https://github.com/iovisor/bcc
https://github.com/iovisor/bpftrace
https://github.com/brendangregg/bpf-perf-tools-book
CPUs: execsnoop
# execsnoop.py -T
TIME(s) PCOMM PID PPID RET ARGS
0.506 run 8745 1828 0 ./run
0.507 bash 8745 1828 0 /bin/bash
0.511 svstat 8747 8746 0 /command/svstat /service/nflx-httpd
0.511 perl 8748 8746 0 /usr/bin/perl -e $l=<>;$l=~/(d+) sec/;pr...
0.514 ps 8750 8749 0 /bin/ps --ppid 1 -o pid,cmd,args
0.514 grep 8751 8749 0 /bin/grep org.apache.catalina
0.514 sed 8752 8749 0 /bin/sed s/^ *//;
0.515 xargs 8754 8749 0 /usr/bin/xargs
0.515 cut 8753 8749 0 /usr/bin/cut -d -f 1
0.523 echo 8755 8754 0 /bin/echo
0.524 mkdir 8756 8745 0 /bin/mkdir -v -p /data/tomcat
[...]
1.528 run 8785 1828 0 ./run
1.529 bash 8785 1828 0 /bin/bash
1.533 svstat 8787 8786 0 /command/svstat /service/nflx-httpd
1.533 perl 8788 8786 0 /usr/bin/perl -e $l=<>;$l=~/(d+) sec/;pr...
[...]
New process trace
CPUs: runqlat
# runqlat.py 10 1
Tracing run queue latency... Hit Ctrl-C to end.
usecs : count distribution
0 -> 1 : 1906 |*** |
2 -> 3 : 22087 |****************************************|
4 -> 7 : 21245 |************************************** |
8 -> 15 : 7333 |************* |
16 -> 31 : 4902 |******** |
32 -> 63 : 6002 |********** |
64 -> 127 : 7370 |************* |
128 -> 255 : 13001 |*********************** |
256 -> 511 : 4823 |******** |
512 -> 1023 : 1519 |** |
1024 -> 2047 : 3682 |****** |
2048 -> 4095 : 3170 |***** |
4096 -> 8191 : 5759 |********** |
8192 -> 16383 : 14549 |************************** |
16384 -> 32767 : 5589 |********** |
Scheduler latency (run queue latency)
CPUs: runqlen
# runqlen.py 10 1
Sampling run queue length... Hit Ctrl-C to end.
runqlen : count distribution
0 : 47284 |****************************************|
1 : 211 | |
2 : 28 | |
3 : 6 | |
4 : 4 | |
5 : 1 | |
6 : 1 | |
Run queue length
Memory: ffaults (book)
# ffaults.bt
Attaching 1 probe...
^C
[...]
@[dpkg]: 18
@[sudoers.so]: 19
@[ld.so.cache]: 27
@[libpthread-2.27.so]: 29
@[ld-2.27.so]: 32
@[locale-archive]: 34
@[system.journal]: 39
@[libstdc++.so.6.0.25]: 43
@[libapt-pkg.so.5.0.2]: 47
@[BrowserMetrics-5D8A6422-77F1.pma]: 86
@[libc-2.27.so]: 168
@[i915]: 409
@[pkgcache.bin]: 860
@[]: 25038
Page faults by filename
Disks: biolatency
# biolatency.py -mT 1 5
Tracing block device I/O... Hit Ctrl-C to end.
06:20:16
msecs : count distribution
0 -> 1 : 36 |**************************************|
2 -> 3 : 1 |* |
4 -> 7 : 3 |*** |
8 -> 15 : 17 |***************** |
16 -> 31 : 33 |********************************** |
32 -> 63 : 7 |******* |
64 -> 127 : 6 |****** |
06:20:17
msecs : count distribution
0 -> 1 : 96 |************************************ |
2 -> 3 : 25 |********* |
4 -> 7 : 29 |*********** |
[...]
Disk I/O latency histograms, per second
File Systems: xfsslower
# xfsslower.py 50
Tracing XFS operations slower than 50 ms
TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME
21:20:46 java 112789 R 8012 13925 60.16 file.out
21:20:47 java 112789 R 3571 4268 136.60 file.out
21:20:49 java 112789 R 5152 1780 63.88 file.out
21:20:52 java 112789 R 5214 12434 108.47 file.out
21:20:52 java 112789 R 7465 19379 58.09 file.out
21:20:54 java 112789 R 5326 12311 89.14 file.out
21:20:55 java 112789 R 4336 3051 67.89 file.out
[...]
22:02:39 java 112789 R 65536 1486748 182.10 shuffle_6_646_0.data
22:02:39 java 112789 R 65536 872492 30.10 shuffle_6_646_0.data
22:02:39 java 112789 R 65536 1113896 309.52 shuffle_6_646_0.data
22:02:39 java 112789 R 65536 1481020 400.31 shuffle_6_646_0.data
22:02:39 java 112789 R 65536 1415232 324.92 shuffle_6_646_0.data
22:02:39 java 112789 R 65536 1147912 119.37 shuffle_6_646_0.data
[...]
XFS I/O slower than a threshold (variants for ext4, btrfs, zfs)
File Systems: xfsdist
# xfsdist.py 60
Tracing XFS operation latency... Hit Ctrl-C to end.
22:41:24:
operation = 'read'
usecs : count distribution
0 -> 1 : 382130 |****************************************|
2 -> 3 : 85717 |******** |
4 -> 7 : 23639 |** |
8 -> 15 : 5668 | |
16 -> 31 : 3594 | |
32 -> 63 : 21387 |** |
[...]
operation = 'write'
usecs : count distribution
0 -> 1 : 12925 |***** |
2 -> 3 : 83375 |************************************* |
[...]
XFS I/O latency histograms, by operation
Networking: tcplife
# tcplife.py
PID COMM LADDR LPORT RADDR RPORT TX_KB RX_KB MS
22597 recordProg 127.0.0.1 46644 127.0.0.1 28527 0 0 0.23
3277 redis-serv 127.0.0.1 28527 127.0.0.1 46644 0 0 0.28
22598 curl 100.66.3.172 61620 52.205.89.26 80 0 1 91.79
22604 curl 100.66.3.172 44400 52.204.43.121 80 0 1 121.38
22624 recordProg 127.0.0.1 46648 127.0.0.1 28527 0 0 0.22
3277 redis-serv 127.0.0.1 28527 127.0.0.1 46648 0 0 0.27
22647 recordProg 127.0.0.1 46650 127.0.0.1 28527 0 0 0.21
3277 redis-serv 127.0.0.1 28527 127.0.0.1 46650 0 0 0.26
[...]
TCP session lifespans with connection details
Networking: tcpsynbl (book)
# tcpsynbl.bt
Attaching 4 probes...
Tracing SYN backlog size. Ctrl-C to end.
^C
@backlog[backlog limit]: histogram of backlog size
@backlog[128]:
[0] 2 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
@backlog[500]:
[0] 2783 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[1] 9 | |
[2, 4) 4 | |
[4, 8) 1 | |
TCP SYN backlogs as histograms
Languages: funccount
# funccount.py 'tcp_s*'
Tracing 50 functions for "tcp_s*"... Hit Ctrl-C to end.
^C
FUNC COUNT
[...]
tcp_setsockopt 1839
tcp_shutdown 2690
tcp_sndbuf_expand 2862
tcp_send_delayed_ack 9457
tcp_set_state 10425
tcp_sync_mss 12529
tcp_sendmsg_locked 41012
tcp_sendmsg 41236
tcp_send_mss 42686
tcp_small_queue_check.isra.29 45724
tcp_schedule_loss_probe 64067
tcp_send_ack 66945
tcp_stream_memory_free 178616
Detaching...
Count native function calls (C, C++, Go, etc.)
Applications: mysqld_qslower
# mysqld_qslower.py $(pgrep mysqld)
Tracing MySQL server queries for PID 9908 slower than 1 ms...
TIME(s) PID MS QUERY
0.000000 9962 169.032 SELECT * FROM words WHERE word REGEXP '^bre.*n$'
1.962227 9962 205.787 SELECT * FROM words WHERE word REGEXP '^bpf.tools$'
9.043242 9962 95.276 SELECT COUNT(*) FROM words
23.723025 9962 186.680 SELECT count(*) AS count FROM words WHERE word REGEXP
'^bre.*n$'
30.343233 9962 181.494 SELECT * FROM words WHERE word REGEXP '^bre.*n$' ORDER BY word
[...]
MySQL queries slower than a threshold
Kernel: workq (book)
# workq.bt
Attaching 4 probes...
Tracing workqueue request latencies. Ctrl-C to end.
^C
@us[blk_mq_timeout_work]:
[1] 1 |@@ |
[2, 4) 11 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[4, 8) 18 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
@us[xfs_end_io]:
[1] 2 |@@@@@@@@ |
[2, 4) 6 |@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[4, 8) 6 |@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[8, 16) 12 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[16, 32) 12 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[32, 64) 3 |@@@@@@@@@@@@@ |
[...]
Work queue function execution times
Hypervisor: xenhyper (book)
# xenhyper.bt
Attaching 1 probe...
^C
@[mmu_update]: 44
@[update_va_mapping]: 78
@[mmuext_op]: 6473
@[stack_switch]: 23445
Count hypercalls from Xen PV guests
Containers: blkthrot (book)
# blkthrot.bt
Attaching 3 probes...
Tracing block I/O throttles by cgroup. Ctrl-C to end
^C
@notthrottled[1]: 506
@throttled[1]: 31
Count block I/O throttles by blk cgroup
That was only
14 out of
150+ tools
All are
open source
Not all 150+
tools shown here
Coping with so many BPF tools at Netflix
●
On Netflix servers, /apps/nflx-bpf-alltools has all the tools
—
BCC, bpftrace, my book, Netflix internal
—
Open source at: https://github.com/Netflix-Skunkworks/bpftoolkit
●
Latest tools are fetched & put in a hierarchy: cpu, disk, …
●
●
●
●
●
We are also building GUIs to front these tools
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Tool Development
Only one engineer at your company
needs to learn tool development
They can turn everyone’s ideas into tools
The Tracing Landscape, Dec 2019
Scope & Capability
Easeofuse
sysdig
perf
ftrace
C/BPF
stap
Stage of
Development
(my opinion)(brutal)(lessbrutal)
(alpha) (mature)
bcc/BPF
ply/BPF
Raw BPF
LTTng
(hist triggers,synthetic events)
recent changes
(many)
bpftrace
(eBPF)
(0.9.3)
bcc/BPF (C & Python)
bcc examples/tracing/bitehist.py
entire program
bpftrace/BPF
https://github.com/iovisor/bpftrace
entire program
bpftrace -e 'kr:vfs_read { @ = hist(retval); }'
bpftrace Syntax
bpftrace -e ‘k:do_nanosleep /pid > 100/ { @[comm]++ }’
Probe
Filter
(optional)
Action
Probe Type Shortcuts
tracepoint t Kernel static tracepoints
usdt U User-level statically defined tracing
kprobe k Kernel function tracing
kretprobe kr Kernel function returns
uprobe u User-level function tracing
uretprobe ur User-level function returns
profile p Timed sampling across all CPUs
interval i Interval output
software s Kernel software events
hardware h Processor hardware events
Filters
● /pid == 181/
● /comm != “sshd”/
● /@ts[tid]/
Actions
●
Per-event output
– printf()
– system()
– join()
– time()
●
Map summaries
– @ = count() or @++
– @ = hist()
– …
The following is in the https://github.com/iovisor/bpftrace/blob/master/docs/reference_guide.md
Functions
●
hist(n) Log2 histogram
●
lhist(n, min, max, step) Linear hist.
●
count() Count events
●
sum(n) Sum value
●
min(n) Minimum value
●
max(n) Maximum value
●
avg(n) Average value
●
stats(n) Statistics
●
str(s) String
●
ksym(p) Resolve kernel addr
●
usym(p) Resolve user addr
●
kaddr(n) Resolve kernel symbol
●
uaddr(n) Resolve user symbol
●
printf(fmt, ...) Print formatted
●
print(@x[, top[, div]]) Print map
●
delete(@x) Delete map element
●
clear(@x) Delete all keys/values
●
reg(n) Register lookup
●
join(a) Join string array
●
time(fmt) Print formatted time
●
system(fmt) Run shell command
●
cat(file) Print file contents
●
exit() Quit bpftrace
Variable Types
●
Basic Variables
– @global
– @thread_local[tid]
– $scratch
●
Associative Arrays
– @array[key] = value
●
Buitins
– pid
– ...
Builtin Variables
●
pid Process ID (kernel tgid)
●
tid Thread ID (kernel pid)
●
cgroup Current Cgroup ID
●
uid User ID
●
gid Group ID
●
nsecs Nanosecond timestamp
●
cpu Processor ID
●
comm Process name
●
kstack Kernel stack trace
●
ustack User stack trace
●
arg0, arg1, … Function args
●
retval Return value
●
args Tracepoint args
●
func Function name
●
probe Full probe name
●
curtask Curr task_struct (u64)
●
rand Random number (u32)
bpftrace: BPF observability front-end
Linux 4.9+, https://github.com/iovisor/bpftrace
# Files opened by process
bpftrace -e 't:syscalls:sys_enter_open { printf("%s %sn", comm,
str(args->filename)) }'
# Read size distribution by process
bpftrace -e 't:syscalls:sys_exit_read { @[comm] = hist(args->ret) }'
# Count VFS calls
bpftrace -e 'kprobe:vfs_* { @[func]++ }'
# Show vfs_read latency as a histogram
bpftrace -e 'k:vfs_read { @[tid] = nsecs }
kr:vfs_read /@[tid]/ { @ns = hist(nsecs - @[tid]); delete(@tid) }’
# Trace user-level function
bpftrace -e 'uretprobe:bash:readline { printf(“%sn”, str(retval)) }’
...
Example: bpftrace biolatency
# biolatency.bt
Attaching 3 probes...
Tracing block device I/O... Hit Ctrl-C to end.
^C
@usecs:
[256, 512) 2 | |
[512, 1K) 10 |@ |
[1K, 2K) 426 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[2K, 4K) 230 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[4K, 8K) 9 |@ |
[8K, 16K) 128 |@@@@@@@@@@@@@@@ |
[16K, 32K) 68 |@@@@@@@@ |
[32K, 64K) 0 | |
[64K, 128K) 0 | |
[128K, 256K) 10 |@ |
[...]
Disk I/O latency histograms, per second
#!/usr/local/bin/bpftrace
BEGIN
{
printf("Tracing block device I/O... Hit Ctrl-C to end.n");
}
kprobe:blk_account_io_start
{
@start[arg0] = nsecs;
}
kprobe:blk_account_io_done
/@start[arg0]/
{
@usecs = hist((nsecs - @start[arg0]) / 1000);
delete(@start[arg0]);
}
Example: bpftrace biolatency
Implemented in <20 lines of bpftrace
Netflix Vector (old)
Grafana at Netflix
Takeaways
Add BCC & bpftrace
packages to your servers
Start using BPF perf tools
directly or via GUIs
Identify 1+ engineer at your
company to develop
tools & GUIs
From: BPF Performance Tools: Linux System and Application
Observability, Brendan Gregg, Addison Wesley 2019
Thanks & URLs
BPF: Alexei Starovoitov, Daniel Borkmann, David S. Miller, Linus Torvalds, BPF community
BCC: Brenden Blanco, Yonghong Song, Sasha Goldsthein, BCC community
bpftrace: Alastair Robertson, Matheus Marchini, Dan Xu, bpftrace community
https://github.com/iovisor/bcc
https://github.com/iovisor/bpftrace
https://github.com/brendangregg/bpf-perf-tools-book
http://www.brendangregg.com/ebpf.html
http://www.brendangregg.com/bpf-performance-tools-book.html
All diagrams and photos (slides 11 & 22) are my own; slide 12 is from KernelRecipes: https://www.youtube.com/watch?v=bbHFg9IsTk8
Thank you!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Brendan Gregg
@brendangregg
bgregg@netflix.com
Please complete the session
survey in the mobile app.
!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Weitere ähnliche Inhalte

Was ist angesagt?

Namespaces and cgroups - the basis of Linux containers
Namespaces and cgroups - the basis of Linux containersNamespaces and cgroups - the basis of Linux containers
Namespaces and cgroups - the basis of Linux containersKernel TLV
 
Systems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting StartedSystems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting StartedBrendan Gregg
 
Linux kernel tracing
Linux kernel tracingLinux kernel tracing
Linux kernel tracingViller Hsiao
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsBrendan Gregg
 
USENIX ATC 2017: Visualizing Performance with Flame Graphs
USENIX ATC 2017: Visualizing Performance with Flame GraphsUSENIX ATC 2017: Visualizing Performance with Flame Graphs
USENIX ATC 2017: Visualizing Performance with Flame GraphsBrendan Gregg
 
Meet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracingMeet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracingViller Hsiao
 
Java Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame GraphsJava Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame GraphsBrendan Gregg
 
Velocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPFVelocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPFBrendan Gregg
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixBrendan Gregg
 
Analyze Virtual Machine Overhead Compared to Bare Metal with Tracing
Analyze Virtual Machine Overhead Compared to Bare Metal with TracingAnalyze Virtual Machine Overhead Compared to Bare Metal with Tracing
Analyze Virtual Machine Overhead Compared to Bare Metal with TracingScyllaDB
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceBrendan Gregg
 
Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)Brendan Gregg
 
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven RostedtKernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven RostedtAnne Nicolas
 
Linux BPF Superpowers
Linux BPF SuperpowersLinux BPF Superpowers
Linux BPF SuperpowersBrendan Gregg
 
High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringHigh-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringScyllaDB
 
Using eBPF for High-Performance Networking in Cilium
Using eBPF for High-Performance Networking in CiliumUsing eBPF for High-Performance Networking in Cilium
Using eBPF for High-Performance Networking in CiliumScyllaDB
 
Performance Tuning EC2 Instances
Performance Tuning EC2 InstancesPerformance Tuning EC2 Instances
Performance Tuning EC2 InstancesBrendan Gregg
 
Linux Crash Dump Capture and Analysis
Linux Crash Dump Capture and AnalysisLinux Crash Dump Capture and Analysis
Linux Crash Dump Capture and AnalysisPaul V. Novarese
 
Kernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixKernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixBrendan Gregg
 

Was ist angesagt? (20)

Namespaces and cgroups - the basis of Linux containers
Namespaces and cgroups - the basis of Linux containersNamespaces and cgroups - the basis of Linux containers
Namespaces and cgroups - the basis of Linux containers
 
Systems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting StartedSystems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting Started
 
Linux kernel tracing
Linux kernel tracingLinux kernel tracing
Linux kernel tracing
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
 
USENIX ATC 2017: Visualizing Performance with Flame Graphs
USENIX ATC 2017: Visualizing Performance with Flame GraphsUSENIX ATC 2017: Visualizing Performance with Flame Graphs
USENIX ATC 2017: Visualizing Performance with Flame Graphs
 
Meet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracingMeet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracing
 
Java Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame GraphsJava Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame Graphs
 
Velocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPFVelocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPF
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at Netflix
 
Analyze Virtual Machine Overhead Compared to Bare Metal with Tracing
Analyze Virtual Machine Overhead Compared to Bare Metal with TracingAnalyze Virtual Machine Overhead Compared to Bare Metal with Tracing
Analyze Virtual Machine Overhead Compared to Bare Metal with Tracing
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems Performance
 
Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)
 
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven RostedtKernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
 
Linux BPF Superpowers
Linux BPF SuperpowersLinux BPF Superpowers
Linux BPF Superpowers
 
High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringHigh-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uring
 
Using eBPF for High-Performance Networking in Cilium
Using eBPF for High-Performance Networking in CiliumUsing eBPF for High-Performance Networking in Cilium
Using eBPF for High-Performance Networking in Cilium
 
Performance Tuning EC2 Instances
Performance Tuning EC2 InstancesPerformance Tuning EC2 Instances
Performance Tuning EC2 Instances
 
Linux Crash Dump Capture and Analysis
Linux Crash Dump Capture and AnalysisLinux Crash Dump Capture and Analysis
Linux Crash Dump Capture and Analysis
 
Kernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixKernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at Netflix
 

Ähnlich wie BPF Performance Analysis at Netflix

Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...
Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...
Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...Anne Nicolas
 
Kernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPFKernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPFBrendan Gregg
 
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPFOSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPFBrendan Gregg
 
USENIX ATC 2017 Performance Superpowers with Enhanced BPF
USENIX ATC 2017 Performance Superpowers with Enhanced BPFUSENIX ATC 2017 Performance Superpowers with Enhanced BPF
USENIX ATC 2017 Performance Superpowers with Enhanced BPFBrendan Gregg
 
Debugging linux issues with eBPF
Debugging linux issues with eBPFDebugging linux issues with eBPF
Debugging linux issues with eBPFIvan Babrou
 
ATO Linux Performance 2018
ATO Linux Performance 2018ATO Linux Performance 2018
ATO Linux Performance 2018Brendan Gregg
 
bcc/BPF tools - Strategy, current tools, future challenges
bcc/BPF tools - Strategy, current tools, future challengesbcc/BPF tools - Strategy, current tools, future challenges
bcc/BPF tools - Strategy, current tools, future challengesIO Visor Project
 
Linux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPFLinux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPFBrendan Gregg
 
eBPF Perf Tools 2019
eBPF Perf Tools 2019eBPF Perf Tools 2019
eBPF Perf Tools 2019Brendan Gregg
 
LSFMM 2019 BPF Observability
LSFMM 2019 BPF ObservabilityLSFMM 2019 BPF Observability
LSFMM 2019 BPF ObservabilityBrendan Gregg
 
Security Monitoring with eBPF
Security Monitoring with eBPFSecurity Monitoring with eBPF
Security Monitoring with eBPFAlex Maestretti
 
Percona Live UK 2014 Part III
Percona Live UK 2014  Part IIIPercona Live UK 2014  Part III
Percona Live UK 2014 Part IIIAlkin Tezuysal
 
Oracle Basics and Architecture
Oracle Basics and ArchitectureOracle Basics and Architecture
Oracle Basics and ArchitectureSidney Chen
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016Brendan Gregg
 
Fine grained monitoring
Fine grained monitoringFine grained monitoring
Fine grained monitoringIben Rodriguez
 
Debugging Ruby
Debugging RubyDebugging Ruby
Debugging RubyAman Gupta
 

Ähnlich wie BPF Performance Analysis at Netflix (20)

Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...
Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...
Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...
 
Kernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPFKernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPF
 
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPFOSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
 
USENIX ATC 2017 Performance Superpowers with Enhanced BPF
USENIX ATC 2017 Performance Superpowers with Enhanced BPFUSENIX ATC 2017 Performance Superpowers with Enhanced BPF
USENIX ATC 2017 Performance Superpowers with Enhanced BPF
 
Debugging linux issues with eBPF
Debugging linux issues with eBPFDebugging linux issues with eBPF
Debugging linux issues with eBPF
 
ATO Linux Performance 2018
ATO Linux Performance 2018ATO Linux Performance 2018
ATO Linux Performance 2018
 
bcc/BPF tools - Strategy, current tools, future challenges
bcc/BPF tools - Strategy, current tools, future challengesbcc/BPF tools - Strategy, current tools, future challenges
bcc/BPF tools - Strategy, current tools, future challenges
 
BPF Tools 2017
BPF Tools 2017BPF Tools 2017
BPF Tools 2017
 
Linux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPFLinux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPF
 
eBPF Perf Tools 2019
eBPF Perf Tools 2019eBPF Perf Tools 2019
eBPF Perf Tools 2019
 
test
testtest
test
 
LSFMM 2019 BPF Observability
LSFMM 2019 BPF ObservabilityLSFMM 2019 BPF Observability
LSFMM 2019 BPF Observability
 
Security Monitoring with eBPF
Security Monitoring with eBPFSecurity Monitoring with eBPF
Security Monitoring with eBPF
 
Percona Live UK 2014 Part III
Percona Live UK 2014  Part IIIPercona Live UK 2014  Part III
Percona Live UK 2014 Part III
 
Oracle Basics and Architecture
Oracle Basics and ArchitectureOracle Basics and Architecture
Oracle Basics and Architecture
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
 
Fine grained monitoring
Fine grained monitoringFine grained monitoring
Fine grained monitoring
 
Debugging Ruby
Debugging RubyDebugging Ruby
Debugging Ruby
 
4 Sessions
4 Sessions4 Sessions
4 Sessions
 
C&C Botnet Factory
C&C Botnet FactoryC&C Botnet Factory
C&C Botnet Factory
 

Mehr von Brendan Gregg

UM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of SoftwareUM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of SoftwareBrendan Gregg
 
LPC2019 BPF Tracing Tools
LPC2019 BPF Tracing ToolsLPC2019 BPF Tracing Tools
LPC2019 BPF Tracing ToolsBrendan Gregg
 
YOW2018 CTO Summit: Working at netflix
YOW2018 CTO Summit: Working at netflixYOW2018 CTO Summit: Working at netflix
YOW2018 CTO Summit: Working at netflixBrendan Gregg
 
NetConf 2018 BPF Observability
NetConf 2018 BPF ObservabilityNetConf 2018 BPF Observability
NetConf 2018 BPF ObservabilityBrendan Gregg
 
Linux Performance 2018 (PerconaLive keynote)
Linux Performance 2018 (PerconaLive keynote)Linux Performance 2018 (PerconaLive keynote)
Linux Performance 2018 (PerconaLive keynote)Brendan Gregg
 
How Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceHow Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceBrendan Gregg
 
LISA17 Container Performance Analysis
LISA17 Container Performance AnalysisLISA17 Container Performance Analysis
LISA17 Container Performance AnalysisBrendan Gregg
 
EuroBSDcon 2017 System Performance Analysis Methodologies
EuroBSDcon 2017 System Performance Analysis MethodologiesEuroBSDcon 2017 System Performance Analysis Methodologies
EuroBSDcon 2017 System Performance Analysis MethodologiesBrendan Gregg
 
Container Performance Analysis
Container Performance AnalysisContainer Performance Analysis
Container Performance AnalysisBrendan Gregg
 

Mehr von Brendan Gregg (10)

UM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of SoftwareUM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of Software
 
LPC2019 BPF Tracing Tools
LPC2019 BPF Tracing ToolsLPC2019 BPF Tracing Tools
LPC2019 BPF Tracing Tools
 
YOW2018 CTO Summit: Working at netflix
YOW2018 CTO Summit: Working at netflixYOW2018 CTO Summit: Working at netflix
YOW2018 CTO Summit: Working at netflix
 
NetConf 2018 BPF Observability
NetConf 2018 BPF ObservabilityNetConf 2018 BPF Observability
NetConf 2018 BPF Observability
 
FlameScope 2018
FlameScope 2018FlameScope 2018
FlameScope 2018
 
Linux Performance 2018 (PerconaLive keynote)
Linux Performance 2018 (PerconaLive keynote)Linux Performance 2018 (PerconaLive keynote)
Linux Performance 2018 (PerconaLive keynote)
 
How Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceHow Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for Performance
 
LISA17 Container Performance Analysis
LISA17 Container Performance AnalysisLISA17 Container Performance Analysis
LISA17 Container Performance Analysis
 
EuroBSDcon 2017 System Performance Analysis Methodologies
EuroBSDcon 2017 System Performance Analysis MethodologiesEuroBSDcon 2017 System Performance Analysis Methodologies
EuroBSDcon 2017 System Performance Analysis Methodologies
 
Container Performance Analysis
Container Performance AnalysisContainer Performance Analysis
Container Performance Analysis
 

Kürzlich hochgeladen

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 

Kürzlich hochgeladen (20)

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 

BPF Performance Analysis at Netflix

  • 1.
  • 2. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. BPF Performance Analysis at Netflix Brendan Gregg O P N 3 0 3 Senior Performance Architect Netflix
  • 3. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Superpowers Demo
  • 4. Agenda Why BPF Is Changing Linux BPF Internals Performance Analysis Tool Development
  • 5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Why BPF Is Changing Linux
  • 8. Kernel User-mode Applications System Calls Hardware Modern Linux: a new OS model Kernel-mode Applications (BPF) BPF Helper Calls
  • 9. 50 years, one process state model SwappingKernel User Runnable Wait Block Sleep Idle schedule resource I/O acquire lock sleep wait for work Off-CPU On-CPU wakeup acquired wakeup work arrives preemption or time quantum expired swap out swap in Linux groups most sleep states
  • 10. BPF uses a new program state model Loaded Enabled event fires program ended Off-CPU On-CPU BPF attach Kernel helpers Spinning spin lock
  • 12. Kernel Recipes 2019, Alexei Starovoitov ~40 active BPF programs on every Facebook server
  • 13. >150K Amazon EC2 server instances ~34% US Internet traffic at night >130M subscribers ~14 active BPF programs on every instance (so far)
  • 14. Kernel User-mode Applications Hardware Events (incl. clock) Modern Linux: Event-based Applications Kernel-mode Applications (BPF) Scheduler Kernel Events U.E.
  • 15. Smaller Kernel User-mode Applications Hardware Modern Linux is becoming microkernel-ish Kernel-mode Services & Drivers BPF BPF BPF The word “microkernel” has already been invoked by Jonathan Corbet, Thomas Graf, Greg Kroah-Hartman, ...
  • 16. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. BPF Internals
  • 17. BPF 1992: Berkeley Packet Filter A limited virtual machine for efficient packet filters # tcpdump -d host 127.0.0.1 and port 80 (000) ldh [12] (001) jeq #0x800 jt 2 jf 18 (002) ld [26] (003) jeq #0x7f000001 jt 6 jf 4 (004) ld [30] (005) jeq #0x7f000001 jt 6 jf 18 (006) ldb [23] (007) jeq #0x84 jt 10 jf 8 (008) jeq #0x6 jt 10 jf 9 (009) jeq #0x11 jt 10 jf 18 (010) ldh [20] (011) jset #0x1fff jt 18 jf 12 (012) ldxb 4*([14]&0xf) (013) ldh [x + 14] (014) jeq #0x50 jt 17 jf 15 (015) ldh [x + 16] (016) jeq #0x50 jt 17 jf 18 (017) ret #262144 (018) ret #0
  • 18. BPF 2019: aka extended BPF bpftrace BPF Microconference XDP & Facebook Katran, Google KRSI, Netflix flowsrus, and many more bpfconf
  • 19. BPF 2019 Kernel kprobes uprobes tracepoints sockets SDN Configuration User-Defined BPF Programs … Event TargetsRuntime perf_events BPF actions BPF verifier DDoS Mitigation Intrusion Detection Container Security Observability Firewalls Device Drivers
  • 20. BPF is open source and in the Linux kernel (you’re all getting it) BPF is also now a technology name, and no longer an acronym
  • 21. BPF Internals 11 Registers Map Storage (Mbytes) Machine Code Execution BPF Helpers JIT Compiler BPF Instructions Rest of Kernel Events BPF Context Verifier Interpreter
  • 22. Is BPF Turing complete?
  • 23. BPF: a new type of software Execution model User- defined Compile Security Failure mode Resource access User task yes any user- based abort syscall, fault Kernel task no static none panic direct BPF event yes JIT, CO-RE verified, JIT error message restricted helpers
  • 24. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Performance Analysis
  • 25. BPF enables a new class of custom, efficient, and production-safe performance analysis tools
  • 27. Tool Examples by Subsystem 1. CPUs (scheduling) 2. Memory 3. Disks 4. File Systems 5. Networking 6. Languages 7. Applications 8. Kernel 9. Hypervisors 10. Containers
  • 28. Tool Extensions & Sources .py: BCC (Python) .bt: bpftrace (some tools exist for both) https://github.com/iovisor/bcc https://github.com/iovisor/bpftrace https://github.com/brendangregg/bpf-perf-tools-book
  • 29. CPUs: execsnoop # execsnoop.py -T TIME(s) PCOMM PID PPID RET ARGS 0.506 run 8745 1828 0 ./run 0.507 bash 8745 1828 0 /bin/bash 0.511 svstat 8747 8746 0 /command/svstat /service/nflx-httpd 0.511 perl 8748 8746 0 /usr/bin/perl -e $l=<>;$l=~/(d+) sec/;pr... 0.514 ps 8750 8749 0 /bin/ps --ppid 1 -o pid,cmd,args 0.514 grep 8751 8749 0 /bin/grep org.apache.catalina 0.514 sed 8752 8749 0 /bin/sed s/^ *//; 0.515 xargs 8754 8749 0 /usr/bin/xargs 0.515 cut 8753 8749 0 /usr/bin/cut -d -f 1 0.523 echo 8755 8754 0 /bin/echo 0.524 mkdir 8756 8745 0 /bin/mkdir -v -p /data/tomcat [...] 1.528 run 8785 1828 0 ./run 1.529 bash 8785 1828 0 /bin/bash 1.533 svstat 8787 8786 0 /command/svstat /service/nflx-httpd 1.533 perl 8788 8786 0 /usr/bin/perl -e $l=<>;$l=~/(d+) sec/;pr... [...] New process trace
  • 30. CPUs: runqlat # runqlat.py 10 1 Tracing run queue latency... Hit Ctrl-C to end. usecs : count distribution 0 -> 1 : 1906 |*** | 2 -> 3 : 22087 |****************************************| 4 -> 7 : 21245 |************************************** | 8 -> 15 : 7333 |************* | 16 -> 31 : 4902 |******** | 32 -> 63 : 6002 |********** | 64 -> 127 : 7370 |************* | 128 -> 255 : 13001 |*********************** | 256 -> 511 : 4823 |******** | 512 -> 1023 : 1519 |** | 1024 -> 2047 : 3682 |****** | 2048 -> 4095 : 3170 |***** | 4096 -> 8191 : 5759 |********** | 8192 -> 16383 : 14549 |************************** | 16384 -> 32767 : 5589 |********** | Scheduler latency (run queue latency)
  • 31. CPUs: runqlen # runqlen.py 10 1 Sampling run queue length... Hit Ctrl-C to end. runqlen : count distribution 0 : 47284 |****************************************| 1 : 211 | | 2 : 28 | | 3 : 6 | | 4 : 4 | | 5 : 1 | | 6 : 1 | | Run queue length
  • 32. Memory: ffaults (book) # ffaults.bt Attaching 1 probe... ^C [...] @[dpkg]: 18 @[sudoers.so]: 19 @[ld.so.cache]: 27 @[libpthread-2.27.so]: 29 @[ld-2.27.so]: 32 @[locale-archive]: 34 @[system.journal]: 39 @[libstdc++.so.6.0.25]: 43 @[libapt-pkg.so.5.0.2]: 47 @[BrowserMetrics-5D8A6422-77F1.pma]: 86 @[libc-2.27.so]: 168 @[i915]: 409 @[pkgcache.bin]: 860 @[]: 25038 Page faults by filename
  • 33. Disks: biolatency # biolatency.py -mT 1 5 Tracing block device I/O... Hit Ctrl-C to end. 06:20:16 msecs : count distribution 0 -> 1 : 36 |**************************************| 2 -> 3 : 1 |* | 4 -> 7 : 3 |*** | 8 -> 15 : 17 |***************** | 16 -> 31 : 33 |********************************** | 32 -> 63 : 7 |******* | 64 -> 127 : 6 |****** | 06:20:17 msecs : count distribution 0 -> 1 : 96 |************************************ | 2 -> 3 : 25 |********* | 4 -> 7 : 29 |*********** | [...] Disk I/O latency histograms, per second
  • 34. File Systems: xfsslower # xfsslower.py 50 Tracing XFS operations slower than 50 ms TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME 21:20:46 java 112789 R 8012 13925 60.16 file.out 21:20:47 java 112789 R 3571 4268 136.60 file.out 21:20:49 java 112789 R 5152 1780 63.88 file.out 21:20:52 java 112789 R 5214 12434 108.47 file.out 21:20:52 java 112789 R 7465 19379 58.09 file.out 21:20:54 java 112789 R 5326 12311 89.14 file.out 21:20:55 java 112789 R 4336 3051 67.89 file.out [...] 22:02:39 java 112789 R 65536 1486748 182.10 shuffle_6_646_0.data 22:02:39 java 112789 R 65536 872492 30.10 shuffle_6_646_0.data 22:02:39 java 112789 R 65536 1113896 309.52 shuffle_6_646_0.data 22:02:39 java 112789 R 65536 1481020 400.31 shuffle_6_646_0.data 22:02:39 java 112789 R 65536 1415232 324.92 shuffle_6_646_0.data 22:02:39 java 112789 R 65536 1147912 119.37 shuffle_6_646_0.data [...] XFS I/O slower than a threshold (variants for ext4, btrfs, zfs)
  • 35. File Systems: xfsdist # xfsdist.py 60 Tracing XFS operation latency... Hit Ctrl-C to end. 22:41:24: operation = 'read' usecs : count distribution 0 -> 1 : 382130 |****************************************| 2 -> 3 : 85717 |******** | 4 -> 7 : 23639 |** | 8 -> 15 : 5668 | | 16 -> 31 : 3594 | | 32 -> 63 : 21387 |** | [...] operation = 'write' usecs : count distribution 0 -> 1 : 12925 |***** | 2 -> 3 : 83375 |************************************* | [...] XFS I/O latency histograms, by operation
  • 36. Networking: tcplife # tcplife.py PID COMM LADDR LPORT RADDR RPORT TX_KB RX_KB MS 22597 recordProg 127.0.0.1 46644 127.0.0.1 28527 0 0 0.23 3277 redis-serv 127.0.0.1 28527 127.0.0.1 46644 0 0 0.28 22598 curl 100.66.3.172 61620 52.205.89.26 80 0 1 91.79 22604 curl 100.66.3.172 44400 52.204.43.121 80 0 1 121.38 22624 recordProg 127.0.0.1 46648 127.0.0.1 28527 0 0 0.22 3277 redis-serv 127.0.0.1 28527 127.0.0.1 46648 0 0 0.27 22647 recordProg 127.0.0.1 46650 127.0.0.1 28527 0 0 0.21 3277 redis-serv 127.0.0.1 28527 127.0.0.1 46650 0 0 0.26 [...] TCP session lifespans with connection details
  • 37. Networking: tcpsynbl (book) # tcpsynbl.bt Attaching 4 probes... Tracing SYN backlog size. Ctrl-C to end. ^C @backlog[backlog limit]: histogram of backlog size @backlog[128]: [0] 2 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| @backlog[500]: [0] 2783 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [1] 9 | | [2, 4) 4 | | [4, 8) 1 | | TCP SYN backlogs as histograms
  • 38. Languages: funccount # funccount.py 'tcp_s*' Tracing 50 functions for "tcp_s*"... Hit Ctrl-C to end. ^C FUNC COUNT [...] tcp_setsockopt 1839 tcp_shutdown 2690 tcp_sndbuf_expand 2862 tcp_send_delayed_ack 9457 tcp_set_state 10425 tcp_sync_mss 12529 tcp_sendmsg_locked 41012 tcp_sendmsg 41236 tcp_send_mss 42686 tcp_small_queue_check.isra.29 45724 tcp_schedule_loss_probe 64067 tcp_send_ack 66945 tcp_stream_memory_free 178616 Detaching... Count native function calls (C, C++, Go, etc.)
  • 39. Applications: mysqld_qslower # mysqld_qslower.py $(pgrep mysqld) Tracing MySQL server queries for PID 9908 slower than 1 ms... TIME(s) PID MS QUERY 0.000000 9962 169.032 SELECT * FROM words WHERE word REGEXP '^bre.*n$' 1.962227 9962 205.787 SELECT * FROM words WHERE word REGEXP '^bpf.tools$' 9.043242 9962 95.276 SELECT COUNT(*) FROM words 23.723025 9962 186.680 SELECT count(*) AS count FROM words WHERE word REGEXP '^bre.*n$' 30.343233 9962 181.494 SELECT * FROM words WHERE word REGEXP '^bre.*n$' ORDER BY word [...] MySQL queries slower than a threshold
  • 40. Kernel: workq (book) # workq.bt Attaching 4 probes... Tracing workqueue request latencies. Ctrl-C to end. ^C @us[blk_mq_timeout_work]: [1] 1 |@@ | [2, 4) 11 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [4, 8) 18 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| @us[xfs_end_io]: [1] 2 |@@@@@@@@ | [2, 4) 6 |@@@@@@@@@@@@@@@@@@@@@@@@@@ | [4, 8) 6 |@@@@@@@@@@@@@@@@@@@@@@@@@@ | [8, 16) 12 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [16, 32) 12 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [32, 64) 3 |@@@@@@@@@@@@@ | [...] Work queue function execution times
  • 41. Hypervisor: xenhyper (book) # xenhyper.bt Attaching 1 probe... ^C @[mmu_update]: 44 @[update_va_mapping]: 78 @[mmuext_op]: 6473 @[stack_switch]: 23445 Count hypercalls from Xen PV guests
  • 42. Containers: blkthrot (book) # blkthrot.bt Attaching 3 probes... Tracing block I/O throttles by cgroup. Ctrl-C to end ^C @notthrottled[1]: 506 @throttled[1]: 31 Count block I/O throttles by blk cgroup
  • 43. That was only 14 out of 150+ tools All are open source Not all 150+ tools shown here
  • 44. Coping with so many BPF tools at Netflix ● On Netflix servers, /apps/nflx-bpf-alltools has all the tools — BCC, bpftrace, my book, Netflix internal — Open source at: https://github.com/Netflix-Skunkworks/bpftoolkit ● Latest tools are fetched & put in a hierarchy: cpu, disk, … ● ● ● ● ● We are also building GUIs to front these tools
  • 45. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Tool Development
  • 46. Only one engineer at your company needs to learn tool development They can turn everyone’s ideas into tools
  • 47. The Tracing Landscape, Dec 2019 Scope & Capability Easeofuse sysdig perf ftrace C/BPF stap Stage of Development (my opinion)(brutal)(lessbrutal) (alpha) (mature) bcc/BPF ply/BPF Raw BPF LTTng (hist triggers,synthetic events) recent changes (many) bpftrace (eBPF) (0.9.3)
  • 48. bcc/BPF (C & Python) bcc examples/tracing/bitehist.py entire program
  • 50. bpftrace Syntax bpftrace -e ‘k:do_nanosleep /pid > 100/ { @[comm]++ }’ Probe Filter (optional) Action
  • 51. Probe Type Shortcuts tracepoint t Kernel static tracepoints usdt U User-level statically defined tracing kprobe k Kernel function tracing kretprobe kr Kernel function returns uprobe u User-level function tracing uretprobe ur User-level function returns profile p Timed sampling across all CPUs interval i Interval output software s Kernel software events hardware h Processor hardware events
  • 52. Filters ● /pid == 181/ ● /comm != “sshd”/ ● /@ts[tid]/
  • 53. Actions ● Per-event output – printf() – system() – join() – time() ● Map summaries – @ = count() or @++ – @ = hist() – … The following is in the https://github.com/iovisor/bpftrace/blob/master/docs/reference_guide.md
  • 54. Functions ● hist(n) Log2 histogram ● lhist(n, min, max, step) Linear hist. ● count() Count events ● sum(n) Sum value ● min(n) Minimum value ● max(n) Maximum value ● avg(n) Average value ● stats(n) Statistics ● str(s) String ● ksym(p) Resolve kernel addr ● usym(p) Resolve user addr ● kaddr(n) Resolve kernel symbol ● uaddr(n) Resolve user symbol ● printf(fmt, ...) Print formatted ● print(@x[, top[, div]]) Print map ● delete(@x) Delete map element ● clear(@x) Delete all keys/values ● reg(n) Register lookup ● join(a) Join string array ● time(fmt) Print formatted time ● system(fmt) Run shell command ● cat(file) Print file contents ● exit() Quit bpftrace
  • 55. Variable Types ● Basic Variables – @global – @thread_local[tid] – $scratch ● Associative Arrays – @array[key] = value ● Buitins – pid – ...
  • 56. Builtin Variables ● pid Process ID (kernel tgid) ● tid Thread ID (kernel pid) ● cgroup Current Cgroup ID ● uid User ID ● gid Group ID ● nsecs Nanosecond timestamp ● cpu Processor ID ● comm Process name ● kstack Kernel stack trace ● ustack User stack trace ● arg0, arg1, … Function args ● retval Return value ● args Tracepoint args ● func Function name ● probe Full probe name ● curtask Curr task_struct (u64) ● rand Random number (u32)
  • 57. bpftrace: BPF observability front-end Linux 4.9+, https://github.com/iovisor/bpftrace # Files opened by process bpftrace -e 't:syscalls:sys_enter_open { printf("%s %sn", comm, str(args->filename)) }' # Read size distribution by process bpftrace -e 't:syscalls:sys_exit_read { @[comm] = hist(args->ret) }' # Count VFS calls bpftrace -e 'kprobe:vfs_* { @[func]++ }' # Show vfs_read latency as a histogram bpftrace -e 'k:vfs_read { @[tid] = nsecs } kr:vfs_read /@[tid]/ { @ns = hist(nsecs - @[tid]); delete(@tid) }’ # Trace user-level function bpftrace -e 'uretprobe:bash:readline { printf(“%sn”, str(retval)) }’ ...
  • 58. Example: bpftrace biolatency # biolatency.bt Attaching 3 probes... Tracing block device I/O... Hit Ctrl-C to end. ^C @usecs: [256, 512) 2 | | [512, 1K) 10 |@ | [1K, 2K) 426 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [2K, 4K) 230 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [4K, 8K) 9 |@ | [8K, 16K) 128 |@@@@@@@@@@@@@@@ | [16K, 32K) 68 |@@@@@@@@ | [32K, 64K) 0 | | [64K, 128K) 0 | | [128K, 256K) 10 |@ | [...] Disk I/O latency histograms, per second
  • 59. #!/usr/local/bin/bpftrace BEGIN { printf("Tracing block device I/O... Hit Ctrl-C to end.n"); } kprobe:blk_account_io_start { @start[arg0] = nsecs; } kprobe:blk_account_io_done /@start[arg0]/ { @usecs = hist((nsecs - @start[arg0]) / 1000); delete(@start[arg0]); } Example: bpftrace biolatency Implemented in <20 lines of bpftrace
  • 62. Takeaways Add BCC & bpftrace packages to your servers Start using BPF perf tools directly or via GUIs Identify 1+ engineer at your company to develop tools & GUIs From: BPF Performance Tools: Linux System and Application Observability, Brendan Gregg, Addison Wesley 2019
  • 63. Thanks & URLs BPF: Alexei Starovoitov, Daniel Borkmann, David S. Miller, Linus Torvalds, BPF community BCC: Brenden Blanco, Yonghong Song, Sasha Goldsthein, BCC community bpftrace: Alastair Robertson, Matheus Marchini, Dan Xu, bpftrace community https://github.com/iovisor/bcc https://github.com/iovisor/bpftrace https://github.com/brendangregg/bpf-perf-tools-book http://www.brendangregg.com/ebpf.html http://www.brendangregg.com/bpf-performance-tools-book.html All diagrams and photos (slides 11 & 22) are my own; slide 12 is from KernelRecipes: https://www.youtube.com/watch?v=bbHFg9IsTk8
  • 64. Thank you! © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Brendan Gregg @brendangregg bgregg@netflix.com
  • 65. Please complete the session survey in the mobile app. ! © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.