SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Downloaden Sie, um offline zu lesen
Receive side scaling (RSS) with
eBPF in QEMU and virtio-net
Yan Vugenfirer - CEO, Daynix
Agenda
• What is RSS?


• History: RSS and virtio-net


• What is eBPF?


• Using eBPF for packet steering (RSS) in
virtio-net
What is RSS?
• Receive side scaling - distribution of packets’ processing
among CPUs


• A NIC uses a hashing function to compute a hash value
over a defined area


• Hash value is used to index an indirection table


• The values in the indirection table are used to assign the
received data to a CPU


• With MSI support, a NIC can also interrupt the associated
CPU
CPU1 CPU2 CPU3
CPU0
What is RSS?
NIC
RX	handling RX	handling RX	handling RX	handling
NIC	Driver
HW	
queue
HW	
queue
HW	
queue
HW	
queue
Packet	classification
What is RSS?
Packet	
header
Hash	
function	
(Toeplitz)
QueueCPU	
number
Redirection	table	(up	to	128	entries)
History: RSS and virtio-net
• Let’s use RSS with virtio-net!


• Utilisation CPUs for packet processing


• Cash locality for network applications


• Microsoft WHQL requirement for high speed
devices
History: RSS and virtio-net
• No multi-queue in virtio


• SW implementation in Windows guest driver (similar to RFS in Linux)
CPU1 CPU2 CPU3
CPU0
virtio-net
RX	
handling
RX	
handling
RX	
handling
NetKVM	
RX	virt-queue
Packet	classification
RX	handling
History: RSS and virtio-net
• virtio-net became multi queue device


• Due to Windows requirements - hybrid
model. Interrupt received on specific CPU
core, but could be rescheduled to another


• Works good for TCP


• Might not work for UDP


• Support legacy interrupts for old OSes
History: RSS and virtio-net
CPU1
CPU0
virtio-net
NetKVM	
RX	virt-queue
Packet	classification
RX	handling
RX	virt-queue
Packet	classification
RX	handling
History: RSS and virtio-net
• virtio spec changes


• Set steering mode


• Pass the device redirection tables


• Set hash value in virtio-net-hdr


• No inter-processor interrupts due to re-scheduling


• Vision: HW will do all the heavy work


• Implementations


• SW only POC in QEMU


• eBPF
virtio spec changes - capabilities
• VIRTIO_NET_F_RSS


• VIRTIO_NET_F_MQ must be set
virtio spec changes - device configuration
• virtio_net_config
struct virtio_net_con
fi
g
{

u8 mac[6]
;

le16 status
;

le16 max_virtqueue_pairs
;

le16 mtu
;

le32 speed
;

u8 duplex
;

u8 rss_max_key_size
;

le16 rss_max_indirection_table_length
;

le32 supported_hash_types
;

};
virtio spec changes - setting RSS parameters
• VIRTIO_NET_CTRL_MQ_RSS_CONFIG
struct virtio_net_rss_con
fi
g
{

le32 hash_types
;

le16 indirection_table_mask
;

le16 unclassi
fi
ed_queue
;

le16 indirection_table[indirection_table_length]
;

le16 max_tx_vq
;

u8 hash_key_length
;

u8 hash_key_data[hash_key_length]
;

};
virtio spec changes - virtio-net-hdr
struct virtio_net_hdr
{

u8
fl
ags
;

u8 gso_type
;

le16 hdr_len
;

le16 gso_size
;

le16 csum_start
;

le16 csum_offset
;

le16 num_buffers
;

le32 hash_value; (Only if
VIRTIO_NET_F_HASH_REPORT negotiated
)

le16 hash_report; (Only if
VIRTIO_NET_F_HASH_REPORT negotiated
)

le16 padding_reserved; (Only if
VIRTIO_NET_F_HASH_REPORT negotiated
)

};
What is eBPF?
• Enable running sandboxed code in Linux
kernel


• The code can be loaded at run time


• Used for security, tracing, networking,
observability
How can eBPF help us?
• Calculate the RSS hash and return the
queue index for incoming packets


• Populate the hash value in virtio_net_hdr
(work in progress)
The “magic”
• Loading eBPF program using IOCTL
TUNSETSTEERINGEBPF


• tun_struct has steering_prog field


• If eBPF program for steering is loaded,
tun_select_queue will call it with
bpf_prog_run_clear_cb
Hash population (work in progress)
• Population from eBPF program


• virtio_net_hdr with additional fields


• Work in progress in kernel


• Enlarge virtio_net_hdr in all kernel
modules


• Keep calculated hash in SKB and copy
it to virtio_net_hdr
eBPF program source in QEMU
• tun_rss_steering_prog


• tools/ebpf/rss.bpf.c


• Use clang to compile


• tools/ebpf/Makefile.ebpf
eBPF program skeleton
• During QEMU compilation include file is populated
with the compiled binary


• bpftool gen skeleton rss.bpf.o > rss.bpf.skeleton.h


• Helpers to initialise maps (mechanism to share data
between eBPF program and kerneluserspace)


• Some changes to support libvirt - mmapping the
shared data structure to user space (3 maps in
current main branch without mmaping, 1 map in
pending patches)
Configuration map
• The configuration map is BPF array map that
contains everything required for RSS:


• Supported hash flows: IPv4, TCPv4, UDPv4,
IPv6, IPv6ex, TCPv6, UDPv6


• Indirections table size (max 128)


• Default queue


• Toeplitz hash key - 40 bytes


• Indirections table - 128 entries
Loading eBPF program
• Two mechanisms


• QEMU using function in skeleton file.
Calling bpf syscall


• eBPF helper program (with libvirt) -
QEMU gets file descriptors from libvirt
with already loaded ebpf program and
mapping of the ebpf map (patches under
review)
Loading eBPF program
• Possible load failures


• Kernel support. Current solution requires 5.8+


• Without helper


• QEMU process capabilities: CAP_BPF,
CAP_NET_ADMIN


• sysctl kernel.unprivileged_bpf_disabled=1


• libbpf not present


• In case of helper usage - mismatch between helper and
QEMU


• Stamp is a hash of skeleton include file
Fallback
• Built it QEMU RSS steering


• Can be triggered also by live migration


• Hash population is enabled in QEMU
command line, because there is still not
hash population from eBPF program
Live migration
• Known issue: migrating to old kernel


• eBPF load failure


• Fallback to in-QEMU RSS steering
QEMU command line
• Multi-queue should be enabled


• -smp with vCPU for each queue-pair


• -device virtio-net-pci,
rss=on,hash=on,ebpf_rss_fds=<fd0,fd1>
QEMU command line
• rss=on


• Try to load eBPF from skeleton or by using
provided file descriptors


• Fallback to “built-in” RSS steering in QEMU if
cannot load eBPF program


• hash=on


• Populate hash in virtio_net_hdr


• ebpf_rss_fds - optional, provide file descriptors for
eBPF program and map
libvirt integration
• QEMU should run with least possible privileges


• eBPF helper


• Stamping the helper during compilation time


• Redirection table mapping


• Additional command line options to provide file
descriptors to QEMU


• Patches under review
Current status
• Initial support was merged to QEMU


• libvirt integration patches in QEMU and
libvirt are under discussion on mailing lists


• Hash population by eBPF program -
pending additional work for next set of
patches
Pending patches
• QEMU libvirt integration: https://lists.nongnu.org/
archive/html/qemu-devel/2021-07/msg03535.html


• libvirt patches: https://listman.redhat.com/archives/
libvir-list/2021-July/msg00836.html


• RSS support in Linux virtio-net driver: https://
lists.linuxfoundation.org/pipermail/virtualization/
2021-August/055940.html


• In kernel hash calculation reporting to guest driver:
https://lkml.org/lkml/2021/1/12/1329
virtio-net and eBPF future
• Packet filtering with vhost


• Security?
yan@daynix.com
Links
• https://www.kernel.org/doc/Documentation/networking/scaling.txt


• https://docs.microsoft.com/en-us/windows-hardware/drivers/network/
introduction-to-receive-side-scaling


• https://ebpf.io


• https://docs.microsoft.com/en-us/windows-hardware/drivers/network/rss-
with-a-single-hardware-receive-queue


• https://docs.microsoft.com/en-us/windows-hardware/drivers/network/rss-
with-hardware-queuing


• https://docs.microsoft.com/en-us/windows-hardware/drivers/network/rss-
with-message-signaled-interrupts


• https://docs.microsoft.com/en-us/windows-hardware/drivers/network/rss-
hashing-functions

Weitere ähnliche Inhalte

Was ist angesagt?

VXLAN BGP EVPN: Technology Building Blocks
VXLAN BGP EVPN: Technology Building BlocksVXLAN BGP EVPN: Technology Building Blocks
VXLAN BGP EVPN: Technology Building BlocksAPNIC
 
20111015 勉強会 (PCIe / SR-IOV)
20111015 勉強会 (PCIe / SR-IOV)20111015 勉強会 (PCIe / SR-IOV)
20111015 勉強会 (PCIe / SR-IOV)Kentaro Ebisawa
 
DPDK in Containers Hands-on Lab
DPDK in Containers Hands-on LabDPDK in Containers Hands-on Lab
DPDK in Containers Hands-on LabMichelle Holley
 
XDP in Practice: DDoS Mitigation @Cloudflare
XDP in Practice: DDoS Mitigation @CloudflareXDP in Practice: DDoS Mitigation @Cloudflare
XDP in Practice: DDoS Mitigation @CloudflareC4Media
 
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking ExplainedThomas Graf
 
Accelerating Envoy and Istio with Cilium and the Linux Kernel
Accelerating Envoy and Istio with Cilium and the Linux KernelAccelerating Envoy and Istio with Cilium and the Linux Kernel
Accelerating Envoy and Istio with Cilium and the Linux KernelThomas Graf
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDKKernel TLV
 
Segment Routing Advanced Use Cases - Cisco Live 2016 USA
Segment Routing Advanced Use Cases - Cisco Live 2016 USASegment Routing Advanced Use Cases - Cisco Live 2016 USA
Segment Routing Advanced Use Cases - Cisco Live 2016 USAJose Liste
 
"SRv6の現状と展望" ENOG53@上越
"SRv6の現状と展望" ENOG53@上越"SRv6の現状と展望" ENOG53@上越
"SRv6の現状と展望" ENOG53@上越Kentaro Ebisawa
 
Cilium - Container Networking with BPF & XDP
Cilium - Container Networking with BPF & XDPCilium - Container Networking with BPF & XDP
Cilium - Container Networking with BPF & XDPThomas Graf
 
Poll mode driver integration into dpdk
Poll mode driver integration into dpdkPoll mode driver integration into dpdk
Poll mode driver integration into dpdkVipin Varghese
 
Intel DPDK Step by Step instructions
Intel DPDK Step by Step instructionsIntel DPDK Step by Step instructions
Intel DPDK Step by Step instructionsHisaki Ohara
 
Deploying CloudStack and Ceph with flexible VXLAN and BGP networking
Deploying CloudStack and Ceph with flexible VXLAN and BGP networking Deploying CloudStack and Ceph with flexible VXLAN and BGP networking
Deploying CloudStack and Ceph with flexible VXLAN and BGP networking ShapeBlue
 
FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)Kirill Tsym
 
Cisco Live Milan 2015 - BGP advance
Cisco Live Milan 2015 - BGP advanceCisco Live Milan 2015 - BGP advance
Cisco Live Milan 2015 - BGP advanceBertrand Duvivier
 

Was ist angesagt? (20)

Deploying IPv6 on OpenStack
Deploying IPv6 on OpenStackDeploying IPv6 on OpenStack
Deploying IPv6 on OpenStack
 
VXLAN BGP EVPN: Technology Building Blocks
VXLAN BGP EVPN: Technology Building BlocksVXLAN BGP EVPN: Technology Building Blocks
VXLAN BGP EVPN: Technology Building Blocks
 
20111015 勉強会 (PCIe / SR-IOV)
20111015 勉強会 (PCIe / SR-IOV)20111015 勉強会 (PCIe / SR-IOV)
20111015 勉強会 (PCIe / SR-IOV)
 
SCTP Tutorial
SCTP TutorialSCTP Tutorial
SCTP Tutorial
 
DPDK in Containers Hands-on Lab
DPDK in Containers Hands-on LabDPDK in Containers Hands-on Lab
DPDK in Containers Hands-on Lab
 
XDP in Practice: DDoS Mitigation @Cloudflare
XDP in Practice: DDoS Mitigation @CloudflareXDP in Practice: DDoS Mitigation @Cloudflare
XDP in Practice: DDoS Mitigation @Cloudflare
 
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking Explained
 
Accelerating Envoy and Istio with Cilium and the Linux Kernel
Accelerating Envoy and Istio with Cilium and the Linux KernelAccelerating Envoy and Istio with Cilium and the Linux Kernel
Accelerating Envoy and Istio with Cilium and the Linux Kernel
 
DPDK In Depth
DPDK In DepthDPDK In Depth
DPDK In Depth
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
Segment Routing Advanced Use Cases - Cisco Live 2016 USA
Segment Routing Advanced Use Cases - Cisco Live 2016 USASegment Routing Advanced Use Cases - Cisco Live 2016 USA
Segment Routing Advanced Use Cases - Cisco Live 2016 USA
 
"SRv6の現状と展望" ENOG53@上越
"SRv6の現状と展望" ENOG53@上越"SRv6の現状と展望" ENOG53@上越
"SRv6の現状と展望" ENOG53@上越
 
Cilium - Container Networking with BPF & XDP
Cilium - Container Networking with BPF & XDPCilium - Container Networking with BPF & XDP
Cilium - Container Networking with BPF & XDP
 
Poll mode driver integration into dpdk
Poll mode driver integration into dpdkPoll mode driver integration into dpdk
Poll mode driver integration into dpdk
 
100Gbps OpenStack For Providing High-Performance NFV
100Gbps OpenStack For Providing High-Performance NFV100Gbps OpenStack For Providing High-Performance NFV
100Gbps OpenStack For Providing High-Performance NFV
 
Intel DPDK Step by Step instructions
Intel DPDK Step by Step instructionsIntel DPDK Step by Step instructions
Intel DPDK Step by Step instructions
 
Deploying CloudStack and Ceph with flexible VXLAN and BGP networking
Deploying CloudStack and Ceph with flexible VXLAN and BGP networking Deploying CloudStack and Ceph with flexible VXLAN and BGP networking
Deploying CloudStack and Ceph with flexible VXLAN and BGP networking
 
FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)
 
Cisco Live Milan 2015 - BGP advance
Cisco Live Milan 2015 - BGP advanceCisco Live Milan 2015 - BGP advance
Cisco Live Milan 2015 - BGP advance
 
OVS v OVS-DPDK
OVS v OVS-DPDKOVS v OVS-DPDK
OVS v OVS-DPDK
 

Ähnlich wie Receive side scaling (RSS) with eBPF in QEMU and virtio-net

Fastsocket Linxiaofeng
Fastsocket LinxiaofengFastsocket Linxiaofeng
Fastsocket LinxiaofengMichael Zhang
 
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*Michelle Holley
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machineAlexei Starovoitov
 
Project Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptxProject Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptxAkshitAgiwal1
 
VMworld 2016: vSphere 6.x Host Resource Deep Dive
VMworld 2016: vSphere 6.x Host Resource Deep DiveVMworld 2016: vSphere 6.x Host Resource Deep Dive
VMworld 2016: vSphere 6.x Host Resource Deep DiveVMworld
 
Assisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated ArchitectureAssisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated Architectureinside-BigData.com
 
BKK16-103 OpenCSD - Open for Business!
BKK16-103 OpenCSD - Open for Business!BKK16-103 OpenCSD - Open for Business!
BKK16-103 OpenCSD - Open for Business!Linaro
 
Load Balancing
Load BalancingLoad Balancing
Load Balancingoptalink
 
Spy hard, challenges of 100G deep packet inspection on x86 platform
Spy hard, challenges of 100G deep packet inspection on x86 platformSpy hard, challenges of 100G deep packet inspection on x86 platform
Spy hard, challenges of 100G deep packet inspection on x86 platformRedge Technologies
 
Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...SignalFx
 
Introduction to ARM big.LITTLE technology
Introduction to ARM big.LITTLE technologyIntroduction to ARM big.LITTLE technology
Introduction to ARM big.LITTLE technology義洋 顏
 
Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing ...
Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing ...Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing ...
Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing ...Bruno Castelucci
 
DPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles ShiflettDPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles ShiflettJim St. Leger
 
2016 NCTU P4 Workshop
2016 NCTU P4 Workshop2016 NCTU P4 Workshop
2016 NCTU P4 WorkshopYi Tseng
 

Ähnlich wie Receive side scaling (RSS) with eBPF in QEMU and virtio-net (20)

Fastsocket Linxiaofeng
Fastsocket LinxiaofengFastsocket Linxiaofeng
Fastsocket Linxiaofeng
 
eBPF Basics
eBPF BasicseBPF Basics
eBPF Basics
 
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machine
 
Project Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptxProject Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptx
 
Linux Kernel Live Patching
Linux Kernel Live PatchingLinux Kernel Live Patching
Linux Kernel Live Patching
 
VMworld 2016: vSphere 6.x Host Resource Deep Dive
VMworld 2016: vSphere 6.x Host Resource Deep DiveVMworld 2016: vSphere 6.x Host Resource Deep Dive
VMworld 2016: vSphere 6.x Host Resource Deep Dive
 
Assisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated ArchitectureAssisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated Architecture
 
15 ia64
15 ia6415 ia64
15 ia64
 
ch2.pptx
ch2.pptxch2.pptx
ch2.pptx
 
Ebpf ovsconf-2016
Ebpf ovsconf-2016Ebpf ovsconf-2016
Ebpf ovsconf-2016
 
BKK16-103 OpenCSD - Open for Business!
BKK16-103 OpenCSD - Open for Business!BKK16-103 OpenCSD - Open for Business!
BKK16-103 OpenCSD - Open for Business!
 
Load Balancing
Load BalancingLoad Balancing
Load Balancing
 
Building a Router
Building a RouterBuilding a Router
Building a Router
 
Spy hard, challenges of 100G deep packet inspection on x86 platform
Spy hard, challenges of 100G deep packet inspection on x86 platformSpy hard, challenges of 100G deep packet inspection on x86 platform
Spy hard, challenges of 100G deep packet inspection on x86 platform
 
Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...
 
Introduction to ARM big.LITTLE technology
Introduction to ARM big.LITTLE technologyIntroduction to ARM big.LITTLE technology
Introduction to ARM big.LITTLE technology
 
Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing ...
Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing ...Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing ...
Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing ...
 
DPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles ShiflettDPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles Shiflett
 
2016 NCTU P4 Workshop
2016 NCTU P4 Workshop2016 NCTU P4 Workshop
2016 NCTU P4 Workshop
 

Mehr von Yan Vugenfirer

HCK-CI: Enabling CI for Windows Guest Paravirtualized Drivers - Kostiantyn Ko...
HCK-CI: Enabling CI for Windows Guest Paravirtualized Drivers - Kostiantyn Ko...HCK-CI: Enabling CI for Windows Guest Paravirtualized Drivers - Kostiantyn Ko...
HCK-CI: Enabling CI for Windows Guest Paravirtualized Drivers - Kostiantyn Ko...Yan Vugenfirer
 
Implementing SR-IOv failover for Windows guests during live migration
Implementing SR-IOv failover for Windows guests during live migrationImplementing SR-IOv failover for Windows guests during live migration
Implementing SR-IOv failover for Windows guests during live migrationYan Vugenfirer
 
Qemu device prototyping
Qemu device prototypingQemu device prototyping
Qemu device prototypingYan Vugenfirer
 
Windows network teaming
Windows network teamingWindows network teaming
Windows network teamingYan Vugenfirer
 
Rebuild presentation - IoT Israel MeetUp
Rebuild presentation - IoT Israel MeetUpRebuild presentation - IoT Israel MeetUp
Rebuild presentation - IoT Israel MeetUpYan Vugenfirer
 
Rebuild presentation during Docker's Birthday party
Rebuild presentation during Docker's Birthday partyRebuild presentation during Docker's Birthday party
Rebuild presentation during Docker's Birthday partyYan Vugenfirer
 
Contributing to open source using Git
Contributing to open source using GitContributing to open source using Git
Contributing to open source using GitYan Vugenfirer
 
Microsoft Hardware Certification Kit (HCK) setup
Microsoft Hardware Certification Kit (HCK) setupMicrosoft Hardware Certification Kit (HCK) setup
Microsoft Hardware Certification Kit (HCK) setupYan Vugenfirer
 
Building “old” Windows drivers (XP, Vista, 2003 and 2008) with Visual Studio ...
Building “old” Windows drivers (XP, Vista, 2003 and 2008) with Visual Studio ...Building “old” Windows drivers (XP, Vista, 2003 and 2008) with Visual Studio ...
Building “old” Windows drivers (XP, Vista, 2003 and 2008) with Visual Studio ...Yan Vugenfirer
 
Advanced NDISTest options
Advanced NDISTest optionsAdvanced NDISTest options
Advanced NDISTest optionsYan Vugenfirer
 
QEMU Development and Testing Automation Using MS HCK - Anton Nayshtut and Yan...
QEMU Development and Testing Automation Using MS HCK - Anton Nayshtut and Yan...QEMU Development and Testing Automation Using MS HCK - Anton Nayshtut and Yan...
QEMU Development and Testing Automation Using MS HCK - Anton Nayshtut and Yan...Yan Vugenfirer
 
Windows guest debugging presentation from KVM Forum 2012
Windows guest debugging presentation from KVM Forum 2012Windows guest debugging presentation from KVM Forum 2012
Windows guest debugging presentation from KVM Forum 2012Yan Vugenfirer
 

Mehr von Yan Vugenfirer (14)

HCK-CI: Enabling CI for Windows Guest Paravirtualized Drivers - Kostiantyn Ko...
HCK-CI: Enabling CI for Windows Guest Paravirtualized Drivers - Kostiantyn Ko...HCK-CI: Enabling CI for Windows Guest Paravirtualized Drivers - Kostiantyn Ko...
HCK-CI: Enabling CI for Windows Guest Paravirtualized Drivers - Kostiantyn Ko...
 
Implementing SR-IOv failover for Windows guests during live migration
Implementing SR-IOv failover for Windows guests during live migrationImplementing SR-IOv failover for Windows guests during live migration
Implementing SR-IOv failover for Windows guests during live migration
 
Qemu device prototyping
Qemu device prototypingQemu device prototyping
Qemu device prototyping
 
Windows network teaming
Windows network teamingWindows network teaming
Windows network teaming
 
Rebuild presentation - IoT Israel MeetUp
Rebuild presentation - IoT Israel MeetUpRebuild presentation - IoT Israel MeetUp
Rebuild presentation - IoT Israel MeetUp
 
Rebuild presentation during Docker's Birthday party
Rebuild presentation during Docker's Birthday partyRebuild presentation during Docker's Birthday party
Rebuild presentation during Docker's Birthday party
 
Contributing to open source using Git
Contributing to open source using GitContributing to open source using Git
Contributing to open source using Git
 
Introduction to Git
Introduction to GitIntroduction to Git
Introduction to Git
 
Microsoft Hardware Certification Kit (HCK) setup
Microsoft Hardware Certification Kit (HCK) setupMicrosoft Hardware Certification Kit (HCK) setup
Microsoft Hardware Certification Kit (HCK) setup
 
UsbDk at a Glance 
UsbDk at a Glance UsbDk at a Glance 
UsbDk at a Glance 
 
Building “old” Windows drivers (XP, Vista, 2003 and 2008) with Visual Studio ...
Building “old” Windows drivers (XP, Vista, 2003 and 2008) with Visual Studio ...Building “old” Windows drivers (XP, Vista, 2003 and 2008) with Visual Studio ...
Building “old” Windows drivers (XP, Vista, 2003 and 2008) with Visual Studio ...
 
Advanced NDISTest options
Advanced NDISTest optionsAdvanced NDISTest options
Advanced NDISTest options
 
QEMU Development and Testing Automation Using MS HCK - Anton Nayshtut and Yan...
QEMU Development and Testing Automation Using MS HCK - Anton Nayshtut and Yan...QEMU Development and Testing Automation Using MS HCK - Anton Nayshtut and Yan...
QEMU Development and Testing Automation Using MS HCK - Anton Nayshtut and Yan...
 
Windows guest debugging presentation from KVM Forum 2012
Windows guest debugging presentation from KVM Forum 2012Windows guest debugging presentation from KVM Forum 2012
Windows guest debugging presentation from KVM Forum 2012
 

Kürzlich hochgeladen

CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 

Kürzlich hochgeladen (20)

Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 

Receive side scaling (RSS) with eBPF in QEMU and virtio-net

  • 1. Receive side scaling (RSS) with eBPF in QEMU and virtio-net Yan Vugenfirer - CEO, Daynix
  • 2. Agenda • What is RSS? • History: RSS and virtio-net • What is eBPF? • Using eBPF for packet steering (RSS) in virtio-net
  • 3. What is RSS? • Receive side scaling - distribution of packets’ processing among CPUs • A NIC uses a hashing function to compute a hash value over a defined area • Hash value is used to index an indirection table • The values in the indirection table are used to assign the received data to a CPU • With MSI support, a NIC can also interrupt the associated CPU
  • 4. CPU1 CPU2 CPU3 CPU0 What is RSS? NIC RX handling RX handling RX handling RX handling NIC Driver HW queue HW queue HW queue HW queue Packet classification
  • 6. History: RSS and virtio-net • Let’s use RSS with virtio-net! • Utilisation CPUs for packet processing • Cash locality for network applications • Microsoft WHQL requirement for high speed devices
  • 7. History: RSS and virtio-net • No multi-queue in virtio • SW implementation in Windows guest driver (similar to RFS in Linux) CPU1 CPU2 CPU3 CPU0 virtio-net RX handling RX handling RX handling NetKVM RX virt-queue Packet classification RX handling
  • 8. History: RSS and virtio-net • virtio-net became multi queue device • Due to Windows requirements - hybrid model. Interrupt received on specific CPU core, but could be rescheduled to another • Works good for TCP • Might not work for UDP • Support legacy interrupts for old OSes
  • 9. History: RSS and virtio-net CPU1 CPU0 virtio-net NetKVM RX virt-queue Packet classification RX handling RX virt-queue Packet classification RX handling
  • 10. History: RSS and virtio-net • virtio spec changes • Set steering mode • Pass the device redirection tables • Set hash value in virtio-net-hdr • No inter-processor interrupts due to re-scheduling • Vision: HW will do all the heavy work • Implementations • SW only POC in QEMU • eBPF
  • 11. virtio spec changes - capabilities • VIRTIO_NET_F_RSS • VIRTIO_NET_F_MQ must be set
  • 12. virtio spec changes - device configuration • virtio_net_config struct virtio_net_con fi g { u8 mac[6] ; le16 status ; le16 max_virtqueue_pairs ; le16 mtu ; le32 speed ; u8 duplex ; u8 rss_max_key_size ; le16 rss_max_indirection_table_length ; le32 supported_hash_types ; };
  • 13. virtio spec changes - setting RSS parameters • VIRTIO_NET_CTRL_MQ_RSS_CONFIG struct virtio_net_rss_con fi g { le32 hash_types ; le16 indirection_table_mask ; le16 unclassi fi ed_queue ; le16 indirection_table[indirection_table_length] ; le16 max_tx_vq ; u8 hash_key_length ; u8 hash_key_data[hash_key_length] ; };
  • 14. virtio spec changes - virtio-net-hdr struct virtio_net_hdr { u8 fl ags ; u8 gso_type ; le16 hdr_len ; le16 gso_size ; le16 csum_start ; le16 csum_offset ; le16 num_buffers ; le32 hash_value; (Only if VIRTIO_NET_F_HASH_REPORT negotiated ) le16 hash_report; (Only if VIRTIO_NET_F_HASH_REPORT negotiated ) le16 padding_reserved; (Only if VIRTIO_NET_F_HASH_REPORT negotiated ) };
  • 15. What is eBPF? • Enable running sandboxed code in Linux kernel • The code can be loaded at run time • Used for security, tracing, networking, observability
  • 16. How can eBPF help us? • Calculate the RSS hash and return the queue index for incoming packets • Populate the hash value in virtio_net_hdr (work in progress)
  • 17. The “magic” • Loading eBPF program using IOCTL TUNSETSTEERINGEBPF • tun_struct has steering_prog field • If eBPF program for steering is loaded, tun_select_queue will call it with bpf_prog_run_clear_cb
  • 18. Hash population (work in progress) • Population from eBPF program • virtio_net_hdr with additional fields • Work in progress in kernel • Enlarge virtio_net_hdr in all kernel modules • Keep calculated hash in SKB and copy it to virtio_net_hdr
  • 19. eBPF program source in QEMU • tun_rss_steering_prog • tools/ebpf/rss.bpf.c • Use clang to compile • tools/ebpf/Makefile.ebpf
  • 20. eBPF program skeleton • During QEMU compilation include file is populated with the compiled binary • bpftool gen skeleton rss.bpf.o > rss.bpf.skeleton.h • Helpers to initialise maps (mechanism to share data between eBPF program and kerneluserspace) • Some changes to support libvirt - mmapping the shared data structure to user space (3 maps in current main branch without mmaping, 1 map in pending patches)
  • 21. Configuration map • The configuration map is BPF array map that contains everything required for RSS: • Supported hash flows: IPv4, TCPv4, UDPv4, IPv6, IPv6ex, TCPv6, UDPv6 • Indirections table size (max 128) • Default queue • Toeplitz hash key - 40 bytes • Indirections table - 128 entries
  • 22. Loading eBPF program • Two mechanisms • QEMU using function in skeleton file. Calling bpf syscall • eBPF helper program (with libvirt) - QEMU gets file descriptors from libvirt with already loaded ebpf program and mapping of the ebpf map (patches under review)
  • 23. Loading eBPF program • Possible load failures • Kernel support. Current solution requires 5.8+ • Without helper • QEMU process capabilities: CAP_BPF, CAP_NET_ADMIN • sysctl kernel.unprivileged_bpf_disabled=1 • libbpf not present • In case of helper usage - mismatch between helper and QEMU • Stamp is a hash of skeleton include file
  • 24. Fallback • Built it QEMU RSS steering • Can be triggered also by live migration • Hash population is enabled in QEMU command line, because there is still not hash population from eBPF program
  • 25. Live migration • Known issue: migrating to old kernel • eBPF load failure • Fallback to in-QEMU RSS steering
  • 26. QEMU command line • Multi-queue should be enabled • -smp with vCPU for each queue-pair • -device virtio-net-pci, rss=on,hash=on,ebpf_rss_fds=<fd0,fd1>
  • 27. QEMU command line • rss=on • Try to load eBPF from skeleton or by using provided file descriptors • Fallback to “built-in” RSS steering in QEMU if cannot load eBPF program • hash=on • Populate hash in virtio_net_hdr • ebpf_rss_fds - optional, provide file descriptors for eBPF program and map
  • 28. libvirt integration • QEMU should run with least possible privileges • eBPF helper • Stamping the helper during compilation time • Redirection table mapping • Additional command line options to provide file descriptors to QEMU • Patches under review
  • 29. Current status • Initial support was merged to QEMU • libvirt integration patches in QEMU and libvirt are under discussion on mailing lists • Hash population by eBPF program - pending additional work for next set of patches
  • 30. Pending patches • QEMU libvirt integration: https://lists.nongnu.org/ archive/html/qemu-devel/2021-07/msg03535.html • libvirt patches: https://listman.redhat.com/archives/ libvir-list/2021-July/msg00836.html • RSS support in Linux virtio-net driver: https:// lists.linuxfoundation.org/pipermail/virtualization/ 2021-August/055940.html • In kernel hash calculation reporting to guest driver: https://lkml.org/lkml/2021/1/12/1329
  • 31. virtio-net and eBPF future • Packet filtering with vhost • Security?
  • 33.
  • 34. Links • https://www.kernel.org/doc/Documentation/networking/scaling.txt • https://docs.microsoft.com/en-us/windows-hardware/drivers/network/ introduction-to-receive-side-scaling • https://ebpf.io • https://docs.microsoft.com/en-us/windows-hardware/drivers/network/rss- with-a-single-hardware-receive-queue • https://docs.microsoft.com/en-us/windows-hardware/drivers/network/rss- with-hardware-queuing • https://docs.microsoft.com/en-us/windows-hardware/drivers/network/rss- with-message-signaled-interrupts • https://docs.microsoft.com/en-us/windows-hardware/drivers/network/rss- hashing-functions