[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: AMD EPYC virtual network performances
On 09.07.24 10:36, Andrei Semenov wrote: Hello, As been reported by David Morel (mail 4 Jan 2024), our customers experience a very poor virtual network performances in HVM guests on AMD EPYC platforms. After some investigations we notices a huge performances drop (perfs divided by factor of 5) starting from 5.10.88 Linux kernel version on the AMD EPYC platforms. The patch introduced in this kernel version that allows to pinpoint the buggy behavior is : “xen/netfront: harden netfront against event channel storms” d31b3379179d64724d3bbfa87bd4ada94e3237de The patch basically binds the network frontend to the `xen_lateeoi_chip` irq_chip (insead of `xen_dynamic_chip`) which allows to its clients to inform the chip if spurious interrupts are detected and so the delay in interrupt treatment is introduced by the chip. We tried to measure how much spurious interrupts (no work to do by the driver) are raised. We used `iperf2` to bench the network bandwidth on the AMD EPYC 7262 8-core). Dom0 > iperf -s DomU> iperf -c $DOM0_IP_ADDRESS It appears from our observations that we have approximatively 1 spurious interrupt for 1 “useful” interrupt (frontend TX interrupts) for HVM guests. We run the same bench on the same platform with PV and PVH and the interrupts spurious/useful ratio was quite lower: 1 to 20 (so the network performances are much better). We also run this bench on the Intel platform (Intel Xeon Bronze 3106 CPU). The interrupts spurious/useful ratio was about 1 to 30 for HVM guests. So this make us think that this buggy behavior is related to abnormal amount of spurious interrupts. This spurious/useful interrupts ratio is particularly elevated in HVM guests on AMD platforms, so virtual network bandwidth is heavily penalized – in our particular bench we have 1,5Gbps bandwidth instead of 7 Gbps (when slowdown isn’t introduced by the irq_chip). Does anybody notice this behavior on his side? Can we do something about it? In the guest you could raise the spurious event threshold via writing a higher number to /sys/devices/vif-0/xenbus/spurious_threshold (default is 1). There is a similar file on the backend side, which might be interesting to raise the value. In both directories you can see the number of spurious events by looking into the spurious_events file. In the end the question is why so many spurious events are happening. Finding the reason might be hard, though. Juergen
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |