[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] Interesting observation with network event notification and batching
Hi all I'm hacking on a netback trying to identify whether TLB flushes causes heavy performance penalty on Tx path. The hack is quite nasty (you would not want to know, trust me). Basically what is doesn't is, 1) alter network protocol to pass along mfns instead of grant references, 2) when the backend sees a new mfn, map it RO and cache it in its own address space. With this hack, now we have some sort of zero-copy TX path. Backend doesn't need to issue any grant copy / map operation any more. When it sees a new packet in the ring, it just needs to pick up the pages in its own address space and assemble packets with those pages then pass the packet on to network stack. In theory this should boost performance, but in practice it is the other way around. This hack makes Xen network more than 50% slower than before (OMG). Further investigation shows that with this hack the batching ability is gone. Before this hack, netback batches like 64 slots in one interrupt event, however after this hack, it only batches 3 slots in one interrupt event -- that's no batching at all because we can expect one packet to occupy 3 slots. Time to have some figures (iperf from DomU to Dom0). Before the hack, doing grant copy, throughput: 7.9 Gb/s, average slots per batch 64. After the hack, throughput: 2.5 Gb/s, average slots per batch 3. After the hack, adds in 64 HYPERVISOR_xen_version (it just does context switch into hypervisor) in Tx path, throughput: 3.2 Gb/s, average slots per batch 6. After the hack, adds in 256 HYPERVISOR_xen_version (it just does context switch into hypervisor) in Tx path, throughput: 5.2 Gb/s, average slots per batch 26. After the hack, adds in 512 HYPERVISOR_xen_version (it just does context switch into hypervisor) in Tx path, throughput: 7.9 Gb/s, average slots per batch 26. After the hack, adds in 768 HYPERVISOR_xen_version (it just does context switch into hypervisor) in Tx path, throughput: 5.6 Gb/s, average slots per batch 25. After the hack, adds in 1024 HYPERVISOR_xen_version (it just does context switch into hypervisor) in Tx path, throughput: 4.4 Gb/s, average slots per batch 25. Average slots per batch is calculate as followed: 1. count total_slots processed from start of day 2. count tx_count which is the number of tx_action function gets invoked 3. avg_slots_per_tx = total_slots / tx_count The counter-intuition figures imply that there is something wrong with the currently batching mechanism. Probably we need to fine-tune the batching behavior for network and play with event pointers in the ring (actually I'm looking into it now). It would be good to have some input on this. Konrad, IIRC you once mentioned you discovered something with event notification, what's that? To all, any thoughts? Wei. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |