[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Interesting observation with network event notification and batching



Hi all

I'm hacking on a netback trying to identify whether TLB flushes causes
heavy performance penalty on Tx path. The hack is quite nasty (you would
not want to know, trust me).

Basically what is doesn't is, 1) alter network protocol to pass along
mfns instead of grant references, 2) when the backend sees a new mfn,
map it RO and cache it in its own address space.

With this hack, now we have some sort of zero-copy TX path. Backend
doesn't need to issue any grant copy / map operation any more. When it
sees a new packet in the ring, it just needs to pick up the pages
in its own address space and assemble packets with those pages then pass
the packet on to network stack.

In theory this should boost performance, but in practice it is the other
way around. This hack makes Xen network more than 50% slower than before
(OMG). Further investigation shows that with this hack the batching
ability is gone. Before this hack, netback batches like 64 slots in one
interrupt event, however after this hack, it only batches 3 slots in one
interrupt event -- that's no batching at all because we can expect one
packet to occupy 3 slots.

Time to have some figures (iperf from DomU to Dom0).

Before the hack, doing grant copy, throughput: 7.9 Gb/s, average slots
per batch 64.

After the hack, throughput: 2.5 Gb/s, average slots per batch 3.

After the hack, adds in 64 HYPERVISOR_xen_version (it just does context
switch into hypervisor) in Tx path, throughput: 3.2 Gb/s, average slots
per batch 6.

After the hack, adds in 256 HYPERVISOR_xen_version (it just does context
switch into hypervisor) in Tx path, throughput: 5.2 Gb/s, average slots
per batch 26.

After the hack, adds in 512 HYPERVISOR_xen_version (it just does context
switch into hypervisor) in Tx path, throughput: 7.9 Gb/s, average slots
per batch 26.

After the hack, adds in 768 HYPERVISOR_xen_version (it just does context
switch into hypervisor) in Tx path, throughput: 5.6 Gb/s, average slots
per batch 25.

After the hack, adds in 1024 HYPERVISOR_xen_version (it just does context
switch into hypervisor) in Tx path, throughput: 4.4 Gb/s, average slots
per batch 25.

Average slots per batch is calculate as followed:
 1. count total_slots processed from start of day
 2. count tx_count which is the number of tx_action function gets
    invoked
 3. avg_slots_per_tx = total_slots / tx_count

The counter-intuition figures imply that there is something wrong with
the currently batching mechanism. Probably we need to fine-tune the
batching behavior for network and play with event pointers in the ring
(actually I'm looking into it now). It would be good to have some input
on this.

Konrad, IIRC you once mentioned you discovered something with event
notification, what's that?

To all, any thoughts?


Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.