Xen project Mailing List

[Xen-devel] [Hackathon minutes] PV network improvements

From: Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>

Date: Mon, 20 May 2013 15:08:05 +0100

Cc: Stefano Stabellini <Stefano.Stabellini@xxxxxxxxxxxxx>

Delivery-date: Mon, 20 May 2013 14:08:52 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Hi all, these are Konrad's and my notes (mostly Konrad's) on possible improvements of the PV network protocol, taken at the Hackathon. A) Network bandwidth: multipage rings The max outstanding amount of data the it can have is 898kB (64K of data use 18 slot, out of 256. 256 / 18 = 14, 14 * 64KB). This can be expanded by having multi-page to expand the ring. This would benefit NFS and bulk data transfer (such as netperf data). B) Producer and consumer index is on the same cache line In present hardware that means the reader and writer will compete for the same cacheline causing a ping-pong between sockets. This can be solved by having a feature-split-indexes (or better name) where the req_prod and req_event as a tuple are different from the rsp_prod and rsp_prod. This would entail using 128bytes of the ring at the start - each cacheline for each tuple. C) Cache alignment of requests The fix is to make the request structures more cache-aligned. For networking that means making it 16 bytes and block 64 bytes. Since it does not shrink the structure but just expands it, could be called feature-align-slot. E) Multiqueue (request-feature-multiqueue) It means creating many TX and RX rings for each vif. F) don't gnt_copy all of the requests Instead don't touch them and let the Xen IOMMU create appropriate entries. This would require the DMA API in dom0 to be aware whether the grant has been done and if not (so FOREIGN, aka no m2p_override), then do the hypercall to tell the hypervisor that this grant is going to be used by a specific PCI device. This would create the IOMMU entry in Xen. G) On TX side, do persistent grant mapping This would only be done from frontend -> backend path. That means that we could exhaust initial domains memory. H) Affinity of the frontend and backend being on the same NUMA node This touches upon the discussion about NUMA and having PV guests be aware of memory layout. It also means that each backend kthread needs to be on a different NUMA node. I) separate request and response rings for TX and RX J) Map the whole physical memory of the machine in dom0 If mapping/unmapping or copying slows us down, could we just keep the whole physical memory of the machine mapped in dom0 (with corresponding IOMMU entries)? At that point the frontend could just pass mfn numbers to the backend, and the backend would already have them mapped. >From a security perspective it doesn't change anything when running the backend in dom0, because dom0 is already capable of mapping random pages of any guests. QEMU instances do that all the time. But it would take away one of the benefits of deploying driver domains: we wouldn't be able to run the backends at a lower privilege level. However it might still be worth considering as an option? The backend is still trusted and protected from the frontend, but the frontend wouldn't be protected from the backend. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.