[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [win-pv-devel] [PATCH xenbus 3/3] Stop using BAR space to host Xen data structures


  • To: win-pv-devel@xxxxxxxxxxxxxxxxxxxx
  • From: Paul Durrant <xadimgnik@xxxxxxxxx>
  • Date: Tue, 9 Mar 2021 15:29:55 +0000
  • Delivery-date: Tue, 09 Mar 2021 15:30:01 +0000
  • List-id: Developer list for the Windows PV Drivers subproject <win-pv-devel.lists.xenproject.org>

On 28/02/2021 17:31, Jinoh Kang wrote:
On 1/31/18 2:59 PM, Paul Durrant wrote:
Currently XENBUS makes use of the memory BAR of the PCI device to which it
binds as a source of unpopulated GFNs to host Xen data structures, such as
the shared info and grant table.

There is a problem with doing this, which is that Windows (unsurprisingly)
sets up a non-cached MTRR for the page range covering PCI BARs so accesses
to BAR space (and hence the Xen data structures) should be non-cached.
However, Xen itself contains a work-around to avoid the slow access times
that would ordinarily result from the this; it ignores the MTRRs if no
real devices are passed through to the guest so accesses are actually
cached. Thus, in the normal case, there is no penalty to pay... but as soon
as hardware is passed through to a guest, the work-around no longer applies
and there is a noticeable drop in PV driver performance. (E.g. network
throughput can drop by ~30-40%).

This patch modifies XENBUS to allocate a 2MB area of RAM

Some time ago I have discovered that the PV driver fails with
STATUS_INSUFFICIENT_RESOURCES if the grant table configured for the
Windows HVM is larger than 2MB.

Perhaps it might be a good idea to let unpopulated GFNs to be allocated
dynamically from FdoAllocateHole, possibly reviving the original purpose
of range_set in the process.

Or at minimum, call GrantTableQuerySize early and take the
MaximumFrameCount into account when allocating the initial "unpopulated"
GFN range.

(which will always fall into a cached MTRR),

Isn't MmAllocateContiguousNodeMemory expected to either return memory
with correct cacheability or fail completely?  In the absence of
PAGE_NOCACHE or PAGE_WRITECOMBINE flags, it makes sense for the caller
to safely assume the allocated memory to be WB-cached.


I'd assume that is the case, hence we now allocate memory that way and then decrease_reservation it out, to ensure we have a hole in a cached region.

I suppose the "fail completely" case could be alleviated via dynamic
allocation.


Yes, we could conceivably grab memory a page at a time. Perhaps that would be the best way to go. We do take the hit of potentially shattering superpage mappings if we don't grab in 2M chunks though.

use a decrease_reservation hypercall to de-populate the area,

An alternative method would be to copy the unpopulated-alloc facility
in Linux merged into mainline fairly recently (5.9), which avoids being
entangled with ballooning entirely.

An obvious approach would be to have hotplug PDOs to convince the NT PnP
manager to hand us cacheable memory resources.  Implementing it sounds
pretty complicated, though.

Yep, I've wanted to sort out hotplug memory for a long time and that may well offer a way to get hold of suitable ranges.

  Paul


and then use that as a source of GFNs instead of the
BAR. Hence, the work-around in Xen no longer has any baring on accessing of
Xen data structures and thus there is no longer any performance penalty
when hardware is passed through to a guest.

Signed-off-by: Paul Durrant <paul.durrant@xxxxxxxxxx>





 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.