[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: PCI pass-through vs PoD



On 17/11/2021 11:23, Jan Beulich wrote:
On 17.11.2021 12:09, Andrew Cooper wrote:
On 17/11/2021 10:13, Jan Beulich wrote:
On 17.11.2021 09:55, Roger Pau Monné wrote:
On Wed, Nov 17, 2021 at 09:39:17AM +0100, Jan Beulich wrote:
On 13.09.2021 11:02, Jan Beulich wrote:
libxl__domain_config_setdefault() checks whether PoD is going to be
enabled and fails domain creation if at the same time devices would get
assigned. Nevertheless setting up of IOMMU page tables is allowed.
I'm unsure whether allowing enabling the IOMMU with PoD is the right
thing to do, at least for our toolstack.
May I ask about the reasons of you being unsure?
PoD and passthrough is a total nonsense.  You cannot have IOMMU mappings
to bits of the guest physical address space which don't exist.

It is now the case that IOMMU (or not) must be specified at domain
creation time, which is ahead of creating PoD pages.  Certainly as far
as Xen is concerned, the logic probably wants reversing to have
add_to_physmap&friends reject PoD if an IOMMU was configured.

A toolstack could, in principle, defer the decision to first device
assignment.
Right, which is what I consider the preferred approach.

Why?

Just because something is technically possible, does not mean it is an appropriate or clever thing to do.

In this case, we're talking about extra complexity in Xen and the toolstack, which in the very best case comes with unattractive user experience properties, to "fix" an issue which doesn't happen in practice.

and liable to suffer -ENOMEM,
Not if (as suggested) we first check that the PoD cache is large enough
to cover all PoD entries.

Just because at this instant we have enough free RAM to force-populate all PoD entries doesn't mean the same is true in 2 minutes time after we've been slowly force-populating a massive VM.

Yes, there are heuristics we can use to short-circuit the failure early, but that's still spelt -ENOMEM and reported to the user as such.

The only way to succeed here is to force populate the VM and to have not suffered -ENOMEM by the end of this task.

or we have
to reject a control operation with -EBUSY for a task which is dependent
on the guest kernel actions in a known-buggy area.
Why reject anything?

Because the guest kernel has no knowledge of nor the ability to query the PoD status of a page, the only way to not have things malfunction is to enforce that there are no P2M entries of type PoD when devices are assigned.

If you don't want to / can't force-populate the entire VM prior to having device assigned, then the assign operation needs to fail.

There is no point trying to make this work.  If a user wants a device,
they don't get to have PoD.  Anything else is a waste of time and effort
on our behalf for a usecase that doesn't exist in practice.
Not sure where you take the latter from. I suppose I'll submit the patch
as I have it now (once I have properly resolved dependencies on other
patches I have queued and/or pending), and if that's not deemed acceptable
plus if at the same time I don't really agree with proposed alternatives,
I'll leave fixing the bug to someone else. Of course the expectation then
is that such a bug fix come forward within a reasonable time frame ...

What bug?  PoD and PCI Passthrough are mutually exclusive technologies.

We can (now) tell up front when a VM is configured with these mutually exclusive options.  Such a configuration should be rejected as early as possible.


What you're talking about is introducing extra complexity to explicitly support running the VM in a known-incompatible configuration, with the decision point for fixing said incompatibility deferred until runtime and now with a possibility of "genuinely can't this to become compatible".

Failing device assignment (potentially after a multi-minute wait) with "well you shouldn't have enabled PoD to begin with, you fool" is clearly worse behaviour than refusing to create such a VM in the first place, and you need a far far better reason than "because it's technically possible" to justify doing this.

~Andrew



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.