[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH RFC] x86/vMSI-X: avoid missing first unmask of vectors



>>> On 01.04.16 at 12:56, <Paul.Durrant@xxxxxxxxxx> wrote:
>>  -----Original Message-----
>> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
>> Sent: 01 April 2016 10:59
>> To: xen-devel
>> Cc: Andrew Cooper; Anthony Perard; Paul Durrant; Stefano Stabellini; Keir
>> (Xen.org)
>> Subject: Re: [PATCH RFC] x86/vMSI-X: avoid missing first unmask of vectors
>> 
>> >>> On 01.04.16 at 11:15, <JBeulich@xxxxxxxx> wrote:
>> > Recent changes to Linux result in there just being a single unmask
>> > operation prior to expecting the first interrupts to arrive. However,
>> > we've had a chicken-and-egg problem here: Qemu invokes
>> > xc_domain_update_msi_irq(), ultimately leading to
>> > msixtbl_pt_register(), upon seeing that first unmask operation. Yet
>> > for msixtbl_range() to return true (in order to msixtbl_write() to get
>> > invoked at all) msixtbl_pt_register() must have completed.
>> >
>> > Deal with this by snooping suitable writes in msixtbl_range() and
>> > triggering the invocation of msix_write_completion() from
>> > msixtbl_pt_register() when that happens in the context of a still in
>> > progress vector control field write.
>> >
>> > Note that the seemingly unrelated deletion of the redundant
>> > irq_desc->msi_desc checks in msixtbl_pt_register() is to make clear to
>> > any compiler version used that the "msi_desc" local variable isn't
>> > being used uninitialized. (Doing the same in msixtbl_pt_unregister() is
>> > just for consistency reasons.)
>> >
>> > Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
>> > ---
>> > TODO: How to deal with REP MOVS to the MSI-X table (in msixtbl_range())?
>> 
>> Some more detail on the thoughts I so far had for this aspect:
>> It has always been puzzling me that the hypervisor doesn't
>> see _all_ the MSI-X table accesses (which is a result of the
>> addresses only getting registered via XEN_DOMCTL_bind_pt_irq);
>> it's quite natural that this is an at least latent issue possibly
>> causing guest misbehavior. I cannot, however, currently see any
>> way to address this without altering both Xen and qemu, since for
>> Xen to see all accesses it would need to become aware of the
>> GPA of the MSI-X table much earlier (read: before the domain
>> actually start, or at the latest when the domain first enables
>> memory decoding on the device).
>> 
>> The mapping of the MMIO BARs of the device into guest memory,
>> however, intentionally excludes the page(s) covering the MSI-X
>> table, so the hypervisor can't become aware of them by just
>> looking at data it gets presented today. Hence either we need to
>> add some new hypercall for qemu to invoke, or we need to make
>> qemu map the full BAR ranges, filtering out MSI-X table pages
>> in the hypervisor (using those mapping requests just to learn the
>> GPA of the MSI-X table, without entering them into P2M).
>> 
>> Unless someone can think of a way which doesn't require altering
>> both qemu and Xen (creating the well known compatibility issues
>> between unmatched pairs), I think the patch as presented should
>> be okay without handling this case, i.e. best possible effort, and a
>> subsequent change then ought to be to deal with this by changing
>> both components. In which case I'd suggest that the change here
>> go into 4.7, but the full fix would then likely need deferring until
>> 4.8.
> 
> I guess it could be handled entirely in Xen if we are willing to snoop on 
> PCI configuration. It would not be too hard to snoop guest writes to the BARs 
> in config space so that Xen can keep track of where they are. Snooping on the 
> MSI-X capability could then tell Xen when to start interposing on the table, 
> and allow it to discover the GPA at that point (via the BIR and offset 
> values).

Well, that's a possibility, but won't - afaict - work without qemu's
help at another point: So far we don't know the guest's PCI bus
topology, hence we can't correlate vBAR writes we might snoop
with the physical devices they correspond to.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.