[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH RFC] x86/vMSI-X: avoid missing first unmask of vectors
> -----Original Message----- > From: Jan Beulich [mailto:JBeulich@xxxxxxxx] > Sent: 01 April 2016 14:43 > To: Paul Durrant > Cc: Andrew Cooper; Anthony Perard; Stefano Stabellini; xen-devel; Keir > (Xen.org) > Subject: RE: [PATCH RFC] x86/vMSI-X: avoid missing first unmask of vectors > > >>> On 01.04.16 at 15:01, <Paul.Durrant@xxxxxxxxxx> wrote: > >> From: Jan Beulich [mailto:JBeulich@xxxxxxxx] > >> Sent: 01 April 2016 12:21 > >> >>> On 01.04.16 at 12:56, <Paul.Durrant@xxxxxxxxxx> wrote: > >> >> From: Jan Beulich [mailto:JBeulich@xxxxxxxx] > >> >> Sent: 01 April 2016 10:59 > >> >> >>> On 01.04.16 at 11:15, <JBeulich@xxxxxxxx> wrote: > >> >> > Recent changes to Linux result in there just being a single unmask > >> >> > operation prior to expecting the first interrupts to arrive. However, > >> >> > we've had a chicken-and-egg problem here: Qemu invokes > >> >> > xc_domain_update_msi_irq(), ultimately leading to > >> >> > msixtbl_pt_register(), upon seeing that first unmask operation. Yet > >> >> > for msixtbl_range() to return true (in order to msixtbl_write() to get > >> >> > invoked at all) msixtbl_pt_register() must have completed. > >> >> > > >> >> > Deal with this by snooping suitable writes in msixtbl_range() and > >> >> > triggering the invocation of msix_write_completion() from > >> >> > msixtbl_pt_register() when that happens in the context of a still in > >> >> > progress vector control field write. > >> >> > > >> >> > Note that the seemingly unrelated deletion of the redundant > >> >> > irq_desc->msi_desc checks in msixtbl_pt_register() is to make clear > to > >> >> > any compiler version used that the "msi_desc" local variable isn't > >> >> > being used uninitialized. (Doing the same in msixtbl_pt_unregister() > is > >> >> > just for consistency reasons.) > >> >> > > >> >> > Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> > >> >> > --- > >> >> > TODO: How to deal with REP MOVS to the MSI-X table (in > >> msixtbl_range())? > >> >> > >> >> Some more detail on the thoughts I so far had for this aspect: > >> >> It has always been puzzling me that the hypervisor doesn't > >> >> see _all_ the MSI-X table accesses (which is a result of the > >> >> addresses only getting registered via XEN_DOMCTL_bind_pt_irq); > >> >> it's quite natural that this is an at least latent issue possibly > >> >> causing guest misbehavior. I cannot, however, currently see any > >> >> way to address this without altering both Xen and qemu, since for > >> >> Xen to see all accesses it would need to become aware of the > >> >> GPA of the MSI-X table much earlier (read: before the domain > >> >> actually start, or at the latest when the domain first enables > >> >> memory decoding on the device). > >> >> > >> >> The mapping of the MMIO BARs of the device into guest memory, > >> >> however, intentionally excludes the page(s) covering the MSI-X > >> >> table, so the hypervisor can't become aware of them by just > >> >> looking at data it gets presented today. Hence either we need to > >> >> add some new hypercall for qemu to invoke, or we need to make > >> >> qemu map the full BAR ranges, filtering out MSI-X table pages > >> >> in the hypervisor (using those mapping requests just to learn the > >> >> GPA of the MSI-X table, without entering them into P2M). > >> >> > >> >> Unless someone can think of a way which doesn't require altering > >> >> both qemu and Xen (creating the well known compatibility issues > >> >> between unmatched pairs), I think the patch as presented should > >> >> be okay without handling this case, i.e. best possible effort, and a > >> >> subsequent change then ought to be to deal with this by changing > >> >> both components. In which case I'd suggest that the change here > >> >> go into 4.7, but the full fix would then likely need deferring until > >> >> 4.8. > >> > > >> > I guess it could be handled entirely in Xen if we are willing to snoop on > >> > PCI configuration. It would not be too hard to snoop guest writes to the > >> BARs > >> > in config space so that Xen can keep track of where they are. Snooping > on > >> the > >> > MSI-X capability could then tell Xen when to start interposing on the > table, > >> > and allow it to discover the GPA at that point (via the BIR and offset > >> > values). > >> > >> Well, that's a possibility, but won't - afaict - work without qemu's > >> help at another point: So far we don't know the guest's PCI bus > >> topology, hence we can't correlate vBAR writes we might snoop > >> with the physical devices they correspond to. > > > > Does Xen need to know that correspondence just to track state? I thought > the > > problem here was that Xen does not see every guest access to an MSI-X > table. > > If Xen always interposes on MSI-X tables then it can at least track the > > state > > of the emulated table, even if we end up just forwarding the access for > QEMU > > to handle at first. When the mapping is created to the actual h/w table > then, > > presumably, Xen's idea of state should correspond to QEMU's. > > But Xen doesn't see the guest view of config space, Well Xen interposes on every single config cycle so arguably it sees exactly what the guest sees. > and the > whereabouts of the MSI-X table are in read-only config space fields > (i.e. snooping writes to those doesn't make sense, as the guest may > not ever write them or write anything, which would just get dropped > on the floor). I was thinking of snooping the reads. The guest has to find out where the table is before it can write to it, right? Or Xen could even pro-actively read the capability chain of any pcidev registered with it. > > And additionally msixtbl_addr_to_desc() needs to know the physical > device. > Yes, but msixtbl_range() could be trivially changed to accept any access where msixtbl_find_entry() returns non-NULL. That would allow msixtbl_write() to manipulate entry->flags even if msixtbl_addr_to_desc() returns NULL. Paul > Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |