[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 3/3] x86/msi: clear initial MSI-X state on boot



On Tue, Mar 28, 2023 at 9:17 AM Jason Andryuk <jandryuk@xxxxxxxxx> wrote:
>
> On Tue, Mar 28, 2023 at 9:04 AM Marek Marczykowski-Górecki
> <marmarek@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > On Tue, Mar 28, 2023 at 02:54:38PM +0200, Jan Beulich wrote:
> > > On 25.03.2023 03:49, Marek Marczykowski-Górecki wrote:
> > > > Some firmware/devices are found to not reset MSI-X properly, leaving
> > > > MASKALL set. Xen relies on initial state being both disabled.
> > > > Especially, pci_reset_msix_state() assumes if MASKALL is set, it was Xen
> > > > setting it due to msix->host_maskall or msix->guest_maskall. Clearing
> > > > just MASKALL might be unsafe if ENABLE is set, so clear them both.
> > >
> > > But pci_reset_msix_state() comes into play only when assigning a device
> > > to a DomU. If the tool stack doing a reset doesn't properly clear the
> > > bit, how would it be cleared the next time round (i.e. after the guest
> > > stopped and then possibly was started again)? It feels like the issue
> > > wants dealing with elsewhere, possibly in the tool stack.
> >
> > I may be misremembering some details, but AFAIR Xen intercepts
> > toolstack's (or more generally: accesses from dom0) attempt to clean
> > this up and once it enters an inconsistent state (or rather: starts with
> > such at the start of the day), there was no way to clean it up.
> >
> > I have considered changing pci_reset_msix_state() to not choke on
> > MASKALL being set, but I'm a bit afraid of doing it, as there it seems
> > there is a lot of assumptions all over the place and I may miss some.
>
> Hi, Marek and Jan,
>
> Marek, thank you for working on MSI-X support.
>
> As Jan says, the clearing here works during system boot.  However, I
> have found that Xen itself is setting MASKALL in __pci_disable_msix()
> when shutting down a domU.  When that is called, memory_decoded(dev)
> returns false, and Xen prints "cannot disable IRQ 137: masking MSI-X
> on 0000:00:14.3".  That makes the device unavailable for subsequent
> domU assignment.  I haven't investigated where and why memory decoding
> gets disabled for the device.
>
> Testing was with this v2 patchset integrated into OpenXT w/ Xen 4.16.
> We have some device reset changes, so I'll have to look at them again.
> Hmmm, they move the libxl device reseting from pci_remove_detached()
> to libxl__destroy_domid() to ensure all devices are de-assign after
> the domain is destroyed.  A kernel patch implements a "more thorough
> reset" which could do a slot or bus level reset, and the desire is to
> have all devices deassigned before that.  Maybe the shift later is
> throwing off Xen's expectations?

I dropped the OpenXT libxl patch, and Xen is not setting MASKALL.

Regards,
Jason



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.