[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 3/3] x86/msi: clear initial MSI-X state on boot



On Tue, Mar 28, 2023 at 9:35 AM Jan Beulich <jbeulich@xxxxxxxx> wrote:
>
> On 28.03.2023 15:32, Jason Andryuk wrote:
> > On Tue, Mar 28, 2023 at 9:28 AM Roger Pau Monné <roger.pau@xxxxxxxxxx> 
> > wrote:
> >> On Tue, Mar 28, 2023 at 03:23:56PM +0200, Jan Beulich wrote:
> >>> On 28.03.2023 15:04, Marek Marczykowski-Górecki wrote:
> >>>> On Tue, Mar 28, 2023 at 02:54:38PM +0200, Jan Beulich wrote:
> >>>>> On 25.03.2023 03:49, Marek Marczykowski-Górecki wrote:
> >>>>>> Some firmware/devices are found to not reset MSI-X properly, leaving
> >>>>>> MASKALL set. Xen relies on initial state being both disabled.
> >>>>>> Especially, pci_reset_msix_state() assumes if MASKALL is set, it was 
> >>>>>> Xen
> >>>>>> setting it due to msix->host_maskall or msix->guest_maskall. Clearing
> >>>>>> just MASKALL might be unsafe if ENABLE is set, so clear them both.
> >>>>>
> >>>>> But pci_reset_msix_state() comes into play only when assigning a device
> >>>>> to a DomU. If the tool stack doing a reset doesn't properly clear the
> >>>>> bit, how would it be cleared the next time round (i.e. after the guest
> >>>>> stopped and then possibly was started again)? It feels like the issue
> >>>>> wants dealing with elsewhere, possibly in the tool stack.
> >>>>
> >>>> I may be misremembering some details, but AFAIR Xen intercepts
> >>>> toolstack's (or more generally: accesses from dom0) attempt to clean
> >>>> this up and once it enters an inconsistent state (or rather: starts with
> >>>> such at the start of the day), there was no way to clean it up.
> >>>
> >>> Iirc Roger and you already discussed that there needs to be an
> >>> indication of device reset having happened, so that Xen can resync
> >>> from this "behind its back" operation. That would look to be the
> >>> point/place where such inconsistencies should be eliminated.
> >>
> >> I think that was a different conversation with Huang Rui related to
> >> the AMD GPU work, see:
> >>
> >> https://lore.kernel.org/xen-devel/ZBwtaceTNvCYksmR@Air-de-Roger/
> >>
> >> I understood the problem Marek was trying to solve was that some
> >> devices where initialized with the MASKALL bit set (likely by the
> >> firmware?) and that prevented Xen from using them.  But now seeing the
> >> further replies on this patch I'm unsure whether that's the case.
> >
> > In my case, Xen's setting of MASKALL persists through a warm reboot,
>
> And does this get in the way of Dom0 using the device? (Before a DomU
> gets to use it, things should be properly reset anyway.)

Dom0 doesn't have drivers for the device, so I am not sure.  I don't
seem to have the logs around, but I believe when MASKALL is set, the
initial quarantine of the device fails.  Yes, some notes I have
mention:

It's getting -EBUSY from pdev_msix_assign() which means
pci_reset_msix_state() is failing:
    if ( pci_conf_read16(pdev->sbdf, msix_control_reg(pos)) &
         PCI_MSIX_FLAGS_MASKALL )
        return -EBUSY;

Regards,
Jason



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.