[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 3/3] x86/msi: clear initial MSI-X state on boot



On Tue, Mar 28, 2023 at 9:54 AM Jan Beulich <jbeulich@xxxxxxxx> wrote:
>
> On 28.03.2023 15:43, Jason Andryuk wrote:
> > On Tue, Mar 28, 2023 at 9:35 AM Jan Beulich <jbeulich@xxxxxxxx> wrote:
> >>
> >> On 28.03.2023 15:32, Jason Andryuk wrote:
> >>> On Tue, Mar 28, 2023 at 9:28 AM Roger Pau Monné <roger.pau@xxxxxxxxxx> 
> >>> wrote:
> >>>> On Tue, Mar 28, 2023 at 03:23:56PM +0200, Jan Beulich wrote:
> >>>>> On 28.03.2023 15:04, Marek Marczykowski-Górecki wrote:
> >>>>>> On Tue, Mar 28, 2023 at 02:54:38PM +0200, Jan Beulich wrote:
> >>>>>>> On 25.03.2023 03:49, Marek Marczykowski-Górecki wrote:
> >>>>>>>> Some firmware/devices are found to not reset MSI-X properly, leaving
> >>>>>>>> MASKALL set. Xen relies on initial state being both disabled.
> >>>>>>>> Especially, pci_reset_msix_state() assumes if MASKALL is set, it was 
> >>>>>>>> Xen
> >>>>>>>> setting it due to msix->host_maskall or msix->guest_maskall. Clearing
> >>>>>>>> just MASKALL might be unsafe if ENABLE is set, so clear them both.
> >>>>>>>
> >>>>>>> But pci_reset_msix_state() comes into play only when assigning a 
> >>>>>>> device
> >>>>>>> to a DomU. If the tool stack doing a reset doesn't properly clear the
> >>>>>>> bit, how would it be cleared the next time round (i.e. after the guest
> >>>>>>> stopped and then possibly was started again)? It feels like the issue
> >>>>>>> wants dealing with elsewhere, possibly in the tool stack.
> >>>>>>
> >>>>>> I may be misremembering some details, but AFAIR Xen intercepts
> >>>>>> toolstack's (or more generally: accesses from dom0) attempt to clean
> >>>>>> this up and once it enters an inconsistent state (or rather: starts 
> >>>>>> with
> >>>>>> such at the start of the day), there was no way to clean it up.
> >>>>>
> >>>>> Iirc Roger and you already discussed that there needs to be an
> >>>>> indication of device reset having happened, so that Xen can resync
> >>>>> from this "behind its back" operation. That would look to be the
> >>>>> point/place where such inconsistencies should be eliminated.
> >>>>
> >>>> I think that was a different conversation with Huang Rui related to
> >>>> the AMD GPU work, see:
> >>>>
> >>>> https://lore.kernel.org/xen-devel/ZBwtaceTNvCYksmR@Air-de-Roger/
> >>>>
> >>>> I understood the problem Marek was trying to solve was that some
> >>>> devices where initialized with the MASKALL bit set (likely by the
> >>>> firmware?) and that prevented Xen from using them.  But now seeing the
> >>>> further replies on this patch I'm unsure whether that's the case.
> >>>
> >>> In my case, Xen's setting of MASKALL persists through a warm reboot,
> >>
> >> And does this get in the way of Dom0 using the device? (Before a DomU
> >> gets to use it, things should be properly reset anyway.)
> >
> > Dom0 doesn't have drivers for the device, so I am not sure.  I don't
> > seem to have the logs around, but I believe when MASKALL is set, the
> > initial quarantine of the device fails.  Yes, some notes I have
> > mention:
> >
> > It's getting -EBUSY from pdev_msix_assign() which means
> > pci_reset_msix_state() is failing:
> >     if ( pci_conf_read16(pdev->sbdf, msix_control_reg(pos)) &
> >          PCI_MSIX_FLAGS_MASKALL )
> >         return -EBUSY;
>
> Arguably this check may want skipping when moving to quarantine. I'd
> still be curious to know whether the device works in Dom0, and
> confirmation of device reset's effect on the bit would also be helpful.

echo 1 > /sys/.../reset does not clear MASKALL.

Regards,
Jason



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.