[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2 3/3] x86/msi: clear initial MSI-X state on boot
On Tue, Mar 28, 2023 at 9:54 AM Jan Beulich <jbeulich@xxxxxxxx> wrote: > > On 28.03.2023 15:43, Jason Andryuk wrote: > > On Tue, Mar 28, 2023 at 9:35 AM Jan Beulich <jbeulich@xxxxxxxx> wrote: > >> > >> On 28.03.2023 15:32, Jason Andryuk wrote: > >>> On Tue, Mar 28, 2023 at 9:28 AM Roger Pau Monné <roger.pau@xxxxxxxxxx> > >>> wrote: > >>>> On Tue, Mar 28, 2023 at 03:23:56PM +0200, Jan Beulich wrote: > >>>>> On 28.03.2023 15:04, Marek Marczykowski-Górecki wrote: > >>>>>> On Tue, Mar 28, 2023 at 02:54:38PM +0200, Jan Beulich wrote: > >>>>>>> On 25.03.2023 03:49, Marek Marczykowski-Górecki wrote: > >>>>>>>> Some firmware/devices are found to not reset MSI-X properly, leaving > >>>>>>>> MASKALL set. Xen relies on initial state being both disabled. > >>>>>>>> Especially, pci_reset_msix_state() assumes if MASKALL is set, it was > >>>>>>>> Xen > >>>>>>>> setting it due to msix->host_maskall or msix->guest_maskall. Clearing > >>>>>>>> just MASKALL might be unsafe if ENABLE is set, so clear them both. > >>>>>>> > >>>>>>> But pci_reset_msix_state() comes into play only when assigning a > >>>>>>> device > >>>>>>> to a DomU. If the tool stack doing a reset doesn't properly clear the > >>>>>>> bit, how would it be cleared the next time round (i.e. after the guest > >>>>>>> stopped and then possibly was started again)? It feels like the issue > >>>>>>> wants dealing with elsewhere, possibly in the tool stack. > >>>>>> > >>>>>> I may be misremembering some details, but AFAIR Xen intercepts > >>>>>> toolstack's (or more generally: accesses from dom0) attempt to clean > >>>>>> this up and once it enters an inconsistent state (or rather: starts > >>>>>> with > >>>>>> such at the start of the day), there was no way to clean it up. > >>>>> > >>>>> Iirc Roger and you already discussed that there needs to be an > >>>>> indication of device reset having happened, so that Xen can resync > >>>>> from this "behind its back" operation. That would look to be the > >>>>> point/place where such inconsistencies should be eliminated. > >>>> > >>>> I think that was a different conversation with Huang Rui related to > >>>> the AMD GPU work, see: > >>>> > >>>> https://lore.kernel.org/xen-devel/ZBwtaceTNvCYksmR@Air-de-Roger/ > >>>> > >>>> I understood the problem Marek was trying to solve was that some > >>>> devices where initialized with the MASKALL bit set (likely by the > >>>> firmware?) and that prevented Xen from using them. But now seeing the > >>>> further replies on this patch I'm unsure whether that's the case. > >>> > >>> In my case, Xen's setting of MASKALL persists through a warm reboot, > >> > >> And does this get in the way of Dom0 using the device? (Before a DomU > >> gets to use it, things should be properly reset anyway.) > > > > Dom0 doesn't have drivers for the device, so I am not sure. I don't > > seem to have the logs around, but I believe when MASKALL is set, the > > initial quarantine of the device fails. Yes, some notes I have > > mention: > > > > It's getting -EBUSY from pdev_msix_assign() which means > > pci_reset_msix_state() is failing: > > if ( pci_conf_read16(pdev->sbdf, msix_control_reg(pos)) & > > PCI_MSIX_FLAGS_MASKALL ) > > return -EBUSY; > > Arguably this check may want skipping when moving to quarantine. I'd > still be curious to know whether the device works in Dom0, and > confirmation of device reset's effect on the bit would also be helpful. echo 1 > /sys/.../reset does not clear MASKALL. Regards, Jason
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |