[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3] xen-pciback: Consider INTx disabled when MSI/MSI-X is enabled



On Sat, Nov 19, 2022 at 09:36:54AM -0500, Jason Andryuk wrote:
> Hi, Marek,
> 
> On Fri, Nov 18, 2022 at 4:46 PM Marek Marczykowski-Górecki
> <marmarek@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > On Fri, Nov 18, 2022 at 03:46:47PM -0500, Jason Andryuk wrote:
> > > I was trying to test your xen-pciback v3 patch, and I am having
> > > assignment fail consistently now.  It is actually failing to
> > > quarantine to domIO in the first place, which matches the failure from
> > > the other day (when I more carefully read through the logs).  It now
> > > consistently fails to quarantine on every boot unlike the other day
> > > where it happened once.
> >
> > Does this include the very first assignment too, or only after domain
> > reboot? If the latter, maybe some cleanup missed clearing MASKALL?
> 
> It's the quarantine during dom0 boot that fails.  Later assignment
> during VM boot fails.  I tried warm reboots and cold boots and it
> happened both times.
> 
> I also modified my initrd to halt in there and checked the config
> space.  MASKALL wasn't set at that time.  I need to double check -
> MASKALL may have been unset after dom0 booted in that case.
> 
> I'll test more to figure when and how MASKALL is getting set.
> 
> > FWIW, the patch applied to Qubes
> > (https://github.com/QubesOS/qubes-linux-kernel/pull/680) seems to work
> > fine (the full test run is still in progress, but I see some green marks
> > already).
> 
> Does Qubes quarantine devices explicitly, or are they in dom0 and
> libvirt/libxl just assigns them when a domain boots?

We do quarantine them explicitly, still in initramfs.

> > > I added some printks and it 's getting -EBUSY from pdev_msix_assign()
> > > which means pci_reset_msix_state() is failing:
> > >     if ( pci_conf_read16(pdev->sbdf, msix_control_reg(pos)) &
> > >          PCI_MSIX_FLAGS_MASKALL )
> > >         return -EBUSY;
> > >
> > > # lspci -vv -s 14.3
> > > ...
> > >     Capabilities: [80] MSI-X: Enable- Count=16 Masked+
> > >         Vector table: BAR=0 offset=00002000
> > >         PBA: BAR=0 offset=00003000
> > >
> > > So it looks like MASKALL is set and prevents assignment.
> > >
> > > setpci -s 00:14.3 82.W=f
> > > cleared that out for me and I could assign the device.
> > >
> > > My dom0 boots, it runs flask-label-pci for a set of PCI devices
> > > (including iwlwifi), then xl pci-assignable-add for all PCI devices
> > > which will be passed through, then a little later it boots the
> > > associated domains.  Dom0 does not have a driver for iwlwifi.
> > >
> > > I'll have to investigate more to see how MASKALL is getting set.  This
> > > had not been an issue before your recent patches.
> >
> > I guess before the patches nothing set anything in MSI-X capability,
> > because it was hidden...
> 
> Well, stubdom hasn't even booted when, so it would be the Xen or
> pciback change to modify MASKALL?

Weird...

> > Anyway, to support my cleanup hypothesis, I tried to destroy a
> > PCI-having domain, and it left MSI-X enabled (at least according to the
> > config space). MASKALL was _not_ set, but I haven't checked masking of
> > individual vectors. TBH, I'm not sure what should be responsible for the
> > MSI-X cleanup after guest destroy. Should it be Xen? Qemu? Pciback?
> > Pciback calls PHYSDEVOP_{prepare,release}_msix only when
> > binding/unbinding from the device (so - xl pci-assignable-{add,remove}),
> > so this isn't the right place.
> 
> I need to review all this code to give a meaningful response.  Would
> xen-pciback set MASKALL when it binds a device?  That happens before
> xl pci-assignable-add tries to quarantine (assign to to domIO).

I don't see pciback doing that. And also, my patches shouldn't change
behaviour of pciback when binding to a device (so, if it would be doing
that, it would happen before my patches too).

Maybe that's an interaction with some other patches?

> > Should that be in Xen, in deassign_device() (part of
> > DOMCTL_deassign_device)?
> 
> It seems to me that Xen needs to ultimately disable the device.

That's my intuition too.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

Attachment: signature.asc
Description: PGP signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.