[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] xen/pt: fix some pass-thru devices don't work across reboot

>>> On 09.11.18 at 01:11, <chao.gao@xxxxxxxxx> wrote:
> I find some pass-thru devices don't work any more across guest
> reboot. Assigning it to another domain also meets the same issue. And
> the only way to make it work again is un-binding and binding it to
> pciback. Someone reported this issue one year ago [1].

I find "some" above and in the title misleading. According to the
following description, the issue ought to affect exactly all MSI-X
capable devices.

> The root cause is that xen sets the maskall bit in MSI-x capability
> during guest shutdown.

That's in __pci_disable_msix()? Please help readers by being

> @@ -1439,7 +1440,27 @@ static int assign_device(struct domain *d, u16 seg, u8 
> bus, u8 devfn, u32 flag)
>      }
>      if ( pdev->msix )
> +    {
> +        uint8_t slot = PCI_SLOT(devfn), func = PCI_FUNC(devfn);
> +        uint8_t pos = pci_find_cap_offset(seg, bus, slot, func,
> +                                          PCI_CAP_ID_MSIX);
> +        uint16_t control = pci_conf_read16(seg, bus, slot, func,
> +                                           msix_control_reg(pos));
> +        uint16_t new_control;
> +
> +        /* Reset status owned by Xen */
> +        pdev->msix->host_maskall = false;
> +        pdev->msix->warned = DOMID_INVALID;
> +
> +        /* Update 'maskall' bit in MSI-X Capability */
> +        new_control = (control & ~PCI_MSIX_FLAGS_MASKALL) |
> +                      pdev->msix->guest_maskall;
> +        if ( new_control != control )
> +            pci_conf_write16(seg, bus, slot, func, msix_control_reg(pos),
> +                             control);
> +
>          msixtbl_init(d);
> +    }

First of all, all maskall bit management is in x86/msi.c. Please keep it
that way, by introducing a function there, called from here (if need
be, read on). Next I'm weary of you clearing the device's maskall
bit without also clearing the enable bit (or ASSERT()ing that it's
cleared, if that's guaranteed). I also don't follow why you OR in
->guest_maskall: Isn't it supposed to be clear anyway?

From a more general perspective, forcing ->host_maskall to true
in msi_set_mask_bit() when memory decoding is disabled is a
safety measure. I'd like to see justification (in the description) why
it is safe to clear the bit again at a later point. Thing is that _if_ it
is safe to clear the bit when assigning the device to another guest,
why wouldn't it even more so be safe to do so when assigning it
back to Dom0 (i.e. in deassign_device())?

And why would it not be safe to clear the bit perhaps already at
the point MSI-X gets disabled? It would seem to me that
_pci_cleanup_msix() could call msix_set_enable(, false), and
msix_set_enable() might legitimately clear host_maskall in that
case (but please double check; another possibility could be that
pci_cleanup_msi() needs a second call site added somewhere).
In the end the state after disabling MSI-X should match that
before enabling it the first time.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.