[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2] xen/pt: fix some pass-thru devices don't work across reboot



On Fri, Nov 16, 2018 at 03:30:11PM +0100, Roger Pau Monné wrote:
>On Fri, Nov 16, 2018 at 02:59:41AM -0700, Jan Beulich wrote:
>> >>> On 16.11.18 at 10:35, <roger.pau@xxxxxxxxxx> wrote:
>> > On Fri, Nov 16, 2018 at 03:53:50PM +0800, Chao Gao wrote:
>> >> On Thu, Nov 15, 2018 at 11:40:39AM +0100, Roger Pau Monné wrote:
>> >> >On Thu, Nov 15, 2018 at 09:10:26AM +0800, Chao Gao wrote:
>> >> >> +    if ( pdev && list_empty(&pdev->msi_list) && pdev->msix )
>> >> >> +    {
>> >> >> +        if ( pdev->msix->host_maskall )
>> >> >> +            printk(XENLOG_G_WARNING
>> >> >> +                   "Resetting msix status of %04x:%02x:%02x.%u\n",
>> >> >> +                   pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
>> >> >> +                   PCI_FUNC(pdev->devfn));
>> >> >> +        pdev->msix->host_maskall = false;
>> >> >> +        pdev->msix->warned = DOMID_INVALID;
>> > 
>> > AFAICT a guest could trigger this message multiple times by forcing a
>> > PIRQ map/unmap of all the vectors in MSIX, thus likely flooding the
>> > console since this is not rate limited. Since I think a guest can
>> > manage to reach this code path while running, clearing warned is not
>> > correct.
>> 
>> Did you overlook the _G_ infix? That guarantees rate limiting, unless
>> the admin specified a non-default "guest_loglvl=".
>
>Right, I tend to use the gprintk variant and I've indeed overlooked
>the _G_.
>
>> > Also, if a guest can manage to trigger this path during it's runtime,
>> > could it also hit the issue of getting host_maskall set and not being
>> > able to clear it?
>> 
>> But _can_ a guest trigger this path? So far I didn't think it can.
>
>AFAICT (and I might have missed something) a guest can trigger the
>execution of unmap_domain_pirq which ends up calling msi_free_irq by
>enabling and then disabling MSIX after having setup some vectors. This
>is the trace from QEMU and Xen:
>
>xen_pt_msixctrl_reg_write
>    xen_pt_msix_disable
>       msi_msix_disable
>            xc_physdev_unmap_pirq
>                -> PHYSDEVOP_unmap_pirq hypercall
>                    physdev_unmap_pirq
>                        unmap_domain_pirq
>                            msi_free_irq
>
>Given this I would only clean host_maskall in msi_free_irq if the
>domain is being destroyed (d->is_shutting_down),

Considering hot-unplug case, it isn't a good idea. Although qemu always
disables msi-x when hot-unplug a device, but it can be compromised.

>or even better I
>would consider using something like PHYSDEVOP_prepare_msix in order to
>reset Xen's internal MSI state after device reset.

It might be a clean solution. But to me, current code is complicated enough.
Extending what the two sub-hypercall is doing and wrapping device reset with
these two sub-hypercall should be very careful. One obvious error is
pci_prepare_msix() will return -EBUSY if 'msix->used_entries' isn't 0 or 1.
To make it work, we also rely on qemu to disable msix then Xen will decrease
the used_entries.

Another solution came to my mind:

The intention of Xen setting 'host_maskall' is to mask a single vector. How
about converting the host_maskall to mask all vectors when Xen tries to init 
the first vector in msix_capability_init()? Actually, on hardware, all
vector's mask bit is already set when pciback is performing device reset. So
it won't break anything. With commit 69d99d1b223, even a guest has cleared
some vectors' mask bit before the convertion, it won't be an issue.

Do you think this solution is theoretically correct?

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.