[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 02/11] vpci: cancel pending map/unmap on vpci removal




On 16.11.21 10:01, Jan Beulich wrote:
> On 16.11.2021 08:32, Oleksandr Andrushchenko wrote:
>> On 15.11.21 18:56, Jan Beulich wrote:
>>> On 05.11.2021 07:56, Oleksandr Andrushchenko wrote:
>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>
>>>>
>>>> When a vPCI is removed for a PCI device it is possible that we have
>>>> scheduled a delayed work for map/unmap operations for that device.
>>>> For example, the following scenario can illustrate the problem:
>>>>
>>>> pci_physdev_op
>>>>      pci_add_device
>>>>          init_bars -> modify_bars -> defer_map -> 
>>>> raise_softirq(SCHEDULE_SOFTIRQ)
>>>>      iommu_add_device <- FAILS
>>>>      vpci_remove_device -> xfree(pdev->vpci)
>>>>
>>>> leave_hypervisor_to_guest
>>>>      vpci_process_pending: v->vpci.mem != NULL; v->vpci.pdev->vpci == NULL
>>>>
>>>> For the hardware domain we continue execution as the worse that
>>>> could happen is that MMIO mappings are left in place when the
>>>> device has been deassigned
>>> Is continuing safe in this case? I.e. isn't there the risk of a NULL
>>> deref?
>> I think it is safe to continue
> And why do you think so? I.e. why is there no race for Dom0 when there
> is one for DomU?
Well, then we need to use a lock to synchronize the two.
I guess this needs to be pci devs lock unfortunately
>
>>>> For unprivileged domains that get a failure in the middle of a vPCI
>>>> {un}map operation we need to destroy them, as we don't know in which
>>>> state the p2m is. This can only happen in vpci_process_pending for
>>>> DomUs as they won't be allowed to call pci_add_device.
>>> You saying "we need to destroy them" made me look for a new domain_crash()
>>> that you add, but there is none. What is this about?
>> Yes, I guess we need to implicitly destroy the domain,
> What do you mean by "implicitly"?
@@ -151,14 +151,18 @@ bool vpci_process_pending(struct vcpu *v)

          vpci_cancel_pending(v->vpci.pdev);
          if ( rc )
+        {
              /*
               * FIXME: in case of failure remove the device from the domain.
               * Note that there might still be leftover mappings. While this is
+             * safe for Dom0, for DomUs the domain needs to be killed in order
+             * to avoid leaking stale p2m mappings on failure.
               */
              vpci_remove_device(v->vpci.pdev);
+
+            if ( !is_hardware_domain(v->domain) )
+                domain_crash(v->domain);

>
>>>> @@ -165,6 +164,18 @@ bool vpci_process_pending(struct vcpu *v)
>>>>        return false;
>>>>    }
>>>>    
>>>> +void vpci_cancel_pending(const struct pci_dev *pdev)
>>>> +{
>>>> +    struct vcpu *v = current;
>>>> +
>>>> +    /* Cancel any pending work now. */
>>> Doesn't "any" include pending work on all vCPU-s of the guest, not
>>> just current? Is current even relevant (as in: part of the correct
>>> domain), considering ...
>>>
>>>> --- a/xen/drivers/vpci/vpci.c
>>>> +++ b/xen/drivers/vpci/vpci.c
>>>> @@ -51,6 +51,8 @@ void vpci_remove_device(struct pci_dev *pdev)
>>>>            xfree(r);
>>>>        }
>>>>        spin_unlock(&pdev->vpci->lock);
>>>> +
>>>> +    vpci_cancel_pending(pdev);
>>> ... this code path, when coming here from pci_{add,remove}_device()?
>>>
>>> I can agree that there's a problem here, but I think you need to
>>> properly (i.e. in a race free manner) drain pending work.
>> Yes, the code is inconsistent with this respect. I am thinking about:
>>
>> void vpci_cancel_pending(const struct pci_dev *pdev)
>> {
>>       struct domain *d = pdev->domain;
>>       struct vcpu *v;
>>
>>       /* Cancel any pending work now. */
>>       domain_lock(d);
>>       for_each_vcpu ( d, v )
>>       {
>>           vcpu_pause(v);
>>           if ( v->vpci.mem && v->vpci.pdev == pdev)
> Nit: Same style issue as in the original patch.
Will fix
>
>>           {
>>               rangeset_destroy(v->vpci.mem);
>>               v->vpci.mem = NULL;
>>           }
>>           vcpu_unpause(v);
>>       }
>>       domain_unlock(d);
>> }
>>
>> which seems to solve all the concerns. Is this what you mean?
> Something along these lines. I expect you will want to make use of
> domain_pause_except_self(),
Yes, this is what is needed here, thanks. The only question is that

int domain_pause_except_self(struct domain *d)
{
[snip]
         /* Avoid racing with other vcpus which may want to be pausing us */
         if ( !spin_trylock(&d->hypercall_deadlock_mutex) )
             return -ERESTART;

so it is not clear what do we do in case of -ERESTART: do we want to spin?
Otherwise we will leave the job not done effectively not canceling the
pending work. Any idea other then spinning?
>   and I don't understand the purpose of
> acquiring the domain lock.
You are right, no need
>
> Jan
>
Thank you,
Oleksandr

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.