|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v7 01/12] vpci: introduce per-domain lock to protect vpci structure
Hi Jan,
Jan Beulich <jbeulich@xxxxxxxx> writes:
> On 05.07.2023 10:59, Roger Pau Monné wrote:
>> On Wed, Jul 05, 2023 at 09:11:10AM +0200, Jan Beulich wrote:
>>> On 04.07.2023 23:03, Volodymyr Babchuk wrote:
>>>> I am currently implementing your proposal (along with Jan's
>>>> suggestions), but I am facing ABBA deadlock with IOMMU's
>>>> reassign_device() call, which has this piece of code:
>>>>
>>>> list_move(&pdev->domain_list, &target->pdev_list);
>>>>
>>>> My immediate change was:
>>>>
>>>> write_lock(&pdev->domain->pci_lock);
>>>> list_del(&pdev->domain_list);
>>>> write_unlock(&pdev->domain->pci_lock);
>>>>
>>>> write_lock(&target->pci_lock);
>>>> list_add(&pdev->domain_list, &target->pdev_list);
>>>> write_unlock(&target->pci_lock);
>>>>
>>>> But this will not work because reassign_device is called from
>>>> pci_release_devices() which iterates over d->pdev_list, so we need to
>>>> take a d->pci_lock early.
>>>>
>>>> Any suggestions on how to fix this? My idea is to remove a device from a
>>>> list one at time:
>>>>
>>>> int pci_release_devices(struct domain *d)
>>>> {
>>>> struct pci_dev *pdev;
>>>> u8 bus, devfn;
>>>> int ret;
>>>>
>>>> pcidevs_lock();
>>>> write_lock(&d->pci_lock);
>>>> ret = arch_pci_clean_pirqs(d);
>>>> if ( ret )
>>>> {
>>>> pcidevs_unlock();
>>>> write_unlock(&d->pci_lock);
>>>> return ret;
>>>> }
>>>>
>>>> while ( !list_empty(&d->pdev_list) )
>>>> {
>>>> pdev = list_entry(&d->pdev_list, struct pci_dev, domain_list);
>>>> bus = pdev->bus;
>>>> devfn = pdev->devfn;
>>>> list_del(&pdev->domain_list);
>>>> write_unlock(&d->pci_lock);
>>>> ret = deassign_device(d, pdev->seg, bus, devfn) ?: ret;
>>>> write_lock(&d->pci_lock);
>>>
>>> I think it needs doing almost like this, but with two more tweaks and
>>> no list_del() right here (first and foremost to avoid needing to
>>> figure whether removing early isn't going to subtly break anything;
>>> see below for an error case that would end up with changed behavior):
>>>
>>> while ( !list_empty(&d->pdev_list) )
>>> {
>>> const struct pci_dev *pdev = list_first_entry(&d->pdev_list, struct
>>> pci_dev, domain_list);
>>> uint16_t seg = pdev->seg;
>>> uint8_t bus = pdev->bus;
>>> uint8_t devfn = pdev->devfn;
>>>
>>> write_unlock(&d->pci_lock);
>>
>> I think you need to remove the device from the pdev_list before
>> dropping the lock, or else release could race with other operations.
>>
>> That's unlikely, but still if the lock is dropped and the routine
>> needs to operate on the device it is better remove such device from
>> the domain so other operations cannot get a reference to it.
>>
>> Otherwise you could modify reassign_device() implementations so they
>> require the caller to hold the source->pci_lock when calling the
>> routine, but that's ugly because the lock would need to be dropped in
>> order to reassign the device from source to target domains.
>>
>> Another option would be to move the whole d->pdev_list to a local
>> variable (so that d->pdev_list would be empty) and then iterate over
>> it without the d->pci_lock. On failure you would take the lock and
>> add the failing device back into d->pdev_list.
>
> Conceptually I like this last variant, but like the individual list_del()
> it requires auditing code for no dependency on the device still being on
> that list. In fact deassign_device()'s use of pci_get_pdev() does. The
> function would then need changing to have struct pci_dev * passed in.
> Yet who knows where else there are uses of pci_get_pdev() lurking.
Okay, so I changed deassign_device() signature and reworked the loop
in pci_release_devices() in a such way:
INIT_LIST_HEAD(&tmp_list);
/* Move all entries to tmp_list, so we can drop d->pci_lock */
list_splice_init(&d->pdev_list, &tmp_list);
write_unlock(&d->pci_lock);
list_for_each_entry_safe ( pdev, tmp, &tmp_list, domain_list )
{
pdev = list_entry(&d->pdev_list, struct pci_dev, domain_list);
rc = deassign_device(d, pdev);
if ( rc )
{
/* Return device back to the domain list */
write_lock(&d->pci_lock);
list_add(&pdev->domain_list, &d->pdev_list);
write_unlock(&d->pci_lock);
func_ret = rc;
}
}
Also, I checked for all pci_get_pdev() calls and found that struct
domain (the first parameter) is passed only in handful of places:
*** xen/drivers/vpci/vpci.c:
vpci_read[504] pdev = pci_get_pdev(d, sbdf);
vpci_read[506] pdev = pci_get_pdev(dom_xen, sbdf);
vpci_write[621] pdev = pci_get_pdev(d, sbdf);
vpci_write[623] pdev = pci_get_pdev(dom_xen, sbdf);
*** xen/arch/x86/irq.c:
map_domain_pirq[2166] pdev = pci_get_pdev(d, msi->sbdf);
*** xen/drivers/passthrough/pci.c:
XEN_GUEST_HANDLE_PARAM[1712] pdev = pci_get_pdev(d, machine_sbdf);
The last one is due to my change to deassign_device() signature.
==============================
d->pdev_list can be accessed there:
*** xen/drivers/passthrough/amd/pci_amd_iommu.c:
reassign_device[489] list_add(&pdev->domain_list, &target->pdev_list);
*** xen/drivers/passthrough/pci.c:
_pci_hide_device[463] list_add(&pdev->domain_list,
&dom_xen->pdev_list);
pci_get_pdev[561] list_for_each_entry ( pdev, &d->pdev_list,
domain_list )
pci_add_device[759] list_add(&pdev->domain_list,
&hardware_domain->pdev_list);
pci_release_devices[917] list_splice_init(&d->pdev_list, &tmp_list);
pci_release_devices[922] pdev = list_entry(&d->pdev_list, struct pci_dev,
domain_list);
pci_release_devices[928] list_add(&pdev->domain_list, &d->pdev_list);
_setup_hwdom_pci_devices[1155] list_add(&pdev->domain_list,
&ctxt->d->pdev_list);
*** xen/drivers/passthrough/vtd/iommu.c:
reassign_device_ownership[2819] list_add(&pdev->domain_list,
&target->pdev_list);
*** xen/include/xen/pci.h:
for_each_pdev[149] list_for_each_entry(pdev, &(domain)->pdev_list,
domain_list)
has_arch_pdevs[151] #define has_arch_pdevs(d)
(!list_empty(&(d)->pdev_list))
==============================
And has_arch_pdevs() is used there:
*** xen/arch/x86/hvm/hvm.c:
hvm_set_cr0[2388] has_arch_pdevs(d)) )
*** xen/arch/x86/hvm/vmx/vmcs.c:
vmx_do_resume[1892] if ( has_arch_pdevs(v->domain) && !iommu_snoop
*** xen/arch/x86/mm.c:
l1_disallow_mask[172] !has_arch_pdevs(d) && \
*** xen/arch/x86/mm/p2m-pod.c:
p2m_pod_set_mem_target[352] if ( has_arch_pdevs(d) ||
cache_flush_permitted(d) )
guest_physmap_mark_populate_on_demand[1404] if ( has_arch_pdevs(d) ||
cache_flush_permitted(d) )
*** xen/arch/x86/mm/paging.c:
paging_log_dirty_enable[208] if ( has_arch_pdevs(d) )
*** xen/drivers/passthrough/vtd/iommu.c:
reassign_device_ownership[2773] if ( !has_arch_pdevs(target) )
reassign_device_ownership[2807] if ( !has_arch_pdevs(target) )
reassign_device_ownership[2825] if ( !has_arch_pdevs(source) )
has_arch_pdevs() bothers me most, actually, because it is not always
obvious how to add locking for the callers. I am planning to rework it
in the following way:
#define has_arch_pdevs_unlocked(d) (!list_empty(&(d)->pdev_list))
static inline bool has_arch_pdevs(struct domain *d)
{
bool ret;
read_lock(&d->pci_lock);
ret = has_arch_pdevs_unlocked(d);
read_unlock(&d->pci_lock);
return ret;
}
And then use appropriate macro/function depending on circumstances.
--
WBR, Volodymyr
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |