[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen-unstable: AMD-Vi: update_paging_mode Try to access pdev_list without aquiring pcidevs_lock.



On 31/10/2019 11:15, Jan Beulich wrote:
> On 30.10.2019 23:21, Sander Eikelenboom wrote:
>> Call trace seems to be the same in all cases.
>>
>> --
>> Sander
>>
>>
>> (XEN) [2019-10-30 22:07:05.748] AMD-Vi: update_paging_mode Try to access 
>> pdev_list without aquiring pcidevs_lock.
>> (XEN) [2019-10-30 22:07:05.748] ----[ Xen-4.13.0-rc  x86_64  debug=y   Not 
>> tainted ]----
>> (XEN) [2019-10-30 22:07:05.748] CPU:    1
>> (XEN) [2019-10-30 22:07:05.748] RIP:    e008:[<ffff82d080265748>] 
>> iommu_map.c#update_paging_mode+0x1f2/0x3eb
>> (XEN) [2019-10-30 22:07:05.748] RFLAGS: 0000000000010286   CONTEXT: 
>> hypervisor (d0v2)
> 
> I didn't pay attention to this when writing my earlier reply: The
> likely culprit looks to be f89f555827 ("remove late (on-demand)
> construction of IOMMU page tables"). Prior to this I assume IOMMU
> page tables got constructed only after ...

OK, I tested f89f555827 and f89f555827~1, my observations:

    with f89f555827~1:
        - I'm NOT seeing the aquiring pcidevs_lock message
        - the usb3 controller is also working.

    with f89f555827:
        - I'm now seeing the aquiring pcidevs_lock messages.
        - but I'm NOT seeing them *once* per booting guest, but multiple times.
        - the usb3 controller is still working.

    with staging:
        - Seeing the aquiring pcidevs_lock messages, but only *once* per guest 
boot.
        - the usb3 controller goes haywire in the guest.

So you seem to be right about both things:
    - f89f555827 is the culprit for the aquiring pcidevs_lock messages. 
      Although I get less of them with current staging, so some other later 
patch must have had some influence
      in reducing the amount.

    - The usb3 controller malfunctioning seems indeed to be a separate issue 
(which seems unfortunate, 
      because a bisect seems to become even nastier with all the intertwined 
pci-passthrough issues).
      
      Perhaps this one is then related to the only *once* occuring message: 
          (XEN) [2019-10-31 20:39:30.746] AMD-Vi: INVALID_DEV_REQUEST 00000800 
8a000000 f8000840 000000fd
     
      While in the guest it is endlessly repeating:
          [  231.385566] xhci_hcd 0000:00:05.0: Max number of devices this xHCI 
host supports is 32.
          [  231.407351] usb usb1-port2: couldn't allocate usb_device

      Hopefully this also gives you a hunch as to which commits to look at.

--
Sander

>> (XEN) [2019-10-30 22:07:05.748] Xen call trace:
>> (XEN) [2019-10-30 22:07:05.748]    [<ffff82d080265748>] R 
>> iommu_map.c#update_paging_mode+0x1f2/0x3eb
>> (XEN) [2019-10-30 22:07:05.748]    [<ffff82d080265ded>] F 
>> amd_iommu_map_page+0x72/0x1c2
>> (XEN) [2019-10-30 22:07:05.748]    [<ffff82d0802583b6>] F 
>> iommu_map+0x98/0x17e
>> (XEN) [2019-10-30 22:07:05.748]    [<ffff82d0802586fb>] F 
>> iommu_legacy_map+0x28/0x73
>> (XEN) [2019-10-30 22:07:05.748]    [<ffff82d08034a4a6>] F 
>> p2m-pt.c#p2m_pt_set_entry+0x4d3/0x844
>> (XEN) [2019-10-30 22:07:05.748]    [<ffff82d080342e13>] F 
>> p2m_set_entry+0x91/0x128
>> (XEN) [2019-10-30 22:07:05.748]    [<ffff82d080343c52>] F 
>> guest_physmap_add_entry+0x39f/0x5a3
>> (XEN) [2019-10-30 22:07:05.748]    [<ffff82d080343f85>] F 
>> guest_physmap_add_page+0x12f/0x138
>> (XEN) [2019-10-30 22:07:05.748]    [<ffff82d0802201ee>] F 
>> memory.c#populate_physmap+0x2e3/0x505
> 
> ... Dom0 had populated the new guest's physmap.
> 
> Anyway, as odd as it may seem I guess there's little choice
> besides making populate_physmap() (and likely a few others)
> acquire the lock.
> 
> Jan
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.