[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[RFC XEN PATCH v7 0/5] Support device passthrough when dom0 is PVH on Xen


  • To: <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Jiqian Chen <Jiqian.Chen@xxxxxxx>
  • Date: Fri, 19 Apr 2024 11:53:35 +0800
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=lists.xenproject.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0)
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=NjLj/bgXtWHSqSA6gowNsuXVO4VVRYQkxkeszs2Ladg=; b=CNq0+PDdWgr8MMxlHI/IroNa4PWycRQ85atYnfOcZdM9IwbQuF8+kBCMxJ0/ATSaIlmLaRakNab+mSTRo6U78zFG8FiQaNjosPnZMtBCZZ/NhkGh8MUod6uTTN2TkeC57w67NcFlZ910z5bdHfZRE5QnY7LH7qkxL++R4MX3kePj2GijXbzYMKlPrPjwAF7Lwp0mDxjU82rB9A4WvpidXmn4Qa5FKdS/WA7VRhQfaX8NWkwdcxhCscWkIhhyJhH1J8XstbGh2N2cDsw+2w6m6v69zdxebAfN/a7VlTKhv3yrGh3uqLbWwnSp0d38uiJ8GCksU5yWF7O8rXbykqtNbA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=XsApnsVKINkemdSKp9jB1ZN4H8eUyCWhvteX1dgRjFd1F9i3pYb4p4jR0r7NWQUrwVaqpSLYsuCcT/aSWxBtpofEN18c4OLt0jhGvIw4vMM3+mANpgRUHSN8Dpb2ATTmtLXxv+Cy5WXNjNLvpDkEYiDXfXg1MNU2DKeShKVqXHY/UH1sryrT7zeBReaoA+9EkSIOXVW/bjpexH3XyrykgJnvObx+sI2j+aj6knV15I8koqFM+VrFKqdIV3zwW2MSOfDY8X2ZRzvoRUGGs8QtQPCW7aojnriu1JNJ66mdZfoyHmzdY5aday/LBSrKngqlgv+LEPc8oRmnv3IyHbi4Kg==
  • Cc: Jan Beulich <jbeulich@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, "Juergen Gross" <jgross@xxxxxxxx>, "Daniel P . Smith" <dpsmith@xxxxxxxxxxxxxxxxxxxx>, Stewart Hildebrand <Stewart.Hildebrand@xxxxxxx>, Huang Rui <Ray.Huang@xxxxxxx>, Jiqian Chen <Jiqian.Chen@xxxxxxx>
  • Delivery-date: Fri, 19 Apr 2024 03:54:34 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Hi All,
This is v7 series to support passthrough when dom0 is PVH
v6->v7 changes:
* patch#4: Due to changes in the implementation of obtaining gsi in the kernel. 
Change to add a new function to get gsi from irq, instead of gsi sysfs.
* patch#5: Fix the issue with variable usage, rc->r.

Best regards,
Jiqian Chen


v5->v6 changes:
* patch#1: Add Reviewed-by Stefano and Stewart. Rebase code and change old 
function vpci_remove_device, vpci_add_handlers to vpci_deassign_device, 
vpci_assign_device
* patch#2: Add Reviewed-by Stefano
* patch#3: Remove unnecessary "ASSERT(!has_pirq(currd));"
* patch#4: Fix some coding style issues below directory tools
* patch#5: Modified some variable names and code logic to make code easier to 
be understood, which to use gsi by default and be compatible with older kernel 
versions to continue to use irq


v4->v5 changes:
* patch#1: add pci_lock wrap function vpci_reset_device_state
* patch#2: move the check of self map_pirq to physdev.c, and change to check if 
the caller has PIRQ flag, and just break for PHYSDEVOP_(un)map_pirq in 
hvm_physdev_op
* patch#3: return -EOPNOTSUPP instead, and use ASSERT(!has_pirq(currd));
* patch#4: is the patch#5 in v4 because patch#5 in v5 has some dependency on 
it. And add the handling of errno and add the Reviewed-by Stefano
* patch#5: is the patch#4 in v4. New implementation to add new hypercall 
XEN_DOMCTL_gsi_permission to grant gsi


v3->v4 changes:
* patch#1: change the comment of PHYSDEVOP_pci_device_state_reset; move 
printings behind pcidevs_unlock
* patch#2: add check to prevent PVH self map
* patch#3: new patch, The implementation of adding PHYSDEVOP_setup_gsi for PVH 
is treated as a separate patch
* patch#4: new patch to solve the map_pirq problem of PVH dom0. use gsi to 
grant irq permission in XEN_DOMCTL_irq_permission.
* patch#5: to be compatible with previous kernel versions, when there is no gsi 
sysfs, still use irq
v4 link:
https://lore.kernel.org/xen-devel/20240105070920.350113-1-Jiqian.Chen@xxxxxxx/T/#t

v2->v3 changes:
* patch#1: move the content out of pci_reset_device_state and delete 
pci_reset_device_state; add xsm_resource_setup_pci check for 
PHYSDEVOP_pci_device_state_reset; add description for 
PHYSDEVOP_pci_device_state_reset;
* patch#2: du to changes in the implementation of the second patch on kernel 
side(that it will do setup_gsi and map_pirq when assigning a device to 
passthrough), add PHYSDEVOP_setup_gsi for PVH dom0, and we need to support self 
mapping.
* patch#3: du to changes in the implementation of the second patch on kernel 
side(that adds a new sysfs for gsi instead of a new syscall), so read gsi 
number from the sysfs of gsi.
v3 link:
https://lore.kernel.org/xen-devel/20231210164009.1551147-1-Jiqian.Chen@xxxxxxx/T/#t

v2 link:
https://lore.kernel.org/xen-devel/20231124104136.3263722-1-Jiqian.Chen@xxxxxxx/T/#t
Below is the description of v2 cover letter:
This series of patches are the v2 of the implementation of passthrough when 
dom0 is PVH on Xen.
We sent the v1 to upstream before, but the v1 had so many problems and we got 
lots of suggestions.
I will introduce all issues that these patches try to fix and the differences 
between v1 and v2.

Issues we encountered:
1. pci_stub failed to write bar for a passthrough device.
Problem: when we run \u201csudo xl pci-assignable-add <sbdf>\u201d to assign a 
device, pci_stub will call \u201cpcistub_init_device() -> pci_restore_state() 
-> pci_restore_config_space() ->
pci_restore_config_space_range() -> pci_restore_config_dword() -> 
pci_write_config_dword()\u201d, the pci config write will trigger an io 
interrupt to bar_write() in the xen, but the
bar->enabled was set before, the write is not allowed now, and then when 
bar->Qemu config the
passthrough device in xen_pt_realize(), it gets invalid bar values.

Reason: the reason is that we don't tell vPCI that the device has been reset, 
so the current cached state in pdev->vpci is all out of date and is different 
from the real device state.

Solution: to solve this problem, the first patch of kernel(xen/pci: Add 
xen_reset_device_state
function) and the fist patch of xen(xen/vpci: Clear all vpci status of device) 
add a new hypercall to reset the state stored in vPCI when the state of real 
device has changed.
Thank Roger for the suggestion of this v2, and it is different from v1 
(https://lore.kernel.org/xen-devel/20230312075455.450187-3-ray.huang@xxxxxxx/), 
v1 simply allow domU to write pci bar, it does not comply with the design 
principles of vPCI.

2. failed to do PHYSDEVOP_map_pirq when dom0 is PVH
Problem: HVM domU will do PHYSDEVOP_map_pirq for a passthrough device by using 
gsi. See xen_pt_realize->xc_physdev_map_pirq and 
pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq will call into 
Xen, but in hvm_physdev_op(), PHYSDEVOP_map_pirq is not allowed.

Reason: In hvm_physdev_op(), the variable "currd" is PVH dom0 and PVH has no 
X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.

Solution: I think we may need to allow PHYSDEVOP_map_pirq when "currd" is dom0 
(at present dom0 is PVH). The second patch of xen(x86/pvh: Open 
PHYSDEVOP_map_pirq for PVH dom0) allow PVH dom0 do PHYSDEVOP_map_pirq. This v2 
patch is better than v1, v1 simply remove the has_pirq check(xen 
https://lore.kernel.org/xen-devel/20230312075455.450187-4-ray.huang@xxxxxxx/).

3. the gsi of a passthrough device doesn't be unmasked
 3.1 failed to check the permission of pirq
 3.2 the gsi of passthrough device was not registered in PVH dom0

Problem:
3.1 callback function pci_add_dm_done() will be called when qemu config a 
passthrough device for domU.
This function will call xc_domain_irq_permission()-> pirq_access_permitted() to 
check if the gsi has corresponding mappings in dom0. But it didn\u2019t, so 
failed. See XEN_DOMCTL_irq_permission->pirq_access_permitted, "current" is PVH 
dom0 and it return irq is 0.
3.2 it's possible for a gsi (iow: vIO-APIC pin) to never get registered on PVH 
dom0, because the devices of PVH are using MSI(-X) interrupts. However, the 
IO-APIC pin must be configured for it to be able to be mapped into a domU.

Reason: After searching codes, I find "map_pirq" and "register_gsi" will be 
done in function vioapic_write_redirent->vioapic_hwdom_map_gsi when the gsi(aka 
ioapic's pin) is unmasked in PVH dom0.
So the two problems can be concluded to that the gsi of a passthrough device 
doesn't be unmasked.

Solution: to solve these problems, the second patch of kernel(xen/pvh: Unmask 
irq for passthrough device in PVH dom0) call the unmask_irq() when we assign a 
device to be passthrough. So that passthrough devices can have the mapping of 
gsi on PVH dom0 and gsi can be registered. This v2 patch is different from the 
v1( kernel 
https://lore.kernel.org/xen-devel/20230312120157.452859-5-ray.huang@xxxxxxx/,
kernel 
https://lore.kernel.org/xen-devel/20230312120157.452859-5-ray.huang@xxxxxxx/ 
and xen 
https://lore.kernel.org/xen-devel/20230312075455.450187-5-ray.huang@xxxxxxx/),
v1 performed "map_pirq" and "register_gsi" on all pci devices on PVH dom0, 
which is unnecessary and may cause multiple registration.

4. failed to map pirq for gsi
Problem: qemu will call xc_physdev_map_pirq() to map a passthrough 
device\u2019s gsi to pirq in function xen_pt_realize(). But failed.

Reason: According to the implement of xc_physdev_map_pirq(), it needs gsi 
instead of irq, but qemu pass irq to it and treat irq as gsi, it is got from 
file /sys/bus/pci/devices/xxxx:xx:xx.x/irq in function 
xen_host_pci_device_get(). But actually the gsi number is not equal with irq. 
On PVH dom0, when it allocates irq for a gsi in function 
acpi_register_gsi_ioapic(), allocation is dynamic, and follow the principle of 
applying first, distributing first. And if you debug the kernel codes(see 
function __irq_alloc_descs), you will find the irq number is allocated from 
small to large by order, but the applying gsi number is not, gsi 38 may come 
before gsi 28, that causes gsi 38 get a smaller irq number than gsi 28, and 
then gsi != irq.

Solution: we can record the relation between gsi and irq, then when 
userspace(qemu) want to use gsi, we can do a translation. The third patch of 
kernel(xen/privcmd: Add new syscall to get gsi from irq) records all the 
relations in acpi_register_gsi_xen_pvh() when dom0 initialize pci devices, and 
provide a syscall for userspace to get the gsi from irq. The third patch of 
xen(tools: Add new function to get gsi from irq) add a new function 
xc_physdev_gsi_from_irq() to call the new syscall added on kernel side.
And then userspace can use that function to get gsi. Then xc_physdev_map_pirq() 
will success. This v2 patch is the same as v1( kernel 
https://lore.kernel.org/xen-devel/20230312120157.452859-6-ray.huang@xxxxxxx/ 
and xen 
https://lore.kernel.org/xen-devel/20230312075455.450187-6-ray.huang@xxxxxxx/)

About the v2 patch of qemu, just change an included head file, other are 
similar to the v1 ( qemu 
https://lore.kernel.org/xen-devel/20230312092244.451465-19-ray.huang@xxxxxxx/), 
just call
xc_physdev_gsi_from_irq() to get gsi from irq.

Jiqian Chen (5):
  xen/vpci: Clear all vpci status of device
  x86/pvh: Allow (un)map_pirq when dom0 is PVH
  x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0
  tools: Add new function to get gsi from irq
  domctl: Add XEN_DOMCTL_gsi_permission to grant gsi

 tools/include/xencall.h        |  2 ++
 tools/include/xenctrl.h        |  7 +++++
 tools/libs/call/core.c         |  5 +++
 tools/libs/call/libxencall.map |  2 ++
 tools/libs/call/linux.c        | 15 +++++++++
 tools/libs/call/private.h      |  9 ++++++
 tools/libs/ctrl/xc_domain.c    | 15 +++++++++
 tools/libs/ctrl/xc_physdev.c   |  4 +++
 tools/libs/light/libxl_pci.c   | 57 ++++++++++++++++++++++++++++------
 xen/arch/x86/domctl.c          | 31 ++++++++++++++++++
 xen/arch/x86/hvm/hypercall.c   |  8 +++++
 xen/arch/x86/physdev.c         | 24 ++++++++++++++
 xen/drivers/pci/physdev.c      | 36 +++++++++++++++++++++
 xen/drivers/vpci/vpci.c        | 10 ++++++
 xen/include/public/domctl.h    |  9 ++++++
 xen/include/public/physdev.h   |  7 +++++
 xen/include/xen/vpci.h         |  6 ++++
 xen/xsm/flask/hooks.c          |  1 +
 18 files changed, 238 insertions(+), 10 deletions(-)

-- 
2.34.1




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.