[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[RFC XEN PATCH v2 0/3] Support device passthrough when dom0 is PVH on Xen


  • To: Jan Beulich <jbeulich@xxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Juergen Gross <jgross@xxxxxxxx>, "Stefano Stabellini" <sstabellini@xxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Jiqian Chen <Jiqian.Chen@xxxxxxx>
  • Date: Fri, 24 Nov 2023 18:41:33 +0800
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=suse.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0)
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=sAdLDrxElG/nCt7sYNFn/UabtUtI1BcN1/frFv9Hsk4=; b=JK/401LNGihRhukidi6l5kb7X5zcOCHYg/j2bB3F/Bx+UoAtSYxvv4pfpImmeIp6BVbTLKA3gk1AOGOWCDTvomVG1isTc+wVJi4OwSXUkGAbdUeRbY03Q9dfgCw6ZCiL0eHS/1HLCDV2oEtG12f1Myyp9lV2DjSdaz+U72zn7zg/60fmHymKZai3uT0ytqcPlkesdGDVdTFgN6+yz3kjshOY7LDAOfIO/WvvX15l+/tOe8Luz5Ju7WCYXdZv6LvQ0N8HuRFcECUzQtTR943Z5G8BsdE22nOpvSE9Tw8m6Dh5qYumO6NwHifd5nJfjwlVpAtXAU+4Qh/optpd23lFUA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ZoaGuMlU/D+vp+DBP6XM6MQ1NIrx31y5zSrPf3jnN478S8xM1/Fut253hYyqr7EOeea2ZJ4MyVP4TBMiFwHpAlpUyE/yoY1/I8sXr7SuVyCEiUUVNwD8DHNq2S8W9u4Qafqpp9yL7XKrPpa2q3fv/AT054zjcSLCuDfws2pQ0VpdBMppCAz4gb3qO/q8DZGFtDfjQ2DBhfyNJK2jlKbzPlwpw2SbK3oArutSK5Zr6GsEX8mSykZ7NAgDFN1O3Gp6pS8xjTDs3EKNVIuF0lTz95r87IC8nkGqopEuI7RXwf35cA43G+rOQ2j00be7yfJwwObeAR6A23BEd6qJfz6JqA==
  • Cc: Stewart Hildebrand <Stewart.Hildebrand@xxxxxxx>, Alex Deucher <Alexander.Deucher@xxxxxxx>, Xenia Ragiadakou <xenia.ragiadakou@xxxxxxx>, Stefano Stabellini <stefano.stabellini@xxxxxxx>, Huang Rui <Ray.Huang@xxxxxxx>, Honglei Huang <Honglei1.Huang@xxxxxxx>, Julia Zhang <Julia.Zhang@xxxxxxx>, Jiqian Chen <Jiqian.Chen@xxxxxxx>
  • Delivery-date: Fri, 24 Nov 2023 10:42:10 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Hi All,

This series of patches are the v2 of the implementation of passthrough when 
dom0 is PVH on Xen.
We sent the v1 to upstream before, but the v1 had so many problems and we got 
lots of suggestions.
I will introduce all issues that these patches try to fix and the differences 
between v1 and v2.

v2 on kernel side:
https://lore.kernel.org/lkml/20231124103123.3263471-1-Jiqian.Chen@xxxxxxx/T/#t

Issues we encountered:
1. pci_stub failed to write bar for a passthrough device.
Problem: when we run “sudo xl pci-assignable-add <sbdf>” to assign a device, 
pci_stub will
call “pcistub_init_device() -> pci_restore_state() -> 
pci_restore_config_space() ->
pci_restore_config_space_range() -> pci_restore_config_dword() -> 
pci_write_config_dword()”,
the pci config write will trigger an io interrupt to bar_write() in the xen, 
but the
bar->enabled was set before, the write is not allowed now, and then when Qemu 
config the
passthrough device in xen_pt_realize(), it gets invalid bar values.

Reason: the reason is that we don't tell vPCI that the device has been reset, 
so the current
cached state in pdev->vpci is all out of date and is different from the real 
device state.

Solution: to solve this problem, the first patch of kernel(xen/pci: Add 
xen_reset_device_state
function) and the fist patch of xen(xen/vpci: Clear all vpci status of device) 
add a new
hypercall to reset the state stored in vPCI when the state of real device has 
changed.
Thank Roger for the suggestion of this v2, and it is different from v1
(https://lore.kernel.org/xen-devel/20230312075455.450187-3-ray.huang@xxxxxxx/), 
v1 simply allow
domU to write pci bar, it does not comply with the design principles of vPCI.

2. failed to do PHYSDEVOP_map_pirq when dom0 is PVH
Problem: HVM domU will do PHYSDEVOP_map_pirq for a passthrough device by using 
gsi. See
xen_pt_realize->xc_physdev_map_pirq and pci_add_dm_done->xc_physdev_map_pirq. 
Then
xc_physdev_map_pirq will call into Xen, but in hvm_physdev_op(), 
PHYSDEVOP_map_pirq is not allowed.

Reason: In hvm_physdev_op(), the variable "currd" is PVH dom0 and PVH has no 
X86_EMU_USE_PIRQ flag,
it will fail at has_pirq check.

Solution: I think we may need to allow PHYSDEVOP_map_pirq when "currd" is dom0 
(at present dom0 is
PVH). The second patch of xen(x86/pvh: Open PHYSDEVOP_map_pirq for PVH dom0) 
allow PVH dom0 do
PHYSDEVOP_map_pirq. This v2 patch is better than v1, v1 simply remove the 
has_pirq check(xen
https://lore.kernel.org/xen-devel/20230312075455.450187-4-ray.huang@xxxxxxx/).

3. the gsi of a passthrough device doesn't be unmasked
 3.1 failed to check the permission of pirq
 3.2 the gsi of passthrough device was not registered in PVH dom0

Problem:
3.1 callback function pci_add_dm_done() will be called when qemu config a 
passthrough device for domU.
This function will call xc_domain_irq_permission()-> pirq_access_permitted() to 
check if the gsi has
corresponding mappings in dom0. But it didn’t, so failed. See
XEN_DOMCTL_irq_permission->pirq_access_permitted, "current" is PVH dom0 and it 
return irq is 0.
3.2 it's possible for a gsi (iow: vIO-APIC pin) to never get registered on PVH 
dom0, because the
devices of PVH are using MSI(-X) interrupts. However, the IO-APIC pin must be 
configured for it to be
able to be mapped into a domU.

Reason: After searching codes, I find "map_pirq" and "register_gsi" will be 
done in function
vioapic_write_redirent->vioapic_hwdom_map_gsi when the gsi(aka ioapic's pin) is 
unmasked in PVH dom0.
So the two problems can be concluded to that the gsi of a passthrough device 
doesn't be unmasked.

Solution: to solve these problems, the second patch of kernel(xen/pvh: Unmask 
irq for passthrough
device in PVH dom0) call the unmask_irq() when we assign a device to be 
passthrough. So that
passthrough devices can have the mapping of gsi on PVH dom0 and gsi can be 
registered. This v2 patch
is different from the v1(
kernel 
https://lore.kernel.org/xen-devel/20230312120157.452859-5-ray.huang@xxxxxxx/,
kernel 
https://lore.kernel.org/xen-devel/20230312120157.452859-5-ray.huang@xxxxxxx/ and
xen 
https://lore.kernel.org/xen-devel/20230312075455.450187-5-ray.huang@xxxxxxx/),
v1 performed "map_pirq" and "register_gsi" on all pci devices on PVH dom0, 
which is unnecessary and
may cause multiple registration.

4. failed to map pirq for gsi
Problem: qemu will call xc_physdev_map_pirq() to map a passthrough device’s gsi 
to pirq in function
xen_pt_realize(). But failed.

Reason: According to the implement of xc_physdev_map_pirq(), it needs gsi 
instead of irq, but qemu
pass irq to it and treat irq as gsi, it is got from file 
/sys/bus/pci/devices/xxxx:xx:xx.x/irq in
function xen_host_pci_device_get(). But actually the gsi number is not equal 
with irq. On PVH dom0,
when it allocates irq for a gsi in function acpi_register_gsi_ioapic(), 
allocation is dynamic, and
follow the principle of applying first, distributing first. And if you debug 
the kernel codes(see
function __irq_alloc_descs), you will find the irq number is allocated from 
small to large by order,
but the applying gsi number is not, gsi 38 may come before gsi 28, that causes 
gsi 38 get a smaller
irq number than gsi 28, and then gsi != irq.

Solution: we can record the relation between gsi and irq, then when 
userspace(qemu) want to use gsi,
we can do a translation. The third patch of kernel(xen/privcmd: Add new syscall 
to get gsi from irq)
records all the relations in acpi_register_gsi_xen_pvh() when dom0 initialize 
pci devices, and provide
a syscall for userspace to get the gsi from irq. The third patch of xen(tools: 
Add new function to get
gsi from irq) add a new function xc_physdev_gsi_from_irq() to call the new 
syscall added on kernel side.
And then userspace can use that function to get gsi. Then xc_physdev_map_pirq() 
will success. This v2
patch is the same as v1(
kernel 
https://lore.kernel.org/xen-devel/20230312120157.452859-6-ray.huang@xxxxxxx/ and
xen 
https://lore.kernel.org/xen-devel/20230312075455.450187-6-ray.huang@xxxxxxx/)

About the v2 patch of qemu, just change an included head file, other are 
similar to the v1 (
qemu 
https://lore.kernel.org/xen-devel/20230312092244.451465-19-ray.huang@xxxxxxx/), 
just call
xc_physdev_gsi_from_irq() to get gsi from irq.


Jiqian Chen (3):
  xen/vpci: Clear all vpci status of device
  x86/pvh: Open PHYSDEVOP_map_pirq for PVH dom0
  tools: Add new function to get gsi from irq

 tools/include/xen-sys/Linux/privcmd.h |  7 +++++++
 tools/include/xencall.h               |  2 ++
 tools/include/xenctrl.h               |  2 ++
 tools/libs/call/core.c                |  5 +++++
 tools/libs/call/libxencall.map        |  2 ++
 tools/libs/call/linux.c               | 14 ++++++++++++++
 tools/libs/call/private.h             |  9 +++++++++
 tools/libs/ctrl/xc_physdev.c          |  4 ++++
 tools/libs/light/libxl_pci.c          |  1 +
 xen/arch/x86/hvm/hypercall.c          |  3 +++
 xen/drivers/passthrough/pci.c         | 21 +++++++++++++++++++++
 xen/drivers/pci/physdev.c             | 14 ++++++++++++++
 xen/drivers/vpci/vpci.c               |  9 +++++++++
 xen/include/public/physdev.h          |  2 ++
 xen/include/xen/pci.h                 |  1 +
 xen/include/xen/vpci.h                |  6 ++++++
 16 files changed, 102 insertions(+)

-- 
2.34.1




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.