[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [v5][PATCH 10/16] tools: introduce some new parameters to set rdm policy

This patch introduces user configurable parameters to specify RDM
resource and according policies,

Global RDM parameter:
    rdm = "type=none/host,reserve=strict/relaxed"
Per-device RDM parameter:
    pci = [ 'sbdf, rdm_reserve=strict/relaxed' ]

Global RDM parameter, "type", allows user to specify reserved regions
explicitly, e.g. using 'host' to include all reserved regions reported
on this platform which is good to handle hotplug scenario. In the future
this parameter may be further extended to allow specifying random regions,
e.g. even those belonging to another platform as a preparation for live
migration with passthrough devices. Instead, 'none' means we have nothing
to do all reserved regions and ignore all policies, so guest work as before.

'strict/relaxed' policy decides how to handle conflict when reserving RDM
regions in pfn space. If conflict exists, 'strict' means an immediate error
so VM can't keep running, while 'relaxed' allows moving forward with a
warning message thrown out.

Default per-device RDM policy is 'strict', while default global RDM policy
is 'relaxed'. And the per-device policy would override the global policy like

CC: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>
CC: Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>
CC: Ian Campbell <ian.campbell@xxxxxxxxxx>
CC: Wei Liu <wei.liu2@xxxxxxxxxx>
Signed-off-by: Tiejun Chen <tiejun.chen@xxxxxxxxx>

* Just make sure the per-device plicy always override the global policy,
  and so cleanup some associated comments and the patch head description.
* A little change to follow one bit, XEN_DOMCTL_DEV_RDM_RELAXED.
* Improve all descriptions in doc.
* Make all rdm variables specific to .hvm


* No need to define init_val for libxl_rdm_reserve_type since its just zero
* Grab those changes to xl/libxlu to as a final patch

 docs/man/xl.cfg.pod.5        | 84 ++++++++++++++++++++++++++++++++++++++++++++
 docs/misc/vtd.txt            | 24 +++++++++++++
 tools/libxl/libxl_create.c   |  7 ++++
 tools/libxl/libxl_internal.h |  2 ++
 tools/libxl/libxl_pci.c      |  9 +++++
 tools/libxl/libxl_types.idl  | 18 ++++++++++
 6 files changed, 144 insertions(+)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index a3e0e2e..a2dc343 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -655,6 +655,83 @@ assigned slave device.
+(HVM/x86 only) Specifies information about Reserved Device Memory (RDM),
+which is necessary to enable robust device passthrough. One example of RDM
+is reported through ACPI Reserved Memory Region Reporting (RMRR) structure
+on x86 platform.
+B<RDM_RESERVE_STRING> has the form C<[KEY=VALUE,KEY=VALUE,...> where:
+=over 4
+=item B<KEY=VALUE>
+Possible B<KEY>s are:
+=over 4
+=item B<type="STRING">
+Currently there are only two valid types:
+"host" means all reserved device memory on this platform should be checked to
+reserve regions in this VM's guest address space. This global RDM parameter
+allows user to specify reserved regions explicitly, and using "host" includes
+all reserved regions reported on this platform, which is useful when doing
+"none" is the default value and it means we don't check any reserved regions
+and then all rdm policies would be ignored. Guest just works as before and
+the conflict of RDM and guest address space wouldn't be handled, and then
+this may result in the associated device not being able to work or even crash
+the VM. So if you're assigning this kind of device, this option is not
+recommended unless you can make sure any conflict doesn't exist.
+For example, you're trying to set "memory = 2800" to allocate memory to one
+given VM but the platform owns two RDM regions like,
+RMRR region: base_addr ac6d3000 end_address ac6e6fff
+RMRR region: base_addr ad800000 end_address afffffff
+In this conflict case,
+#1. If the type options is set with "none",
+rdm = "type=none,reserve=strict" or rdm = "type=none,reserve=relaxed"
+mean we don't handle any conflict just to make VM keep running as before.
+Note this is our default behavior.
+#2. If the type options is set with "host",
+rdm = "type=host,reserve=strict" or rdm = "type=host,reserve=relaxed"
+mean all conflict would be handled according to our policies which is
+introduced by the reserve option as described below.
+=item B<reserve="STRING">
+Specifies how to deal with conflicts discovered when reserving reserved device
+memory in the guest address space.
+When that conflict is unsolved,
+"strict" means this VM can't be created successfully, or the associated device
+can't be attached in the case of hotplug;
+"relaxed" allows a VM to be created to keep running with a warning message
+thrown out. But this may crash this VM if this device accesses RDM. For 
+Windows IGD GFX driver always access these regions so this lead to a blue 
+to crash VM in such a case.
+Note this may be overridden by rdm_reserve option in PCI device configuration.
 =item B<pci=[ "PCI_SPEC_STRING", "PCI_SPEC_STRING", ... ]>
 Specifies the host PCI devices to passthrough to this guest. Each 
@@ -717,6 +794,13 @@ dom0 without confirmation.  Please use with care.
 D0-D3hot power management states for the PCI device. False (0) by
+=item B<rdm_reserve="STRING">
+(HVM/x86 only) This is same as reserve option inside the rdm option
+but just specific to a given device, and "strict" is default here.
+Note this would override global B<rdm> option.
diff --git a/docs/misc/vtd.txt b/docs/misc/vtd.txt
index 9af0e99..7d63c47 100644
--- a/docs/misc/vtd.txt
+++ b/docs/misc/vtd.txt
@@ -111,6 +111,30 @@ in the config file:
 To override for a specific device:
        pci = [ '01:00.0,msitranslate=0', '03:00.0' ]
+RDM, 'reserved device memory', for PCI Device Passthrough
+There are some devices the BIOS controls, for e.g. USB devices to perform
+PS2 emulation. The regions of memory used for these devices are marked
+reserved in the e820 map. When we turn on DMA translation, DMA to those
+regions will fail. Hence BIOS uses RMRR to specify these regions along with
+devices that need to access these regions. OS is expected to setup
+identity mappings for these regions for these devices to access these regions.
+While creating a VM we should reserve them in advance, and avoid any conflicts.
+So we introduce user configurable parameters to specify RDM resource and
+according policies,
+To enable this globally, add "rdm" in the config file:
+    rdm = "type=host, reserve=relaxed"   (default policy is "relaxed")
+Or just for a specific device:
+    pci = [ '01:00.0,rdm_reserve=relaxed', '03:00.0,rdm_reserve=strict' ]
+For all the options available to RDM, see xl.cfg(5).
 Caveat on Conventional PCI Device Passthrough
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index f366a09..b884fa1 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -105,6 +105,12 @@ static int sched_params_valid(libxl__gc *gc,
     return 1;
+void libxl__rdm_setdefault(libxl__gc *gc, libxl_domain_build_info *b_info)
+    if (b_info->u.hvm.rdm.reserve == LIBXL_RDM_RESERVE_FLAG_INVALID)
+        b_info->u.hvm.rdm.reserve = LIBXL_RDM_RESERVE_FLAG_RELAXED;
 int libxl__domain_build_info_setdefault(libxl__gc *gc,
                                         libxl_domain_build_info *b_info)
@@ -384,6 +390,7 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
         libxl_defbool_setdefault(&b_info->u.hvm.gfx_passthru, false);
+        libxl__rdm_setdefault(gc, b_info);
         libxl_defbool_setdefault(&b_info->u.pv.e820_host, false);
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index d52589e..d397143 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1154,6 +1154,8 @@ _hidden int libxl__device_vtpm_setdefault(libxl__gc *gc, 
libxl_device_vtpm *vtpm
 _hidden int libxl__device_vfb_setdefault(libxl__gc *gc, libxl_device_vfb *vfb);
 _hidden int libxl__device_vkb_setdefault(libxl__gc *gc, libxl_device_vkb *vkb);
 _hidden int libxl__device_pci_setdefault(libxl__gc *gc, libxl_device_pci *pci);
+_hidden void libxl__rdm_setdefault(libxl__gc *gc,
+                                   libxl_domain_build_info *b_info);
 _hidden const char *libxl__device_nic_devname(libxl__gc *gc,
                                               uint32_t domid,
diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index 632c15e..f0a32fa 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -988,6 +988,12 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, 
libxl_device_pci *pcidev, i
     if (!libxl_is_stubdom(ctx, domid, NULL)) {
+        if (pcidev->rdm_reserve == LIBXL_RDM_RESERVE_FLAG_STRICT) {
+            flag &= ~XEN_DOMCTL_DEV_RDM_RELAXED;
+        } else if (pcidev->rdm_reserve != LIBXL_RDM_RESERVE_FLAG_RELAXED) {
+            LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "unknown rdm check flag.");
+            return ERROR_FAIL;
+        }
         rc = xc_assign_device(ctx->xch, domid, pcidev_encode_bdf(pcidev), 
         if (rc < 0 && (hvm || errno != ENOSYS)) {
             LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "xc_assign_device failed");
@@ -1040,6 +1046,9 @@ static int libxl__device_pci_reset(libxl__gc *gc, 
unsigned int domain, unsigned
 int libxl__device_pci_setdefault(libxl__gc *gc, libxl_device_pci *pci)
+    /* We'd like to force reserve rdm specific to a device by default.*/
+    if ( pci->rdm_reserve == LIBXL_RDM_RESERVE_FLAG_INVALID)
+        pci->rdm_reserve = LIBXL_RDM_RESERVE_FLAG_STRICT;
     return 0;
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index e1632fa..0602fff 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -76,6 +76,17 @@ libxl_domain_type = Enumeration("domain_type", [
     (2, "PV"),
     ], init_val = "LIBXL_DOMAIN_TYPE_INVALID")
+libxl_rdm_reserve_type = Enumeration("rdm_reserve_type", [
+    (0, "none"),
+    (1, "host"),
+    ])
+libxl_rdm_reserve_flag = Enumeration("rdm_reserve_flag", [
+    (-1, "invalid"),
+    (0, "strict"),
+    (1, "relaxed"),
+    ], init_val = "LIBXL_RDM_RESERVE_FLAG_INVALID")
 libxl_channel_connection = Enumeration("channel_connection", [
     (0, "UNKNOWN"),
     (1, "PTY"),
@@ -369,6 +380,11 @@ libxl_vnode_info = Struct("vnode_info", [
     ("vcpus", libxl_bitmap), # vcpus in this node
+libxl_rdm_reserve = Struct("rdm_reserve", [
+    ("type",    libxl_rdm_reserve_type),
+    ("reserve",   libxl_rdm_reserve_flag),
+    ])
 libxl_domain_build_info = Struct("domain_build_info",[
     ("max_vcpus",       integer),
     ("avail_vcpus",     libxl_bitmap),
@@ -467,6 +483,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
                                        # See libxl_ms_vm_genid_generate()
                                        ("ms_vm_genid",      libxl_ms_vm_genid),
                                        ("serial_list",      libxl_string_list),
+                                       ("rdm", libxl_rdm_reserve),
                  ("pv", Struct(None, [("kernel", string),
                                       ("slack_memkb", MemKB),
@@ -542,6 +559,7 @@ libxl_device_pci = Struct("device_pci", [
     ("power_mgmt", bool),
     ("permissive", bool),
     ("seize", bool),
+    ("rdm_reserve",   libxl_rdm_reserve_flag),
 libxl_device_dtdev = Struct("device_dtdev", [

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.