[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [v6][PATCH 10/16] tools: introduce some new parameters to set rdm policy

This patch introduces user configurable parameters to specify RDM
resource and according policies,

Global RDM parameter:
    rdm = "strategy=host,reserve=strict/relaxed"
Per-device RDM parameter:
    pci = [ 'sbdf, rdm_reserve=strict/relaxed' ]

Global RDM parameter, "strategy", allows user to specify reserved regions
explicitly, Currently, using 'host' to include all reserved regions reported
on this platform which is good to handle hotplug scenario. In the future
this parameter may be further extended to allow specifying random regions,
e.g. even those belonging to another platform as a preparation for live
migration with passthrough devices. By default this isn't set so we don't
check all rdms. Instead, we just check rdm specific to a given device if
you're assigning this kind of device. Note this option is not recommended
unless you can make sure any conflict does exist.

'strict/relaxed' policy decides how to handle conflict when reserving RDM
regions in pfn space. If conflict exists, 'strict' means an immediate error
so VM can't keep running, while 'relaxed' allows moving forward with a
warning message thrown out.

Default per-device RDM policy is same as default global RDM policy as being
'relaxed'. And the per-device policy would override the global policy like

CC: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>
CC: Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>
CC: Ian Campbell <ian.campbell@xxxxxxxxxx>
CC: Wei Liu <wei.liu2@xxxxxxxxxx>
Signed-off-by: Tiejun Chen <tiejun.chen@xxxxxxxxx>

* Some rename to make our policy reasonable
  "type" -> "strategy"
  "none" -> "ignore"
* Don't expose "ignore" in xl level and just keep that as a default.
  And then sync docs and the patch head description


* Just make sure the per-device plicy always override the global policy,
  and so cleanup some associated comments and the patch head description.
* A little change to follow one bit, XEN_DOMCTL_DEV_RDM_RELAXED.
* Improve all descriptions in doc.
* Make all rdm variables specific to .hvm


* No need to define init_val for libxl_rdm_reserve_type since its just zero
* Grab those changes to xl/libxlu to as a final patch

 docs/man/xl.cfg.pod.5        | 81 ++++++++++++++++++++++++++++++++++++++++++++
 docs/misc/vtd.txt            | 24 +++++++++++++
 tools/libxl/libxl_create.c   |  7 ++++
 tools/libxl/libxl_internal.h |  2 ++
 tools/libxl/libxl_pci.c      |  9 +++++
 tools/libxl/libxl_types.idl  | 18 ++++++++++
 6 files changed, 141 insertions(+)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index a3e0e2e..091e80d 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -655,6 +655,79 @@ assigned slave device.
+(HVM/x86 only) Specifies information about Reserved Device Memory (RDM),
+which is necessary to enable robust device passthrough. One example of RDM
+is reported through ACPI Reserved Memory Region Reporting (RMRR) structure
+on x86 platform.
+B<RDM_RESERVE_STRING> has the form C<[KEY=VALUE,KEY=VALUE,...> where:
+=over 4
+=item B<KEY=VALUE>
+Possible B<KEY>s are:
+=over 4
+=item B<strategy="STRING">
+Currently there is only one valid type:
+"host" means all reserved device memory on this platform should be checked to
+reserve regions in this VM's guest address space. This global rdm parameter
+allows user to specify reserved regions explicitly, and using "host" includes
+all reserved regions reported on this platform, which is useful when doing
+By default this isn't set so we don't check all rdms. Instead, we just check
+rdm specific to a given device if you're assigning this kind of device. Note
+this option is not recommended unless you can make sure any conflict does 
+For example, you're trying to set "memory = 2800" to allocate memory to one
+given VM but the platform owns two RDM regions like,
+Device A [sbdf_A]: RMRR region_A: base_addr ac6d3000 end_address ac6e6fff
+Device B [sbdf_B]: RMRR region_B: base_addr ad800000 end_address afffffff
+In this conflict case,
+#1. If B<strategy> is set to "host", for example,
+rdm = "strategy=host,reserve=strict" or rdm = "strategy=host,reserve=relaxed"
+It means all conflicts will be handled according to the policy
+introduced by B<reserve> as described below.
+#2. If B<strategy> is not set at all, but
+pci = [ 'sbdf_A, rdm_reserve=xxxxx' ]
+It means only one conflict of region_A will be handled according to the policy
+introduced by B<rdm_reserve="STRING"> as described inside pci options.
+=item B<reserve="STRING">
+Specifies how to deal with conflicts when reserving reserved device
+memory in guest address space.
+When that conflict is unsolved,
+"strict" means VM can't be created, or the associated device can't be
+attached in the case of hotplug.
+"relaxed" allows VM to be created but may cause VM to crash if
+pass-through device accesses RDM. For exampl,e Windows IGD GFX driver
+always accessed RDM regions so it leads to VM crash.
+Note this may be overridden by rdm_reserve option in PCI device configuration.
 =item B<pci=[ "PCI_SPEC_STRING", "PCI_SPEC_STRING", ... ]>
 Specifies the host PCI devices to passthrough to this guest. Each 
@@ -717,6 +790,14 @@ dom0 without confirmation.  Please use with care.
 D0-D3hot power management states for the PCI device. False (0) by
+=item B<rdm_reserve="STRING">
+(HVM/x86 only) This is same as reserve option inside the rdm option but
+just specific to a given device. Therefore the default is "relaxed" as
+same as reserve option as well.
+Note this would override global B<rdm> option.
diff --git a/docs/misc/vtd.txt b/docs/misc/vtd.txt
index 9af0e99..c0824a9 100644
--- a/docs/misc/vtd.txt
+++ b/docs/misc/vtd.txt
@@ -111,6 +111,30 @@ in the config file:
 To override for a specific device:
        pci = [ '01:00.0,msitranslate=0', '03:00.0' ]
+RDM, 'reserved device memory', for PCI Device Passthrough
+There are some devices the BIOS controls, for e.g. USB devices to perform
+PS2 emulation. The regions of memory used for these devices are marked
+reserved in the e820 map. When we turn on DMA translation, DMA to those
+regions will fail. Hence BIOS uses RMRR to specify these regions along with
+devices that need to access these regions. OS is expected to setup
+identity mappings for these regions for these devices to access these regions.
+While creating a VM we should reserve them in advance, and avoid any conflicts.
+So we introduce user configurable parameters to specify RDM resource and
+according policies,
+To enable this globally, add "rdm" in the config file:
+    rdm = "strategy=host, reserve=relaxed"   (default policy is "relaxed")
+Or just for a specific device:
+    pci = [ '01:00.0,rdm_reserve=relaxed', '03:00.0,rdm_reserve=strict' ]
+For all the options available to RDM, see xl.cfg(5).
 Caveat on Conventional PCI Device Passthrough
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index f366a09..b884fa1 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -105,6 +105,12 @@ static int sched_params_valid(libxl__gc *gc,
     return 1;
+void libxl__rdm_setdefault(libxl__gc *gc, libxl_domain_build_info *b_info)
+    if (b_info->u.hvm.rdm.reserve == LIBXL_RDM_RESERVE_FLAG_INVALID)
+        b_info->u.hvm.rdm.reserve = LIBXL_RDM_RESERVE_FLAG_RELAXED;
 int libxl__domain_build_info_setdefault(libxl__gc *gc,
                                         libxl_domain_build_info *b_info)
@@ -384,6 +390,7 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
         libxl_defbool_setdefault(&b_info->u.hvm.gfx_passthru, false);
+        libxl__rdm_setdefault(gc, b_info);
         libxl_defbool_setdefault(&b_info->u.pv.e820_host, false);
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index d52589e..d397143 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1154,6 +1154,8 @@ _hidden int libxl__device_vtpm_setdefault(libxl__gc *gc, 
libxl_device_vtpm *vtpm
 _hidden int libxl__device_vfb_setdefault(libxl__gc *gc, libxl_device_vfb *vfb);
 _hidden int libxl__device_vkb_setdefault(libxl__gc *gc, libxl_device_vkb *vkb);
 _hidden int libxl__device_pci_setdefault(libxl__gc *gc, libxl_device_pci *pci);
+_hidden void libxl__rdm_setdefault(libxl__gc *gc,
+                                   libxl_domain_build_info *b_info);
 _hidden const char *libxl__device_nic_devname(libxl__gc *gc,
                                               uint32_t domid,
diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index 632c15e..f0a32fa 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -988,6 +988,12 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, 
libxl_device_pci *pcidev, i
     if (!libxl_is_stubdom(ctx, domid, NULL)) {
+        if (pcidev->rdm_reserve == LIBXL_RDM_RESERVE_FLAG_STRICT) {
+            flag &= ~XEN_DOMCTL_DEV_RDM_RELAXED;
+        } else if (pcidev->rdm_reserve != LIBXL_RDM_RESERVE_FLAG_RELAXED) {
+            LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "unknown rdm check flag.");
+            return ERROR_FAIL;
+        }
         rc = xc_assign_device(ctx->xch, domid, pcidev_encode_bdf(pcidev), 
         if (rc < 0 && (hvm || errno != ENOSYS)) {
             LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "xc_assign_device failed");
@@ -1040,6 +1046,9 @@ static int libxl__device_pci_reset(libxl__gc *gc, 
unsigned int domain, unsigned
 int libxl__device_pci_setdefault(libxl__gc *gc, libxl_device_pci *pci)
+    /* We'd like to force reserve rdm specific to a device by default.*/
+    if ( pci->rdm_reserve == LIBXL_RDM_RESERVE_FLAG_INVALID)
+        pci->rdm_reserve = LIBXL_RDM_RESERVE_FLAG_STRICT;
     return 0;
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index e1632fa..546686d 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -76,6 +76,17 @@ libxl_domain_type = Enumeration("domain_type", [
     (2, "PV"),
     ], init_val = "LIBXL_DOMAIN_TYPE_INVALID")
+libxl_rdm_reserve_strategy = Enumeration("rdm_reserve_strategy", [
+    (0, "ignore"),
+    (1, "host"),
+    ])
+libxl_rdm_reserve_flag = Enumeration("rdm_reserve_flag", [
+    (-1, "invalid"),
+    (0, "strict"),
+    (1, "relaxed"),
+    ], init_val = "LIBXL_RDM_RESERVE_FLAG_INVALID")
 libxl_channel_connection = Enumeration("channel_connection", [
     (0, "UNKNOWN"),
     (1, "PTY"),
@@ -369,6 +380,11 @@ libxl_vnode_info = Struct("vnode_info", [
     ("vcpus", libxl_bitmap), # vcpus in this node
+libxl_rdm_reserve = Struct("rdm_reserve", [
+    ("strategy",    libxl_rdm_reserve_strategy),
+    ("reserve",     libxl_rdm_reserve_flag),
+    ])
 libxl_domain_build_info = Struct("domain_build_info",[
     ("max_vcpus",       integer),
     ("avail_vcpus",     libxl_bitmap),
@@ -467,6 +483,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
                                        # See libxl_ms_vm_genid_generate()
                                        ("ms_vm_genid",      libxl_ms_vm_genid),
                                        ("serial_list",      libxl_string_list),
+                                       ("rdm", libxl_rdm_reserve),
                  ("pv", Struct(None, [("kernel", string),
                                       ("slack_memkb", MemKB),
@@ -542,6 +559,7 @@ libxl_device_pci = Struct("device_pci", [
     ("power_mgmt", bool),
     ("permissive", bool),
     ("seize", bool),
+    ("rdm_reserve",   libxl_rdm_reserve_flag),
 libxl_device_dtdev = Struct("device_dtdev", [

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.