[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [v8][PATCH 02/17] introduce XEN_DOMCTL_set_rdm




On 2014/12/3 3:39, Konrad Rzeszutek Wilk wrote:
On Mon, Dec 01, 2014 at 05:24:20PM +0800, Tiejun Chen wrote:
This should be based on a new parameter globally, 'pci_rdmforce'.

pci_rdmforce = 1 => Of course this should be 0 by default.

'1' means we should force check to reserve all ranges. If failed
VM wouldn't be created successfully. This also can give user a
chance to work well with later hotplug, even if not a device
assignment while creating VM.

But we can override that by one specific pci device:

pci = ['AA:BB.CC,rdmforce=0/1]

But this 'rdmforce' should be 1 by default since obviously any
passthrough device always need to do this. Actually no one really
want to set as '0' so it may be unnecessary but I'd like to leave
this as a potential approach.

So this domctl provides an approach to control how to populate
reserved device memory by tools.

Note we always post a message to user about this once we owns
RMRR.

Signed-off-by: Tiejun Chen <tiejun.chen@xxxxxxxxx>
---
  docs/man/xl.cfg.pod.5              |  6 +++++
  docs/misc/vtd.txt                  | 15 ++++++++++++
  tools/libxc/include/xenctrl.h      |  6 +++++
  tools/libxc/xc_domain.c            | 28 +++++++++++++++++++++++
  tools/libxl/libxl_create.c         |  3 +++
  tools/libxl/libxl_dm.c             | 47 ++++++++++++++++++++++++++++++++++++++
  tools/libxl/libxl_internal.h       |  4 ++++
  tools/libxl/libxl_types.idl        |  2 ++
  tools/libxl/libxlu_pci.c           |  2 ++
  tools/libxl/xl_cmdimpl.c           | 10 ++++++++

In the past we had split the hypervisor and the
toolstack patches in two. So that one could focus
on the hypervisor ones first, and then in another
patch on the toolstack.


Yes.

But perhaps this was intended to be in one patch?

This change also involve docs so its little bit harder to understand the whole page if we split this.


  xen/drivers/passthrough/pci.c      | 39 +++++++++++++++++++++++++++++++
  xen/drivers/passthrough/vtd/dmar.c |  8 +++++++
  xen/include/asm-x86/hvm/domain.h   |  4 ++++

I don't see ARM here? Should there be an ARM variant of this? If not

ARM doesn't need this feature.

should the toolstack ones only run under x86?

And I think this shouldn't broken current ARM path as well. I mean this would return simply since ARM hasn't such a hypercall handler.


  xen/include/public/domctl.h        | 21 +++++++++++++++++
  xen/xsm/flask/hooks.c              |  1 +
  15 files changed, 196 insertions(+)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 622ea53..9adc41e 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -645,6 +645,12 @@ dom0 without confirmation.  Please use with care.
  D0-D3hot power management states for the PCI device. False (0) by
  default.

+=item B<rdmforce=BOOLEAN>
+
+(HVM/x86 only) Specifies that the VM would force to check and try to

s/force/forced/

I guess you're saying 'be forced'.

+reserve all reserved device memory, like RMRR, associated to the PCI
+device. False (0) by default.

Not sure I understand. How would the VM be forced to do this? Or is
it that the hvmloader would force to do this? And if it fails (as you

Yes.

say 'try') ? What then?

In most cases we can reserve these regions but if these RMRR regions overlap with some fixed range, like guest BIOS, we can't succeed in this case.


+
  =back

  =back
diff --git a/docs/misc/vtd.txt b/docs/misc/vtd.txt
index 9af0e99..23544d5 100644
--- a/docs/misc/vtd.txt
+++ b/docs/misc/vtd.txt
@@ -111,6 +111,21 @@ in the config file:
  To override for a specific device:
        pci = [ '01:00.0,msitranslate=0', '03:00.0' ]

+RDM, 'reserved device memory', for PCI Device Passthrough
+---------------------------------------------------------
+
+The BIOS controls some devices in terms of some reginos of memory used for

Could you elaborate what 'some devices' are? Network cards? GPUs? What
are the most commons ones.

Some legacy USB device to perform PS2 emulation, and GPU has a stolen memory as I remember.


s/reginos/regions/

Fixed.


And by regions you mean BAR regions?

No. I guess you want to know some background about RMRR :)

There's a good brief description in Linux:

What is RMRR?
-------------

There are some devices the BIOS controls, for e.g USB devices to perform
PS2 emulation. The regions of memory used for these devices are marked
reserved in the e820 map. When we turn on DMA translation, DMA to those
regions will fail. Hence BIOS uses RMRR to specify these regions along with
devices that need to access these regions. OS is expected to setup
unity mappings for these regions for these devices to access these regions.


+these devices. This kind of region should be reserved before creating a VM
+to make sure they are not occupied by RAM/MMIO to conflict, and also we can

You said 'This' but here you are using the plural ' are'. IF you want it plural
it needs to be 'These regions'

Thanks for your correction.

+create necessary IOMMU table successfully.
+
+To enable this globally, add "pci_rdmforce" in the config file:
+
+       pci_rdmforce = 1         (default is 0)

The guest config file? Or /etc/xen/xl.conf ?

The guest config file. Here I just follow something about 'pci_msitranslate' since they have that usage in common.


+
+Or just enable for a specific device:
+       pci = [ '01:00.0,rdmforce=1', '03:00.0' ]
+

  Caveat on Conventional PCI Device Passthrough
  ---------------------------------------------
diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h

[snip]

--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -34,6 +34,7 @@
  #include <xen/tasklet.h>
  #include <xsm/xsm.h>
  #include <asm/msi.h>
+#include <xen/stdbool.h>

  struct pci_seg {
      struct list_head alldevs_list;
@@ -1553,6 +1554,44 @@ int iommu_do_pci_domctl(
          }
          break;

+    case XEN_DOMCTL_set_rdm:
+    {
+        struct xen_domctl_set_rdm *xdsr = &domctl->u.set_rdm;
+        struct xen_guest_pcidev_info *pcidevs = NULL;
+        struct domain *d = rcu_lock_domain_by_any_id(domctl->domain);
+
+        if ( d == NULL )
+            return -ESRCH;
+

What if this is called on an PV domain?

Currently we just support this in HVM, so I'd like to add this,

         if ( d == NULL )
             return -ESRCH;

+        ASSERT( is_hvm_domain(d) );
+


You are also missing the XSM checks.

Just see this below.


What if this is called multiple times. Is it OK to over-ride
the 'pci_force' or should it stick once?

It should be fine since just xc/hvmloader need such an information while creating a VM.

And especially, currently we just call this one time to set. So why we need to call this again and again? I think if anyone want to extend such a case you're worrying, he should know any effect before he take a action, right?



+        d->arch.hvm_domain.pci_force =
+                            xdsr->flags & PCI_DEV_RDM_CHECK ? true : false;

Won't we crash here if this is called for PV guests?

After the line, 'ASSERT( is_hvm_domain(d) );', is added, this problem should be gone.


+        d->arch.hvm_domain.num_pcidevs = xdsr->num_pcidevs;

What if the 'num_pcidevs' has some bogus value. You need to check for that.

This value is grabbed from that existing interface, assign_device, so I mean this is already checked.



+        d->arch.hvm_domain.pcidevs = NULL;

Please first free it. It might be that the toolstack
is doing this a couple of times. You don't want to leak memory.


Okay,

+        if ( d->arch.hvm_domain.pcidevs )
+            xfree(d->arch.hvm_domain.pcidevs);


+
+        if ( xdsr->num_pcidevs )
+        {
+            pcidevs = xmalloc_array(xen_guest_pcidev_info_t,
+                                    xdsr->num_pcidevs);
+            if ( pcidevs == NULL )
+            {
+                rcu_unlock_domain(d);
+                return -ENOMEM;

But you already have set 'num_pcidevs' to some value. This copying/check
should be done before you modify 'd->arch.hvm_domain'...

This makes sense so I'll move down this fragment.

+            }
+
+            if ( copy_from_guest(pcidevs, xdsr->pcidevs,
+                                 xdsr->num_pcidevs*sizeof(*pcidevs)) )
+            {
+                xfree(pcidevs);
+                rcu_unlock_domain(d);

Ditto. You need to do these checks before you modify 'd->arch.hvm_domain'.

+                return -EFAULT;
+            }
+        }
+
+        d->arch.hvm_domain.pcidevs = pcidevs;
+        rcu_unlock_domain(d);
+    }
+        break;
+
      case XEN_DOMCTL_assign_device:
          if ( unlikely(d->is_dying) )
          {
diff --git a/xen/drivers/passthrough/vtd/dmar.c 
b/xen/drivers/passthrough/vtd/dmar.c
index 1152c3a..5e41e7a 100644
--- a/xen/drivers/passthrough/vtd/dmar.c
+++ b/xen/drivers/passthrough/vtd/dmar.c
@@ -674,6 +674,14 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header)
                          "  RMRR region: base_addr %"PRIx64
                          " end_address %"PRIx64"\n",
                          rmrru->base_address, rmrru->end_address);
+            /*
+             * TODO: we may provide a precise paramter just to reserve

s/paramter/parameter/

Fixed.

+             * RMRR range specific to one device.
+             */
+            dprintk(XENLOG_WARNING VTDPREFIX,
+                    "So please set pci_rdmforce to reserve these ranges"
+                    " if you need such a device in hotplug case.\n");

s/hotplug/passthrough


'Please set rdmforce to reserve ranges %lx->%lx if you plan to hotplug this 
device.'

But then this is going to be a bit verbose, so perhaps:

'Ranges %lx-%lx need rdmforce to properly work.' ?

Its unnecessary to output range again since we already have such a print message here.


+
              acpi_register_rmrr_unit(rmrru);
          }
      }
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index 2757c7f..38530e5 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -90,6 +90,10 @@ struct hvm_domain {
      /* Cached CF8 for guest PCI config cycles */
      uint32_t                pci_cf8;


Maybe a comment explaining its purpose?

Okay.

/* Force to check/reserve Reserved Device Memory. */
    bool_t                  pci_force;


+    bool_t                  pci_force;
+    uint32_t                num_pcidevs;
+    struct xen_guest_pcidev_info      *pcidevs;
+

You are also missing freeing of this in the hypervisor when the guest
is destroyed. Please fix that.

You're right. I will go there next revision.


      struct pl_time         pl_time;

      struct hvm_io_handler *io_handler;
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 57e2ed7..ba8970d 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -508,6 +508,25 @@ struct xen_domctl_get_device_group {
  typedef struct xen_domctl_get_device_group xen_domctl_get_device_group_t;
  DEFINE_XEN_GUEST_HANDLE(xen_domctl_get_device_group_t);

+/* Currently just one bit to indicate force to check Reserved Device Memory. */

Not sure I understand. Did you mean:

'Check Reserved Device Memory'.

I can change this as '...force checking Reserved Device Memory.'


What happens if you do not have this flag? What are the semantics
of this hypercall - as in what will it mean.

If we have no this flag, these devices owned RMRR can't work in passthrough case.


+#define PCI_DEV_RDM_CHECK   0x1
+struct xen_guest_pcidev_info {
+    uint16_t    seg;
+    uint8_t     bus;
+    uint8_t     devfn;
+    uint32_t    flags;
+};
+typedef struct xen_guest_pcidev_info xen_guest_pcidev_info_t;
+DEFINE_XEN_GUEST_HANDLE(xen_guest_pcidev_info_t);
+/* Control whether/how we check and reserve device memory. */
+struct xen_domctl_set_rdm {
+    uint32_t    flags;

What is this 'flags' purpose compared to the 'pcidevs.flags'? Please
explain.

I replied something to Kevin, and we just need a global flag so we can remove pcidevs.flags.


+    uint32_t    num_pcidevs;
+    XEN_GUEST_HANDLE_64(xen_guest_pcidev_info_t) pcidevs;
+};
+typedef struct xen_domctl_set_rdm xen_domctl_set_rdm_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_set_rdm_t);
+
  /* Pass-through interrupts: bind real irq -> hvm devfn. */
  /* XEN_DOMCTL_bind_pt_irq */
  /* XEN_DOMCTL_unbind_pt_irq */
@@ -1070,6 +1089,7 @@ struct xen_domctl {
  #define XEN_DOMCTL_setvnumainfo                  74
  #define XEN_DOMCTL_psr_cmt_op                    75
  #define XEN_DOMCTL_arm_configure_domain          76
+#define XEN_DOMCTL_set_rdm                       77
  #define XEN_DOMCTL_gdbsx_guestmemio            1000
  #define XEN_DOMCTL_gdbsx_pausevcpu             1001
  #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
@@ -1135,6 +1155,7 @@ struct xen_domctl {
          struct xen_domctl_gdbsx_domstatus   gdbsx_domstatus;
          struct xen_domctl_vnuma             vnuma;
          struct xen_domctl_psr_cmt_op        psr_cmt_op;
+        struct xen_domctl_set_rdm           set_rdm;
          uint8_t                             pad[128];
      } u;
  };
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index d48463f..5a760e2 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -592,6 +592,7 @@ static int flask_domctl(struct domain *d, int cmd)
      case XEN_DOMCTL_test_assign_device:
      case XEN_DOMCTL_assign_device:
      case XEN_DOMCTL_deassign_device:
+    case XEN_DOMCTL_set_rdm:

There is more to XSM than just this file..

But I don't see more other stuff, like XEN_DOMCTL_assign_device.


Please compile with XSM enabled.

Anyway, I add XSM_ENABLE = y and FLASK_ENABLE = y in Config.mk then recompile, but looks good.

Anything I'm missing?

  #endif
          return 0;


Also how does this work with 32-bit dom0s? Is there a need to use the
compat layer?

Are you saying in xsm case? Others?

Actually this new DOMCTL is similar with XEN_DOMCTL_assign_device in some senses but I don't see such an issue you're pointing.

Thanks
Tiejun

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.