[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 4 of 5] libxl: Add support for passing in the machine's E820 for PCI passthrough



On Fri, Apr 08, 2011 at 09:36:45AM +0100, Ian Campbell wrote:
> On Thu, 2011-04-07 at 21:25 +0100, Konrad Rzeszutek Wilk wrote:
> > # HG changeset patch
> > # User Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
> > # Date 1302206363 14400
> > # Node ID 2e464234c94cfd29a98a9011a46d76846b87f7f8
> > # Parent  546d8a03d5cbe0ceddadf701174f2417a0b72891
> > libxl: Add support for passing in the machine's E820 for PCI passthrough.
> > 
> > The code (libxl_e820_alloc) calls the xc_get_machine_memory_map to
> > retrieve the systems E820. Then the E820 is sanitized to weed out E820
> > entries below 16MB, and as well remove any E820_RAM or E820_UNUSED
> > regions as the guest does not need to know abou them. The guest
> > only needs the E820_ACPI, E820_NVS, E820_RESERVED to get an idea of
> > where the PCI I/O space is. Mostly.. The Linux kernel assumes that any
> > gap in the E820 is considered PCI I/O space which means that if we pass
> > in the guest 2GB, and the E820_ACPI, and its friend start at 3GB, the
> > gap between 2GB and 3GB will be considered as PCI I/O space. To guard 
> > against
> > that we also create an E820_UNUSABLE between the region of 'target_kb'
> > (called ram_end in the code) up to the first E820_[ACPI,NVS,RESERVED] 
> > region.
> > 
> > Memory that is slack or for balloon (so 'maxmem' in guest configuration)
> > is put behind the machine E820. Which in most cases is after the 4GB.
> > 
> > The reason for doing the fetching of the E820 using the hypercall in
> > the toolstack (instead of the guest doing it) is that when a Linux
> > guest would do a hypercall to 'XENMEM_machine_memory_map' it would
> > retrieve an E820 with I/O range caps added in. Meaning that the
> > region after 4GB up to end of possible memory would be marked as unusuable
> > and the Linux kernel would not have any space to allocate a balloon
> > region.
> > 
> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
> > 
> > diff -r 546d8a03d5cb -r 2e464234c94c tools/libxl/libxl.h
> > --- a/tools/libxl/libxl.h       Thu Apr 07 15:18:46 2011 -0400
> > +++ b/tools/libxl/libxl.h       Thu Apr 07 15:59:23 2011 -0400
> > @@ -204,6 +204,14 @@
> >  } libxl_file_reference;
> >  void libxl_file_reference_destroy(libxl_file_reference *p);
> > 
> > +#include <xc_e820.h>
> 
> We are trying to avoid exposing libxc headers to libxl users. struct
> e820entry is well defined and unchanging so I don't think there is much
> harm in duplicating it.

OK.
> 
> > +typedef struct {
> > +    uint32_t nr_entries;
> > +    struct e820entry *entry;
> > +} libxl_e820;
> 
> BTW does the e820entry array need to be packed at some point before
> getting returned to the guest? (I'm not sure what the expected alignment

The struct e820entry has the __packed__ attribute on it.

> is, nor how a 20 byte struct containing two u64s and a u32 naturally
> aligns on x86_64).

It seems to work OK.. (I booted a 32-bit guest under a 64-bit hypervisor
with a 64-bit dom0).
> 
> > diff -r 546d8a03d5cb -r 2e464234c94c tools/libxl/libxl.idl
> > --- a/tools/libxl/libxl.idl     Thu Apr 07 15:18:46 2011 -0400
> > +++ b/tools/libxl/libxl.idl     Thu Apr 07 15:59:23 2011 -0400
> > @@ -22,6 +22,7 @@
> > 
> >  libxl_hwcap = Builtin("hwcap")
> > 
> > +libxl_e820 = Builtin("e820", destructor_fn="libxl_e820_destroy", 
> > passby=PASS_BY_REFERENCE)
> 
> Hmmm, We are starting to collect a lot of builtins of the form {len,
> pointer}. Time to start thinking of a new type class abstraction, I
> think... (not your problem in this series though)
> 
> > diff -r 546d8a03d5cb -r 2e464234c94c tools/libxl/libxl_pci.c
> > --- a/tools/libxl/libxl_pci.c   Thu Apr 07 15:18:46 2011 -0400
> > +++ b/tools/libxl/libxl_pci.c   Thu Apr 07 15:59:23 2011 -0400
> > @@ -37,6 +37,8 @@
> >  #include "libxl_internal.h"
> >  #include "flexarray.h"
> > 
> > +#include <xc_e820.h>
> > +
> >  #define PCI_BDF                "%04x:%02x:%02x.%01x"
> >  #define PCI_BDF_SHORT          "%02x:%02x.%01x"
> >  #define PCI_BDF_VDEVFN         "%04x:%02x:%02x.%01x@%02x"
> > @@ -1047,3 +1049,154 @@
> >      free(pcidevs);
> >      return 0;
> >  }
> > +
> > +#define E820MAX (128)
> 
> Should be somewhere public for the benefit of libxl users who want to
> setup an e820? Even if it should be private it should be in a .h.

Aye aye.
> 
> Are we expecting that libxl users might want to modify the e820? If not
> then why expose libxl_e820_alloc and libxl_e820_sanitize at all, just
> add a flag to the libxl interface which says whether or not to provide a
> host-derived e820.

OK. Where should I put the definitions of these two functions? (They are
called from libxl_dom.c and xl_cmdimpl.c) Is there a internal header file?

> 
> If users are expected to modify it then I'm in two minds about doing the
> sanitize step at domain build time instead of in libxl_e820_alloc. If we

The issue I had was that to construct the e820, I needed the
'target_kb' and 'max_memkb' and 'slack_memkb'. If I can get those two values
when the guest config file is parsed (so xl_cmdimpl.c) I think moving
the sanitization in there (and not doing it during domain build time)
is the right thing to do. Let me see if I can do that.

> provide a default sanitized e820 to the caller and they want to modify
> it to be non-santized should we let them? I guess the question here is
> whether we are sanitizing actually invalid e820 maps or if we are also
> enforcing some higher level policy based on a specific class Linux
> guest's requirements.

Which might be quite different if the guest is a BSD one. I guess
it is time to start launching those NetBSD guests :-)

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.