[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] Re: [PATCH 2 of 3] libxl: Add support for passing in the machine's E820 for PCI passthrough
On Tue, May 10, 2011 at 09:29:50AM +0100, Ian Campbell wrote: > On Wed, 2011-05-04 at 15:17 +0100, Konrad Rzeszutek Wilk wrote: > > # HG changeset patch > > # User Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> > > # Date 1304518568 14400 > > # Node ID ba218fa1a48ed682651fd90dd8eeb68eeab68e7a > > # Parent b6af9b428bb16c4c5364ace0617923ffa44ad887 > > libxl: Add support for passing in the machine's E820 for PCI passthrough > > > > The code that populates E820 is unconditionally triggered by the guest > > configuration having "pci=['<BDF>,..']", being a PV guest, and if > > b_info->u.pv.machine_e820 is set. > > > > The code do_domain_create calls the libxl__e820_alloc when > > it notices that the guest is PV, has at least one PCI devices, and has > > the machine_e820 flag set. > > > > libxl__e820_alloc calls the xc_get_machine_memory_map to retrieve the > > systems > > E820. Then the E820 is sanitized to weed out E820 entries below 16MB, and as > > well remove any E820_RAM or E820_UNUSED regions as the guest does not need > > to > > know about them. The guest only needs the E820_ACPI, E820_NVS, > > E820_RESERVED to > > get an idea of where the PCI I/O space is. Mostly.. The Linux kernel > > assumes that any > > gap in the E820 is considered PCI I/O space which means that if we pass > > in the guest 2GB, and the E820_ACPI, and its friend start at 3GB, the > > gap between 2GB and 3GB will be considered as PCI I/O space. To guard > > against > > that we also create an E820_UNUSABLE between the region of 'target_kb' > > (called ram_end in the code) up to the first E820_[ACPI,NVS,RESERVED] > > region. > > Lastly, the xc_domain_set_memory_map is called to install the new E820. > > Do we need to document the requirements this places on a PV guest > somewhere more prominent? > > Phrases like "up to the first E820...." make me worry that this will > only work on "sensible" machines, but I suppose we can cross those > bridges when we come to them... I did test it with non-sensible machines that have a E820 peppered with E820_RAM amongs the E820_RESERVED. The last patches makes that work. > > > When tested with another PV guest (NetBSD 5.1) the modified E820 gave > > it no trouble. The code has also been tested with older "classic" Xen Linux > > and with the newer "pvops" with success (SLES11, RHEL5, Ubuntu Lucid, > > Debian Squeeze, 2.6.37, 2.6.38, 2.6.39). > > > > Memory that is slack or for balloon (so 'maxmem' in guest configuration) > > is put behind the machine E820. Which in most cases is after the 4GB. > > > > The reason for doing the fetching of the E820 using the hypercall in > > the toolstack (instead of the guest doing it) is that when a guest > > would do a hypercall to 'XENMEM_machine_memory_map' it would > > retrieve an E820 with I/O range caps added in. Meaning that the > > region after 4GB up to end of possible memory would be marked as unusable > > and the kernel would not have any space to allocate a balloon > > region. > > > > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> > > > > diff -r b6af9b428bb1 -r ba218fa1a48e tools/libxl/libxl.idl > > --- a/tools/libxl/libxl.idl Wed May 04 10:16:08 2011 -0400 > > +++ b/tools/libxl/libxl.idl Wed May 04 10:16:08 2011 -0400 > > @@ -180,6 +180,7 @@ libxl_domain_build_info = Struct("domain > > ("cmdline", string), > > ("ramdisk", libxl_file_reference), > > ("features", string, True), > > + ("machine_e820", bool, False, "Use > > machine's E820 for PCI passthrough."), > > ])), > > ])), > > ], > > diff -r b6af9b428bb1 -r ba218fa1a48e tools/libxl/libxl_create.c > > --- a/tools/libxl/libxl_create.c Wed May 04 10:16:08 2011 -0400 > > +++ b/tools/libxl/libxl_create.c Wed May 04 10:16:08 2011 -0400 > > @@ -519,6 +519,14 @@ static int do_domain_create(libxl__gc *g > > for (i = 0; i < d_config->num_pcidevs; i++) > > libxl__device_pci_add(gc, domid, &d_config->pcidevs[i], 1); > > > > + if (!d_config->c_info.hvm && d_config->b_info.u.pv.machine_e820) { > > + int rc = 0; > > + rc = libxl__e820_alloc(ctx, domid, d_config); > > rc is pretty comprehensively initialised ;-0 Duh! > > > + if (rc) > > + LIBXL__LOG(ctx, LIBXL__LOG_WARNING, > > + "Failed while collecting E820 with: %d (errno:%d)\n", > > + rc, errno); > > LIBXL__LOG_ERRNO takes care of logging errno for you. OK. > > > + } > > if ( cb && (d_config->c_info.hvm || d_config->b_info.u.pv.bootloader > > )) { > > if ( (*cb)(ctx, domid, priv) ) > > goto error_out; > > > diff -r b6af9b428bb1 -r ba218fa1a48e tools/libxl/libxl_pci.c > > --- a/tools/libxl/libxl_pci.c Wed May 04 10:16:08 2011 -0400 > > +++ b/tools/libxl/libxl_pci.c Wed May 04 10:16:08 2011 -0400 > > @@ -1047,3 +1047,157 @@ int libxl_device_pci_shutdown(libxl_ctx > > free(pcidevs); > > return 0; > > } > > + > > +static int e820_sanitize(libxl_ctx *ctx, struct e820entry src[], > > + uint32_t *nr_entries, > > + unsigned long map_limitkb, > > + unsigned long balloon_kb) > > +{ > > + uint64_t delta_kb = 0, start = 0, start_kb = 0, last = 0, ram_end; > > + uint32_t i, idx = 0, nr; > > + struct e820entry e820[E820MAX]; > > + > > + if (!src || !map_limitkb || !balloon_kb || !nr_entries) > > + return ERROR_INVAL; > > + > > + nr = *nr_entries; > > + if (!nr) > > + return ERROR_INVAL; > > + > > + if (nr > E820MAX) > > + return ERROR_NOMEM; > > + > > + /* Weed out anything under 16MB */ > > + for (i = 0; i < nr; i++) { > > + if (src[i].addr > 0x100000) > > 0x100000 is 1MB not 16MB. I'm not sure if the code or the comment is > correct. Comment should have said 1MB. > > > + continue; > > + > > + src[i].type = 0; > > + src[i].size = 0; > > + src[i].addr = -1ULL; > > + } > > + > > + /* Find the lowest and highest entry in E820, skipping over > > + * undersired entries. */ > > undesired ? > > > + start = -1ULL; > > + last = 0; > > + for (i = 0; i < nr; i++) { > > + if ((src[i].type == E820_RAM) || > > + (src[i].type == E820_UNUSABLE) || > > + (src[i].type == 0)) > [...] > > nr = idx; > > + > > + for (i = 0; i < nr; i++) { > > + LIBXL__LOG(ctx, LIBXL__LOG_DEBUG, ":%s\t[%lx -> %lx]", > > + e820[i].type == E820_RAM ? "RAM " : > > + (e820[i].type == E820_RESERVED ? "RSV " : > > + e820[i].type == E820_ACPI ? "ACPI" : > > + (e820[i].type == E820_NVS ? "NVS " : > > + (e820[i].type == E820_UNUSABLE ? "UNU " : "----"))), > > + e820[i].addr >> 12, > > + (e820[i].addr + e820[i].size) >> 12); > > This screams out for a e820_type_as_string style function or a lookup > table, or just about anything else really ;-). <nods> I had it at some point in the patches, not sure why it got lost. Let me add it back in. > Also you can use "%-4s" instead of manually aligning all the strings. /me nods. > > > + } > > + > > + /* Done: copy the sanitized version. */ > > + *nr_entries = nr; > > + memcpy(src, e820, nr * sizeof(struct e820entry)); > > + return 0; > > +} > > + > > Ian. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |