[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] HVM support for e820_host (Was: Bug: Limitation of <=2GB RAM in domU persists with 4.3.0)



On Tue, Sep 03, 2013 at 09:35:50PM +0100, Gordan Bobic wrote:
> First attempt at a test run predictably failed. I added e820_host=1
> to a VM config and tried starting it:
> 
> [root@normandy ~]# xl create /etc/xen/edi
> Parsing config from /etc/xen/edi
> libxl: error: libxl_x86.c:307:libxl__arch_domain_create: Failed
> while collecting E820 with: -3 (errno:-1)
> 
> libxl: error: libxl_create.c:901:domcreate_rebuild_done: cannot
> (re-)build domain: -3
> libxl: error: libxl_dm.c:1300:libxl__destroy_device_model: could not
> find device-model's pid for dom 1
> libxl: error: libxl.c:1415:libxl__destroy_domid:
> libxl__destroy_device_model failed for 1
> 
> xl-edi.log, qemu-dm-edi.log attached.
> Both actually look identical to previous logs before the patch.
> 
> Is this something that is clearly a consequence of the patch being
> incomplete? Or did I break something?

You are missing the hypervisor patch to set the E820 for HVM guests.
http://lists.xen.org/archives/html/xen-devel/2013-05/msg01603.html

And that should make it possible to "stash" the E820 in the hypervisor.

Then after that you will need to implement in the hvmloader.c the
XENMEM_memory_map hypercall to get the E820 and do something with it.


Oh, and something like this probably should do it - not compile tested
in any way:

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 1fcaed0..7b38890 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -3146,6 +3146,7 @@ static long hvm_memory_op(int cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
     case XENMEM_machine_memory_map:
     case XENMEM_machphys_mapping:
         return -ENOSYS;
+    case XENMEM_memory_map:
     case XENMEM_decrease_reservation:
         rc = do_memory_op(cmd, arg);
         current->domain->arch.hvm_domain.qemu_mapcache_invalidate = 1;
@@ -3216,10 +3217,10 @@ static long hvm_memory_op_compat32(int cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 
     switch ( cmd & MEMOP_CMD_MASK )
     {
-    case XENMEM_memory_map:
     case XENMEM_machine_memory_map:
     case XENMEM_machphys_mapping:
         return -ENOSYS;
+    case XENMEM_memory_map:
     case XENMEM_decrease_reservation:
         rc = compat_memory_op(cmd, arg);
         current->domain->arch.hvm_domain.qemu_mapcache_invalidate = 1;

diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c
index 2e05e93..86fb20a 100644
--- a/tools/firmware/hvmloader/e820.c
+++ b/tools/firmware/hvmloader/e820.c
@@ -68,16 +68,42 @@ void dump_e820_table(struct e820entry *e820, unsigned int 
nr)
     }
 }
 
+static const char *e820_names(int type)
+{
+    switch (type) {
+        case E820_RAM: return "RAM";
+        case E820_RESERVED: return "Reserved";
+        case E820_ACPI: return "ACPI";
+        case E820_NVS: return "ACPI NVS";
+        case E820_UNUSABLE: return "Unusable";
+        default: break;
+    }
+    return "Unknown";
+}
+
+
 /* Create an E820 table based on memory parameters provided in hvm_info. */
 int build_e820_table(struct e820entry *e820,
                      unsigned int lowmem_reserved_base,
                      unsigned int bios_image_base)
 {
     unsigned int nr = 0;
+    struct xen_memory_map op;
+    struct e820entry map[E820MAX];
+    int rc;
 
     if ( !lowmem_reserved_base )
             lowmem_reserved_base = 0xA0000;
 
+    set_xen_guest_handle(op.buffer, map);
+
+    rc = hypercall_memory_op ( XENMEM_memory_op, &op);
+    if ( rc != -ENOSYS) { /* It works!? */
+        int i;
+        for ( i = 0; i < op.nr_entries; i++ )
+            printf("    %lx -> %lx %s\n", map[i].addr >> 12,
+                   (map[i].addr + map[i].size) >> 12, e820_names(map[i].type));
+    }
     /* Lowmem must be at least 512K to keep Windows happy) */
     ASSERT ( lowmem_reserved_base > 512<<10 );
 
> 
> Gordan
> 
> On 09/03/2013 08:47 PM, Gordan Bobic wrote:
> >On 09/03/2013 03:59 PM, Konrad Rzeszutek Wilk wrote:
> >
> >>>>>2) Further, I'm finding myself motivated to write that
> >>>>>auto-set (as opposed to hard coded) vBAR=pBAR patch discussed
> >>>>>briefly a week or so ago (have an init script read the BAR
> >>>>>info from dom0 and put it in xenstore, plus a patch to
> >>>>>make pBAR=vBAR reservations built dynamically rather than
> >>>>>statically, based on this data. Now, I'm quite fluent in C,
> >>>>>but my familiarity with Xen soruce code is nearly non-existant
> >>>>>(limited to studying an old unsupported patch every now and then
> >>>>>in order to make it apply to a more recent code release).
> >>>>>Can anyone help me out with a high level view WRT where
> >>>>>this would be best plumbed in (which files and the flow of
> >>>>>control between the affected files)?
> >>>>
> >>>>hvmloader probably and the libxl e820 code. What from a
> >>>>high view needs to happen is that:
> >>>>1). Need to relax the check in libxl for e820_hole
> >>>>    to also do it for HVM guests. Said code just iterates over the
> >>>>    host E820 and sanitizes it a bit and makes a E820 hypercall to
> >>>>    set it for the guest.
> >[snip]
> >
> >OK, I have attached a preliminary patch against 4.3.0 for the libxl
> >part. It compiles. I haven't tried running it to see if it actually
> >works or does something, but my packages build.
> >
> >Please let me know if I've missed anything. On it's own, I don't think
> >this patch will do much (apart from maybe break HVM hosts with
> >e820_host=1 set).
> >
> >>>>2). Figure out whether the E820 hypercall (which sets the E820
> >>>>    layout for a guest) can be run on HVM guests. I think it
> >>>>    could not and Mukesh in his PVH patches posted a patch
> >>>>    to enable that - "..Move e820 fields out of pv_domain struct"
> >
> >Is this already in 4.3.0 or is this an out-of-tree patch? Do you have a
> >link to it handy?
> >
> >>>>2). Hvmloader should do an E820 get machine memory hypercall
> >>>>   to see if there is anything there. If there is - that means
> >>>>    the toolstack has request a "new" type of E820. Iterate
> >>>>    over the E820 and make it look like that.
> >>>>    You can look in the Linux arch/x86/xen/setup.c to see how
> >>>>    it does that.
> >>>>
> >>>>   The complication there is that hvmloader needs to to fit the
> >>>>   ACPI code (the guest type one) and such.
> >>>>   Presumarily you can just re-use the existing spaces that
> >>>>   the host has marked as E820_RESERVED or E820_ACPI..
> >>>
> >>>Yup, I get it. Not only that, but it should also ideally (not
> >>>strictly necessary, but it'd be handy) map the IOMEM for devices
> >>>it is passed so that pBAR=vBAR (as opposed to just leaving all
> >>>the host e820 reserved areas well alone - which would work for
> >>>most things).
> >>
> >>Yes. That is an extra complication that could be done in subsequent
> >>patches. But in theory if you have the E820 mirrored from the host the
> >>pBAR=vBAR should be easy enough as the values from the host BARs can
> >>easily fit in the E820 gaps.
> >
> >Agreed. Let's leave the pBAR=vBAR part for a separate patch set. I'll
> >have to figure out a sensible way to query the IOMEM regions for each of
> >the devices passed to the VM and make sure they are in the same hole.
> >
> >>>>   Then there is the SMBIOS would need to move and the BIOS
> >>>>   might need to be relocated - but I think those are relocatable
> >>>>  in some form.
> >
> >[bit above left for later reference]
> >
> >>>>Well, I am more than happy to help you with this.
> >>>
> >>>Thanks, much appreciated. :)
> >>
> >>Yeeey! Vict^H^H^H^volunteer :-)! <manically laughter in the background>
> >>
> >>I am also reachable on IRC (FreeNode mostly) as either darnok or konrad
> >>if that would be more convient to discuss this.
> >
> >Thanks. I'll keep that in mind. :)
> >
> >Gordan
> >
> >
> >_______________________________________________
> >Xen-devel mailing list
> >Xen-devel@xxxxxxxxxxxxx
> >http://lists.xen.org/xen-devel
> >
> 

> domid: 1
> Using file /dev/zvol/ssd/edi in read-write mode
> Watching /local/domain/0/device-model/1/logdirty/cmd
> Watching /local/domain/0/device-model/1/command
> Watching /local/domain/1/cpu
> char device redirected to /dev/pts/3
> qemu_map_cache_init nr_buckets = 10000 size 4194304
> shared page at pfn feffd
> buffered io page at pfn feffb
> Guest uuid = a57e6840-e9f5-4a14-a822-b2cc662c177f
> populating video RAM at ff000000
> mapping video RAM from ff000000
> Register xen platform.
> Done register platform.
> platform_fixed_ioport: changed ro/rw state of ROM memory area. now is rw 
> state.
> xs_read(/local/domain/0/device-model/1/xen_extended_power_mgmt): read error
> xs_read(): vncpasswd get error. 
> /vm/a57e6840-e9f5-4a14-a822-b2cc662c177f/vncpasswd.
> Log-dirty: no command yet.
> I/O request not ready: 0, ptr: 0, port: 0, data: 0, count: 0, size: 0
> I/O request not ready: 0, ptr: 0, port: 0, data: 0, count: 0, size: 0
> vcpu-set: watch node error.
> [xenstore_process_vcpu_set_event]: /local/domain/1/cpu has no CPU!
> I/O request not ready: 0, ptr: 0, port: 0, data: 0, count: 0, size: 0
> xs_read(/local/domain/1/log-throttling): read error
> qemu: ignoring not-understood drive `/local/domain/1/log-throttling'
> medium change watch on `/local/domain/1/log-throttling' - unknown device, 
> ignored
> I/O request not ready: 0, ptr: 0, port: 0, data: 0, count: 0, size: 0
> I/O request not ready: 0, ptr: 0, port: 0, data: 0, count: 0, size: 0
> I/O request not ready: 0, ptr: 0, port: 0, data: 0, count: 0, size: 0
> I/O request not ready: 0, ptr: 0, port: 0, data: 0, count: 0, size: 0
> I/O request not ready: 0, ptr: 0, port: 0, data: 0, count: 0, size: 0
> dm-command: hot insert pass-through pci dev 
> register_real_device: Assigning real physical device 08:00.0 ...
> register_real_device: Disable MSI translation via per device option
> register_real_device: Enable power management
> pt_iomul_init: Error: pt_iomul_init can't open file /dev/xen/pci_iomul: No 
> such file or directory: 0x8:0x0.0x0
> pt_register_regions: IO region registered (size=0x02000000 
> base_addr=0xf8000000)
> pt_register_regions: IO region registered (size=0x08000000 
> base_addr=0xb800000c)
> pt_register_regions: IO region registered (size=0x04000000 
> base_addr=0xb400000c)
> pt_register_regions: IO region registered (size=0x00000080 
> base_addr=0x0000cf81)
> pt_register_regions: Expansion ROM registered (size=0x00080000 
> base_addr=0xfbc00000)
> pci_intx: intx=1
> register_real_device: Real physical device 08:00.0 registered successfuly!
> IRQ type = INTx
> dm-command: hot insert pass-through pci dev 
> register_real_device: Assigning real physical device 08:00.1 ...
> register_real_device: Disable MSI translation via per device option
> register_real_device: Enable power management
> pt_iomul_init: Error: pt_iomul_init can't open file /dev/xen/pci_iomul: No 
> such file or directory: 0x8:0x0.0x1
> pt_register_regions: IO region registered (size=0x00004000 
> base_addr=0xfbcfc000)
> pci_intx: intx=2
> register_real_device: Real physical device 08:00.1 registered successfuly!
> IRQ type = INTx
> dm-command: hot insert pass-through pci dev 
> register_real_device: Assigning real physical device 0c:00.0 ...
> register_real_device: Disable MSI translation via per device option
> register_real_device: Enable power management
> pt_iomul_init: Error: pt_iomul_init can't open file /dev/xen/pci_iomul: No 
> such file or directory: 0xc:0x0.0x0
> pt_register_regions: IO region registered (size=0x00004000 
> base_addr=0xd7efc000)
> pci_intx: intx=1
> register_real_device: Real physical device 0c:00.0 registered successfuly!
> IRQ type = INTx
> dm-command: hot insert pass-through pci dev 
> register_real_device: Assigning real physical device 00:1a.1 ...
> register_real_device: Disable MSI translation via per device option
> register_real_device: Enable power management
> pt_iomul_init: Error: pt_iomul_init can't open file /dev/xen/pci_iomul: No 
> such file or directory: 0x0:0x1a.0x1
> pt_register_regions: IO region registered (size=0x00000020 
> base_addr=0x00008a01)
> pci_intx: intx=2
> register_real_device: Real physical device 00:1a.1 registered successfuly!
> IRQ type = INTx
> pt_iomem_map: e_phys=e0000000 maddr=b8000000 type=8 len=134217728 index=1 
> first_map=1
> pt_iomem_map: e_phys=e8000000 maddr=b4000000 type=8 len=67108864 index=3 
> first_map=1
> pt_iomem_map: e_phys=ec000000 maddr=f8000000 type=0 len=33554432 index=0 
> first_map=1
> vga s->lfb_addr = ef000000 s->lfb_end = ef800000 
> pt_iomem_map: e_phys=ef8a0000 maddr=fbcfc000 type=0 len=16384 index=0 
> first_map=1
> pt_iomem_map: e_phys=ef8a4000 maddr=d7efc000 type=0 len=16384 index=0 
> first_map=1
> pt_ioport_map: e_phys=c100 pio_base=cf80 len=128 index=5 first_map=1
> pt_ioport_map: e_phys=c1e0 pio_base=8a00 len=32 index=4 first_map=1
> platform_fixed_ioport: changed ro/rw state of ROM memory area. now is rw 
> state.
> platform_fixed_ioport: changed ro/rw state of ROM memory area. now is ro 
> state.
> Unknown PV product 2 loaded in guest
> PV driver build 1
> region type 0 at [ef880000,ef8a0000).
> squash iomem [ef880000, ef8a0000).
> region type 1 at [c180,c1c0).
> vga s->lfb_addr = ef000000 s->lfb_end = ef800000 
> pt_iomem_map: e_phys=ffffffff maddr=f8000000 type=0 len=33554432 index=0 
> first_map=0
> pt_iomem_map: e_phys=ffffffff maddr=b8000000 type=8 len=134217728 index=1 
> first_map=0
> pt_iomem_map: e_phys=ffffffff maddr=b4000000 type=8 len=67108864 index=3 
> first_map=0
> pt_ioport_map: e_phys=ffff pio_base=cf80 len=128 index=5 first_map=0
> pt_iomem_map: e_phys=ec000000 maddr=f8000000 type=0 len=33554432 index=0 
> first_map=0
> pt_iomem_map: e_phys=e0000000 maddr=b8000000 type=8 len=134217728 index=1 
> first_map=0
> pt_iomem_map: e_phys=e8000000 maddr=b4000000 type=8 len=67108864 index=3 
> first_map=0
> pt_ioport_map: e_phys=c100 pio_base=cf80 len=128 index=5 first_map=0
> pt_iomem_map: e_phys=ffffffff maddr=fbcfc000 type=0 len=16384 index=0 
> first_map=0
> pt_pci_write_config: [00:06:0] Warning: Guest attempt to set address to 
> unused Base Address Register. [Offset:30h][Length:4]
> pt_iomem_map: e_phys=ef8a0000 maddr=fbcfc000 type=0 len=16384 index=0 
> first_map=0
> pt_iomem_map: e_phys=ffffffff maddr=d7efc000 type=0 len=16384 index=0 
> first_map=0
> pt_pci_write_config: [00:07:0] Warning: Guest attempt to set address to 
> unused Base Address Register. [Offset:30h][Length:4]
> pt_iomem_map: e_phys=ef8a4000 maddr=d7efc000 type=0 len=16384 index=0 
> first_map=0
> pt_ioport_map: e_phys=ffff pio_base=8a00 len=32 index=4 first_map=0
> pt_pci_write_config: [00:08:0] Warning: Guest attempt to set address to 
> unused Base Address Register. [Offset:30h][Length:4]
> pt_ioport_map: e_phys=c1e0 pio_base=8a00 len=32 index=4 first_map=0
> pt_iomem_map: e_phys=ffffffff maddr=f8000000 type=0 len=33554432 index=0 
> first_map=0
> pt_iomem_map: e_phys=ffffffff maddr=b8000000 type=8 len=134217728 index=1 
> first_map=0
> pt_iomem_map: e_phys=ffffffff maddr=b4000000 type=8 len=67108864 index=3 
> first_map=0
> pt_ioport_map: e_phys=ffff pio_base=cf80 len=128 index=5 first_map=0
> pt_iomem_map: e_phys=ec000000 maddr=f8000000 type=0 len=33554432 index=0 
> first_map=0
> pt_iomem_map: e_phys=e0000000 maddr=b8000000 type=8 len=134217728 index=1 
> first_map=0
> pt_iomem_map: e_phys=e8000000 maddr=b4000000 type=8 len=67108864 index=3 
> first_map=0
> pt_ioport_map: e_phys=c100 pio_base=cf80 len=128 index=5 first_map=0
> pt_iomem_map: e_phys=ffffffff maddr=fbcfc000 type=0 len=16384 index=0 
> first_map=0
> pt_iomem_map: e_phys=ef8a0000 maddr=fbcfc000 type=0 len=16384 index=0 
> first_map=0
> pt_ioport_map: e_phys=ffff pio_base=8a00 len=32 index=4 first_map=0
> pt_ioport_map: e_phys=c1e0 pio_base=8a00 len=32 index=4 first_map=0
> pt_iomem_map: e_phys=ffffffff maddr=d7efc000 type=0 len=16384 index=0 
> first_map=0
> pt_iomem_map: e_phys=ef8a4000 maddr=d7efc000 type=0 len=16384 index=0 
> first_map=0
> pt_iomem_map: e_phys=ffffffff maddr=fbcfc000 type=0 len=16384 index=0 
> first_map=0
> pt_iomem_map: e_phys=ffffffff maddr=d7efc000 type=0 len=16384 index=0 
> first_map=0
> pt_ioport_map: e_phys=ffff pio_base=8a00 len=32 index=4 first_map=0
> shutdown requested in cpu_handle_ioreq
> Issued domain 1 poweroff

> Waiting for domain edi (domid 1) to die [pid 8363]
> Domain 1 has shut down, reason code 0 0x0
> Action for shutdown reason code 0 is destroy
> Domain 1 needs to be cleaned up: destroying the domain
> libxl: error: libxl_pci.c:990:libxl__device_pci_reset: The kernel doesn't 
> support reset from sysfs for PCI device 0000:08:00.0
> libxl: error: libxl_pci.c:990:libxl__device_pci_reset: The kernel doesn't 
> support reset from sysfs for PCI device 0000:08:00.1
> Done. Exiting now


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.