[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2 3/3] x86/PVH: Support relocatable dom0 kernels
On Wed, Mar 13, 2024 at 03:30:21PM -0400, Jason Andryuk wrote: > Xen tries to load a PVH dom0 kernel at the fixed guest physical address > from the elf headers. For Linux, this defaults to 0x1000000 (16MB), but > it can be configured. > > Unfortunately there exist firmwares that have reserved regions at this > address, so Xen fails to load the dom0 kernel since it's not RAM. > > The PVH entry code is not relocatable - it loads from absolute > addresses, which fail when the kernel is loaded at a different address. > With a suitably modified kernel, a reloctable entry point is possible. > > Add XEN_ELFNOTE_PVH_RELOCATION which specifies the minimum, maximum and > alignment needed for the kernel. The presence of the NOTE indicates the > kernel supports a relocatable entry path. > > Change the loading to check for an acceptable load address. If the > kernel is relocatable, support finding an alternate load address. > > Link: https://gitlab.com/xen-project/xen/-/issues/180 > Signed-off-by: Jason Andryuk <jason.andryuk@xxxxxxx> > --- > ELF Note printing looks like: > (XEN) ELF: note: PVH_RELOCATION = min: 0x1000000 max: 0xffffffff align: > 0x200000 > > v2: > Use elfnote for min, max & align - use 64bit values. > Print original and relocated memory addresses > Use check_and_adjust_load_address() name > Return relocated base instead of offset > Use PAGE_ALIGN > Don't load above max_phys (expected to be 4GB in kernel elf note) > Use single line comments > Exit check_load_address loop earlier > Add __init to find_kernel_memory() > --- > xen/arch/x86/hvm/dom0_build.c | 108 +++++++++++++++++++++++++++++ > xen/common/libelf/libelf-dominfo.c | 13 ++++ > xen/include/public/elfnote.h | 11 +++ > xen/include/xen/libelf.h | 3 + > 4 files changed, 135 insertions(+) > > diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c > index 0ceda4140b..5c6c0d2db3 100644 > --- a/xen/arch/x86/hvm/dom0_build.c > +++ b/xen/arch/x86/hvm/dom0_build.c > @@ -537,6 +537,108 @@ static paddr_t __init find_memory( > return INVALID_PADDR; > } > > +static bool __init check_load_address( > + const struct domain *d, const struct elf_binary *elf) > +{ > + paddr_t kernel_start = (paddr_t)elf->dest_base & PAGE_MASK; Are you sure this is correct? If a program header specifies a non-4K aligned load address we should still try to honor it. I think this is very unlikely, but still we shouldn't apply non-requested alignments to addresses coming from the ELF headers. > + paddr_t kernel_end = PAGE_ALIGN((paddr_t)elf->dest_base + > elf->dest_size); > + unsigned int i; > + > + /* > + * The memory map is sorted and all RAM regions starts and sizes are > + * aligned to page boundaries. Relying on sizes to be page aligned seems fragile: it might work now because of the order in which pvh_setup_vmx_realmode_helpers() first reserves memory for the TSS and afterwards for the identity page tables, but it's not a property this code should assume. > + */ > + for ( i = 0; i < d->arch.nr_e820; i++ ) > + { > + paddr_t start = d->arch.e820[i].addr; > + paddr_t end = d->arch.e820[i].addr + d->arch.e820[i].size; > + > + if ( start >= kernel_end ) > + return false; > + > + if ( start <= kernel_start && > + end >= kernel_end && > + d->arch.e820[i].type == E820_RAM ) Nit: I would maybe do the type check before the boundary ones, as it's pointless to do boundary checking on a region of a non-RAM type. But I guess you could also see it the other way around. > + return true; > + } > + > + return false; > +} > + > +/* Find an e820 RAM region that fits the kernel at a suitable alignment. */ > +static paddr_t __init find_kernel_memory( > + const struct domain *d, struct elf_binary *elf, const for elf also. > + const struct elf_dom_parms *parms) > +{ > + paddr_t kernel_start = (paddr_t)elf->dest_base & PAGE_MASK; > + paddr_t kernel_end = PAGE_ALIGN((paddr_t)elf->dest_base + > elf->dest_size); > + paddr_t kernel_size = kernel_end - kernel_start; Hm, I'm again unsure about the alignments applied here. I think if anything we want to assert that dest_base is aligned to phys_align. > + unsigned int i; > + > + /* > + * The memory map is sorted and all RAM regions starts and sizes are > + * aligned to page boundaries. > + */ > + for ( i = 0; i < d->arch.nr_e820; i++ ) > + { > + paddr_t start = d->arch.e820[i].addr; > + paddr_t end = d->arch.e820[i].addr + d->arch.e820[i].size; > + paddr_t kstart, kend; > + > + if ( d->arch.e820[i].type != E820_RAM ) > + continue; > + > + if ( d->arch.e820[i].size < kernel_size ) > + continue; You can unify both checks in a single condition. > + > + kstart = ROUNDUP(start, parms->phys_align); > + kstart = kstart < parms->phys_min ? parms->phys_min : kstart; > + kend = kstart + kernel_size; > + > + if ( kend > parms->phys_max ) > + return 0; > + > + if ( kend <= end ) > + return kstart; > + } > + > + return 0; > +} > + > +/* Check the kernel load address, and adjust if necessary and possible. */ > +static bool __init check_and_adjust_load_address( > + const struct domain *d, struct elf_binary *elf, struct elf_dom_parms > *parms) > +{ > + paddr_t reloc_base; > + > + if ( check_load_address(d, elf) ) > + return true; > + > + if ( parms->phys_align == UNSET_ADDR ) > + { > + printk("Address conflict and %pd kernel is not relocatable\n", d); > + return false; > + } > + > + reloc_base = find_kernel_memory(d, elf, parms); > + if ( reloc_base == 0 ) > + { > + printk("Failed find a load address for the kernel\n"); Since you print the domain in the error message prior to this one, I would consider also printing it here (or not printing it in both cases). > + return false; > + } > + > + if ( opt_dom0_verbose ) > + printk("Relocating kernel from [%lx, %lx] -> [%lx, %lx]\n", > + (paddr_t)elf->dest_base, > + (paddr_t)elf->dest_base + elf->dest_size, > + reloc_base, reloc_base + elf->dest_size); I think you need `- 1` for the end calculation if you use inclusive ranges [, ]. Otherwise [, ) should be used. > + > + parms->phys_entry += (reloc_base - (paddr_t)elf->dest_base); This seems to assume that the image is always relocated at a higher address that the original one? parms->phys_entry = reloc_base + (parms->phys_entry - elf->dest_base); > + elf->dest_base = (char *)reloc_base; > + > + return true; > +} > + > static int __init pvh_load_kernel(struct domain *d, const module_t *image, > unsigned long image_headroom, > module_t *initrd, void *image_base, > @@ -585,6 +687,12 @@ static int __init pvh_load_kernel(struct domain *d, > const module_t *image, > elf.dest_base = (void *)(parms.virt_kstart - parms.virt_base); > elf.dest_size = parms.virt_kend - parms.virt_kstart; > > + if ( !check_and_adjust_load_address(d, &elf, &parms) ) > + { > + printk("Unable to load kernel\n"); check_and_adjust_load_address() already prints an error message, probably no need to print another message. > + return -ENOMEM; > + } > + > elf_set_vcpu(&elf, v); > rc = elf_load_binary(&elf); > if ( rc < 0 ) > diff --git a/xen/common/libelf/libelf-dominfo.c > b/xen/common/libelf/libelf-dominfo.c > index 7cc7b18a51..837a1b0f21 100644 > --- a/xen/common/libelf/libelf-dominfo.c > +++ b/xen/common/libelf/libelf-dominfo.c > @@ -125,6 +125,7 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf, > [XEN_ELFNOTE_SUSPEND_CANCEL] = { "SUSPEND_CANCEL", ELFNOTE_INT }, > [XEN_ELFNOTE_MOD_START_PFN] = { "MOD_START_PFN", ELFNOTE_INT }, > [XEN_ELFNOTE_PHYS32_ENTRY] = { "PHYS32_ENTRY", ELFNOTE_INT }, > + [XEN_ELFNOTE_PVH_RELOCATION] = { "PVH_RELOCATION", ELFNOTE_OTHER }, > }; > /* *INDENT-ON* */ > > @@ -234,6 +235,17 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary > *elf, > elf_note_numeric_array(elf, note, 8, 0), > elf_note_numeric_array(elf, note, 8, 1)); > break; > + > + case XEN_ELFNOTE_PVH_RELOCATION: > + if ( elf_uval(elf, note, descsz) != 3 * sizeof(uint64_t) ) > + return -1; > + > + parms->phys_min = elf_note_numeric_array(elf, note, 8, 0); > + parms->phys_max = elf_note_numeric_array(elf, note, 8, 1); > + parms->phys_align = elf_note_numeric_array(elf, note, 8, 2); Size for those needs to be 4 (32bits) as the entry point is in 32bit mode? I don't see how we can start past the 4GB boundary. > + elf_msg(elf, "min: %#"PRIx64" max: %#"PRIx64" align: %#"PRIx64"\n", > + parms->phys_min, parms->phys_max, parms->phys_align); > + break; > } > return 0; > } > @@ -545,6 +557,7 @@ elf_errorstatus elf_xen_parse(struct elf_binary *elf, > parms->p2m_base = UNSET_ADDR; > parms->elf_paddr_offset = UNSET_ADDR; > parms->phys_entry = UNSET_ADDR32; > + parms->phys_align = UNSET_ADDR; For correctness I would also init phys_{min,max}. Thanks, Roger.
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |