[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] x86/PVH: extend checking in hwdom_fixup_p2m()
On 07.07.2025 16:44, Jan Beulich wrote: > We're generally striving to minimize behavioral differences between PV > and PVH Dom0. Using (just?) is_memory_hole() in the PVH case looks quite > a bit weaker to me, compared to the page ownership check done in the PV > case. Extend checking accordingly. > > Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> > --- > The addition may actually be suitable to replace the use of > is_memory_hole() here. While dropping that would in particular extend > coverage to E820_RESERVED regions, those are identity-mapped anyway > (albeit oddly enough still by IOMMU code). This really is more of a by-product of me looking into the failing (by default) memory accesses that Dom0 is doing on systems I can put my hands on. Those accesses are indeed occurring in the context of ACPI evaluating e.g. _INI or _STA methods, and at least some of the underlying memory regions also appear in /proc/iomem. Which means that in principle there could be the option of Dom0 informing us of (at least some of) such regions. However, the accesses occur ahead of the kernel actually collecting resource data. In particular, the intention would have been for the to-be-added sub-op to be invoked from drivers/pnp/system.c:reserve_range(). That runs from an fs_initcall though, while the accesses occur from underneath acpi_bus_init() and acpi_scan_init(), both called from acpi_init(), which is a subsys_initcall. It also doesn't look as if re-arranging things in Linux would be reasonably possible. Hence the only (vague) option might be to duplicate some of what is already being done. Yet besides this appearing undesirable to me, even if we hooked such duplicate code up from acpi_arch_init(), that would only cover accesses from underneath acpi_scan_init(); acpi_bus_init() runs yet earlier. Nevertheless, for the sake of completeness I'll reproduce my draft, patch (not having the intended effect) below. For the moment, however, I think I need to give up my hope that we could deal with the problem by other than the "dom0=pf-fixup" approach. Jan --- Along the lines of the respective remark in "x86/PVH: extend checking in hwdom_fixup_p2m()" submitted earlier, is_memory_hole() isn't used in PHYSDEVOP_system_memory handling. The new physdev-op is intended to be invoked from e.g. Linux'es drivers/pnp/system.c:reserve_range(). Whether to make this a physdev-op or a platform-op wasn't quite clear to me; the two have a pretty fuzzy boundary between them. By observation, the size field doesn't need to be 64 bits wide. If we still wanted to make it so, we'd definitely need to add preemption checking, too. (We may need to do so anyway, I simply wasn't sure.) There are sub-page regions being reported on the system I'm looking at. Not sure what to do about them. --- unstable.orig/xen/arch/x86/hvm/hypercall.c +++ unstable/xen/arch/x86/hvm/hypercall.c @@ -88,6 +88,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_H case PHYSDEVOP_pci_device_remove: case PHYSDEVOP_pci_device_reset: case PHYSDEVOP_dbgp_op: + case PHYSDEVOP_system_memory: if ( !is_hardware_domain(currd) ) return -ENOSYS; break; --- unstable.orig/xen/arch/x86/physdev.c +++ unstable/xen/arch/x86/physdev.c @@ -27,6 +27,8 @@ int physdev_unmap_pirq(struct domain *d, #ifndef COMPAT typedef long ret_t; +#include <asm/altp2m.h> + static int physdev_hvm_map_pirq( struct domain *d, int type, int *index, int *pirq) { @@ -619,6 +621,71 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H break; } +#ifndef COMPAT + case PHYSDEVOP_system_memory: { + struct xen_physdev_sysmem_op op; + + if ( !is_hardware_domain(currd) ) + ret = -EPERM; + else if ( copy_from_guest(&op, arg, 1) ) + ret = -EFAULT; + else if ( !op.size ) + ret = 0; + else if ( (op.addr | op.size) & (PAGE_SIZE - 1) || + (op.addr >> paddr_bits) || + ((op.addr + op.size - 1) >> paddr_bits) || + op.flags ) + ret = -EINVAL; + else + { + mfn_t mfn = maddr_to_mfn(op.addr); + + if ( mfn_valid(mfn) ) + { + struct page_info *pg = mfn_to_page(mfn); + const struct domain *owner = page_get_owner_and_reference(pg); + + if ( owner ) + put_page(pg); + if ( owner != dom_io ) + return -EINVAL; + } + + /* + * Avoid the need to fix up behind a PVH Dom0, by inserting the + * desired mapping right away. See also hwdom_fixup_p2m(). + */ + if ( !is_hvm_domain(currd) ) + ret = 0; + else if ( !iomem_access_permitted(currd, mfn_x(mfn), + (mfn_x(mfn) + + PFN_DOWN(op.size) - 1)) || + altp2m_active(currd) ) + ret = -EACCES; + else +{ printk("sysmem: %08lx (%04x bytes; hole:%d)\n",//temp + op.addr, op.size, is_memory_hole(mfn, mfn_add(mfn, PFN_DOWN(op.size) - 1))); + ret = 0; break;//temp + do { + p2m_type_t type; + unsigned long gfn = mfn_x(mfn); + mfn_t omfn = get_gfn(currd, gfn, &type); + + if ( !mfn_eq(omfn, INVALID_MFN) || !p2m_is_hole(type) ) + ret = mfn_eq(omfn, mfn) ? 0 : -ENOTEMPTY; + else + ret = set_mmio_p2m_entry(currd, _gfn(gfn), mfn, 0); + put_gfn(currd, gfn); + + mfn = mfn_add(mfn, 1); + op.size -= PAGE_SIZE; + } while ( !ret && op.size ); +} + } + break; + } +#endif /* !COMPAT */ + default: ret = pci_physdev_op(cmd, arg); break; --- unstable.orig/xen/include/public/physdev.h +++ unstable/xen/include/public/physdev.h @@ -344,6 +344,20 @@ typedef struct physdev_dbgp_op physdev_d DEFINE_XEN_GUEST_HANDLE(physdev_dbgp_op_t); /* + * Notify the hypervisor of a system memory range ("Motherboard resource"), + * some of which can only be discovered via interpreting ACPI's AML. For PVH + * Dom0 such ranges, if deemed valid, would then have a 1:1 translation + * inserted into their P2M. + */ +#define PHYSDEVOP_system_memory 33 +struct xen_physdev_sysmem_op { + uint64_t addr; + uint32_t size; + /* No flags defined so far; field must be zero. */ + uint32_t flags; +}; + +/* * Notify that some PIRQ-bound event channels have been unmasked. * ** This command is obsolete since interface version 0x00030202 and is ** * ** unsupported by newer versions of Xen. **
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |