Xen project Mailing List

Re: [PATCH 3/5] x86/hvm: fix handling of accesses to partial r/o MMIO pages

From: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Mon, 14 Apr 2025 15:53:32 +0200

Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx

Delivery-date: Mon, 14 Apr 2025 13:53:44 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Mon, Apr 14, 2025 at 08:33:44AM +0200, Jan Beulich wrote: > On 11.04.2025 12:54, Roger Pau Monne wrote: > > The current logic to handle accesses to MMIO pages partially read-only is > > based on the (now removed) logic used to handle accesses to the r/o MMCFG > > region(s) for PVH v1 dom0. However that has issues when running on AMD > > hardware, as in that case the guest linear address that triggered the fault > > is not provided as part of the VM exit. This caused > > mmio_ro_emulated_write() to always fail before calling > > subpage_mmio_write_emulate() when running on AMD and called from an HVM > > context. > > > > Take a different approach and convert the handling of partial read-only > > MMIO page accesses into an HVM MMIO ops handler, as that's the more natural > > way to handle this kind of emulation for HVM domains. > > > > This allows getting rid of hvm_emulate_one_mmio() and it's single cal site > > in hvm_hap_nested_page_fault(). > > > > Note a small adjustment is needed to the `pf-fixup` dom0 PVH logic: avoid > > attempting to fixup faults resulting from accesses to read-only MMIO > > regions, as handling of those accesses is now done by handle_mmio(). > > > > Fixes: 33c19df9a5a0 ('x86/PCI: intercept accesses to RO MMIO from dom0s in > > HVM containers') > > Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx> > > --- > > The fixes tag is maybe a bit wonky, it's either this or: > > > > 8847d6e23f97 ('x86/mm: add API for marking only part of a MMIO page read > > only') > > > > However the addition of subpage r/o access handling to the existing > > mmio_ro_emulated_write() function was done based on the assumption that the > > current code was working - which turned out to not be the case for AMD, > > hence my preference for blaming the commit that actually introduced the > > broken logic. > > I agree. > > > --- > > xen/arch/x86/hvm/emulate.c | 47 +------------- > > xen/arch/x86/hvm/hvm.c | 89 +++++++++++++++++++++++--- > > It's quite a bit of rather special code you add to this catch-all file. How > about introducing a small new one, e.g. mmio.c (to later maybe become home > to some further stuff)? Yes, I didn't find any suitable place, I was also considering hvm/io.c or hvm/intercept.c, but none looked very good TBH. The main benefit of placing it in hvm.c is that the functions can all be static and confined to hvm.c as it's only user. > > --- a/xen/arch/x86/hvm/emulate.c > > +++ b/xen/arch/x86/hvm/emulate.c > > @@ -370,7 +370,8 @@ static int hvmemul_do_io( > > /* If there is no suitable backing DM, just ignore accesses */ > > if ( !s ) > > { > > - if ( is_mmio && is_hardware_domain(currd) ) > > + if ( is_mmio && is_hardware_domain(currd) && > > + !rangeset_contains_singleton(mmio_ro_ranges, > > PFN_DOWN(addr)) ) > > I think this needs a brief comment - it otherwise looks misplaced here. > > > @@ -585,9 +585,81 @@ static int cf_check hvm_print_line( > > return X86EMUL_OKAY; > > } > > > > +static int cf_check subpage_mmio_accept(struct vcpu *v, unsigned long addr) > > +{ > > + p2m_type_t t; > > + mfn_t mfn = get_gfn_query_unlocked(v->domain, addr, &t); > > + > > + return !mfn_eq(mfn, INVALID_MFN) && t == p2m_mmio_direct && > > + !!subpage_mmio_find_page(mfn); > > The !! isn't needed here, is it? IIRC clang complained about conversion from pointer to integer without a cast, but maybe that was before adding the further conditions here. > > +} > > + > > +static int cf_check subpage_mmio_read( > > + struct vcpu *v, unsigned long addr, unsigned int len, unsigned long > > *data) > > +{ > > + struct domain *d = v->domain; > > + p2m_type_t t; > > + mfn_t mfn = get_gfn_query(d, addr, &t); > > + struct subpage_ro_range *entry; > > + volatile void __iomem *mem; > > + > > + *data = ~0UL; > > + > > + if ( mfn_eq(mfn, INVALID_MFN) || t != p2m_mmio_direct ) > > + { > > + put_gfn(d, addr); > > + return X86EMUL_RETRY; > > + } > > + > > + entry = subpage_mmio_find_page(mfn); > > + if ( !entry ) > > + { > > + put_gfn(d, addr); > > + return X86EMUL_RETRY; > > + } > > + > > + mem = subpage_mmio_map_page(entry); > > + if ( !mem ) > > + { > > + put_gfn(d, addr); > > + gprintk(XENLOG_ERR, "Failed to map page for MMIO read at %#lx\n", > > + mfn_to_maddr(mfn) + PAGE_OFFSET(addr)); > > Makes me wonder whether the function parameter wouldn't better be named > "curr" and the local variable "currd", to reflect that this log message > will report appropriate context. Sure, can adjust. > Would also logging the guest physical address perhaps be worthwhile here? Possibly, my first apporahc was to print gfn -> mfn, but ended up copying the same message from subpage_mmio_write_emulate() for symmetry reasons. > > + return X86EMUL_OKAY; > > + } > > + > > + *data = read_mmio(mem + PAGE_OFFSET(addr), len); > > + > > + put_gfn(d, addr); > > + return X86EMUL_OKAY; > > +} > > + > > +static int cf_check subpage_mmio_write( > > + struct vcpu *v, unsigned long addr, unsigned int len, unsigned long > > data) > > +{ > > + struct domain *d = v->domain; > > + p2m_type_t t; > > + mfn_t mfn = get_gfn_query(d, addr, &t); > > + > > + if ( mfn_eq(mfn, INVALID_MFN) || t != p2m_mmio_direct ) > > + { > > + put_gfn(d, addr); > > + return X86EMUL_RETRY; > > + } > > + > > + subpage_mmio_write_emulate(mfn, PAGE_OFFSET(addr), data, len); > > + > > + put_gfn(d, addr); > > + return X86EMUL_OKAY; > > +} > > Unlike the read path this doesn't return RETRY when subpage_mmio_find_page() > fails (indicating something must have changed after "accept" said yes). Yeah, I've noticed this, but didn't feel like modifying subpage_mmio_write_emulate() for this. The list of partial r/o MMIO pages cannot change after system boot, so accept returning yes and not finding a page here would likely warrant an ASSERT_UNRECHABLE(). Would you like me to modify subpage_mmio_write_emulate to make it return an error code? > > @@ -1981,7 +2056,9 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned > > long gla, > > */ > > if ( (p2mt == p2m_mmio_dm) || > > (npfec.write_access && > > - (p2m_is_discard_write(p2mt) || (p2mt == p2m_ioreq_server))) ) > > + (p2m_is_discard_write(p2mt) || (p2mt == p2m_ioreq_server) || > > + /* MMIO entries can be r/o if the target mfn is in > > mmio_ro_ranges. */ > > + (p2mt == p2m_mmio_direct))) ) > > { > > if ( !handle_mmio_with_translation(gla, gfn, npfec) ) > > hvm_inject_hw_exception(X86_EXC_GP, 0); > > Aren't we handing too many things to handle_mmio_with_translation() this > way? At the very least you're losing ... > > > @@ -2033,14 +2110,6 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned > > long gla, > > goto out_put_gfn; > > } > > > > - if ( (p2mt == p2m_mmio_direct) && npfec.write_access && npfec.present > > && > > ... the .present check. Isn't the p2mt == p2m_mmio_direct check already ensuring the entry is present? Otherwise it's type would be p2m_invalid or p2m_mmio_dm? It did seem to me the other checks in this function already assume that by having a valid type the entry is present. > I'm also concerned of e.g. VT-x'es APIC access MFN, which is > p2m_mmio_direct. But that won't go into hvm_hap_nested_page_fault() when using cpu_has_vmx_virtualize_apic_accesses (and thus having an APIC page mapped as p2m_mmio_direct)? It would instead be an EXIT_REASON_APIC_ACCESS vmexit which is handled differently? Thanks, Roger.

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.