Xen project Mailing List

Re: [PATCH 3/5] x86/hvm: fix handling of accesses to partial r/o MMIO pages

From: Marek Marczykowski <marmarek@xxxxxxxxxxxxxxxxxxxxxx>

Date: Tue, 15 Apr 2025 12:40:11 +0200

Cc: Roger Pau Monné <roger.pau@xxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx

Delivery-date: Tue, 15 Apr 2025 10:40:22 +0000

Feedback-id: i1568416f:Fastmail

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Tue, Apr 15, 2025 at 12:18:04PM +0200, Jan Beulich wrote: > On 15.04.2025 12:04, Roger Pau Monné wrote: > > On Tue, Apr 15, 2025 at 11:41:27AM +0200, Jan Beulich wrote: > >> On 15.04.2025 10:34, Roger Pau Monné wrote: > >>> On Tue, Apr 15, 2025 at 09:32:37AM +0200, Jan Beulich wrote: > >>>> On 14.04.2025 18:13, Roger Pau Monné wrote: > >>>>> On Mon, Apr 14, 2025 at 05:24:32PM +0200, Jan Beulich wrote: > >>>>>> On 14.04.2025 15:53, Roger Pau Monné wrote: > >>>>>>> On Mon, Apr 14, 2025 at 08:33:44AM +0200, Jan Beulich wrote: > >>>>>>>> I'm also concerned of e.g. VT-x'es APIC access MFN, which is > >>>>>>>> p2m_mmio_direct. > >>>>>>> > >>>>>>> But that won't go into hvm_hap_nested_page_fault() when using > >>>>>>> cpu_has_vmx_virtualize_apic_accesses (and thus having an APIC page > >>>>>>> mapped as p2m_mmio_direct)? > >>>>>>> > >>>>>>> It would instead be an EXIT_REASON_APIC_ACCESS vmexit which is handled > >>>>>>> differently? > >>>>>> > >>>>>> All true as long as things work as expected (potentially including the > >>>>>> guest > >>>>>> also behaving as expected). Also this was explicitly only an example I > >>>>>> could > >>>>>> readily think of. I'm simply wary of handle_mmio_with_translation() now > >>>>>> getting things to handle it's not meant to ever see. > >>>>> > >>>>> How was access to MMIO r/o regions supposed to be handled before > >>>>> 33c19df9a5a0 (~2015)? I see that setting r/o MMIO p2m entries was > >>>>> added way before to p2m_type_to_flags() and ept_p2m_type_to_flags() > >>>>> (~2010), yet I can't figure out how writes would be handled back then > >>>>> that didn't result in a p2m fault and crashing of the domain. > >>>> > >>>> Was that handled at all before said change? > >>> > >>> Not really AFAICT, hence me wondering how where write accesses to r/o > >>> MMIO regions supposed to be handled by (non-priv) domains. Was the > >>> expectation that those writes trigger an p2m violation thus crashing > >>> the domain? > >> > >> I think so, yes. Devices with such special areas weren't (aren't?) supposed > >> to be handed to DomU-s. > > > > Oh, I see. That makes stuff a bit clearer. I think we would then > > also want to add some checks to {ept_}p2m_type_to_flags()? > > > > I wonder why handling of mmio_ro_ranges was added to the HVM p2m code > > in ~2010 then. If mmio_ro_ranges is only supposed to be relevant for > > the hardware domain in ~2010 an HVM dom0 was not even in sight? > > I fear because I was wrong with what I said in the earlier reply: There's > one exception - the MSI-X tables of devices. DomU-s (and even Dom0) aren't > supposed to access them directly, but we'd permit reads (which, at least > back at the time, were also required to keep qemu working). And there is also a case where some devices have other registers on the same page as MSI-X tables. But this case is handled specially in the MSI-X code, not via sub-page R/O API. > > Sorry to ask so many questions, I'm a bit confused about how this > > was/is supposed to work. > > No worries - as you can see, I'm not getting it quite straight either. > > >>>> mmio_ro_do_page_fault() was > >>>> (and still is) invoked for the hardware domain only, and quite likely > >>>> the need for handling (discarding) writes for PVHv1 had been overlooked > >>>> until someone was hit by the lack thereof. > >>> > >>> I see, I didn't realize r/o MMIO was only handled for PV hardware > >>> domains only. I could arguably do the same for HVM in > >>> hvm_hap_nested_page_fault(). > >>> > >>> Not sure whether the subpage stuff is supposed to be functional for > >>> domains different than the hardware domain? It seems to be available > >>> to the hanrdware domain only for PV guests, while for HVM is available > >>> for both PV and HVM domains: > >> > >> DYM Dom0 and DomU here? > > > > Indeed, sorry. I'm not sure about the PV case and domU. I think I tested it at some iteration, but it isn't configuration that I care much about. If it doesn't work (and fixing it would make it even more complex), IMO we can simply adjust documentation of XHCI_SHARE_ANY to say it works only with HVM domU. The domU case exists mostly (only?) to enable automated testing. I do a lot of that on laptops, which have only a single USB controller (no way to plug any extra one), and I need that USB controller in a domU for several tests. In fact, the XHCI console is a debugging feature in the first place. So, the domU part doesn't need security support, can require extra hoops to jump through etc. > >>> is_hardware_domain(currd) || subpage_mmio_write_accept(mfn, gla) > >>> > >>> In hvm_hap_nested_page_fault(). > >> > >> See the three XHCI_SHARE_* modes. When it's XHCI_SHARE_ANY, even DomU-s > >> would require this handling. It looks like a mistake that we permit the > >> path to be taken for DomU-s even when the mode is XHCI_SHARE_HWDOM. > > > > Arguable a domU will never get the device assigned in the first place > > unless the share mode is set to XHCI_SHARE_ANY. For the other modes > > the device is hidden, and hence couldn't be assigned to a domU anyway. > > Correct. Yet then we permit a code path to be taken which is supposedly > unnecessary, but potentially (if something went wrong) harmful. Since the XHCI_SHARE_ANY case is rare (and not security-supported), maybe there should be a global variable guarding this part? It would be set to true only if XHCI_SHARE_ANY is used (or some future use of this subpage-ro API with a domU). Then, that code would still be potentially reachable for all domUs (if XHCI_SHARE_ANY is used), but that's still better? Anyway, I'm still not sure what the concern is. What is the (not purely theoretical) case where domU gains access to the emulator, where without this feature it wouldn't have it already? Any HVM can hit the emulator already, regardless of this feature, no? > >>>>> I'm happy to look at other ways to handling this, but given there's > >>>>> current logic for handling accesses to read-only regions in > >>>>> hvm_hap_nested_page_fault() I think re-using that was the best way to > >>>>> also handle accesses to MMIO read-only regions. > >>>>> > >>>>> Arguably it would already be the case that for other reasons Xen would > >>>>> need to emulate an instruction that accesses a read-only MMIO region? > >>>> > >>>> Aiui hvm_translate_get_page() will yield HVMTRANS_bad_gfn_to_mfn for > >>>> p2m_mmio_direct (after all, "direct" means we expect no emulation is > >>>> needed; while arguably wrong for the introspection case, I'm not sure > >>>> that and pass-through actually go together). Hence it's down to > >>>> hvmemul_linear_mmio_access() -> hvmemul_phys_mmio_access() -> > >>>> hvmemul_do_mmio_buffer() -> hvmemul_do_io_buffer() -> hvmemul_do_io(), > >>>> which means that if hvm_io_intercept() can't handle it, the access > >>>> will be forwarded to the responsible DM, or be "processed" by the > >>>> internal null handler. > >>>> > >>>> Given this, perhaps what you do is actually fine. At the same time > >>>> note how several functions in hvm/emulate.c simply fail upon > >>>> encountering p2m_mmio_direct. These are all REP handlers though, so > >>>> the main emulator would then try emulating the insn the non-REP way. > >>> > >>> I'm open to alternative ways of handling such accesses, just used what > >>> seemed more natural in the context of hvm_hap_nested_page_fault(). > >>> > >>> Emulation of r/o MMIO accesses failing wouldn't be an issue from Xen's > >>> perspective, that would "just" result in the guest getting a #GP > >>> injected. > >> > >> That's not the part I'm worried about. What worries me is that we open up > >> another (or better: we're widening a) way to hit the emulator in the first > >> place. (Plus, as said, the issue with the not really tidy P2M type system.) > > > > But the hit would be limited to domains having r/o p2m_mmio_direct > > entries in the p2m, as otherwise the path would be unreachable? > > I fear I don't follow - all you look for in the newly extended conditional > is the type being p2m_mmio_direct. There's no r/o-ness being checked for > until we'd make it through the emulator and into subpage_mmio_accept(). But EPT violation can be hit on p2m_mmio_direct page only if it's a write and the page is read-only, no? Is there any other case that exists today? -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab

Attachment: signature.asc
Description: PGP signature

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.