[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v7 1/2] x86/mm: add API for marking only part of a MMIO page read only
On 26.07.2024 03:55, Marek Marczykowski-Górecki wrote: > In some cases, only few registers on a page needs to be write-protected. > Examples include USB3 console (64 bytes worth of registers) or MSI-X's > PBA table (which doesn't need to span the whole table either), although > in the latter case the spec forbids placing other registers on the same > page. Current API allows only marking whole pages pages read-only, > which sometimes may cover other registers that guest may need to > write into. > > Currently, when a guest tries to write to an MMIO page on the > mmio_ro_ranges, it's either immediately crashed on EPT violation - if > that's HVM, or if PV, it gets #PF. In case of Linux PV, if access was > from userspace (like, /dev/mem), it will try to fixup by updating page > tables (that Xen again will force to read-only) and will hit that #PF > again (looping endlessly). Both behaviors are undesirable if guest could > actually be allowed the write. > > Introduce an API that allows marking part of a page read-only. Since > sub-page permissions are not a thing in page tables (they are in EPT, > but not granular enough), do this via emulation (or simply page fault > handler for PV) that handles writes that are supposed to be allowed. > The new subpage_mmio_ro_add() takes a start physical address and the > region size in bytes. Both start address and the size need to be 8-byte > aligned, as a practical simplification (allows using smaller bitmask, > and a smaller granularity isn't really necessary right now). > It will internally add relevant pages to mmio_ro_ranges, but if either > start or end address is not page-aligned, it additionally adds that page > to a list for sub-page R/O handling. The list holds a bitmask which > qwords are supposed to be read-only and an address where page is mapped > for write emulation - this mapping is done only on the first access. A > plain list is used instead of more efficient structure, because there > isn't supposed to be many pages needing this precise r/o control. > > The mechanism this API is plugged in is slightly different for PV and > HVM. For both paths, it's plugged into mmio_ro_emulated_write(). For PV, > it's already called for #PF on read-only MMIO page. For HVM however, EPT > violation on p2m_mmio_direct page results in a direct domain_crash() for > non hardware domains. To reach mmio_ro_emulated_write(), change how > write violations for p2m_mmio_direct are handled - specifically, check > if they relate to such partially protected page via > subpage_mmio_write_accept() and if so, call hvm_emulate_one_mmio() for > them too. This decodes what guest is trying write and finally calls > mmio_ro_emulated_write(). The EPT write violation is detected as > npfec.write_access and npfec.present both being true (similar to other > places), which may cover some other (future?) cases - if that happens, > emulator might get involved unnecessarily, but since it's limited to > pages marked with subpage_mmio_ro_add() only, the impact is minimal. > Both of those paths need an MFN to which guest tried to write (to check > which part of the page is supposed to be read-only, and where > the page is mapped for writes). This information currently isn't > available directly in mmio_ro_emulated_write(), but in both cases it is > already resolved somewhere higher in the call tree. Pass it down to > mmio_ro_emulated_write() via new mmio_ro_emulate_ctxt.mfn field. > > This may give a bit more access to the instruction emulator to HVM > guests (the change in hvm_hap_nested_page_fault()), but only for pages > explicitly marked with subpage_mmio_ro_add() - so, if the guest has a > passed through a device partially used by Xen. > As of the next patch, it applies only configuration explicitly > documented as not security supported. > > The subpage_mmio_ro_add() function cannot be called with overlapping > ranges, and on pages already added to mmio_ro_ranges separately. > Successful calls would result in correct handling, but error paths may > result in incorrect state (like pages removed from mmio_ro_ranges too > early). Debug build has asserts for relevant cases. > > Signed-off-by: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx> Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx> with the prior restriction that this does not include the usually implied ack. > +int __init subpage_mmio_ro_add( > + paddr_t start, > + size_t size) > +{ > + mfn_t mfn_start = maddr_to_mfn(start); > + paddr_t end = start + size - 1; > + mfn_t mfn_end = maddr_to_mfn(end); > + unsigned int offset_end = 0; > + int rc; > + bool subpage_start, subpage_end; > + > + ASSERT(IS_ALIGNED(start, MMIO_RO_SUBPAGE_GRAN)); > + ASSERT(IS_ALIGNED(size, MMIO_RO_SUBPAGE_GRAN)); > + if ( !IS_ALIGNED(start, MMIO_RO_SUBPAGE_GRAN) || > + !IS_ALIGNED(size, MMIO_RO_SUBPAGE_GRAN) ) > + return -EINVAL; As a minor remark: I can certainly see the value of having separate assertions. For the if() I wonder though if it wouldn't better be if ( !IS_ALIGNED(start | size, MMIO_RO_SUBPAGE_GRAN) ) return -EINVAL; Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |