[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH for-4.14] x86/tlb: fix assisted flush usage



On Mon, Jun 22, 2020 at 01:07:10PM +0200, Jan Beulich wrote:
> On 22.06.2020 11:31, Roger Pau Monné wrote:
> > On Fri, Jun 19, 2020 at 04:06:55PM +0200, Jan Beulich wrote:
> >> On 18.06.2020 18:04, Roger Pau Monne wrote:
> >>> Commit e9aca9470ed86 introduced a regression when avoiding sending
> >>> IPIs for certain flush operations. Xen page fault handler
> >>> (spurious_page_fault) relies on blocking interrupts in order to
> >>> prevent handling TLB flush IPIs and thus preventing other CPUs from
> >>> removing page tables pages. Switching to assisted flushing avoided such
> >>> IPIs, and thus can result in pages belonging to the page tables being
> >>> removed (and possibly re-used) while __page_fault_type is being
> >>> executed.
> >>>
> >>> Force some of the TLB flushes to use IPIs, thus avoiding the assisted
> >>> TLB flush. Those selected flushes are the page type change (when
> >>> switching from a page table type to a different one, ie: a page that
> >>> has been removed as a page table) and page allocation. This sadly has
> >>> a negative performance impact on the pvshim, as less assisted flushes
> >>> can be used.
> >>>
> >>> Introduce a new flag (FLUSH_FORCE_IPI) and helper to force a TLB flush
> >>> using an IPI (flush_tlb_mask_sync). Note that the flag is only
> >>> meaningfully defined when the hypervisor supports PV mode, as
> >>> otherwise translated domains are in charge of their page tables and
> >>> won't share page tables with Xen, thus not influencing the result of
> >>> page walks performed by the spurious fault handler.
> >>
> >> Is this true for shadow mode? If a page shadowing a guest one was
> >> given back quickly enough to the allocator and then re-used, I think
> >> the same situation could in principle arise.
> > 
> > Hm, I think it's not applicable to HVM shadow mode at least, because
> > CR3 is switched as part of vmentry/vmexit, and the page tables are not
> > shared between Xen and the guest, so there's no way for a HVM shadow
> > guest to modify the page-tables while Xen is walking them in
> > spurious_page_fault (note spurious_page_fault is only called when the
> > fault happens in non-guest context).
> 
> I'm afraid I disagree, because of shadow's use of "linear page tables".

You will have to bear with me, but I don't follow.

Could you provide some pointers at how/where the shadow (I assume
guest controlled) "linear page tables" are used while in Xen
context?

do_page_fault will only call spurious_page_fault (and thus attempt a
page walk) when the fault happens in non-guest context, yet on HVM all
accesses to guest memory are performed using __hvm_copy which doesn't
load any guest page tables into CR3, but instead performs a software
walk of the paging structures in order to find and map the destination
address into the hypervisor page tables and perform the read or copy.

> >>> Note the flag is not defined on Arm, and the introduced helper is just
> >>> a dummy alias to the existing flush_tlb_mask.
> >>>
> >>> Fixes: e9aca9470ed86 ('x86/tlb: use Xen L0 assisted TLB flush when 
> >>> available')
> >>> Reported-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> >>> Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> >>> ---
> >>> It's my understanding that not doing such IPI flushes could lead to
> >>> the pages tables being read by __page_fault_type being modified by a
> >>> third party, which could make them point to other mfns out of the
> >>> scope of the guest allowed physical memory addresses. However those
> >>> accesses would be limited to __page_fault_type, and hence the main
> >>> worry would be that a guest could make it point to read from a
> >>> physical memory region that has side effects?
> >>
> >> I don't think so, no - the memory could be changed such that the
> >> PTEs are invalid altogether (like having reserved bits set). Consider
> >> for example the case of reading an MFN out of such a PTE that's larger
> >> than the physical address width supported by the CPU. Afaict
> >> map_domain_page() will happily install a respective page table entry,
> >> but we'd get a reserved-bit #PF upon reading from that mapping.
> > 
> > So there are no hazards from executing __page_fault_type against a
> > page-table that could be modified by a user?
> 
> There are - I realize only now that the way I started my earlier
> reply was ambiguous. I meant to negate your "only with side effects"
> way of thinking.

Ack.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.