Xen project Mailing List

Re: [PATCH for-4.14] x86/tlb: fix assisted flush usage

To: Roger Pau Monne <roger.pau@xxxxxxxxxx>

Date: Fri, 19 Jun 2020 16:06:55 +0200

Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Wei Liu <wl@xxxxxxx>, paul@xxxxxxx, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Ian Jackson <ian.jackson@xxxxxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>

Delivery-date: Fri, 19 Jun 2020 14:07:16 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 18.06.2020 18:04, Roger Pau Monne wrote: > Commit e9aca9470ed86 introduced a regression when avoiding sending > IPIs for certain flush operations. Xen page fault handler > (spurious_page_fault) relies on blocking interrupts in order to > prevent handling TLB flush IPIs and thus preventing other CPUs from > removing page tables pages. Switching to assisted flushing avoided such > IPIs, and thus can result in pages belonging to the page tables being > removed (and possibly re-used) while __page_fault_type is being > executed. > > Force some of the TLB flushes to use IPIs, thus avoiding the assisted > TLB flush. Those selected flushes are the page type change (when > switching from a page table type to a different one, ie: a page that > has been removed as a page table) and page allocation. This sadly has > a negative performance impact on the pvshim, as less assisted flushes > can be used. > > Introduce a new flag (FLUSH_FORCE_IPI) and helper to force a TLB flush > using an IPI (flush_tlb_mask_sync). Note that the flag is only > meaningfully defined when the hypervisor supports PV mode, as > otherwise translated domains are in charge of their page tables and > won't share page tables with Xen, thus not influencing the result of > page walks performed by the spurious fault handler. Is this true for shadow mode? If a page shadowing a guest one was given back quickly enough to the allocator and then re-used, I think the same situation could in principle arise. > Note the flag is not defined on Arm, and the introduced helper is just > a dummy alias to the existing flush_tlb_mask. > > Fixes: e9aca9470ed86 ('x86/tlb: use Xen L0 assisted TLB flush when available') > Reported-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx> > --- > It's my understanding that not doing such IPI flushes could lead to > the pages tables being read by __page_fault_type being modified by a > third party, which could make them point to other mfns out of the > scope of the guest allowed physical memory addresses. However those > accesses would be limited to __page_fault_type, and hence the main > worry would be that a guest could make it point to read from a > physical memory region that has side effects? I don't think so, no - the memory could be changed such that the PTEs are invalid altogether (like having reserved bits set). Consider for example the case of reading an MFN out of such a PTE that's larger than the physical address width supported by the CPU. Afaict map_domain_page() will happily install a respective page table entry, but we'd get a reserved-bit #PF upon reading from that mapping. > --- > xen/arch/x86/mm.c | 12 +++++++++++- > xen/common/memory.c | 2 +- > xen/common/page_alloc.c | 2 +- > xen/include/asm-arm/flushtlb.h | 1 + > xen/include/asm-x86/flushtlb.h | 14 ++++++++++++++ > xen/include/xen/mm.h | 8 ++++++-- > 6 files changed, 34 insertions(+), 5 deletions(-) Not finding a consumer of the new flag, my first reaction was to ask whether there's code missing somewhere. Having looked at flush_area_mask() another time I now understand the itended behavior results because of the extra flag now allowing hypervisor_flush_tlb() to be entered. I think that's something that's worth calling out in the description, or perhaps even in the comment next to the #define. > --- a/xen/arch/x86/mm.c > +++ b/xen/arch/x86/mm.c > @@ -2894,7 +2894,17 @@ static int _get_page_type(struct page_info *page, > unsigned long type, > ((nx & PGT_type_mask) == PGT_writable_page)) ) > { > perfc_incr(need_flush_tlb_flush); > - flush_tlb_mask(mask); > + if ( (x & PGT_type_mask) && > + (x & PGT_type_mask) <= PGT_l4_page_table ) With there being 5-level page tables around the corner, I think we ought to get used to use PGT_root_page_table (or alike) whenever possible, to avoid having to touch such code when adding support for the new paging mode. > --- a/xen/include/asm-x86/flushtlb.h > +++ b/xen/include/asm-x86/flushtlb.h > @@ -126,6 +126,12 @@ void switch_cr3_cr4(unsigned long cr3, unsigned long > cr4); > #else > #define FLUSH_HVM_ASID_CORE 0 > #endif > +#if CONFIG_PV #ifdef > +/* Force an IPI to be sent */ > +# define FLUSH_FORCE_IPI 0x8000 > +#else > +# define FLUSH_FORCE_IPI 0 > +#endif If my shadow mode concern above is unwarranted, this overhead could also be avoided if there's no PV domain at all in the system. Perhaps an improvement not for now, but for the future ... Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.