[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH for-4.14] x86/tlb: fix assisted flush usage



On 18.06.2020 18:04, Roger Pau Monne wrote:
> Commit e9aca9470ed86 introduced a regression when avoiding sending
> IPIs for certain flush operations. Xen page fault handler
> (spurious_page_fault) relies on blocking interrupts in order to
> prevent handling TLB flush IPIs and thus preventing other CPUs from
> removing page tables pages. Switching to assisted flushing avoided such
> IPIs, and thus can result in pages belonging to the page tables being
> removed (and possibly re-used) while __page_fault_type is being
> executed.
> 
> Force some of the TLB flushes to use IPIs, thus avoiding the assisted
> TLB flush. Those selected flushes are the page type change (when
> switching from a page table type to a different one, ie: a page that
> has been removed as a page table) and page allocation. This sadly has
> a negative performance impact on the pvshim, as less assisted flushes
> can be used.
> 
> Introduce a new flag (FLUSH_FORCE_IPI) and helper to force a TLB flush
> using an IPI (flush_tlb_mask_sync). Note that the flag is only
> meaningfully defined when the hypervisor supports PV mode, as
> otherwise translated domains are in charge of their page tables and
> won't share page tables with Xen, thus not influencing the result of
> page walks performed by the spurious fault handler.

Is this true for shadow mode? If a page shadowing a guest one was
given back quickly enough to the allocator and then re-used, I think
the same situation could in principle arise.

> Note the flag is not defined on Arm, and the introduced helper is just
> a dummy alias to the existing flush_tlb_mask.
> 
> Fixes: e9aca9470ed86 ('x86/tlb: use Xen L0 assisted TLB flush when available')
> Reported-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> ---
> It's my understanding that not doing such IPI flushes could lead to
> the pages tables being read by __page_fault_type being modified by a
> third party, which could make them point to other mfns out of the
> scope of the guest allowed physical memory addresses. However those
> accesses would be limited to __page_fault_type, and hence the main
> worry would be that a guest could make it point to read from a
> physical memory region that has side effects?

I don't think so, no - the memory could be changed such that the
PTEs are invalid altogether (like having reserved bits set). Consider
for example the case of reading an MFN out of such a PTE that's larger
than the physical address width supported by the CPU. Afaict
map_domain_page() will happily install a respective page table entry,
but we'd get a reserved-bit #PF upon reading from that mapping.

> ---
>  xen/arch/x86/mm.c              | 12 +++++++++++-
>  xen/common/memory.c            |  2 +-
>  xen/common/page_alloc.c        |  2 +-
>  xen/include/asm-arm/flushtlb.h |  1 +
>  xen/include/asm-x86/flushtlb.h | 14 ++++++++++++++
>  xen/include/xen/mm.h           |  8 ++++++--
>  6 files changed, 34 insertions(+), 5 deletions(-)

Not finding a consumer of the new flag, my first reaction was to
ask whether there's code missing somewhere. Having looked at
flush_area_mask() another time I now understand the itended
behavior results because of the extra flag now allowing
hypervisor_flush_tlb() to be entered. I think that's something
that's worth calling out in the description, or perhaps even in
the comment next to the #define.

> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -2894,7 +2894,17 @@ static int _get_page_type(struct page_info *page, 
> unsigned long type,
>                        ((nx & PGT_type_mask) == PGT_writable_page)) )
>                  {
>                      perfc_incr(need_flush_tlb_flush);
> -                    flush_tlb_mask(mask);
> +                    if ( (x & PGT_type_mask) &&
> +                         (x & PGT_type_mask) <= PGT_l4_page_table )

With there being 5-level page tables around the corner, I think
we ought to get used to use PGT_root_page_table (or alike)
whenever possible, to avoid having to touch such code when
adding support for the new paging mode.

> --- a/xen/include/asm-x86/flushtlb.h
> +++ b/xen/include/asm-x86/flushtlb.h
> @@ -126,6 +126,12 @@ void switch_cr3_cr4(unsigned long cr3, unsigned long 
> cr4);
>  #else
>  #define FLUSH_HVM_ASID_CORE 0
>  #endif
> +#if CONFIG_PV

#ifdef

> +/* Force an IPI to be sent */
> +# define FLUSH_FORCE_IPI 0x8000
> +#else
> +# define FLUSH_FORCE_IPI 0
> +#endif

If my shadow mode concern above is unwarranted, this overhead could
also be avoided if there's no PV domain at all in the system.
Perhaps an improvement not for now, but for the future ...

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.