Xen project Mailing List

Re: [PATCH] x86/pv: Flush TLB in response to paging structure changes

From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Wed, 21 Oct 2020 16:39:13 +0100

Authentication-results: esa1.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none

Cc: Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>

Delivery-date: Wed, 21 Oct 2020 15:39:38 +0000

Ironport-sdr: +cDykpqcAK1Q00ombkIHAnxKp0+9Ucxp5aNgtqaGvXH5YsLyG7dHGx1a8a0o44cNVDXfI/uvcb mzv2ClnsLDl/4r0PZZ8nfPGHE4fBp8OBnZgiojY8VutoqaA5SUios2B7hSUNBTjti1PoLs2vI1 p6xlqFKRHl/Lj52kcDXXkiZO/6HN6pPE66zUX92FQN7OzrJrbTRpe+exhyRJiHFY1w76wUf5Si zCOHAzlAUJ7zgAgFV5NFkNArvUeAlb0NrGfCLTvrDL/jrMicO1RKsdKJhCxECmu0MkoKmXL2Bn kI4=

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 21/10/2020 14:56, Jan Beulich wrote: > On 21.10.2020 15:07, Andrew Cooper wrote: >> @@ -4037,6 +4037,9 @@ long do_mmu_update( >> break; >> rc = mod_l2_entry(va, l2e_from_intpte(req.val), mfn, >> cmd == MMU_PT_UPDATE_PRESERVE_AD, v); >> + /* Paging structure maybe changed. Flush linear range. >> */ >> + if ( !rc ) >> + flush_flags_all |= FLUSH_TLB; >> break; >> >> case PGT_l3_page_table: >> @@ -4044,6 +4047,9 @@ long do_mmu_update( >> break; >> rc = mod_l3_entry(va, l3e_from_intpte(req.val), mfn, >> cmd == MMU_PT_UPDATE_PRESERVE_AD, v); >> + /* Paging structure maybe changed. Flush linear range. >> */ >> + if ( !rc ) >> + flush_flags_all |= FLUSH_TLB; >> break; >> >> case PGT_l4_page_table: >> @@ -4051,27 +4057,28 @@ long do_mmu_update( >> break; >> rc = mod_l4_entry(va, l4e_from_intpte(req.val), mfn, >> cmd == MMU_PT_UPDATE_PRESERVE_AD, v); >> - if ( !rc && pt_owner->arch.pv.xpti ) >> + /* Paging structure maybe changed. Flush linear range. >> */ >> + if ( !rc ) >> { >> - bool local_in_use = false; >> + bool local_in_use = mfn_eq( >> + pagetable_get_mfn(curr->arch.guest_table), mfn); >> >> - if ( >> mfn_eq(pagetable_get_mfn(curr->arch.guest_table), >> - mfn) ) >> - { >> - local_in_use = true; >> - get_cpu_info()->root_pgt_changed = true; >> - } >> + flush_flags_all |= FLUSH_TLB; >> + >> + if ( local_in_use ) >> + flush_flags_local |= FLUSH_TLB | >> FLUSH_ROOT_PGTBL; > Aiui here (and in the code consuming the flags) you build upon > flush_flags_local, when not zero, always being a superset of > flush_flags_all. I think this is a trap to fall into when later > wanting to change this code, but as per below this won't hold > anymore anyway, I think. Hence here I think you want to set > FLUSH_TLB unconditionally, and above for L3 and L2 you want to > set it in both variables. Or, if I'm wrong below, a comment to > that effect may help people avoid falling into this trap. > > An alternative would be to have > > flush_flags_local |= (flush_flags_all & FLUSH_TLB); > > before doing the actual flush. Honestly, this is what I meant by stating that the asymmetry is a total mess. I originally named all 'remote', but this is even less accurate, it may still contain the current cpu. Our matrix of complexity: * FLUSH_TLB for L2+ structure changes * FLUSH_TLB_GLOBAL/FLUSH_ROOT_PGTBL for XPTI with optimisations to skip GLOBAL/ROOT_PGTBL on the local CPU if none of the updates hit the L4-in-use, and to skip the remote if we hold all references on the L4. Everything is complicated because pt_owner may not be current, for toolstack operations constructing a new domain. > >> /* >> * No need to sync if all uses of the page can be >> * accounted to the page lock we hold, its pinned >> * status, and uses on this (v)CPU. >> */ >> - if ( (page->u.inuse.type_info & PGT_count_mask) > >> + if ( pt_owner->arch.pv.xpti && > I assume you've moved this here to avoid the further non-trivial > checks when possible, but you've not put it around the setting > of FLUSH_ROOT_PGTBL in flush_flags_local because setting > ->root_pgt_changed is benign when !XPTI? No - that was accidental, while attempting to reduce the diff. > >> + (page->u.inuse.type_info & PGT_count_mask) > >> (1 + !!(page->u.inuse.type_info & PGT_pinned) + >> >> mfn_eq(pagetable_get_mfn(curr->arch.guest_table_user), >> mfn) + local_in_use) ) >> - sync_guest = true; >> + flush_flags_all |= FLUSH_ROOT_PGTBL; > This needs to also specify FLUSH_TLB_GLOBAL, or else original > XPTI behavior gets broken. Since the local CPU doesn't need this, > the variable may then better be named flush_flags_remote? See above. remote is even more confusing than all. > > Or if I'm wrong here, the reasoning behind the dropping of the > global flush in this case needs putting in the description, not > the least because it would mean the change introducing it went > too far. > >> @@ -4173,18 +4180,36 @@ long do_mmu_update( >> if ( va ) >> unmap_domain_page(va); >> >> - if ( sync_guest ) >> + /* >> + * Flushing needs to occur for one of several reasons. >> + * >> + * 1) An update to an L2 or higher occured. This potentially changes >> the >> + * pagetable structure, requiring a flush of the linear range. >> + * 2) An update to an L4 occured, and XPTI is enabled. All CPUs running >> + * on a copy of this L4 need refreshing. >> + */ >> + if ( flush_flags_all || flush_flags_local ) > Minor remark: At least in x86 code it is more efficient to use > | instead of || in such cases, to avoid relying on the compiler > to carry out this small optimization. This transformation should not be recommended in general. There are cases, including this one, where it is at best, no effect, and at worst, wrong. I don't care about people using ancient compilers. They've got far bigger (== more impactful) problems than (the absence of) this transformation, and the TLB flush will dwarf anything the compiler does here. However, the hand "optimised" case breaks a compiler spotting that the entire second clause is actually redundant for now. I specifically didn't encode the dependency, to avoid subtle bugs if/when someone alters the logic. > >> { >> + cpumask_t *mask = pt_owner->dirty_cpumask; >> + >> /* >> - * Force other vCPU-s of the affected guest to pick up L4 entry >> - * changes (if any). >> + * Local flushing may be asymmetric with remote. If there is local >> + * flushing to do, perform it separately and omit the current CPU >> from >> + * pt_owner->dirty_cpumask. >> */ >> - unsigned int cpu = smp_processor_id(); >> - cpumask_t *mask = per_cpu(scratch_cpumask, cpu); >> + if ( flush_flags_local ) >> + { >> + unsigned int cpu = smp_processor_id(); >> + >> + mask = per_cpu(scratch_cpumask, cpu); >> + cpumask_copy(mask, pt_owner->dirty_cpumask); >> + __cpumask_clear_cpu(cpu, mask); >> + >> + flush_local(flush_flags_local); >> + } >> >> - cpumask_andnot(mask, pt_owner->dirty_cpumask, cpumask_of(cpu)); > I understand you're of the opinion that cpumask_copy() + > __cpumask_clear_cpu() is more efficient than cpumask_andnot()? > (I guess I agree for high enough CPU counts.) Its faster in all cases, even low CPU counts. ~Andrew

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.