[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] x86/ctxt-switch: Document and improve GDT handling
On 04.07.19 19:57, Andrew Cooper wrote: write_full_gdt_ptes() has a latent bug. Using virt_to_mfn() and iterating with (mfn + i) is wrong, because of PDX compression. The context switch path only functions correctly because NR_RESERVED_GDT_PAGES is 1. As this is exceedingly unlikely to change moving foward, drop the loop rather than inserting a BUILD_BUG_ON(NR_RESERVED_GDT_PAGES != 1). With the loop dropped, write_full_gdt_ptes() becomes more obviously a poor name, so rename it to update_xen_slot_in_full_gdt(). Furthermore, calling virt_to_mfn() in the context switch path is a lot of wasted cycles for a result which is constant after boot. Begin by documenting how Xen handles the GDTs across context switch. From this, we observe that load_full_gdt() is completely independent of the current CPU, and load_default_gdt() only gets passed the current CPU regular GDT. Add two extra per-cpu variables which cache the L1e for the regular and compat GDT, calculated in cpu_smpboot_alloc()/trap_init() as appropriate, so update_xen_slot_in_full_gdt() doesn't need to waste time performing the same calculation on every context switch. Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> I did a small performance test with this patch: on a 8 cpu system I started 2 mini-os domains (1 vcpu each) doing a busy loop sending events to dom0. On dom0 I did a build of the hypervisor via "make -j 8" and measured the time for that build, then took the average of 5 such builds (doing a make clean in between). elapsed user system Unpatched 66.51 232.93 109.21 Patched 57.00 225.47 105.47 This is a very clear win of performance! Tested-by: Juergen Gross <jgross@xxxxxxxx> Reviewed-by: Juergen Gross <jgross@xxxxxxxx> Juergen --- CC: Jan Beulich <JBeulich@xxxxxxxx> CC: Wei Liu <wl@xxxxxxx> CC: Roger Pau Monné <roger.pau@xxxxxxxxxx> CC: Juergen Gross <jgross@xxxxxxxx> Slightly RFC. I'm fairly confident this is better, but Juergen says that the some of his scheduling perf tests notice large difference from subtle changes in __context_switch(), so it would be useful to get some numbers from this change. The delta from this change is: add/remove: 2/0 grow/shrink: 1/1 up/down: 320/-127 (193) Function old new delta cpu_smpboot_callback 1152 1456 +304 per_cpu__gdt_table_l1e - 8 +8 per_cpu__compat_gdt_table_l1e - 8 +8 __context_switch 1238 1111 -127 Total: Before=3339227, After=3339420, chg +0.01% I'm not overly happy about the special case in trap_init() but I can't think of a better place to put this. Also, it should now be very obvious to people that Xen's current GDT handling for non-PV vcpus is a recipe subtle bugs, if we ever manage to execute a stray mov/pop %sreg instruction. We really ought to have Xen's regular GDT in an area where slots 0-13 are either mapped to the zero page, or not present, so we don't risk loading a non-faulting garbage selector. --- xen/arch/x86/domain.c | 52 ++++++++++++++++++++++++++++++---------------- xen/arch/x86/smpboot.c | 4 ++++ xen/arch/x86/traps.c | 10 +++++++++ xen/include/asm-x86/desc.h | 2 ++ 4 files changed, 50 insertions(+), 18 deletions(-) diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 84cafbe558..147f96a09e 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -1635,23 +1635,42 @@ static void _update_runstate_area(struct vcpu *v) v->arch.pv.need_update_runstate_area = 1; }+/*+ * Overview of Xen's GDTs. + * + * Xen maintains per-CPU compat and regular GDTs which are both a single page + * in size. Some content is specific to each CPU (the TSS, the per-CPU marker + * for #DF handling, and optionally the LDT). The compat and regular GDTs + * differ by the layout and content of the guest accessible selectors. + * + * The Xen selectors live from 0xe000 (slot 14 of 16), and need to always + * appear in this position for interrupt/exception handling to work. + * + * A PV guest may specify GDT frames of their own (slots 0 to 13). Room for a + * full GDT exists in the per-domain mappings. + * + * To schedule a PV vcpu, we point slot 14 of the guest's full GDT at the + * current CPU's compat or regular (as appropriate) GDT frame. This is so + * that the per-CPU parts still work correctly after switching pagetables and + * loading the guests full GDT into GDTR. + * + * To schedule Idle or HVM vcpus, we load a GDT base address which causes the + * regular per-CPU GDT frame to appear with selectors at the appropriate + * offset. + */ static inline bool need_full_gdt(const struct domain *d) { return is_pv_domain(d) && !is_idle_domain(d); }-static void write_full_gdt_ptes(seg_desc_t *gdt, const struct vcpu *v)+static void update_xen_slot_in_full_gdt(const struct vcpu *v, unsigned int cpu) { - unsigned long mfn = virt_to_mfn(gdt); - l1_pgentry_t *pl1e = pv_gdt_ptes(v); - unsigned int i; - - for ( i = 0; i < NR_RESERVED_GDT_PAGES; i++ ) - l1e_write(pl1e + FIRST_RESERVED_GDT_PAGE + i, - l1e_from_pfn(mfn + i, __PAGE_HYPERVISOR_RW)); + l1e_write(pv_gdt_ptes(v) + FIRST_RESERVED_GDT_PAGE, + !is_pv_32bit_vcpu(v) ? per_cpu(gdt_table_l1e, cpu) + : per_cpu(compat_gdt_table_l1e, cpu)); }-static void load_full_gdt(const struct vcpu *v, unsigned int cpu)+static void load_full_gdt(const struct vcpu *v) { struct desc_ptr gdt_desc = { .limit = LAST_RESERVED_GDT_BYTE, @@ -1661,11 +1680,12 @@ static void load_full_gdt(const struct vcpu *v, unsigned int cpu) lgdt(&gdt_desc); }-static void load_default_gdt(const seg_desc_t *gdt, unsigned int cpu)+static void load_default_gdt(unsigned int cpu) { struct desc_ptr gdt_desc = { .limit = LAST_RESERVED_GDT_BYTE, - .base = (unsigned long)(gdt - FIRST_RESERVED_GDT_ENTRY), + .base = (unsigned long)(per_cpu(gdt_table, cpu) - + FIRST_RESERVED_GDT_ENTRY), };lgdt(&gdt_desc);@@ -1678,7 +1698,6 @@ static void __context_switch(void) struct vcpu *p = per_cpu(curr_vcpu, cpu); struct vcpu *n = current; struct domain *pd = p->domain, *nd = n->domain; - seg_desc_t *gdt;ASSERT(p != n);ASSERT(!vcpu_cpu_dirty(n)); @@ -1718,15 +1737,12 @@ static void __context_switch(void)psr_ctxt_switch_to(nd); - gdt = !is_pv_32bit_domain(nd) ? per_cpu(gdt_table, cpu) :- per_cpu(compat_gdt_table, cpu); - if ( need_full_gdt(nd) ) - write_full_gdt_ptes(gdt, n); + update_xen_slot_in_full_gdt(n, cpu);if ( need_full_gdt(pd) &&((p->vcpu_id != n->vcpu_id) || !need_full_gdt(nd)) ) - load_default_gdt(gdt, cpu); + load_default_gdt(cpu);write_ptbase(n); @@ -1739,7 +1755,7 @@ static void __context_switch(void) if ( need_full_gdt(nd) &&((p->vcpu_id != n->vcpu_id) || !need_full_gdt(pd)) ) - load_full_gdt(n, cpu); + load_full_gdt(n);if ( pd != nd )cpumask_clear_cpu(cpu, pd->dirty_cpumask); diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c index 730fe141fa..004285d14c 100644 --- a/xen/arch/x86/smpboot.c +++ b/xen/arch/x86/smpboot.c @@ -985,6 +985,8 @@ static int cpu_smpboot_alloc(unsigned int cpu) if ( gdt == NULL ) goto out; per_cpu(gdt_table, cpu) = gdt; + per_cpu(gdt_table_l1e, cpu) = + l1e_from_pfn(virt_to_mfn(gdt), __PAGE_HYPERVISOR_RW); memcpy(gdt, boot_cpu_gdt_table, NR_RESERVED_GDT_PAGES * PAGE_SIZE); BUILD_BUG_ON(NR_CPUS > 0x10000); gdt[PER_CPU_GDT_ENTRY - FIRST_RESERVED_GDT_ENTRY].a = cpu; @@ -992,6 +994,8 @@ static int cpu_smpboot_alloc(unsigned int cpu) per_cpu(compat_gdt_table, cpu) = gdt = alloc_xenheap_pages(order, memflags); if ( gdt == NULL ) goto out; + per_cpu(compat_gdt_table_l1e, cpu) = + l1e_from_pfn(virt_to_mfn(gdt), __PAGE_HYPERVISOR_RW); memcpy(gdt, boot_cpu_compat_gdt_table, NR_RESERVED_GDT_PAGES * PAGE_SIZE); gdt[PER_CPU_GDT_ENTRY - FIRST_RESERVED_GDT_ENTRY].a = cpu;diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.cindex 8097ef3bf5..25b4b47e5e 100644 --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -97,7 +97,9 @@ DEFINE_PER_CPU(uint64_t, efer); static DEFINE_PER_CPU(unsigned long, last_extable_addr);DEFINE_PER_CPU_READ_MOSTLY(seg_desc_t *, gdt_table);+DEFINE_PER_CPU_READ_MOSTLY(l1_pgentry_t, gdt_table_l1e); DEFINE_PER_CPU_READ_MOSTLY(seg_desc_t *, compat_gdt_table); +DEFINE_PER_CPU_READ_MOSTLY(l1_pgentry_t, compat_gdt_table_l1e);/* Master table, used by CPU0. */idt_entry_t __section(".bss.page_aligned") __aligned(PAGE_SIZE) @@ -2059,6 +2061,14 @@ void __init trap_init(void) } }+ /* Cache {,compat_}gdt_table_l1e now that physically relocation is done. */+ this_cpu(gdt_table_l1e) = + l1e_from_pfn(virt_to_mfn(boot_cpu_gdt_table), + __PAGE_HYPERVISOR_RW); + this_cpu(compat_gdt_table_l1e) = + l1e_from_pfn(virt_to_mfn(boot_cpu_compat_gdt_table), + __PAGE_HYPERVISOR_RW); + percpu_traps_init();cpu_init();diff --git a/xen/include/asm-x86/desc.h b/xen/include/asm-x86/desc.h index 85e83bcefb..e565727dc0 100644 --- a/xen/include/asm-x86/desc.h +++ b/xen/include/asm-x86/desc.h @@ -206,8 +206,10 @@ struct __packed desc_ptr {extern seg_desc_t boot_cpu_gdt_table[];DECLARE_PER_CPU(seg_desc_t *, gdt_table); +DECLARE_PER_CPU(l1_pgentry_t, gdt_table_l1e); extern seg_desc_t boot_cpu_compat_gdt_table[]; DECLARE_PER_CPU(seg_desc_t *, compat_gdt_table); +DECLARE_PER_CPU(l1_pgentry_t, compat_gdt_table_l1e);extern void load_TR(void); _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |