[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 22/22] x86/mm: zero stack on stack switch or reset
On 26/07/2024 4:22 pm, Roger Pau Monne wrote: > With the stack mapped on a per-CPU basis there's no risk of other CPUs being > able to read the stack contents, but vCPUs running on the current pCPU could > read stack rubble from operations of previous vCPUs. > > The #DF stack is not zeroed because handling of #DF results in a panic. > > Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx> > --- > xen/arch/x86/include/asm/current.h | 30 +++++++++++++++++++++++++++++- > 1 file changed, 29 insertions(+), 1 deletion(-) > > diff --git a/xen/arch/x86/include/asm/current.h > b/xen/arch/x86/include/asm/current.h > index 75b9a341f814..02b4118b03ef 100644 > --- a/xen/arch/x86/include/asm/current.h > +++ b/xen/arch/x86/include/asm/current.h > @@ -177,6 +177,14 @@ unsigned long get_stack_dump_bottom (unsigned long sp); > # define SHADOW_STACK_WORK "" > #endif > > +#define ZERO_STACK \ > + "test %[stk_size], %[stk_size];" \ > + "jz .L_skip_zeroing.%=;" \ > + "std;" \ > + "rep stosb;" \ > + "cld;" \ > + ".L_skip_zeroing.%=:" > + > #if __GNUC__ >= 9 > # define ssaj_has_attr_noreturn(fn) __builtin_has_attribute(fn, __noreturn__) > #else > @@ -187,10 +195,24 @@ unsigned long get_stack_dump_bottom (unsigned long sp); > #define switch_stack_and_jump(fn, instr, constr) \ > ({ \ > unsigned int tmp; \ > + bool zero_stack = current->domain->arch.asi; \ > BUILD_BUG_ON(!ssaj_has_attr_noreturn(fn)); \ > + ASSERT(IS_ALIGNED((unsigned long)guest_cpu_user_regs() - \ > + PRIMARY_STACK_SIZE + \ > + sizeof(struct cpu_info), PAGE_SIZE)); \ > + if ( zero_stack ) \ > + { \ > + unsigned long stack_top = get_stack_bottom() & \ > + ~(STACK_SIZE - 1); \ > + \ > + clear_page((void *)stack_top + IST_MCE * PAGE_SIZE); \ > + clear_page((void *)stack_top + IST_NMI * PAGE_SIZE); \ > + clear_page((void *)stack_top + IST_DB * PAGE_SIZE); \ > + } \ > __asm__ __volatile__ ( \ > SHADOW_STACK_WORK \ > "mov %[stk], %%rsp;" \ > + ZERO_STACK \ > CHECK_FOR_LIVEPATCH_WORK \ > instr "[fun]" \ > : [val] "=&r" (tmp), \ > @@ -201,7 +223,13 @@ unsigned long get_stack_dump_bottom (unsigned long sp); > ((PRIMARY_SHSTK_SLOT + 1) * PAGE_SIZE - 8), \ > [stack_mask] "i" (STACK_SIZE - 1), \ > _ASM_BUGFRAME_INFO(BUGFRAME_bug, __LINE__, \ > - __FILE__, NULL) \ > + __FILE__, NULL), \ > + /* For stack zeroing. */ \ > + "D" ((void *)guest_cpu_user_regs() - 1), \ > + [stk_size] "c" \ > + (zero_stack ? PRIMARY_STACK_SIZE - sizeof(struct cpu_info)\ > + : 0), \ > + "a" (0) \ > : "memory" ); \ > unreachable(); \ > }) This looks very expensive. For starters, switch_stack_and_jump() is used twice in a typical context switch; once in the schedule tail, and again out of hvm_do_resume(). Furthermore, #MC happen never (to many many significant figures), #DB happens never for HVM guests (but does happen for PV), and NMIs are either ~never, or 2Hz which is far less often than the 30ms default timeslice. So, the overwhelming majority of the time, those 3 calls to clear_page() will be re-zeroing blocks of zeroes. This can probably be avoided by making use of ist_exit (held in %r12) to only zero an IST stack when leaving it. This leaves the IRET frame able to be recovered, but with e.g. RFDS, you can do that irrespective, and it's not terribly sensitive. What about shadow stacks? You're not zeroing those, and while they're less sensitive than the data stack, there ought to be some reasoning about them. ~Andrew
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |