[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 22/22] x86/mm: zero stack on stack switch or reset
On Mon, Jul 29, 2024 at 04:40:24PM +0100, Andrew Cooper wrote: > On 26/07/2024 4:22 pm, Roger Pau Monne wrote: > > With the stack mapped on a per-CPU basis there's no risk of other CPUs being > > able to read the stack contents, but vCPUs running on the current pCPU could > > read stack rubble from operations of previous vCPUs. > > > > The #DF stack is not zeroed because handling of #DF results in a panic. > > > > Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx> > > --- > > xen/arch/x86/include/asm/current.h | 30 +++++++++++++++++++++++++++++- > > 1 file changed, 29 insertions(+), 1 deletion(-) > > > > diff --git a/xen/arch/x86/include/asm/current.h > > b/xen/arch/x86/include/asm/current.h > > index 75b9a341f814..02b4118b03ef 100644 > > --- a/xen/arch/x86/include/asm/current.h > > +++ b/xen/arch/x86/include/asm/current.h > > @@ -177,6 +177,14 @@ unsigned long get_stack_dump_bottom (unsigned long sp); > > # define SHADOW_STACK_WORK "" > > #endif > > > > +#define ZERO_STACK \ > > + "test %[stk_size], %[stk_size];" \ > > + "jz .L_skip_zeroing.%=;" \ > > + "std;" \ > > + "rep stosb;" \ > > + "cld;" \ > > + ".L_skip_zeroing.%=:" > > + > > #if __GNUC__ >= 9 > > # define ssaj_has_attr_noreturn(fn) __builtin_has_attribute(fn, > > __noreturn__) > > #else > > @@ -187,10 +195,24 @@ unsigned long get_stack_dump_bottom (unsigned long > > sp); > > #define switch_stack_and_jump(fn, instr, constr) \ > > ({ \ > > unsigned int tmp; \ > > + bool zero_stack = current->domain->arch.asi; \ > > BUILD_BUG_ON(!ssaj_has_attr_noreturn(fn)); \ > > + ASSERT(IS_ALIGNED((unsigned long)guest_cpu_user_regs() - \ > > + PRIMARY_STACK_SIZE + \ > > + sizeof(struct cpu_info), PAGE_SIZE)); \ > > + if ( zero_stack ) \ > > + { \ > > + unsigned long stack_top = get_stack_bottom() & \ > > + ~(STACK_SIZE - 1); \ > > + \ > > + clear_page((void *)stack_top + IST_MCE * PAGE_SIZE); \ > > + clear_page((void *)stack_top + IST_NMI * PAGE_SIZE); \ > > + clear_page((void *)stack_top + IST_DB * PAGE_SIZE); \ > > + } \ > > __asm__ __volatile__ ( \ > > SHADOW_STACK_WORK \ > > "mov %[stk], %%rsp;" \ > > + ZERO_STACK \ > > CHECK_FOR_LIVEPATCH_WORK \ > > instr "[fun]" \ > > : [val] "=&r" (tmp), \ > > @@ -201,7 +223,13 @@ unsigned long get_stack_dump_bottom (unsigned long sp); > > ((PRIMARY_SHSTK_SLOT + 1) * PAGE_SIZE - 8), \ > > [stack_mask] "i" (STACK_SIZE - 1), \ > > _ASM_BUGFRAME_INFO(BUGFRAME_bug, __LINE__, \ > > - __FILE__, NULL) \ > > + __FILE__, NULL), \ > > + /* For stack zeroing. */ \ > > + "D" ((void *)guest_cpu_user_regs() - 1), \ > > + [stk_size] "c" \ > > + (zero_stack ? PRIMARY_STACK_SIZE - sizeof(struct cpu_info)\ > > + : 0), \ > > + "a" (0) \ > > : "memory" ); \ > > unreachable(); \ > > }) > > This looks very expensive. > > For starters, switch_stack_and_jump() is used twice in a typical context > switch; once in the schedule tail, and again out of hvm_do_resume(). Right, it's the reset_stack_and_call_ind() at the end of context switch and then the reset_stack_and_jump() in the HVM tail context switch handlers. One option would be to only do the stack zeroing from the reset_stack_and_call_ind() call in context_switch(). I've got no idea how expensive this is, I might try to run some benchmarks to get some figures. I was planning on running two VMs with 1 vCPU each, both pinned to the same pCPU. > > Furthermore, #MC happen never (to many many significant figures), #DB > happens never for HVM guests (but does happen for PV), and NMIs are > either ~never, or 2Hz which is far less often than the 30ms default > timeslice. > > So, the overwhelming majority of the time, those 3 calls to clear_page() > will be re-zeroing blocks of zeroes. > > This can probably be avoided by making use of ist_exit (held in %r12) to > only zero an IST stack when leaving it. This leaves the IRET frame able > to be recovered, but with e.g. RFDS, you can do that irrespective, and > it's not terribly sensitive. I could look into that, TBH I was bordeline with clearing the IST stacks, as I wasn't convinced there could be anything sensitive there, but again couldn't convince myself there's nothing sensitive now, nor can be in the future. > What about shadow stacks? You're not zeroing those, and while they're > less sensitive than the data stack, there ought to be some reasoning > about them. I've assumed that shadow stacks only contained the expected return addresses, and hence won't be considered sensitive information, but maybe I was too lax. An attacker could get execution traces of the previous vCPU, and that might be useful for some exploits? Thanks, Roger.
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |