[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] x86/CET: Fix S3 resume with shadow stacks active
The original shadow stack support has an error on S3 resume with very bizzare fallout. The BSP comes back up, but APs fail with: (XEN) Enabling non-boot CPUs ... (XEN) Stuck ?? (XEN) Error bringing CPU1 up: -5 and then later (on at least two Intel TigerLake platforms), the next HVM vCPU to be scheduled on the BSP dies with: (XEN) d1v0 Unexpected vmexit: reason 3 (XEN) domain_crash called from vmx.c:4304 (XEN) Domain 1 (vcpu#0) crashed on cpu#0: The VMExit reason is EXIT_REASON_INIT, which has nothing to do with the scheduled vCPU, and will be addressed in a subsequent patch. It is a consequence of the APs triple faulting. The reason the APs triple fault is because we don't tear down the stacks on suspend. The idle/play_dead loop is killed in the middle of running, meaning that the supervisor token is left busy. On resume, SETSSBSY finds the token already busy, suffers #CP and triple faults because the IDT isn't configured this early. Rework the AP bringup path to (re)create the supervisor token. This ensures the primary stack is non-busy before use. Fixes: b60ab42db2f0 ("x86/shstk: Activate Supervisor Shadow Stacks") Link: https://github.com/QubesOS/qubes-issues/issues/7283 Reported-by: Thiner Logoer <logoerthiner1@xxxxxxx> Reported-by: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> Tested-by: Thiner Logoer <logoerthiner1@xxxxxxx> Tested-by: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx> --- CC: Jan Beulich <JBeulich@xxxxxxxx> CC: Roger Pau Monné <roger.pau@xxxxxxxxxx> CC: Wei Liu <wl@xxxxxxx> CC: Thiner Logoer <logoerthiner1@xxxxxxx> CC: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx> Slightly RFC. This does fix the crash encountered, but it occurs to me that there's a race condition when S3 platform powerdown is incident with an NMI/#MC, where more than just the primary shadow stack can end up busy on resume. A larger fix would be to change how we allocate tokens, and always have each CPU set up its own tokens. I didn't do this originally in the hopes of having WRSSQ generally disabled, but that plan failed when encountering reality... diff --git a/xen/arch/x86/boot/x86_64.S b/xen/arch/x86/boot/x86_64.S index fa41990dde0f..5d12937a0e40 100644 --- a/xen/arch/x86/boot/x86_64.S +++ b/xen/arch/x86/boot/x86_64.S @@ -51,13 +51,21 @@ ENTRY(__high_start) test $CET_SHSTK_EN, %al jz .L_ap_cet_done - /* Derive MSR_PL0_SSP from %rsp (token written when stack is allocated). */ - mov $MSR_PL0_SSP, %ecx + /* Derive the supervisor token address from %rsp. */ mov %rsp, %rdx + and $~(STACK_SIZE - 1), %rdx + or $(PRIMARY_SHSTK_SLOT + 1) * PAGE_SIZE - 8, %rdx + + /* + * Write a new supervisor token. Doesn't matter on boot, but for S3 + * resume this clears the busy bit. + */ + wrssq %rdx, (%rdx) + + /* Point MSR_PL0_SSP at the token. */ + mov $MSR_PL0_SSP, %ecx + mov %edx, %eax shr $32, %rdx - mov %esp, %eax - and $~(STACK_SIZE - 1), %eax - or $(PRIMARY_SHSTK_SLOT + 1) * PAGE_SIZE - 8, %eax wrmsr setssbsy
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |