Xen project Mailing List

Re: x86/CET: Fix S3 resume with shadow stacks active

To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Fri, 25 Feb 2022 09:38:04 +0100

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=pdk83KMn+1VFA3mDD5VKw0kBIVuWYUxHw0+Umf7uJoE=; b=cUhiKb3SmlXqeFggxUmzjkZQHAb8g6eMC8wbWH6gOziZUlOevccrCnfySAmIbM0WzuX9+N/ow2Iu/8cBEbAx8MGDl/ukzvCkFWy1aL9UT6VCnYHushEx+AjottebOmoKM3jRPc/aY1er1ytTfgWjhpsKnNK6Y85HM7zePN+Sz4gvqQNNYp5m3mLxvRYo1BDwspCimMQhX9LSgB13EAIrlFHoUfT/vvvI1MSO+tRgcOQ8G2o1AX8uKaYdFxNfhPful2AtbJjwrBE0siuFV0scd0DDoZKNVDrhD5jtGIEPNYmwx2LvccZunMrIDUfQvkhaVHbjfVCBMEk+8z+gLFBbEA==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=IDbI2rTXuwk+/k2kMnqek0+rF3HBz7z+W+V8C++sZlUasoyR6FadAnRBZScC78pj5mPwTzgqYukeLldkLZtHh49t7uupFCnQM+pMVFSuBhe1s6bKsvfHLPa5yDMqSv/Z5xv6++18vRB1JOP+sORwpn3sNbYpDpV+R3n8CuRz15DgJPqvUwU9w9QVJb0hcES0HpIIEA+AF0dXcBsISrw1hbvoyi7+8cv9lYyEr2XVCbWiTyi5Icj96IQtJvmOpaTX3yXWRf0S5G2gDCQQS8mvcz2hit2haDFX6B9xvQmOSHu3kERYGnShj6qb25k+msHY5fmCRiyxsJ2RPZ6r3R8DvA==

Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;

Cc: Roger Pau Monné <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Thiner Logoer <logoerthiner1@xxxxxxx>, Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Fri, 25 Feb 2022 08:38:13 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 24.02.2022 20:48, Andrew Cooper wrote: > The original shadow stack support has an error on S3 resume with very bizzare > fallout. The BSP comes back up, but APs fail with: > > (XEN) Enabling non-boot CPUs ... > (XEN) Stuck ?? > (XEN) Error bringing CPU1 up: -5 > > and then later (on at least two Intel TigerLake platforms), the next HVM vCPU > to be scheduled on the BSP dies with: > > (XEN) d1v0 Unexpected vmexit: reason 3 > (XEN) domain_crash called from vmx.c:4304 > (XEN) Domain 1 (vcpu#0) crashed on cpu#0: > > The VMExit reason is EXIT_REASON_INIT, which has nothing to do with the > scheduled vCPU, and will be addressed in a subsequent patch. It is a > consequence of the APs triple faulting. > > The reason the APs triple fault is because we don't tear down the stacks on > suspend. The idle/play_dead loop is killed in the middle of running, meaning > that the supervisor token is left busy. > > On resume, SETSSBSY finds the token already busy, suffers #CP and triple > faults because the IDT isn't configured this early. > > Rework the AP bringup path to (re)create the supervisor token. This ensures > the primary stack is non-busy before use. > > Fixes: b60ab42db2f0 ("x86/shstk: Activate Supervisor Shadow Stacks") > Link: https://github.com/QubesOS/qubes-issues/issues/7283 > Reported-by: Thiner Logoer <logoerthiner1@xxxxxxx> > Reported-by: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx> > Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > Tested-by: Thiner Logoer <logoerthiner1@xxxxxxx> > Tested-by: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx> Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx> > Slightly RFC. This does fix the crash encountered, but it occurs to me that > there's a race condition when S3 platform powerdown is incident with an > NMI/#MC, where more than just the primary shadow stack can end up busy on > resume. > > A larger fix would be to change how we allocate tokens, and always have each > CPU set up its own tokens. I didn't do this originally in the hopes of having > WRSSQ generally disabled, but that plan failed when encountering reality... While I think this wants fixing one way or another, I also think this shouldn't block the immediate fix here (which addresses an unconditional crash rather than a pretty unlikely one). Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.