[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2] x86/shutdown: change default reboot method preference


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Mon, 23 Oct 2023 13:02:18 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=l1TXcisM53Gw0IVgqi7OG7S4Z1pX9BFTfRW+VAwnCNI=; b=h/sQZL9s9dTRhm8e52qb753gNAS6T1seE+QQtyytKJcBJq2P10p75hPHeRXESAIp9l/GgwxKkJ821cOCqtw06bAnyve/U1oFSn8MvYbf29F2PqkowAf439F1G0L1Kr3X+mDoo6Tzr9ypAYTCtjSBPRy2N8RV7YYYQV6gQUWTH3U4XDkWoZ3sKHon3OBlzJgW4jbeRkQo0g7sHe66KQksKjV1x0FdxVCuzRdWuPRUn62Rw7HFWBFqoc8+010vYxp1kDibaxzqG0gwhrefEMyCby05U+pOYGYEsVqbg5VlTFR2HIDeLcG7/ozXpijzaQnbgM+uv5MuFnKls7z70hapig==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=UBhRykrJjzQJ/LfHr9qT2cimUMcf8lcFtzYuNyp+rk+9NMEParPWhBhNptUTZhnnnn6q9xdKuiRQuR4oSVSXYvexHG0N+Ibua7neXEeGV+rs8dOX3yx/3Geg3sidXH4nqxqS1qW381uzMvyxLMGurtof/cbV+o0dLRbdG9CqA1coHAxQqEEFpbPpbInlfYKafGhYdwuLTq6BeRsbBCScakCEpKcxspMOsT3SWRPUKYiRWhVq/OOePGTnykugnDB3z+JHg3TKV/Obwcsojld/pzqudZXo0Hbe8DTgD/N1zv7ooXEVEuMmmHCTCEfa9JDlG6GsJ5B2W+3BsMN/nvHhNQ==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Mon, 23 Oct 2023 11:02:46 +0000
  • Ironport-data: A9a23:+BSwUKxe5f2WRscb9ax6t+cRxyrEfRIJ4+MujC+fZmUNrF6WrkVRy zYYX2GAb/jcNGD0fot1Pd+wph5Qv5HQzoNhQABs/iAxQypGp/SeCIXCJC8cHc8wwu7rFxs7s ppEOrEsCOhuExcwcz/0auCJQUFUjPzOHvykTrecZkidfCc8IA85kxVvhuUltYBhhNm9Emult Mj75sbSIzdJ4RYtWo4vw/zF8EgHUMja4mtC5QVmP6sT5TcyqlFOZH4hDfDpR5fHatE88t6SH 47r0Ly/92XFyBYhYvvNfmHTKxBirhb6ZGBiu1IOM0SQqkEqSh8ai87XAME0e0ZP4whlqvgqo Dl7WT5cfi9yVkHEsLx1vxC1iEiSN4UekFPMCSDXXcB+UyQq2pYjqhljJBheAGEWxgp4KV9jy cUfJQoCVBfAg8/t5uiFFOtSqu12eaEHPKtH0p1h5RfwKK9+BLzmHeDN79Ie2yosjMdTG/qYf 9AedTdkcBXHZVtIJ0sTD5U92uyvgxETcRUB8A7T+fVxvjiVlVQpuFTuGIO9ltiiX8Jak1zev mvb12/4HgsbJJqUzj/tHneE37WSw3uhAtpDfFG+3v1LuECW/1UONEJIEl2f++mmhkesf+sKf iT4/QJr98De7neDTNPwQhm5q36spQMHVpxbFOhSwB6J4rrZ5UCeHGdsZi5MbpkqudE7QRQu1 0SVhJX5CDp3qrqXRHmBsLCOoluP1TM9KGYDYWoISFUD6ty6+IUr1EuXFpBkDbK/icDzFXfo2 TeWoSMihrIVy8kWy6G8+lOBiDWpznTUcjMICszsdjrNxmtEiESNPuRENXCzAS58Ebuk
  • Ironport-hdrordr: A9a23:3rIQ2qvExgZRd3mPJ6ywir0P7skDS9V00zEX/kB9WHVpm6uj5q eTdZUgpHvJYVMqM03I9urvBEDtexLhHPxOkOos1MaZLWzbUQKTRekP0WKL+VLd8kbFh4xgPM lbE5SXdLfLfD5HZY6T2mOF+5xJ+rS6GO7Cv5am80tQ
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Tue, Oct 03, 2023 at 01:35:25PM +0200, Roger Pau Monné wrote:
> On Wed, Sep 27, 2023 at 10:21:44AM +0200, Jan Beulich wrote:
> > On 15.09.2023 09:43, Roger Pau Monne wrote:
> > > The current logic to chose the preferred reboot method is based on the 
> > > mode Xen
> > > has been booted into, so if the box is booted from UEFI, the preferred 
> > > reboot
> > > method will be to use the ResetSystem() run time service call.
> > > 
> > > However, that method seems to be widely untested, and quite often leads 
> > > to a
> > > result similar to:
> > > 
> > > Hardware Dom0 shutdown: rebooting machine
> > > ----[ Xen-4.18-unstable  x86_64  debug=y  Tainted:   C    ]----
> > > CPU:    0
> > > RIP:    e008:[<0000000000000017>] 0000000000000017
> > > RFLAGS: 0000000000010202   CONTEXT: hypervisor
> > > [...]
> > > Xen call trace:
> > >    [<0000000000000017>] R 0000000000000017
> > >    [<ffff83207eff7b50>] S ffff83207eff7b50
> > >    [<ffff82d0403525aa>] F machine_restart+0x1da/0x261
> > >    [<ffff82d04035263c>] F apic_wait_icr_idle+0/0x37
> > >    [<ffff82d040233689>] F smp_call_function_interrupt+0xc7/0xcb
> > >    [<ffff82d040352f05>] F call_function_interrupt+0x20/0x34
> > >    [<ffff82d04033b0d5>] F do_IRQ+0x150/0x6f3
> > >    [<ffff82d0402018c2>] F common_interrupt+0x132/0x140
> > >    [<ffff82d040283d33>] F 
> > > arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0x113/0x129
> > >    [<ffff82d04028436c>] F 
> > > arch/x86/acpi/cpu_idle.c#acpi_processor_idle+0x3eb/0x5f7
> > >    [<ffff82d04032a549>] F arch/x86/domain.c#idle_loop+0xec/0xee
> > > 
> > > ****************************************
> > > Panic on CPU 0:
> > > FATAL TRAP: vector = 6 (invalid opcode)
> > > ****************************************
> > > 
> > > Which in most cases does lead to a reboot, however that's unreliable.
> > > 
> > > Change the default reboot preference to prefer ACPI over UEFI if 
> > > available and
> > > not in reduced hardware mode.
> > > 
> > > This is in line to what Linux does, so it's unlikely to cause issues on 
> > > current
> > > and future hardware, since there's a much higher chance of vendors testing
> > > hardware with Linux rather than Xen.
> > > 
> > > Add a special case for one Acer model that does require being rebooted 
> > > using
> > > ResetSystem().  See Linux commit 0082517fa4bce for rationale.
> > > 
> > > I'm not aware of using ACPI reboot causing issues on boxes that do have
> > > properly implemented ResetSystem() methods.
> > 
> > A data point from a new system I'm still in the process of setting up: The
> > ACPI reboot method, as used by Linux, unconditionally means a warm reboot.
> > The EFI method, otoh, properly distinguishes "reboot=warm" from our default
> > of explicitly requesting cold reboot. (Without taking the EFI path, I
> > assume our write to the relevant BDA location simply has no effect, for
> > this being a legacy BIOS thing, and the system apparently defaults to warm
> > reboot when using the ACPI method.)
> 
> This is unfortunate, but IMO not as worse as getting a #UD or any
> other fault while attempting a reboot.  We can always force this
> system to use UEFI reboot, if that does work better than ACPI.
> 
> > Clearly, as a secondary effect, this system adds to my personal experience
> > of so far EFI reboot consistently working on all x86 hardware I have (had)
> > direct access to. (That said, this is the first non-Intel system, which
> > likely biases my overall experience.)
> 
> I can try to gather some data, I can at least tell you that the Intel
> NUC11TNHi7 TGL does also hit a fault when attempting UEFI reboot.
> The above crash was from a Dell PowerEdge R6625.  I do recall seeing
> this with other boxes on the Citrix lab, but don't know the exact
> models.  I'm quite sure other downstreams can provide similar
> feedback.

As a further data point, Dasharo [0] a coreboot downstream was also
providing a firmware with a broken ResetSystem() method, and they
didn't notice until someone reported errors on Xen reboot:

https://github.com/Dasharo/edk2/pull/99/commits/dee75be10ac9387168bd3a8cad0f1ec6e372129a

It's quite clear no one is testing ResetSystem(), the UEFI spec
doesn't mandate using it, and we are just hurting ourselves by forcing
its usage.

Regards, Roger.

[0] https://github.com/Dasharo



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.