[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2] x86/shutdown: change default reboot method preference


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Tue, 3 Oct 2023 13:35:25 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=9nxNyHYLGzv+R974aDBtvbrfosZ9EgYWNEetS75XAJA=; b=cITUcZEJHRIWoMNK7aAtQjEUj9qKpU0PzSU8eC98mUPQao3Q24Juu72oFmGVuA4AKxTAmN7mQNFrrGqvLCx4eerYdLM/hqnI7qPALxGq6855m/kEuDdrCDYVQBNjexTMPM4SGMm6umxbJQlJyHovmCI7nOGpSB2v8BflFOxlqqMOkvaCz8HWsEUloyApWn44r6cN0GGWCvLKU0CaSGsNBcly1hiIG454mI7PViJIUiRalEvvlnCWEheyVjRU2H65xO4+LWfOz1KTPPpqpSSO+CveAQDmjkLwlYtwWm9hRGyzMFe1x7kNfeuVKV2i2FErAaAo78NpalcZ1xIcl/C0xA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=jD3G4COI+sDEZgRl6GTRpW2wU7/5W9u8k/v+9OHWSg1ms/g24zK6RZtZwSuK7wKEemSP0oTJsexKJjUNy9dZcF3EFfK5xF9mE7k/AntER46LTB1qCRAM03hjToE3KIcs8BlW4w0oHRy+wTDvHW3bbziCkA8wcN92tD6THoSiSX4gNs7PZ8g1MTAdsL68mBEYyDvNhrl3J4qYs7RzkJyB/9sg5GfnC0SUChPrcoxju6Y3KApUobuDMFmcrMhrec/anDEx1lo3CM58k5bwm9jGupbhKaZ7lVeD34Z93q3+YPZvitPwwGT2eYvA7idFW02tKtcJEV+hE/lSP0Zoiasnfg==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Tue, 03 Oct 2023 11:36:11 +0000
  • Ironport-data: A9a23:RXnswalVbZP4Sk1nsVc3xMro5gygJ0RdPkR7XQ2eYbSJt1+Wr1Gzt xIdWWmFMq2LZWP0LY91a9ywoEpT7Z6Hn9BhQVQ9rCgyECMWpZLJC+rCIxarNUt+DCFhoGFPt JxCN4aafKjYaleG+39B55C49SEUOZmgH+e6UKicfHkpGWeIcQ954Tp7gek1n4V0ttawBgKJq LvartbWfVSowFaYCEpNg064gE0p5K+aVA8w5ARkPqkT5ASGyxH5MbpETU2PByqgKmVrNrbSq 9brlNmR4m7f9hExPdKp+p6TnpoiG+O60aCm0xK6aoD66vRwjnVaPpUTbZLwXXx/mTSR9+2d/ f0W3XCGpaXFCYWX8AgVe0Ew/yiTpsSq8pefSZS0mZT7I0Er7xIAahihZa07FdRwxwp5PY1B3 dwTM2ldaRCBvOC/zLmHDddu3+4NIeC+aevzulk4pd3YJdAPZMmaBonvu5pf1jp2gd1SF/HDY cZfcSBocBnLfxxIPBEQFY46m+CrwHL4dlW0qnrM/fZxvzeVkVM3ieeyWDbWUoXiqcF9hEGXq 3iA523kKhobKMae2XyO9XfEaurnxHmgANNDT+HknhJsqH3Q9mkiNiIqaUq2rKiajFCgZO92L WVBr0LCqoB3riRHVOLVXRe1vXqFtR40QMdLHqsx7wTl4rXQyxaUAC4DVDEpQMwrsoo6SCIn0 neNnsj1Hnp/vbuNU3Wf+7yI6zSoNkAowXQqYCYFSU4J5oflqYRq1xbXFI88T+iyk8H/Hiz2z 3aSti8iir4PjMkNkaKm4VTAhDHqrZ/MJuIo2jjqsquexlsRTOaYi0aAsgSzASpoRGpBcmS8g Q==
  • Ironport-hdrordr: A9a23:8bboR6OEEu+z8MBcTuOjsMiBIKoaSvp037BL7TETdfUxSKelfq +V/cjzuSWE7Qr5IUtQ4exoRpPwO080hKQU3WB5B97LMDUOl1HHEGgI1/qB/9SPIVybygZBvp 0OT0DMZeeAdGSTxa7BijVRWb4breVuvsuT9IDjJ+YHd3ANV0mBhz0JcTqmLg==
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Wed, Sep 27, 2023 at 10:21:44AM +0200, Jan Beulich wrote:
> On 15.09.2023 09:43, Roger Pau Monne wrote:
> > The current logic to chose the preferred reboot method is based on the mode 
> > Xen
> > has been booted into, so if the box is booted from UEFI, the preferred 
> > reboot
> > method will be to use the ResetSystem() run time service call.
> > 
> > However, that method seems to be widely untested, and quite often leads to a
> > result similar to:
> > 
> > Hardware Dom0 shutdown: rebooting machine
> > ----[ Xen-4.18-unstable  x86_64  debug=y  Tainted:   C    ]----
> > CPU:    0
> > RIP:    e008:[<0000000000000017>] 0000000000000017
> > RFLAGS: 0000000000010202   CONTEXT: hypervisor
> > [...]
> > Xen call trace:
> >    [<0000000000000017>] R 0000000000000017
> >    [<ffff83207eff7b50>] S ffff83207eff7b50
> >    [<ffff82d0403525aa>] F machine_restart+0x1da/0x261
> >    [<ffff82d04035263c>] F apic_wait_icr_idle+0/0x37
> >    [<ffff82d040233689>] F smp_call_function_interrupt+0xc7/0xcb
> >    [<ffff82d040352f05>] F call_function_interrupt+0x20/0x34
> >    [<ffff82d04033b0d5>] F do_IRQ+0x150/0x6f3
> >    [<ffff82d0402018c2>] F common_interrupt+0x132/0x140
> >    [<ffff82d040283d33>] F 
> > arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0x113/0x129
> >    [<ffff82d04028436c>] F 
> > arch/x86/acpi/cpu_idle.c#acpi_processor_idle+0x3eb/0x5f7
> >    [<ffff82d04032a549>] F arch/x86/domain.c#idle_loop+0xec/0xee
> > 
> > ****************************************
> > Panic on CPU 0:
> > FATAL TRAP: vector = 6 (invalid opcode)
> > ****************************************
> > 
> > Which in most cases does lead to a reboot, however that's unreliable.
> > 
> > Change the default reboot preference to prefer ACPI over UEFI if available 
> > and
> > not in reduced hardware mode.
> > 
> > This is in line to what Linux does, so it's unlikely to cause issues on 
> > current
> > and future hardware, since there's a much higher chance of vendors testing
> > hardware with Linux rather than Xen.
> > 
> > Add a special case for one Acer model that does require being rebooted using
> > ResetSystem().  See Linux commit 0082517fa4bce for rationale.
> > 
> > I'm not aware of using ACPI reboot causing issues on boxes that do have
> > properly implemented ResetSystem() methods.
> 
> A data point from a new system I'm still in the process of setting up: The
> ACPI reboot method, as used by Linux, unconditionally means a warm reboot.
> The EFI method, otoh, properly distinguishes "reboot=warm" from our default
> of explicitly requesting cold reboot. (Without taking the EFI path, I
> assume our write to the relevant BDA location simply has no effect, for
> this being a legacy BIOS thing, and the system apparently defaults to warm
> reboot when using the ACPI method.)

This is unfortunate, but IMO not as worse as getting a #UD or any
other fault while attempting a reboot.  We can always force this
system to use UEFI reboot, if that does work better than ACPI.

> Clearly, as a secondary effect, this system adds to my personal experience
> of so far EFI reboot consistently working on all x86 hardware I have (had)
> direct access to. (That said, this is the first non-Intel system, which
> likely biases my overall experience.)

I can try to gather some data, I can at least tell you that the Intel
NUC11TNHi7 TGL does also hit a fault when attempting UEFI reboot.
The above crash was from a Dell PowerEdge R6625.  I do recall seeing
this with other boxes on the Citrix lab, but don't know the exact
models.  I'm quite sure other downstreams can provide similar
feedback.

I think it's clear now that using ResetSystem() when booted from UEFI
is not mandated by the UEFI specification, so I still stand by this
patch and think we should select the default reboot method that has
the highest chance of succeeding.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.