[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2] x86/shutdown: change default reboot method preference


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Mon, 18 Sep 2023 17:09:17 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=mjn9/0FlY8u/9J1L5W0VPRgG4dVPwDDwB3/UuSeomd8=; b=EogD/aCuyt/l+bhaBMtCHKgErVvZJIPfv6Na3hh+n8dbf3M1BwxAMqKeKniJvNjCKhnHtjIwnW8xOasANTe6+//yjW2e4XbMSxjAVfD3ra6eIHu9dqLWf90nScggE2WbamhUa+5cQQJA6naBcS3sIeRIR417WzDJ+bJ2v5Oy6888C85/wZguNpFC7LLo/e+LefqH0MzLFuKLQD4P+yiUcKqrPCsRExUwawPArbSyxQyUIPEW51JKUjKG6t3FGeBfTZHfwD47PFJhkC7IvfC9A24REVLEWv2fqYafenw2BXhdek9EEYwmeqj5AQbBE/EhCUe0hz8XPlBhEEDx6u4sEA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=JKAdgL/M72Yemclbrhpt/a5hU1DhcRFASdclJm4Hy6XHYrbEUPxaWA1a1euv/NAW7L0FvVzsGSncEsOtyvDq9WbvWTR5pTomVYUiGd7aYIVx/lZQmIWFGXY4SJkrbz69pEsfFkAv4fJQJD/UTSw0FFXGIvDBWG/F1ES42dbski74UYBhQvo59fT0thU72BO/RmIrfXmumnm+LcLpI+0Ard9ueZs5Yyo9HURXVZZ82fOnSf/cYkbuA/McVt8IxG+5+9Il+a2sNOdOBypauYC+ykIZyXLAu5LIXyn+ZwXAvbQIhSrdFTzvKcjvABJRQv0fwrBqwG4jdxCNVZhK/w6h5A==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Mon, 18 Sep 2023 15:10:10 +0000
  • Ironport-data: A9a23:f2TJEKJ+lq7qWpHOFE+R95QlxSXFcZb7ZxGr2PjKsXjdYENS02dWn TNMWj2HOq3cY2T9Ko93Oo6zphwO65OAyYRkTlZlqX01Q3x08seUXt7xwmUcnc+xBpaaEB84t ZV2hv3odp1coqr0/0/1WlTZhSAhk/nOHvylULKs1hlZHWdMUD0mhQ9oh9k3i4tphcnRKw6Ws Jb5rta31GWNglaYCUpKrfrZwP9TlK6q4mhA7wVuPakjUGL2zBH5MrpOfcldEFOgKmVkNrbSb /rOyri/4lTY838FYj9yuu+mGqGiaue60Tmm0hK6aYD76vRxjnVaPpIAHOgdcS9qZwChxLid/ jnvWauYEm/FNoWU8AgUvoIx/ytWZcWq85efSZSzXFD6I+QrvBIAzt03ZHzaM7H09c5rHnxB9 uA5DgwJQTeivr6W3bS1bbNF05FLwMnDZOvzu1lG5BSAVLMKZM6GRK/Ho9hFwD03m8ZCW+7EY NYUYiZuaxKGZABTPlAQC9Q1m+LAanvXKmUE7g7K4/dopTGNnGSd05C0WDbRUsaNSshP2F6Ru 0rN/njjAwFcP9uaodaA2iv237GVzH+qCOr+EpWd/99MrHbCzVcvMz0xelWJudCZk2uhDoc3x 0s8v3BGQbIJ3E6hQ8T5Xha4iGWZpRNaUN1Ve8Ul7Cmdx6yS5ByWbkAUQzgEZNE4ucseQT0xy kTPj97vHSZosrCeVTSa7Lj8kN+pES0cLGtHaSpaSwIAuoDnuNtq0EKJSct/GqmoiNGzASv33 z2BsCk5gfMUkNIP0KK4u1vAhlpAu6T0c+L83S2PNkrN0++zTNfNi1CAgbQD0ct9EQ==
  • Ironport-hdrordr: A9a23:aNdiwaEGM8pKGWtDpLqEGMeALOsnbusQ8zAXPiBKJCC9E/bo8/ xG+c5w6faaslkssR0b9+xoW5PwJE80l6QFgrX5VI3KNGXbUQ2TTb2KhbGI/9SKIVydygcy78 ddmtNFebrN5VgRt7eH3OG7eexQv+VuJsqT9JnjJ3QGd3AaV0l5hT0JbDpyiidNNXN77ZxSLu vk2uN34wCOVF4wdcqBCnwMT4H41qD2fMKPW29/O/Y/gjP+9g+V1A==
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Mon, Sep 18, 2023 at 02:26:51PM +0200, Jan Beulich wrote:
> On 15.09.2023 09:43, Roger Pau Monne wrote:
> > The current logic to chose the preferred reboot method is based on the mode 
> > Xen
> > has been booted into, so if the box is booted from UEFI, the preferred 
> > reboot
> > method will be to use the ResetSystem() run time service call.
> > 
> > However, that method seems to be widely untested, and quite often leads to a
> > result similar to:
> > 
> > Hardware Dom0 shutdown: rebooting machine
> > ----[ Xen-4.18-unstable  x86_64  debug=y  Tainted:   C    ]----
> > CPU:    0
> > RIP:    e008:[<0000000000000017>] 0000000000000017
> > RFLAGS: 0000000000010202   CONTEXT: hypervisor
> > [...]
> > Xen call trace:
> >    [<0000000000000017>] R 0000000000000017
> >    [<ffff83207eff7b50>] S ffff83207eff7b50
> >    [<ffff82d0403525aa>] F machine_restart+0x1da/0x261
> >    [<ffff82d04035263c>] F apic_wait_icr_idle+0/0x37
> >    [<ffff82d040233689>] F smp_call_function_interrupt+0xc7/0xcb
> >    [<ffff82d040352f05>] F call_function_interrupt+0x20/0x34
> >    [<ffff82d04033b0d5>] F do_IRQ+0x150/0x6f3
> >    [<ffff82d0402018c2>] F common_interrupt+0x132/0x140
> >    [<ffff82d040283d33>] F 
> > arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0x113/0x129
> >    [<ffff82d04028436c>] F 
> > arch/x86/acpi/cpu_idle.c#acpi_processor_idle+0x3eb/0x5f7
> >    [<ffff82d04032a549>] F arch/x86/domain.c#idle_loop+0xec/0xee
> > 
> > ****************************************
> > Panic on CPU 0:
> > FATAL TRAP: vector = 6 (invalid opcode)
> > ****************************************
> > 
> > Which in most cases does lead to a reboot, however that's unreliable.
> > 
> > Change the default reboot preference to prefer ACPI over UEFI if available 
> > and
> > not in reduced hardware mode.
> > 
> > This is in line to what Linux does, so it's unlikely to cause issues on 
> > current
> > and future hardware, since there's a much higher chance of vendors testing
> > hardware with Linux rather than Xen.
> 
> I certainly appreciate this as a goal. However, ...
> 
> > Add a special case for one Acer model that does require being rebooted using
> > ResetSystem().  See Linux commit 0082517fa4bce for rationale.
> 
> ... this is precisely what I'd like to avoid: Needing workarounds on spec-
> conforming systems.

I wouldn't call that platform spec-conforming when ACPI reboot doesn't
work reliably on it either.  I haven't been able to find a wording on
the UEFI specification that mandates using ResetSystem() in order to
reset the platform.  I've only found this wording:

"... then the UEFI OS Loader has taken control of the platform, and
EFI will not regain control of the system until the platform is reset.
One method of resetting the platform is through the EFI Runtime
Service ResetSystem()."

And this reads to me as a mere indication that one option is to use
ResetSystem(), but that there are likely other platform specific reset
methods that are suitable to be used for OSes and still be compliant
with the UEFI spec.

> 
> > I'm not aware of using ACPI reboot causing issues on boxes that do have
> > properly implemented ResetSystem() methods.
> 
> I'm also puzzled by this statement: That Acer aspect is a clear indication
> of there being an issue.

Hm yes, I had that sentence from v1, before realizing the Acer quirk.
So there's one know issue with using ACPI as the default reboot
method vs many issues when using the UEFI one.

> Plus it's quite easy to see that hooks may be put
> in place by various firmware components that would then be used to make
> certain adjustments to the platform, ahead of an orderly reboot / shutdown.

Well, I very much doubt any vendor would rely on this, seeing as both
Linux and Windows both default to ACPI reboot, and the UEFI spec not
mandating the use of ResetSystem() anyway.

> > --- a/xen/arch/x86/shutdown.c
> > +++ b/xen/arch/x86/shutdown.c
> > @@ -150,19 +150,20 @@ static void default_reboot_type(void)
> >  
> >      if ( xen_guest )
> >          reboot_type = BOOT_XEN;
> > +    else if ( !acpi_disabled && !acpi_gbl_reduced_hardware )
> > +        reboot_type = BOOT_ACPI;
> >      else if ( efi_enabled(EFI_RS) )
> >          reboot_type = BOOT_EFI;
> > -    else if ( acpi_disabled )
> > -        reboot_type = BOOT_KBD;
> >      else
> > -        reboot_type = BOOT_ACPI;
> > +        reboot_type = BOOT_KBD;
> >  }
> >  
> >  static int __init cf_check override_reboot(const struct dmi_system_id *d)
> >  {
> >      enum reboot_type type = (long)d->driver_data;
> >  
> > -    if ( type == BOOT_ACPI && acpi_disabled )
> > +    if ( (type == BOOT_ACPI && acpi_disabled) ||
> > +         (type == BOOT_EFI && !efi_enabled(EFI_RS)) )
> >          type = BOOT_KBD;
> 
> I guess I don't follow this adjustment: Why would we fall back to KBD
> first thing? Wouldn't it make sense to try ACPI first if EFI cannot
> be used?

This is IMO a weird corner case, we have a explicit request to use one
reboot method, but we cannot do so because the component is disabled.
I've assumed that falling back to KBD was the safest option.

For example if we have to explicitly reboot using UEFI it's likely
because ACPI (the proposed default method) is not suitable, and hence
falling back to ACPI here won't help.

> And go further to KBD only if ACPI then also turns out
> disabled (a mode that Xen quite likely won't correctly operate in
> anymore anyway, due to bitrot)?
> 
> As an aside, KBD likely is unusable on hw-reduced systems, for there
> simply not being a legacy keyboard controller. Instead we may need to
> fall back to CF9 in such a case.

Hm, I can send a followup patch for that, but not part of this
change.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.