[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Yet another S3 issue in Xen 4.14


  • To: Jan Beulich <jbeulich@xxxxxxxx>, Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
  • From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Date: Fri, 2 Oct 2020 16:42:50 +0100
  • Authentication-results: esa6.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none
  • Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Fri, 02 Oct 2020 15:43:00 +0000
  • Ironport-sdr: nd7SNFVxjW5sRcXrQRWBKfq5bq1a61ky3GiRdH5dh+nokrGVxvpLLORvv6oJFes4fAKeHJKH7G FObcTadzFUsgQHvrmzvgnQ5pw3YcBWMmTeXYkVR41vjBlZ4tgw6kCoJ/Oi33UWfuQSb+SUq0b4 xWIpgEyVpWAyX32iAfNY5cDPQYlJ3YzO3Ph1cbWeveQIJCkYkBqyfHGbzLPkBYy5m9CGug14CK 1S65GYSgu+A/SpnWLKWMJ9o17/Y6ACkZ3pH8k5k/pDoC7uI85JRXv0ox8v4KZY7MjkugiCK4uu 6AY=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 02/10/2020 16:39, Jan Beulich wrote:
> On 02.10.2020 17:08, Marek Marczykowski-Górecki wrote:
>> I've done another bisect on the commit broken up in separate changes
>> (https://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=shortlog;h=refs/heads/dbg-s3)
>> and the bad part seems to be this:
>>
>> From dbdb32f8c265295d6af7cd4cd0aa12b6d04a0430 Mon Sep 17 00:00:00 2001
>> From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>> Date: Fri, 2 Oct 2020 15:40:22 +0100
>> Subject: [PATCH 1/1] CR4
> Interesting - I was wild guessing so yesterday, but couldn't come
> up with even a vague reason why this would be. I think you could
> further split it up:
>
>> --- a/xen/arch/x86/acpi/power.c
>> +++ b/xen/arch/x86/acpi/power.c
>> @@ -195,7 +195,6 @@ static int enter_state(u32 state)
>>      unsigned long flags;
>>      int error;
>>      struct cpu_info *ci;
>> -    unsigned long cr4;
>>  
>>      if ( (state <= ACPI_STATE_S0) || (state > ACPI_S_STATES_MAX) )
>>          return -EINVAL;
>> @@ -270,15 +269,15 @@ static int enter_state(u32 state)
>>  
>>      system_state = SYS_STATE_resume;
>>  
>> -    /* Restore CR4 and EFER from cached values. */
>> -    cr4 = read_cr4();
>> -    write_cr4(cr4 & ~X86_CR4_MCE);
>> +    /* Restore EFER from cached value. */
>>      write_efer(read_efer());
> This one should be possible to leave in place despite ...
>
>>      device_power_up(SAVED_ALL);
>>  
>>      mcheck_init(&boot_cpu_data, false);
>> -    write_cr4(cr4);
>> +
>> +    /* Restore CR4 from cached value, now MCE is set up. */
>> +    write_cr4(read_cr4());
> ... this change.
>
> Further, while I can't see how the set_in_cr4() in mcheck_init()
> could badly interact with the CR4 writes here, another option
> might be to suppress it when system_state == SYS_STATE_resume
> && c == &boot_cpu_data (or !bsp && c == &boot_cpu_data).
>
>> --- a/xen/arch/x86/acpi/suspend.c
>> +++ b/xen/arch/x86/acpi/suspend.c
>> @@ -23,7 +23,4 @@ void save_rest_processor_state(void)
>>  void restore_rest_processor_state(void)
>>  {
>>      load_system_tables();
>> -
>> -    /* Restore full CR4 (inc MCE) now that the IDT is in place. */
>> -    write_cr4(mmu_cr4_features);
>>  }
> This one should be possible to leave in place despite the other
> changes.

We're continuing to debug in private.  mmu_cr4_features and read_cr4()
are equivalent (as expected), but very different from MINIMAL_CR4 which
is what the trampoline configures, so I think we're suffering a
CR4-related #UD/#GP somewhere in device_power_up() or mcheck_init().

Its not INVPCID.  Trying others.

~Andrew



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.