[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Resume from suspend to RAM broken when using early microcode updates



>>> On 11.04.18 at 14:46, <simon@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> Jan Beulich:
>>>>> On 11.04.18 at 14:11, <JBeulich@xxxxxxxx> wrote:
>>>>>> On 11.04.18 at 14:01, <simon@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>>>> Andrew Cooper:
>>>>> On 11/04/18 12:48, Simon Gaiser wrote:
>>>>>> Hi,
>>>>>>
>>>>>> when I use early microcode loading with the microcode update with the
>>>>>> BTI mitigations, resuming from suspend to RAM is broken.
>>>>>>
>>>>>> Based on added logging to enter_state() (from power.c) it doesn't
>>>>>> survive the local_irq_restore(flags) call (at least a printk() after the
>>>>>> call doesn't output anything on the serial console).
>>>>>>
>>>>>> I guess that some irq handler tries to use IBRS/IBPB. But the microcode
>>>>>> is only loaded later.
>>>>>>
>>>>>> If I simply move the microcode_resume_cpu(0) directly before the
>>>>>> local_irq_restore(flags) everything seems to work fine. But I'm not sure
>>>>>> if this has unintended consequences.
>>>>>>
>>>>>> I tested the above with Xen 4.8.3 from Qubes which includes the BTI and
>>>>>> microcode patches from staging-4.8. AFAICS there are no commits which
>>>>>> changes the affected code or other commits which sound relevant so this
>>>>>> probably affected also all the newer branches.
>>>>>
>>>>> S3 support is a very unloved area of the hypervisor.
>>>>>
>>>>> Yes - we definitely need to get microcode reloaded before interrupts are
>>>>> enabled.
>>>>
>>>> Do you see any problems with simply moving microcode_resume_cpu(0)
>>>> directly before the local_irq_restore(flags) call? (I'm not familiar
>>>> with the code at all and (early) resume handling sounds like something
>>>> which is easy to break in non obvious ways)
>>>
>>> Yes, there would be a problem: microcode_resume_cpu()
>>> spin_lock()-s almost first thing, and this would break our
>>> (simplistic) lock checking. Putting it also ahead of
>>> spin_debug_enable() should work otoh.
>>>
>>> Once at it, cpufreq_add_cpu() should be moved ahead of the
>>> enable_cpu label as well, as cpufreq_del_cpu() wasn't called
>>> yet at the point of the only goto to that label.
>> 
>> And I think console_end_sync() want to be moved earlier then
>> as well.
> 
> Where exactly? console_end_sync() seems to match the position of
> console_start_sync().

The question isn't symmetry with the start_sync, but the fact that at
the right log level (and on big systems) microcode updates can be
quite verbose. We don't want all this output to go out in sync mode,
I think, albeit then again doing the output in normal mode may mean
some of it gets discarded (but personally I think that's acceptable).

As to where exactly, the easiest seems to be to hand you a patch.
Please give this a try. Of course none of this addresses a possible
NMI or #MC occurring before the microcode loading.

Jan

x86: correct ordering of operations during S3 resume

Microcode loading needs to happen before re-enabling interrupts, in case
only updated microcode allows the use of e.g. the SPEC_{CTRL,CMD} MSRs.
Otoh it doesn't need to happen at all when we didn't suspend in the
first place. It needs to happen before spin_debug_enable() though, as it
acquires a lock and hence would otherwise make
common/spinlock.c:check_lock() unhappy. As micrcode loading can be
pretty verbose, also make sure it only runs after console_end_sync().

cpufreq_add_cpu() doesn't need calling on the only "goto enable_cpu"
path, which sits ahead of cpufreq_del_cpu().

Reported-by: Simon Gaiser <simon@xxxxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>

--- unstable.orig/xen/arch/x86/acpi/power.c
+++ unstable/xen/arch/x86/acpi/power.c
@@ -203,6 +203,7 @@ static int enter_state(u32 state)
         printk(XENLOG_ERR "Some devices failed to power down.");
         system_state = SYS_STATE_resume;
         device_power_up(error);
+        console_end_sync();
         error = -EIO;
         goto done;
     }
@@ -243,17 +244,19 @@ static int enter_state(u32 state)
     if ( (state == ACPI_STATE_S3) && error )
         tboot_s3_error(error);
 
+    console_end_sync();
+
+    microcode_resume_cpu(0);
+
  done:
     spin_debug_enable();
     local_irq_restore(flags);
-    console_end_sync();
     acpi_sleep_post(state);
     if ( hvm_cpu_up() )
         BUG();
+    cpufreq_add_cpu(0);
 
  enable_cpu:
-    cpufreq_add_cpu(0);
-    microcode_resume_cpu(0);
     rcu_barrier();
     mtrr_aps_sync_begin();
     enable_nonboot_cpus();



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.