[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 2/5] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops

To: Andy Lutomirski <luto@xxxxxxxxxx>
From: Borislav Petkov <bp@xxxxxxxxx>
Date: Mon, 14 Mar 2016 13:02:03 +0100
Cc: KVM list <kvm@xxxxxxxxxxxxxxx>, Peter Zijlstra <peterz@xxxxxxxxxxxxx>, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>, X86 ML <x86@xxxxxxxxxx>, "linux-kernel@xxxxxxxxxxxxxxx" <linux-kernel@xxxxxxxxxxxxxxx>, xen-devel <Xen-devel@xxxxxxxxxxxxx>, Paolo Bonzini <pbonzini@xxxxxxxxxx>, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>, Arjan van de Ven <arjan@xxxxxxxxxxxxxxx>
Delivery-date: Mon, 14 Mar 2016 12:02:45 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Sat, Mar 12, 2016 at 10:08:49AM -0800, Andy Lutomirski wrote:
> This demotes an OOPS and likely panic due to a failed non-"safe" MSR
> access to a WARN_ONCE and, for RDMSR, a return value of zero.  If
> panic_on_oops is set, then failed unsafe MSR accesses will still
> oops and panic.
> 
> To be clear, this type of failure should *not* happen.  This patch
> exists to minimize the chance of nasty undebuggable failures due on
> systems that used to work due to a now-fixed CONFIG_PARAVIRT=y bug.
> 
> Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxx>
> ---
>  arch/x86/include/asm/msr.h | 10 ++++++++--
>  arch/x86/mm/extable.c      | 33 +++++++++++++++++++++++++++++++++
>  2 files changed, 41 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
> index 93fb7c1cffda..1487054a1a70 100644
> --- a/arch/x86/include/asm/msr.h
> +++ b/arch/x86/include/asm/msr.h
> @@ -92,7 +92,10 @@ static inline unsigned long long native_read_msr(unsigned 
> int msr)
>  {
>       DECLARE_ARGS(val, low, high);
>  
> -     asm volatile("rdmsr" : EAX_EDX_RET(val, low, high) : "c" (msr));
> +     asm volatile("1: rdmsr\n"
> +                  "2:\n"
> +                  _ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_rdmsr_unsafe)
> +                  : EAX_EDX_RET(val, low, high) : "c" (msr));
>       if (msr_tracepoint_active(__tracepoint_read_msr))
>               do_trace_read_msr(msr, EAX_EDX_VAL(val, low, high), 0);
>       return EAX_EDX_VAL(val, low, high);
> @@ -119,7 +122,10 @@ static inline unsigned long long 
> native_read_msr_safe(unsigned int msr,
>  static inline void native_write_msr(unsigned int msr,
>                                   unsigned low, unsigned high)
>  {
> -     asm volatile("wrmsr" : : "c" (msr), "a"(low), "d" (high) : "memory");
> +     asm volatile("1: wrmsr\n"
> +                  "2:\n"
> +                  _ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_wrmsr_unsafe)

This might be a good idea:

[    0.220066] cpuidle: using governor menu
[    0.224000] ------------[ cut here ]------------
[    0.224000] WARNING: CPU: 0 PID: 1 at arch/x86/mm/extable.c:74 
ex_handler_wrmsr_unsafe+0x73/0x80()
[    0.224000] unchecked MSR access error: WRMSR to 0xdeadbeef (tried to write 
0x000000000000caca)
[    0.224000] Modules linked in:
[    0.224000] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.5.0-rc7+ #7
[    0.224000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.7.5-20140531_083030-gandalf 04/01/2014
[    0.224000]  0000000000000000 ffff88007c0d7c08 ffffffff812f13a3 
ffff88007c0d7c50
[    0.224000]  ffffffff81a40ffe ffff88007c0d7c40 ffffffff8105c3b1 
ffffffff81717710
[    0.224000]  ffff88007c0d7d18 0000000000000000 ffffffff816207d0 
0000000000000000
[    0.224000] Call Trace:
[    0.224000]  [<ffffffff812f13a3>] dump_stack+0x67/0x94
[    0.224000]  [<ffffffff8105c3b1>] warn_slowpath_common+0x91/0xd0
[    0.224000]  [<ffffffff816207d0>] ? amd_cpu_notify+0x40/0x40
[    0.224000]  [<ffffffff8105c43c>] warn_slowpath_fmt+0x4c/0x50
[    0.224000]  [<ffffffff816207d0>] ? amd_cpu_notify+0x40/0x40
[    0.224000]  [<ffffffff8131de53>] ? __this_cpu_preempt_check+0x13/0x20
[    0.224000]  [<ffffffff8104efe3>] ex_handler_wrmsr_unsafe+0x73/0x80

and it looks helpful and all but when you do it pretty early - for
example I added a

         wrmsrl(0xdeadbeef, 0xcafe);

at the end of pat_bsp_init() and the machine explodes with an early
panic. I'm wondering what is better - early panic or an early #GP from a
missing MSR.

And more specifically, can we do better to handle the early case
gracefully too?

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] [PATCH v4 2/5] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
  - From: Andy Lutomirski

References:
- [Xen-devel] [PATCH v4 0/5] [PATCH v3 0/5] Improve non-"safe" MSR access failure handling
  - From: Andy Lutomirski
- [Xen-devel] [PATCH v4 2/5] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
  - From: Andy Lutomirski

Prev by Date: Re: [Xen-devel] [PATCH v8]xen: sched: convert RTDS from time to event driven model
Next by Date: Re: [Xen-devel] [patch 1/4] hotplug: Prevent alloc/free of irq descriptors during cpu up/down
Previous by thread: [Xen-devel] [PATCH v4 2/5] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
Next by thread: Re: [Xen-devel] [PATCH v4 2/5] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.