[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops

To: Andy Lutomirski <luto@xxxxxxxxxxxxxx>
From: Ingo Molnar <mingo@xxxxxxxxxx>
Date: Mon, 21 Sep 2015 10:46:42 +0200
Cc: KVM list <kvm@xxxxxxxxxxxxxxx>, Peter Zijlstra <peterz@xxxxxxxxxxxxx>, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>, the arch/x86 maintainers <x86@xxxxxxxxxx>, Linux Kernel Mailing List <linux-kernel@xxxxxxxxxxxxxxx>, xen-devel <Xen-devel@xxxxxxxxxxxxx>, Paolo Bonzini <pbonzini@xxxxxxxxxx>, Thomas Gleixner <tglx@xxxxxxxxxxxxx>, Arjan van de Ven <arjan@xxxxxxxxxxxxxxx>, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Delivery-date: Mon, 21 Sep 2015 08:46:58 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

* Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:

> On Sep 20, 2015 5:15 PM, "Linus Torvalds" <torvalds@xxxxxxxxxxxxxxxxxxxx> 
> wrote:
> >
> > On Sun, Sep 20, 2015 at 5:02 PM, Andy Lutomirski <luto@xxxxxxxxxx> wrote:
> > > This demotes an OOPS and likely panic due to a failed non-"safe" MSR
> > > access to a WARN_ON_ONCE and a return of zero (in the RDMSR case).
> > > We still write a pr_info entry unconditionally for debugging.
> >
> > No, this is wrong.
> >
> > If you really want to do something like this, then just make all MSR reads 
> > safe. So the only difference between "safe" and "unsafe" is that the unsafe 
> > version just doesn't check the return value, and silently just returns zero 
> > for reads (or writes nothing).
> >
> > To quote Obi-Wan: "Use the exception table, Luke".
> >
> > Because decoding instructions is just too ugly. We'll do it for CPU errata 
> > where we might have to do it for user space code too (ie the AMD prefetch 
> > mess), but for code that _we_ control? Hell no.
> >
> > So NAK on this.
> 
> My personal preference is to just not do this at all.  A couple people 
> disagree.  
> If we make the unsafe variants not oops, then I think we want to have the 
> nice 
> loud warning, since these issues are bugs if they happen.
> 
> We could certainly use the exception table for this, but it'll result in 
> bigger 
> core, since each MSR access will need an exception table entry and an 
> associated 
> fixup to call some helper that warns and sets the result to zero.
> 
> I'd be happy to implement that, but only if it'll be applied. Otherwise I'd 
> rather just drop this patch and keep the rest of the series.

Linus, what's your preference?

Due to the bug mentioned earlier in this thread all MSR reads are currently 
'safe' 
on all the major Linux distros (which all have CONFIG_PARAVIRT=y), i.e. by 
'fixing' them we'd reintroduce random crashes into various fragile pieces of 
code...

To add insult to injury, the current 'silently safe by accident' MSR code isn't 
so 
safe: because it leaves the result of the read uninitialized...

To fix this all I'd really like to have:

 - safe MSR reads by default (i.e. never boot crash the kernel on some rare 
   condition - which to most users is either a silent boot hang or an instant 
   restart). Historicaly we had a stream of 'silly boot crashes' due to MSR 
reads 
   that generate a #GPF. They make Linux less usable around the edges, 
especially 
   in the x86 non-server (desktop) space where most hardware vendors are either 
   openly Linux hostile, or, at best, Linux oblivious.

 - proper result-zeroing behavior on exceptions

 - and we should also generate _some_ sort of warning when MSR exceptions happen
   in an 'unintended' fashion.

Maybe the warning could be put under a (default-enabled) config option for the 
size conscious.

Or we could extend exception table entry encoding to include a 'warning bit', 
to 
not bloat the kernel. If the exception handler code encounters such an 
exception 
it would generate a one-time warning for that entry, but otherwise not crash 
the 
kernel and continue execution with an all-zeroes result for the MSR read.

Thanks,

        Ingo

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
  - From: Linus Torvalds
- Re: [Xen-devel] [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
  - From: Paolo Bonzini

References:
- [Xen-devel] [PATCH v2 0/2] x86/msr: MSR access failure changes
  - From: Andy Lutomirski
- [Xen-devel] [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
  - From: Andy Lutomirski
- Re: [Xen-devel] [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
  - From: Linus Torvalds
- Re: [Xen-devel] [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
  - From: Andy Lutomirski

Prev by Date: Re: [Xen-devel] [qemu-mainline bisection] complete test-armhf-armhf-xl-vhd
Next by Date: Re: [Xen-devel] [Patch RFC 00/13] VT-d Asynchronous Device-TLB Flush for ATS Device
Previous by thread: Re: [Xen-devel] [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
Next by thread: Re: [Xen-devel] [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.