[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [Xen-devel] [PATCH] X86 MCE: Add SRAR handler
Jan Beulich wrote: >>>> On 08.10.11 at 10:29, "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx> >>>> wrote: > >> >>> -----Original Message----- >>> From: Jan Beulich [mailto:JBeulich@xxxxxxxx] >>> Sent: Friday, September 30, 2011 3:25 PM >>> To: Liu, Jinsong; Jiang, Yunhong >>> Cc: keir.xen@xxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx >>> Subject: RE: [Xen-devel] [PATCH] X86 MCE: Add SRAR handler >>> >>>>>> On 30.09.11 at 04:51, "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx> >>>>>> wrote: >>> >>>> >>>>> -----Original Message----- >>>>> From: Jan Beulich [mailto:JBeulich@xxxxxxxx] >>>>> This made me look at the current source, and there I see in >>>>> mce_urgent_action() >>>>> >>>>> if ( !(gstatus & MCG_STATUS_RIPV) && !guest_mode(regs)) >>>>> return -1; >>>>> >>>>> which I think should say ... _EIPV and use || instead. Thoughts? >>>> >>>> I think this code means, if the error happens in hypervisor mode >>>> (i.e. !guest_mode()), and RIPV indicate the RIP in stack can't be >>>> restarted, we have to panic. >>> >>> Then the guest_mode() check still lacks an extra check of EIPV, like >>> >>> if ( !(gstatus & MCG_STATUS_RIPV) && >>> (!(gstatus & MCG_STATUS_EIPV) || !guest_mode(regs))) >>> return -1; >>> >> >> The RIPV is not related to the EIPV. RIPV means the context saved in >> the stack can't be restarted anymore. According to the SDM, RIPV >> means "execution can be restarted reliably at the instruction >> pointed to by the instruction pointer pushed on the stack". It's not >> about error happened synchronously or asynchronously. The point is, >> if the program is running in hypervisor context, and RIPV tells us >> that the program can't be restarted, we can't do anything but panic, >> because we can't switch context while we are in xen. So this code >> have nothing to do with EIPV. > > I continue to disagree (including the statement in your other > response): RIPV only tells us whether we can resume, not in which > context the error occurred. EIPV tells us whether, by looking at the > saved registers, we can determine the context that the error occurred > in. Since with !RIPV > we have to determine in what context the error occurred in order to > decide whether to panic or just kill a guest, we can't ignore EIPV > (and if it's not set we have to assume the worst case, since even if > the registers indicate guest mode the error may have occurred in > hypervisor context or accessing hypervisor structures [consider e.g. a > data load error during a GDT access]). > > Jan Yes, I agree EIPV=0 may indicate async error, but I think your solution *overkilled* most cases (i.e. the real guest instruction fetch error). Our idea is, * xen mce would flush prefetched instruction so we can delay handle it until if real need; * a h/w error will not disappear, but if it was not being *consumed*, it's OK for system keep going (like SRAO error which do not need s/w handle immediately); Suppose an async instruction fetch error (RIVP=EIVP=0), triggered at guest context but instruction prefetch hypervisor context. The scenario is, * at xen mce, the prefetched instruction has been flushed. xen mce handler needn't panic, instead it mark the page as broken page, then trigger vmce to guest; * guest may kill app, kernel thread, guest itself, or whatever; The error is still an error, w/ 2 possibilities in the future: 1. it may not be consumed as an SRAR error, system keep going, h/w mechanism may detect a SRAO error (i.e. memroy scrub) at some time point and handled then; 2. it may be consumed at some time point and a SRAR error triggered again. At this time, 1). if srar occurred at hypervisor context, xen will panic. or, 2). if srar occurred at guest context, xen kill the guest as a malicious one (as what the 2nd patch do), and move the page to broken page list; Considering the rare possibility of the above case, I think it's acceptable to handle it in this way. Thoughts? Thanks, Jinsong _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |