[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen mce bugfix


  • To: Jan Beulich <JBeulich@xxxxxxxx>
  • From: "Liu, Jinsong" <jinsong.liu@xxxxxxxxx>
  • Date: Wed, 27 Feb 2013 12:19:36 +0000
  • Accept-language: en-US
  • Cc: "Ren, Yongjie" <yongjie.ren@xxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxx>
  • Delivery-date: Wed, 27 Feb 2013 12:20:25 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xen.org>
  • Thread-index: AQHOFN5kqZTiEd/FTjyR1sPJXJqZr5iNml7w
  • Thread-topic: Xen mce bugfix

Jan Beulich wrote:
>>>> On 27.02.13 at 12:08, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx> wrote:
>> Jan Beulich wrote:
>>>>>> On 27.02.13 at 11:37, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx>
>>>>>> wrote: 
>>>> The reason of the former patch to clear MCi_ADDR/MISC is that it's
>>>>            recommended by Intel SDM: LOG MCA REGISTER:
>>>>            SAVE IA32_MCi_STATUS;
>>>>            If MISCV in IA32_MCi_STATUS
>>>>            THEN
>>>>                    SAVE IA32_MCi_MISC;
>>>>            FI;
>>>>            IF ADDRV in IA32_MCi_STATUS
>>>>            THEN
>>>>                    SAVE IA32_MCi_ADDR;
>>>>            FI;
>>>>            IF CLEAR_MC_BANK = TRUE
>>>>            THEN
>>>>                    SET all 0 to IA32_MCi_STATUS;
>>>>            If MISCV in IA32_MCi_STATUS
>>>>            THEN
>>>>                    SET all 0 to IA32_MCi_MISC;
>>>>            FI;
>>>>            IF ADDRV in IA32_MCi_STATUS
>>>>            THEN
>>>>                    SET all 0 to IA32_MCi_ADDR;
>>>>            FI;
>>>> 
>>>> For Xen mce, it's meaningful to read MCi_ADDR/MISC only when real
>>>> error occur (which indicated by MCi_STATUS), so only clear
>>>> MCi_STATUS at mce handler is an acceptable work around -- after
>>>> all, to read MCi_ADDR/MISC is pointless if MCi_STATUS is 0.
>>> 
>>> So then what - revert your original patch (and ignore the SDM)?
>>> I'm not in favor of this...
>> 
>> Not revert entire 23327, but only use this patch to revert
>> MCi_ADDR/MISC clear. 
>> 
>> I also agree it's not good, but currently seems we don't have a
>> simple and clean way to fix it, except we spend much time to to
>> update xen-mceinj *tools* -- even so it's low-priority?
> 
> No, fixing the tool seems unnecessary for this problem, all we
> need is a way to avoid the problematic MSR writes when finishing
> an injected MCE. That's fully contained to the hypervisor.
> 
> Jan

The problem comes from xen-mceinj tools simulate *some* banks for *some* cpus 
(intpose_arr array). Tools sometimes access simulated value, sometimes access 
real hardware --> that's problematic syntax and what really need fix.

Thanks,
Jinsong
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.