[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check


  • To: Christoph Egger <chegger@xxxxxxxxx>
  • From: "Liu, Jinsong" <jinsong.liu@xxxxxxxxx>
  • Date: Tue, 14 May 2013 15:29:18 +0000
  • Accept-language: en-US
  • Cc: Jan Beulich <JBeulich@xxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxx>
  • Delivery-date: Tue, 14 May 2013 15:29:52 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xen.org>
  • Thread-index: AQHOUIrDGsdbS84pvUuls/cWiRHQfJkEziRw
  • Thread-topic: [Xen-devel] [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check

Christoph Egger wrote:
> On 13.05.13 17:21, Liu, Jinsong wrote:
>> Christoph Egger wrote:
>>> On 13.05.13 15:35, Liu, Jinsong wrote:
>>>> Christoph Egger wrote:
>>>>> On 13.05.13 12:44, Liu, Jinsong wrote:
>>>>>>>> Please refer to the description of patch 2/2, especially
>>>>>>>> 
>>>>>>>>     * For dom0, if really need check, it should check whether
>>>>>>>>     vMCE injection for dom0 ready (say, exception trap bounce
>>>>>>>>     check, which has been done at inject_vmce()), not check
>>>>>>>>     dom0 mcelog ready (which has been done at mce_softirq()
>>>>>>>> before send global virq to dom0). 
>>>>>>>> 
>>>>>>>> Which means before hypervisor send error log via virq to dom0,
>>>>>>>> current code has checked whether mcelog ready at dom0 or not
>>>>>>>> --> that's the right place for your concern, and it has indeed
>>>>>>>> done check.
>>>>>>> 
>>>>>>> I think, I do not understand the patch description.
>>>>>>> Let me rephrase if I do now due to this discussion:
>>>>>>> 
>>>>>>> The mcelog driver in Dom0 registers itself to the virq handler
>>>>>>> to provide the machine check logging service.
>>>>>> 
>>>>>> Yes.
>>>>>> 
>>>>>>> Xen checks if a virq handler has been registered
>>>>>> 
>>>>>> Yes.
>>>>>> 
>>>>>>> but does not check
>>>>>>> if the dom0 handler is actually ready to take the errors.
>>>>>>> This patch fixes this.
>>>>>>> 
>>>>>> 
>>>>>> I'm not clear your question 'does not check if the dom0 handler
>>>>>> is actually ready to take the errors'. Could you elaborate more
>>>>>> your concern at this point?
>>>>> 
>>>>> Yes, this is exactly my question. You got it.
>>>>> 
>>>>> Christoph
>>>> 
>>>> Hmm, seems you misunderstand my word. What I meant is,
>>>> I don't know what you are asking by 'does not check if the dom0
>>>> handler is actually ready to take the errors'. Could you elaborate
>>>> more your question?
>>> 
>>> I reread your patch description:
>>> 
>>>> * For dom0, if really need check, it should check whether vMCE
>>>                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>> injection for dom0 ready (say, exception trap bounce check, which
>>>   ^^^^^^^^^^^^^^^^^^^^^^^^
>>>> has been done at inject_vmce()), not check dom0 mcelog ready (which
>>>>                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>> has been done at mce_softirq() before send global virq to dom0).
>>> 
>>> My question is: Is it possible when mcelog driver registers
>>> the virq handler that it cannot deal with machine check errors
>>> immediately? 
>>> 
>>> Christoph
>> 
>> Yes, there is a small time window when dom0 mcelog driver init:
>> 
>>   The last step of xen_late_init_mcelog()
>>     --> bind_virq_for_mce
>>       --> bind_virq_to_irqhandler
>> 
>>         irq = bind_virq_to_irq(virq, cpu);
>>         if (irq < 0)
>>                 return irq;
>>         retval = request_irq(irq, handler, irqflags, devname,
>> dev_id); 
>> 
>> Time window: between bind_virq_to_irq and request_irq
>> 
>> If hypervisor inject virq to notify error log to dom0 exactly during
>> this init time window, dom0 no-ops handler will not fetch error log.
>> However, it's OK since error log in hypervisor is still there, until
>> next time when hypervisor inject virq again, dom0 mcelog driver will
>> fetch them. Considering it occurs rarely (and no harm), I think it's
>> OK. 
>> 
>> Patch 2/2 itself is not to fix this issue. Patch 2/2 is to remove
>> is_vmce_ready() check since it's problematic (wrong check),
>> overkilled (kill system unnecessary), deprecated (vMCE is not bound
>> to host MCE any more) and redundant (both vmce trap callback and
>> mcelog virq has been checked in other hypervisor code points).
>> 
>> I agree the description is not clear at some points, so I will
>> re-write the description later.
> 
> Thanks. From reading the new description
> I think, I got it now why this check is wrong:
> 
> Whenever a vmce is injected into dom0 is_vmce_ready() checks if dom0
> installed an virq handler for logging and in case dom0 does no logging
> dom0 is crashed instead.
> 
> Assuming above is right then:
> the code path in mcaction.c not only runs when a vmce is injected into
> dom0, it also runs when a vmce is injected into a domU.
> So that means when dom0 does no logging then it is crashed whenever
> a vMCE is injected into any guest. OUCH!
> 
> Make Jan happy with his 'goto vmce_failed' concern and you have my
> ack. 
> 
> Christoph

Thanks ack!

Jan, any more concern?

Thanks,
Jinsong
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.