[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check



>>> On 14.05.13 at 17:29, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx> wrote:
> Christoph Egger wrote:
>> On 13.05.13 17:21, Liu, Jinsong wrote:
>>> Christoph Egger wrote:
>>>> On 13.05.13 15:35, Liu, Jinsong wrote:
>>>>> Christoph Egger wrote:
>>>>>> On 13.05.13 12:44, Liu, Jinsong wrote:
>>>>>>>>> Please refer to the description of patch 2/2, especially
>>>>>>>>> 
>>>>>>>>>     * For dom0, if really need check, it should check whether
>>>>>>>>>     vMCE injection for dom0 ready (say, exception trap bounce
>>>>>>>>>     check, which has been done at inject_vmce()), not check
>>>>>>>>>     dom0 mcelog ready (which has been done at mce_softirq()
>>>>>>>>> before send global virq to dom0). 
>>>>>>>>> 
>>>>>>>>> Which means before hypervisor send error log via virq to dom0,
>>>>>>>>> current code has checked whether mcelog ready at dom0 or not
>>>>>>>>> --> that's the right place for your concern, and it has indeed
>>>>>>>>> done check.
>>>>>>>> 
>>>>>>>> I think, I do not understand the patch description.
>>>>>>>> Let me rephrase if I do now due to this discussion:
>>>>>>>> 
>>>>>>>> The mcelog driver in Dom0 registers itself to the virq handler
>>>>>>>> to provide the machine check logging service.
>>>>>>> 
>>>>>>> Yes.
>>>>>>> 
>>>>>>>> Xen checks if a virq handler has been registered
>>>>>>> 
>>>>>>> Yes.
>>>>>>> 
>>>>>>>> but does not check
>>>>>>>> if the dom0 handler is actually ready to take the errors.
>>>>>>>> This patch fixes this.
>>>>>>>> 
>>>>>>> 
>>>>>>> I'm not clear your question 'does not check if the dom0 handler
>>>>>>> is actually ready to take the errors'. Could you elaborate more
>>>>>>> your concern at this point?
>>>>>> 
>>>>>> Yes, this is exactly my question. You got it.
>>>>>> 
>>>>>> Christoph
>>>>> 
>>>>> Hmm, seems you misunderstand my word. What I meant is,
>>>>> I don't know what you are asking by 'does not check if the dom0
>>>>> handler is actually ready to take the errors'. Could you elaborate
>>>>> more your question?
>>>> 
>>>> I reread your patch description:
>>>> 
>>>>> * For dom0, if really need check, it should check whether vMCE
>>>>                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>> injection for dom0 ready (say, exception trap bounce check, which
>>>>   ^^^^^^^^^^^^^^^^^^^^^^^^
>>>>> has been done at inject_vmce()), not check dom0 mcelog ready (which
>>>>>                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>> has been done at mce_softirq() before send global virq to dom0).
>>>> 
>>>> My question is: Is it possible when mcelog driver registers
>>>> the virq handler that it cannot deal with machine check errors
>>>> immediately? 
>>>> 
>>>> Christoph
>>> 
>>> Yes, there is a small time window when dom0 mcelog driver init:
>>> 
>>>   The last step of xen_late_init_mcelog()
>>>     --> bind_virq_for_mce
>>>       --> bind_virq_to_irqhandler
>>> 
>>>         irq = bind_virq_to_irq(virq, cpu);
>>>         if (irq < 0)
>>>                 return irq;
>>>         retval = request_irq(irq, handler, irqflags, devname,
>>> dev_id); 
>>> 
>>> Time window: between bind_virq_to_irq and request_irq
>>> 
>>> If hypervisor inject virq to notify error log to dom0 exactly during
>>> this init time window, dom0 no-ops handler will not fetch error log.
>>> However, it's OK since error log in hypervisor is still there, until
>>> next time when hypervisor inject virq again, dom0 mcelog driver will
>>> fetch them. Considering it occurs rarely (and no harm), I think it's
>>> OK. 
>>> 
>>> Patch 2/2 itself is not to fix this issue. Patch 2/2 is to remove
>>> is_vmce_ready() check since it's problematic (wrong check),
>>> overkilled (kill system unnecessary), deprecated (vMCE is not bound
>>> to host MCE any more) and redundant (both vmce trap callback and
>>> mcelog virq has been checked in other hypervisor code points).
>>> 
>>> I agree the description is not clear at some points, so I will
>>> re-write the description later.
>> 
>> Thanks. From reading the new description
>> I think, I got it now why this check is wrong:
>> 
>> Whenever a vmce is injected into dom0 is_vmce_ready() checks if dom0
>> installed an virq handler for logging and in case dom0 does no logging
>> dom0 is crashed instead.
>> 
>> Assuming above is right then:
>> the code path in mcaction.c not only runs when a vmce is injected into
>> dom0, it also runs when a vmce is injected into a domU.
>> So that means when dom0 does no logging then it is crashed whenever
>> a vMCE is injected into any guest. OUCH!
>> 
>> Make Jan happy with his 'goto vmce_failed' concern and you have my
>> ack. 
>> 
>> Christoph
> 
> Jan, any more concern?

No, I was fine with your earlier explanation.

Christoph - I wasn't sure whether your above statement implied
Jinsong needing to do further changes, or whether you really just
referred to the initial discussion Jinsong and I had which got
already settled. Please clarify.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.