[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check


  • To: Jan Beulich <JBeulich@xxxxxxxx>, Christoph Egger <chegger@xxxxxxxxx>
  • From: "Liu, Jinsong" <jinsong.liu@xxxxxxxxx>
  • Date: Thu, 16 May 2013 05:53:25 +0000
  • Accept-language: en-US
  • Cc: xen-devel <xen-devel@xxxxxxxxxxxxx>
  • Delivery-date: Thu, 16 May 2013 05:54:01 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xen.org>
  • Thread-index: AQHOULixGsdbS84pvUuls/cWiRHQfJkHUCCg
  • Thread-topic: [Xen-devel] [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check

Jan Beulich wrote:
>>>> On 14.05.13 at 17:29, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx>
>>>> wrote: Christoph Egger wrote: On 13.05.13 17:21, Liu, Jinsong
>>>> wrote: Christoph Egger wrote: 
>>>>> On 13.05.13 15:35, Liu, Jinsong wrote:
>>>>>> Christoph Egger wrote:
>>>>>>> On 13.05.13 12:44, Liu, Jinsong wrote:
>>>>>>>>>> Please refer to the description of patch 2/2, especially
>>>>>>>>>> 
>>>>>>>>>>     * For dom0, if really need check, it should check whether
>>>>>>>>>>     vMCE injection for dom0 ready (say, exception trap bounce
>>>>>>>>>>     check, which has been done at inject_vmce()), not check
>>>>>>>>>>     dom0 mcelog ready (which has been done at mce_softirq()
>>>>>>>>>> before send global virq to dom0).
>>>>>>>>>> 
>>>>>>>>>> Which means before hypervisor send error log via virq to
>>>>>>>>>> dom0, current code has checked whether mcelog ready at dom0
>>>>>>>>>> or not --> that's the right place for your concern, and it
>>>>>>>>>> has indeed done check.
>>>>>>>>> 
>>>>>>>>> I think, I do not understand the patch description.
>>>>>>>>> Let me rephrase if I do now due to this discussion:
>>>>>>>>> 
>>>>>>>>> The mcelog driver in Dom0 registers itself to the virq handler
>>>>>>>>> to provide the machine check logging service.
>>>>>>>> 
>>>>>>>> Yes.
>>>>>>>> 
>>>>>>>>> Xen checks if a virq handler has been registered
>>>>>>>> 
>>>>>>>> Yes.
>>>>>>>> 
>>>>>>>>> but does not check
>>>>>>>>> if the dom0 handler is actually ready to take the errors.
>>>>>>>>> This patch fixes this. 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> I'm not clear your question 'does not check if the dom0 handler
>>>>>>>> is actually ready to take the errors'. Could you elaborate more
>>>>>>>> your concern at this point?
>>>>>>> 
>>>>>>> Yes, this is exactly my question. You got it.
>>>>>>> 
>>>>>>> Christoph
>>>>>> 
>>>>>> Hmm, seems you misunderstand my word. What I meant is,
>>>>>> I don't know what you are asking by 'does not check if the dom0
>>>>>> handler is actually ready to take the errors'. Could you
>>>>>> elaborate more your question?
>>>>> 
>>>>> I reread your patch description:
>>>>> 
>>>>>> * For dom0, if really need check, it should check whether vMCE
>>>>>                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>> injection for dom0 ready (say, exception trap bounce check, which
>>>>>   ^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>> has been done at inject_vmce()), not check dom0 mcelog ready
>>>>>>                                  (which
>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 
>>>>>> has been done at mce_softirq() before send global virq to dom0).
>>>>> 
>>>>> My question is: Is it possible when mcelog driver registers
>>>>> the virq handler that it cannot deal with machine check errors
>>>>> immediately? 
>>>>> 
>>>>> Christoph
>>>> 
>>>> Yes, there is a small time window when dom0 mcelog driver init:
>>>> 
>>>>   The last step of xen_late_init_mcelog()
>>>>     --> bind_virq_for_mce
>>>>       --> bind_virq_to_irqhandler
>>>> 
>>>>         irq = bind_virq_to_irq(virq, cpu);
>>>>         if (irq < 0)
>>>>                 return irq;
>>>>         retval = request_irq(irq, handler, irqflags, devname,
>>>> dev_id); 
>>>> 
>>>> Time window: between bind_virq_to_irq and request_irq
>>>> 
>>>> If hypervisor inject virq to notify error log to dom0 exactly
>>>> during this init time window, dom0 no-ops handler will not fetch
>>>> error log. However, it's OK since error log in hypervisor is still
>>>> there, until next time when hypervisor inject virq again, dom0
>>>> mcelog driver will fetch them. Considering it occurs rarely (and
>>>> no harm), I think it's OK. 
>>>> 
>>>> Patch 2/2 itself is not to fix this issue. Patch 2/2 is to remove
>>>> is_vmce_ready() check since it's problematic (wrong check),
>>>> overkilled (kill system unnecessary), deprecated (vMCE is not bound
>>>> to host MCE any more) and redundant (both vmce trap callback and
>>>> mcelog virq has been checked in other hypervisor code points).
>>>> 
>>>> I agree the description is not clear at some points, so I will
>>>> re-write the description later.
>>> 
>>> Thanks. From reading the new description
>>> I think, I got it now why this check is wrong:
>>> 
>>> Whenever a vmce is injected into dom0 is_vmce_ready() checks if dom0
>>> installed an virq handler for logging and in case dom0 does no
>>> logging dom0 is crashed instead. 
>>> 
>>> Assuming above is right then:
>>> the code path in mcaction.c not only runs when a vmce is injected
>>> into dom0, it also runs when a vmce is injected into a domU.
>>> So that means when dom0 does no logging then it is crashed whenever
>>> a vMCE is injected into any guest. OUCH!
>>> 
>>> Make Jan happy with his 'goto vmce_failed' concern and you have my
>>> ack. 
>>> 
>>> Christoph
>> 
>> Jan, any more concern?
> 
> No, I was fine with your earlier explanation.
> 
> Christoph - I wasn't sure whether your above statement implied
> Jinsong needing to do further changes, or whether you really just
> referred to the initial discussion Jinsong and I had which got
> already settled. Please clarify.
> 
> Jan

I think Christoph's statement agrees that we need patch 2/2, so ack patch 2/2 
as far as you have no more concern about 'goto vmce_failed'.
Christoph, am I right?

Thanks,
Jinsong
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.