[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check



On 16.05.13 07:53, Liu, Jinsong wrote:
> Jan Beulich wrote:
>>>>> On 14.05.13 at 17:29, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx>
>>>>> wrote: Christoph Egger wrote: On 13.05.13 17:21, Liu, Jinsong
>>>>> wrote: Christoph Egger wrote: 
>>>>>> On 13.05.13 15:35, Liu, Jinsong wrote:
>>>>>>> Christoph Egger wrote:
>>>>>>>> On 13.05.13 12:44, Liu, Jinsong wrote:
>>>>>>>>>>> Please refer to the description of patch 2/2, especially
>>>>>>>>>>>
>>>>>>>>>>>     * For dom0, if really need check, it should check whether
>>>>>>>>>>>     vMCE injection for dom0 ready (say, exception trap bounce
>>>>>>>>>>>     check, which has been done at inject_vmce()), not check
>>>>>>>>>>>     dom0 mcelog ready (which has been done at mce_softirq()
>>>>>>>>>>> before send global virq to dom0).
>>>>>>>>>>>
>>>>>>>>>>> Which means before hypervisor send error log via virq to
>>>>>>>>>>> dom0, current code has checked whether mcelog ready at dom0
>>>>>>>>>>> or not --> that's the right place for your concern, and it
>>>>>>>>>>> has indeed done check.
>>>>>>>>>>
>>>>>>>>>> I think, I do not understand the patch description.
>>>>>>>>>> Let me rephrase if I do now due to this discussion:
>>>>>>>>>>
>>>>>>>>>> The mcelog driver in Dom0 registers itself to the virq handler
>>>>>>>>>> to provide the machine check logging service.
>>>>>>>>>
>>>>>>>>> Yes.
>>>>>>>>>
>>>>>>>>>> Xen checks if a virq handler has been registered
>>>>>>>>>
>>>>>>>>> Yes.
>>>>>>>>>
>>>>>>>>>> but does not check
>>>>>>>>>> if the dom0 handler is actually ready to take the errors.
>>>>>>>>>> This patch fixes this. 
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I'm not clear your question 'does not check if the dom0 handler
>>>>>>>>> is actually ready to take the errors'. Could you elaborate more
>>>>>>>>> your concern at this point?
>>>>>>>>
>>>>>>>> Yes, this is exactly my question. You got it.
>>>>>>>>
>>>>>>>> Christoph
>>>>>>>
>>>>>>> Hmm, seems you misunderstand my word. What I meant is,
>>>>>>> I don't know what you are asking by 'does not check if the dom0
>>>>>>> handler is actually ready to take the errors'. Could you
>>>>>>> elaborate more your question?
>>>>>>
>>>>>> I reread your patch description:
>>>>>>
>>>>>>> * For dom0, if really need check, it should check whether vMCE
>>>>>>                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>> injection for dom0 ready (say, exception trap bounce check, which
>>>>>>   ^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>> has been done at inject_vmce()), not check dom0 mcelog ready
>>>>>>>                                  (which
>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 
>>>>>>> has been done at mce_softirq() before send global virq to dom0).
>>>>>>
>>>>>> My question is: Is it possible when mcelog driver registers
>>>>>> the virq handler that it cannot deal with machine check errors
>>>>>> immediately? 
>>>>>>
>>>>>> Christoph
>>>>>
>>>>> Yes, there is a small time window when dom0 mcelog driver init:
>>>>>
>>>>>   The last step of xen_late_init_mcelog()
>>>>>     --> bind_virq_for_mce
>>>>>       --> bind_virq_to_irqhandler
>>>>>
>>>>>         irq = bind_virq_to_irq(virq, cpu);
>>>>>         if (irq < 0)
>>>>>                 return irq;
>>>>>         retval = request_irq(irq, handler, irqflags, devname,
>>>>> dev_id); 
>>>>>
>>>>> Time window: between bind_virq_to_irq and request_irq
>>>>>
>>>>> If hypervisor inject virq to notify error log to dom0 exactly
>>>>> during this init time window, dom0 no-ops handler will not fetch
>>>>> error log. However, it's OK since error log in hypervisor is still
>>>>> there, until next time when hypervisor inject virq again, dom0
>>>>> mcelog driver will fetch them. Considering it occurs rarely (and
>>>>> no harm), I think it's OK. 
>>>>>
>>>>> Patch 2/2 itself is not to fix this issue. Patch 2/2 is to remove
>>>>> is_vmce_ready() check since it's problematic (wrong check),
>>>>> overkilled (kill system unnecessary), deprecated (vMCE is not bound
>>>>> to host MCE any more) and redundant (both vmce trap callback and
>>>>> mcelog virq has been checked in other hypervisor code points).
>>>>>
>>>>> I agree the description is not clear at some points, so I will
>>>>> re-write the description later.
>>>>
>>>> Thanks. From reading the new description
>>>> I think, I got it now why this check is wrong:
>>>>
>>>> Whenever a vmce is injected into dom0 is_vmce_ready() checks if dom0
>>>> installed an virq handler for logging and in case dom0 does no
>>>> logging dom0 is crashed instead. 
>>>>
>>>> Assuming above is right then:
>>>> the code path in mcaction.c not only runs when a vmce is injected
>>>> into dom0, it also runs when a vmce is injected into a domU.
>>>> So that means when dom0 does no logging then it is crashed whenever
>>>> a vMCE is injected into any guest. OUCH!
>>>>
>>>> Make Jan happy with his 'goto vmce_failed' concern and you have my
>>>> ack. 
>>>>
>>>> Christoph
>>>
>>> Jan, any more concern?
>>
>> No, I was fine with your earlier explanation.
>>
>> Christoph - I wasn't sure whether your above statement implied
>> Jinsong needing to do further changes, or whether you really just
>> referred to the initial discussion Jinsong and I had which got
>> already settled. Please clarify.
>>
>> Jan
> 
> I think Christoph's statement agrees that we need patch 2/2, so ack patch 2/2 
> as far as you have no more concern about 'goto vmce_failed'.
> Christoph, am I right?
> 

Yes, right.

Christoph


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.