[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check

>>> On 06.05.13 at 10:54, Christoph Egger <chegger@xxxxxxxxx> wrote:
> On 03.05.13 17:51, Liu, Jinsong wrote:
>> Jan Beulich wrote:
>>>>>> On 03.05.13 at 16:16, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx> wrote:
>>>> Jan Beulich wrote:
>>>>>>>> On 03.05.13 at 10:41, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx>
>>>>>>>> wrote: 
>>>>>> Jan Beulich wrote:
>>>>>>>>>> On 27.04.13 at 10:38, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx>
>>>>>>>>>> wrote:
>>>>>>>> From 9098666db640183f894b9aec09599dd32dddb7fa Mon Sep 17 00:00:00
>>>>>>>> 2001 From: Liu Jinsong <jinsong.liu@xxxxxxxxx>
>>>>>>>> Date: Sat, 27 Apr 2013 22:37:35 +0800
>>>>>>>> Subject: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic
>>>>>>>> is_vmce_ready check 
>>>>>>>> is_vmce_ready() is problematic:
>>>>>>>> * For dom0, it checks if virq bind to dom0 mcelog driver. If not,
>>>>>>>> it results dom0 crash. However, it's problematic and overkilled
>>>>>>>> since mcelog as a dom0 feature could be enabled/disabled per dom0
>>>>>>>> option: (XEN) MCE: This error page is ownded by DOM 0 (XEN) DOM0
>>>>>>>> not ready for vMCE (XEN) domain_crash called from mcaction.c:133
>>>>>>>> (XEN) Domain 0 reported crashed by domain 32767 on cpu#31:
>>>>>>>> (XEN) Domain 0 crashed: rebooting machine in 5 seconds.
>>>>>>>> (XEN) Resetting with ACPI MEMORY or I/O RESET_REG.
>>>>>>>> * For dom0, if really need check, it should check whether vMCE
>>>>>>>> injection for dom0 ready (say, exception trap bounce check, which
>>>>>>>> has been done at inject_vmce()), not check dom0 mcelog ready
>>>>>>>> (which has been done at mce_softirq() before send global virq to
>>>>>>>> dom0). 
>>>>>>> Following the argumentation above, I wonder which of the other
>>>>>>> "goto vmce_failed" are really appropriate, i.e. whether the patch
>>>>>>> shouldn't be extended (at least for the Dom0 case).
>>>>>> You mean other 'goto vmce_failed' are also not appropriate (I'm not
>>>>>> quite clear your point)?
>>>>> Yes.
>>>>>> Would you please point out which point you think not appropriate?
>>>>> I question whether it is correct/necessary to crash the domain in
>>>>> any of those failure cases. Perhaps when we fail to unmap the
>>>>> page it is, but failure of fill_vmsr_data() and inject_vmce() don't
>>>>> appear to be valid reasons once the is_vmce_ready() path is being
>>>>> dropped.
>>>> For fill_vmsr_data(), it failed only when MCG_STATUS_MCIP bit still
>>>> set when next vMCE# occur, means the 2nd vMCE# occur when the 1st
>>>> vMCE# not handled yet. Per SDM it should shutdown.
>>>> For inject_vmce(), it failed when
>>>> 1). vcpu is still mce_pending, or
>>>> 2). pv not register trap callback
>>>> Maybe it's some overkilled for dom0 (for other guest, it's ok to
>>>> kill them), but any graceful way to quit?
>>> Just exit and do nothing (except perhaps log a rate limited
>>> message)?
>> One concern of quiet exit is, the error will be totally ignored by guest --> 
>> it 
> didn't get preperly handled, and may recursively occur to make worse error 
> --> 
> it's better to kill guest under such case.
>>>> or, considering it rarely happens, how about keep current way (kill
>>>> guest no matter dom0 or not)?
>>> Possibly - I was merely asking why this one condition was found to
>>> be too strict, while the others are being left as is.
>>> Jan
>> Ah, the reason of removing is_vmce_ready check is, it's
>> problematic (check mcelog driver, not vmce tap callback),
>> and overkilled (since defaultly dom0 will not start mcelog driver,
>> under which case system will crash whenever vmce inject to dom0)
>> --> So patch 2/2 is not too strict for dom0.
> Please keep in mind the mcelog userland/kernel interface is not designed
> with xen in mind. mcelog cannot report which guest is impacted for
> example, although xen reports that to dom0.
> I object 'fixing' the hypervisor to come over with mcelog drawbacks.
> I prefer fixing Dom0 instead.
> From the design perspective, the virq for Dom0 is for logging purpose
> only and the trap handler has equal purpose for both Dom0 and DomU.

So as this doesn't read like "don't care" - is this an ack, nak, or
a request to Jinsong to change something for the patch to be


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.