[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check



>>> On 03.05.13 at 16:16, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx> wrote:
> Jan Beulich wrote:
>>>>> On 03.05.13 at 10:41, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx> wrote:
>>> Jan Beulich wrote:
>>>>>>> On 27.04.13 at 10:38, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx>
>>>>>>> wrote: 
>>>>> From 9098666db640183f894b9aec09599dd32dddb7fa Mon Sep 17 00:00:00
>>>>> 2001 From: Liu Jinsong <jinsong.liu@xxxxxxxxx>
>>>>> Date: Sat, 27 Apr 2013 22:37:35 +0800
>>>>> Subject: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic
>>>>> is_vmce_ready check 
>>>>> 
>>>>> is_vmce_ready() is problematic:
>>>>> * For dom0, it checks if virq bind to dom0 mcelog driver. If not,
>>>>> it results dom0 crash. However, it's problematic and overkilled
>>>>> since mcelog as a dom0 feature could be enabled/disabled per dom0
>>>>> option: (XEN) MCE: This error page is ownded by DOM 0
>>>>> (XEN) DOM0 not ready for vMCE
>>>>> (XEN) domain_crash called from mcaction.c:133
>>>>> (XEN) Domain 0 reported crashed by domain 32767 on cpu#31:
>>>>> (XEN) Domain 0 crashed: rebooting machine in 5 seconds.
>>>>> (XEN) Resetting with ACPI MEMORY or I/O RESET_REG.
>>>>> 
>>>>> * For dom0, if really need check, it should check whether vMCE
>>>>> injection for dom0 ready (say, exception trap bounce check, which
>>>>> has been done at inject_vmce()), not check dom0 mcelog ready (which
>>>>> has been done at mce_softirq() before send global virq to dom0).
>>>> 
>>>> Following the argumentation above, I wonder which of the other
>>>> "goto vmce_failed" are really appropriate, i.e. whether the patch
>>>> shouldn't be extended (at least for the Dom0 case).
>>> 
>>> You mean other 'goto vmce_failed' are also not appropriate (I'm not
>>> quite clear your point)?
>> 
>> Yes.
>> 
>>> Would you please point out which point you think not appropriate?
>> 
>> I question whether it is correct/necessary to crash the domain in
>> any of those failure cases. Perhaps when we fail to unmap the
>> page it is, but failure of fill_vmsr_data() and inject_vmce() don't
>> appear to be valid reasons once the is_vmce_ready() path is being
>> dropped.
> 
> For fill_vmsr_data(), it failed only when MCG_STATUS_MCIP bit still set when 
> next vMCE# occur, means the 2nd vMCE# occur when the 1st vMCE# not handled 
> yet. Per SDM it should shutdown.
> 
> For inject_vmce(), it failed when
> 1). vcpu is still mce_pending, or
> 2). pv not register trap callback
> Maybe it's some overkilled for dom0 (for other guest, it's ok to kill them), 
> but any graceful way to quit?

Just exit and do nothing (except perhaps log a rate limited
message)?

> or, considering it rarely happens, how about keep current way (kill guest no 
> matter dom0 or not)?

Possibly - I was merely asking why this one condition was found to
be too strict, while the others are being left as is.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.