[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check


  • To: Jan Beulich <JBeulich@xxxxxxxx>
  • From: "Liu, Jinsong" <jinsong.liu@xxxxxxxxx>
  • Date: Fri, 3 May 2013 14:16:46 +0000
  • Accept-language: en-US
  • Cc: Christoph Egger <chegger@xxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxx>
  • Delivery-date: Fri, 03 May 2013 14:17:05 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xen.org>
  • Thread-index: AQHOR+ES+eOpm1T9R9a1HzHcYbeCX5jzfSCQ
  • Thread-topic: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check

Jan Beulich wrote:
>>>> On 03.05.13 at 10:41, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx> wrote:
>> Jan Beulich wrote:
>>>>>> On 27.04.13 at 10:38, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx>
>>>>>> wrote: 
>>>> From 9098666db640183f894b9aec09599dd32dddb7fa Mon Sep 17 00:00:00
>>>> 2001 From: Liu Jinsong <jinsong.liu@xxxxxxxxx>
>>>> Date: Sat, 27 Apr 2013 22:37:35 +0800
>>>> Subject: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic
>>>> is_vmce_ready check 
>>>> 
>>>> is_vmce_ready() is problematic:
>>>> * For dom0, it checks if virq bind to dom0 mcelog driver. If not,
>>>> it results dom0 crash. However, it's problematic and overkilled
>>>> since mcelog as a dom0 feature could be enabled/disabled per dom0
>>>> option: (XEN) MCE: This error page is ownded by DOM 0
>>>> (XEN) DOM0 not ready for vMCE
>>>> (XEN) domain_crash called from mcaction.c:133
>>>> (XEN) Domain 0 reported crashed by domain 32767 on cpu#31:
>>>> (XEN) Domain 0 crashed: rebooting machine in 5 seconds.
>>>> (XEN) Resetting with ACPI MEMORY or I/O RESET_REG.
>>>> 
>>>> * For dom0, if really need check, it should check whether vMCE
>>>> injection for dom0 ready (say, exception trap bounce check, which
>>>> has been done at inject_vmce()), not check dom0 mcelog ready (which
>>>> has been done at mce_softirq() before send global virq to dom0).
>>> 
>>> Following the argumentation above, I wonder which of the other
>>> "goto vmce_failed" are really appropriate, i.e. whether the patch
>>> shouldn't be extended (at least for the Dom0 case).
>> 
>> You mean other 'goto vmce_failed' are also not appropriate (I'm not
>> quite clear your point)?
> 
> Yes.
> 
>> Would you please point out which point you think not appropriate?
> 
> I question whether it is correct/necessary to crash the domain in
> any of those failure cases. Perhaps when we fail to unmap the
> page it is, but failure of fill_vmsr_data() and inject_vmce() don't
> appear to be valid reasons once the is_vmce_ready() path is being
> dropped.
> 
> Jan

For fill_vmsr_data(), it failed only when MCG_STATUS_MCIP bit still set when 
next vMCE# occur, means the 2nd vMCE# occur when the 1st vMCE# not handled yet. 
Per SDM it should shutdown.

For inject_vmce(), it failed when
1). vcpu is still mce_pending, or
2). pv not register trap callback
Maybe it's some overkilled for dom0 (for other guest, it's ok to kill them), 
but any graceful way to quit?
or, considering it rarely happens, how about keep current way (kill guest no 
matter dom0 or not)?

Thanks,
Jinsong

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.