[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] Re: [Patch 1/3]RAS(Part II)--Intel MCA enalbing in XEN



Christoph Egger <mailto:Christoph.Egger@xxxxxxx> wrote:
> On Friday 20 March 2009 10:23:58 Jiang, Yunhong wrote:
>> xen-devel-bounces@xxxxxxxxxxxxxxxxxxx <> wrote:
>>> Seconded.
>>> 
>>> With Wei Huangs 1GB patch support, one question comes to my mind
>>> regarding the page offline support / recovery action:
>>> 
>>> How does the interface deal with different page sizes ?
>>> IMO we should only offline the smallest possible unit of the
>>> page in error,
>>> which is 4KB. So larger pages (2MB, 1GB) must be splitted in that case.
>>> 
>>> Christoph
>> 
>> Christoph, I think this is two story. How to offline a page is a specific
>> recover action method, instead of the MCA itself. Or I missed anything?
> 
> You're right here.
> 
>> For guest owned page offline caused by #MC, Xen can't do anything but mark
>> it (the page frame, 4k) pending, so that when it is freed, it will not be
>> accessed anymore.
> 
> Once Xen supports page-relocation, it can re-map the guest
> page in error
> with a new one on the fly. For HVM guests w/o PV drivers, this should work.
> Need some thinking how that should work for PV guests and HVM guests with
> PV drivers.

If it is triggered by a #MC, normally it means hardware can't recover from the 
error, and the page is broken already. In that situation, Xen can't do anything 
transparently to guest. We have to have guest do page offline, or kill the 
domain.

For CE error, (i.e. recoverable page error), since the offending page's content 
is still right, we are sure can do that. How to do that may depends on the 
design. In fact, the mail thread with "RE: [Xen-devel] Re: [PATCH] Support swap 
a page from user space tools -- Was RE: [RFC][PATCH] Basic support for page 
offline" is to achieve this. That set of patch is for PV in fact.

For HVM, it is not so easy without stubdomain support, since QEMU will map the 
page, and how to change the mapping will be much more complex.

Thanks
Yunhong Jiang

> 
>> It will depends on the real recover action to make sure
>> it will not be accessed by guest anymore. For example, inject a vMC to
>> guest, so that guest can do offline itself, or kill the domain directly in
>> some situation. 
>> 
>> Or do I missed anything?
>> 
>> Thanks
>> Yunhong Jiang
>> 
>>> On Friday 20 March 2009 09:51:24 Keir Fraser wrote:
>>>> These changes need an ack from Sun.
>>>> 
>>>>  -- Keir
>>>> 
>>>> On 20/03/2009 05:03, "Ke, Liping" <liping.ke@xxxxxxxxx> wrote:
>>>>> Hi, Keir
>>>>> 
>>>>> This patch is the basic MCA enabling support for Intel.
>>>>> 
>>>>> For implementation details, please refer to the discussion thread:
>>> 
>>> http://lists.xensource.com/archives/html/xen-devel/2009-02/msg0 0509.html
>>> 
>>>>> Thanks& Regards,
>>>>> Criping
>>> 
>>> --
>>> ---to satisfy European Law for business letters:
>>> Advanced Micro Devices GmbH
>>> Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
>>> Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni
>>> Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
>>> Registergericht Muenchen, HRB Nr. 43632
>>> 
>>> 
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>>> http://lists.xensource.com/xen-devel
> 
> 
> 
> --
> ---to satisfy European Law for business letters:
> Advanced Micro Devices GmbH
> Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
> Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni
> Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
> Registergericht Muenchen, HRB Nr. 43632
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.