[Xen-devel] RE: [PATCH] Unmmap guest's EPT mapping for poison me

To:	Tim Deegan <Tim.Deegan@xxxxxxxxxx>
Subject:	[Xen-devel] RE: [PATCH] Unmmap guest's EPT mapping for poison memory
From:	"Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx>
Date:	Wed, 14 Jul 2010 21:56:28 +0800
Accept-language:	en-US
Acceptlanguage:	en-US
Cc:	xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>, Keir Fraser <Keir.Fraser@xxxxxxxxxxxxx>
Delivery-date:	Wed, 14 Jul 2010 06:57:41 -0700
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to:	<20100714092808.GB13291@xxxxxxxxxxxxxxxxxxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References:	<789F9655DD1B8F43B48D77C5D30659731F571361@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> <20100714092808.GB13291@xxxxxxxxxxxxxxxxxxxxxxx>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index:	AcsjNvNJG9wGWOzoSHmbKgF2xhFpwQAI2QMw
Thread-topic:	[PATCH] Unmmap guest's EPT mapping for poison memory

>-----Original Message-----
>From: Tim Deegan [mailto:Tim.Deegan@xxxxxxxxxx]
>Sent: Wednesday, July 14, 2010 5:28 PM
>To: Jiang, Yunhong
>Cc: Keir Fraser; xen-devel
>Subject: Re: [PATCH] Unmmap guest's EPT mapping for poison memory
>
>Hi,
>
>At 08:41 +0100 on 14 Jul (1279096872), Jiang, Yunhong wrote:
>> diff -r bf51b671f269 xen/arch/x86/cpu/mcheck/vmce.c
>> --- a/xen/arch/x86/cpu/mcheck/vmce.c Mon Jul 12 13:59:39 2010 +0800
>> +++ b/xen/arch/x86/cpu/mcheck/vmce.c Mon Jul 12 14:30:21 2010 +0800
>> @@ -558,3 +558,28 @@ int is_vmce_ready(struct mcinfo_bank *ba
>>
>>      return 0;
>>  }
>> +
>> +/* Now we only have support for HAP guest */
>> +int unmmap_broken_page(struct domain *d, unsigned long mfn, unsigned long
>gfn)
>> +{
>> +    /* Always trust dom0 */
>> +    if ( d == dom0 )
>> +        return 0;
>> +
>> +    if (is_hvm_domain(d) && (paging_mode_hap(d)) )
>> +    {
>> +        p2m_type_t pt;
>> +
>> +        gfn_to_mfn_query(d, gfn, &pt);
>> +        /* What will happen if is paging-in? */
>> +        if ( pt == p2m_ram_rw )
>
>Or any of the other types?  This should be called for ram_ro, and
>ram_logdirty certainly, and probably mmio_direct too.

Yes, we need consider rw/ro/logdirty. Thanks for remind and will fix it. But 
why should we cover mmio_direct? Can you please give some hints?

For ram_shared, it deserve more consideration, seems currently the shared 
memory situation is not handled in the whole offline page flow.

>
>I'm not sure that it's safe to nobble other types - e.g. changing from
>grant_map_*, paging_* or ram_shared might break state-machines/refcounts
>elsewhere.

I think this code does not change anything for the refcounts, we simply destroy 
the guest.
Or you mean race happens when other components is changing the p2m table also? 
I assume that should be ok since we only query the type and destroy the guest. 
Did I missed anything?

>
>Actually wouldn't it be be better to encode brokenness in the frametable

Encode brokeness in frametable is done already. But that is only a mark, and 
that page will not be allocated anymore. If the page is being used by guest, we 
need unmap for the guest, so that guest can't access the memory anymore.

The background here is: In some platform, system can find poison memory through 
like memory scrubbing or L3 cache explicit write back (i.e. async memory 
checking, not in current context). However, whenenever the poison memory is 
accessed, it will cause fatal MCE and system crash. So we need make sure the 
guest can't access the broken memory.

>instead of the P2M and then forbid new mappings of broken MFNs?  It's
>not really a property of the PFN (wasn't there a patch series a while
>ago that swapped broken MFNs under a VM's feet?).

The swap broken mfn is in fact when the page is likely to be broken (i.e. the 
page can still be accessed). For example, for page with ECC support, when too 
many corrected error (i.e. 1 bit error) happens to a page, we assume the page 
is fragile, and may have un-correctable error ( two bit error ) in future, and 
swap it with a new page wil keep thing continue. However, if the page is broken 
already, we can't access the page anymore (this usually causes MCE), in such 
situation, we can't swap the page, but unmap it.
Hope this make thing clear.

Thanks
--jyh

>
>Cheers,
>
>Tim.
>
>--
>Tim Deegan <Tim.Deegan@xxxxxxxxxx>
>Principal Software Engineer, XenServer Engineering
>Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

[Xen-devel] RE: [PATCH] Unmmap guest's EPT mapping for poison memory