[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] [PATCH] X86 MCE: Add SRAR handler



Jan Beulich wrote:
>>>> On 11.10.11 at 11:51, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx> wrote:
>> Jan Beulich wrote:
>>> If the prefetch was from Xen space (only in guest context),
>>> delivering a vMCE to the guest is pointless (and perhaps confusing
>>> to the guest). 
>>> 
>> 
>> Yes, exactly. how about delay handle it as:
>> * at mce isr
>>      if ( !(gstatus & MCG_STATUS_RIPV) && !guest_mode(regs))                 
>> xen panic;
>> * at mce softirq
>>      if ( (srar error) && (EIPV ==0) && (broken page owned by
>>              hypervisor) ) xen panic;
> 
> Possible, but I'm not convinced.
> 
>>>>   * guest may kill app, kernel thread, guest itself, or whatever;
>>>> 
>>>> The error is still an error, w/ 2 possibilities in the future:
>>>>   1. it may not be consumed as an SRAR error, system keep going,
>>>> h/w mechanism may detect a SRAO error (i.e. memroy scrub) at some
>>>> time point and handled then; 
>>>>   2. it may be consumed at some time point and a SRAR error
>>>>    triggered again. At this time, 1). if srar occurred at
>>>>    hypervisor context, xen will panic. or, 2). if srar occurred at
>>>> guest 
>>>> context, xen kill the guest as a malicious one (as what the 2nd
>>>> patch do), and move the page to broken page list;
>>>> 
>>>> Considering the rare possibility of the above case, I think it's
>>>> acceptable to handle it in this way. Thoughts?
>>> 
>>> You're only discussing instruction fetches (which can be discarded),
>>> but you're not covering the other example I gave (GDT access from
>>> guest context - just like this is a ring-0 operations from the
>>> paging unit's pov, this ought to be an out-of-context operation
>>> from MCE's perspective).
>> 
>> That would be data load error (EIPV=1), a sync error.
> 
> If indeed implemented that way in hardware, that would make the
> handling ambiguous: A GDT access must not (unconditionally) be
> attributed to the (pv) guest, as it is not a problem the guest can
> (necessarily) deal with (considering the split page ownership of
> what constitutes the GDT under Xen, the guest should only be
> accountable for the non-reserved part of the GDT, the rest should
> be attributed back to Xen).
> 
> The same would go for (perhaps speculative) page table walks.
> 

Seems not ambiguous here: who own, who take.
If error caused by hypervisor access broken page, xen panic;
If error caused by guest access, guest would handle it (I guess normally kill 
itself);
If guest maliciously access again, it would be killed by hypervisor.

> Furthermore, data prefetching is possible too - how would a problem
> there get reported?
> 

It may be reported as unkown error, or nothing, but not as srar data load error 
w/ EIPV=1.

Thanks,
Jinsong

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.