[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] [PATCH] X86 MCE: Add SRAR handler



>>> On 30.09.11 at 04:51, "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx> wrote:

> 
>> -----Original Message-----
>> From: Jan Beulich [mailto:JBeulich@xxxxxxxx] 
>> Sent: Thursday, September 29, 2011 11:42 PM
>> To: Liu, Jinsong
>> Cc: Keir Fraser; Jiang, Yunhong; xen-devel@xxxxxxxxxxxxxxxxxxx 
>> Subject: Re: [Xen-devel] [PATCH] X86 MCE: Add SRAR handler
>> 
>> >>> On 29.09.11 at 17:20, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx> wrote:
>> > @@ -782,8 +821,12 @@ static void intel_default_mce_uhandler(
>> >
>> >      switch (type)
>> >      {
>> > -    /* Panic if no handler for SRAR error */
>> >      case intel_mce_ucr_srar:
>> > +        if ( !guest_mode(regs) )
>> > +            *result = MCER_RESET;
>> > +        else
>> > +            *result = MCER_CONTINUE;
>> > +        break;
>> >      case intel_mce_fatal:
>> >          *result = MCER_RESET;
>> >          break;
>> 
>> Using the stack based registers for any decision in an MCE handler
>> seems bogus to me - without knowing that the error occurred
>> synchronously, the result is meaningless. Unfortunately I wasn't
> 
> I think the usage of stack in MCE handler should be case by case. 
> For example, it's ok to use it at data load instruction since the EIPV is
> valid for it.

According to the table in the manual, this is only the case on the local
thread.

> For the instruction load, not that sure and I will check it internally.
> 
> But agree that we should not do this depends on the error type (like SRAR),
> but should depends on the specific error code.
> 
>> able to spot - throughout your patch - what SRAR actually stands
>> for, and the manual is no help in that respect either. It does state,
>> however, that EIPV in three of four cases would be clear for these,
>> so using the registers on stack is likely wrong here.
>> 
>> This made me look at the current source, and there I see in
>> mce_urgent_action()
>> 
>>     if ( !(gstatus & MCG_STATUS_RIPV) && !guest_mode(regs))
>>         return -1;
>> 
>> which I think should say ... _EIPV and use || instead. Thoughts?
> 
> I think this code means, if the error happens in hypervisor mode (i.e.
> !guest_mode()), and RIPV indicate the RIP in stack can't be restarted, we
> have to panic.

Then the guest_mode() check still lacks an extra check of EIPV, like

     if ( !(gstatus & MCG_STATUS_RIPV) &&
          (!(gstatus & MCG_STATUS_EIPV) || !guest_mode(regs)))
         return -1;

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.