[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] [PATCH] X86 MCE: Add SRAR handler



Jan Beulich wrote:
>>>> On 30.09.11 at 04:51, "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx>
>>>> wrote: 
> 
>> 
>>> -----Original Message-----
>>> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
>>> Sent: Thursday, September 29, 2011 11:42 PM
>>> To: Liu, Jinsong
>>> Cc: Keir Fraser; Jiang, Yunhong; xen-devel@xxxxxxxxxxxxxxxxxxx
>>> Subject: Re: [Xen-devel] [PATCH] X86 MCE: Add SRAR handler
>>> 
>>>>>> On 29.09.11 at 17:20, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx>
>>>>>> wrote: 
>>>> @@ -782,8 +821,12 @@ static void intel_default_mce_uhandler(
>>>> 
>>>>      switch (type)
>>>>      {
>>>> -    /* Panic if no handler for SRAR error */
>>>>      case intel_mce_ucr_srar:
>>>> +        if ( !guest_mode(regs) )
>>>> +            *result = MCER_RESET;
>>>> +        else
>>>> +            *result = MCER_CONTINUE;
>>>> +        break;
>>>>      case intel_mce_fatal:
>>>>          *result = MCER_RESET;
>>>>          break;
>>> 
>>> Using the stack based registers for any decision in an MCE handler
>>> seems bogus to me - without knowing that the error occurred
>>> synchronously, the result is meaningless. Unfortunately I wasn't
>> 
>> I think the usage of stack in MCE handler should be case by case.
>> For example, it's ok to use it at data load instruction since the
>> EIPV is valid for it.
> 
> According to the table in the manual, this is only the case on the
> local thread.
> 

OK, we can remove srar check at mce isr 'uhandler'.
so at mce isr, the check
        if ( !(gstatus & MCG_STATUS_RIPV) && !guest_mode(regs)) 
                return -1;
is a total insurance for the error which triggered at hypervisor while cannot 
restart from RIP.


>> For the instruction load, not that sure and I will check it
>> internally. 
>> 
>> But agree that we should not do this depends on the error type (like
>> SRAR), but should depends on the specific error code.
>> 
>>> able to spot - throughout your patch - what SRAR actually stands
>>> for, and the manual is no help in that respect either. It does
>>> state, however, that EIPV in three of four cases would be clear for
>>> these, so using the registers on stack is likely wrong here.
>>> 
>>> This made me look at the current source, and there I see in
>>> mce_urgent_action() 
>>> 
>>>     if ( !(gstatus & MCG_STATUS_RIPV) && !guest_mode(regs))        
>>> return -1; 
>>> 
>>> which I think should say ... _EIPV and use || instead. Thoughts?
>> 
>> I think this code means, if the error happens in hypervisor mode
>> (i.e. !guest_mode()), and RIPV indicate the RIP in stack can't be
>> restarted, we have to panic.
> 
> Then the guest_mode() check still lacks an extra check of EIPV, like
> 
>      if ( !(gstatus & MCG_STATUS_RIPV) &&
>           (!(gstatus & MCG_STATUS_EIPV) || !guest_mode(regs)))
>          return -1;
> 

That would be overkilled.
Considering instruction fetch error occur at guest context, hypervisor deliver 
to guest to handle the error is perfer, not panic all system.

Thanks,
Jinsong
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.