[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] mcheck, vmce: Allow vmce_amd_* functions to handle AMD thresolding MSRs



On 10.02.14 08:41, Jan Beulich wrote:
>>>> On 07.02.14 at 22:27, Aravind Gopalakrishnan 
>>>> <aravind.gopalakrishnan@xxxxxxx>
> wrote:
>> On Fri, Feb 07, 2014 at 11:05:17AM +0000, Jan Beulich wrote:
>>>>>> On 07.02.14 at 01:32, Aravind Gopalakrishnan 
>>>>>> <aravind.gopalakrishnan@xxxxxxx> 
>> wrote:
>>>> -  case MSR_F10_MC4_MISC1: /* DRAM error type */
>>>> -          v->arch.vmce.bank[1].mci_misc = val; 
>>>> -          mce_printk(MCE_VERBOSE, "MCE: wr msr %#"PRIx64"\n", val);
>>>> -          break;
>>>> -  case MSR_F10_MC4_MISC2: /* Link error type */
>>>> -  case MSR_F10_MC4_MISC3: /* L3 cache error type */
>>>> -          /* ignore write: we do not emulate link and l3 cache errors
>>>> -           * to the guest.
>>>> -           */
>>>> -          mce_printk(MCE_VERBOSE, "MCE: wr msr %#"PRIx64"\n", val);
>>>> -          break;
>>>> -  default:
>>>> -          return 0;
>>>> -  }
>>>> +    /* If not present, #GP fault, else do nothing as we don't emulate */
>>>> +    if ( !amd_thresholding_reg_present(msr) )
>>>> +        return -1;
>>>
>>> The one thing I'm concerned about making this #GP in the guest is
>>> migration: With it being _newer_ CPUs implementing fewer of these
>>> MSRs, it would be impossible to migrate a guest from an older system
>>> to a newer one - a direction that (as long as the newer system
>>> provides all the hardware capabilities the older one has) is generally
>>> assumed to work. Bottom line - we're probably better off always
>>> dropping writes, and always returning zero for reads. Which will
>>> eliminate the need for amd_thresholding_reg_present().
>>>
>>
>> Before I go ahead and remove the function, few questions-
>>
>> Assuming there is a tool in the guest that accesses these MSRs,
>> wouldn't it be fair to expect that the tool keep in mind these MSRs
>> exist only in certain families?
>>
>> For example:
>> if there's a guest running on F10 that accesses 0xc000040a, that would
>> be fine. But once we migrate to a newer family, then the guest should
>> not even generate accesses to the MSR.
> 
> All correct, provided the family check and the MSR access aren't
> separated by a migration.
> 
>> Also, returning #GP to guests would mean keeping it consistent with HW
>> behavior. If we return zero for reads, (IMHO) it's not necessarily
>> correct information as the register does not even exist.. 
>>
>> Bare-metal cases will face same problems too.. but if a register doesn't
>> exist, then shouldn't OS/hypervisor just say so and let whoever
>> generated the access deal with it?
> 
> That's all valid argumentation as long as you leave migration out
> of the picture.

I agree with Jan. All argumentation is valid from hardware perspective.

Apart from migration there is another perspective you miss completely:
The vmce_amd_* functions (and also the corresponding intel functions)
deal with *virtual* MSRs and deal with the case what should happen
with/to the guest when the guest accesses them.

This has absolutely nothing to do what the hardware provides and what
not. The point is, the guest knows (or better assumes) which MSRs exist
from the cpu family/model information it gets via cpuid. The question is
what should happen when the guest accesses these MSRs.

To get the right thing, the questions are:
What should the hypervisor do for recovery?
Does it make sense to make the guest aware of it?

Christoph


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.