[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] Xen/MCE: adjust for future new vMCE model



>>> On 05.07.12 at 14:46, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx> wrote:
> Jan Beulich wrote:
>>>>> On 05.07.12 at 11:20, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx> wrote:
>>> Jan Beulich wrote:
>>>>>>> On 05.07.12 at 05:30, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx>
>>>>>>> wrote: @@ -68,7 +74,7 @@
>>>>> 
>>>>>  int vmce_restore_vcpu(struct vcpu *v, uint64_t caps)  {
>>>>> -    if ( caps & ~g_mcg_cap & ~MCG_CAP_COUNT & ~MCG_CTL_P )
>>>>> +    if ( caps & ~g_mcg_cap & ~MCG_CAP_COUNT )
>>>>>      {
>>>>>          dprintk(XENLOG_G_ERR, "%s restore: unsupported MCA
>>>>>                  capabilities" " %#" PRIx64 " for d%d:v%u
>>>>>          (supported: %#Lx)\n", @@ -77,6 +83,12 @@ return -EPERM;  
>>>>> } 
>>>>> 
>>>>> +    if ( (caps & MCG_CAP_COUNT) != GUEST_BANK_NUM ) +    {
>>>>> +        dprintk(XENLOG_G_ERR, "Bank number inconsistency\n"); +
>>>>> return -EPERM; +    } +
>>>>>      v->arch.mcg_cap = caps;
>>>>>      return 0;
>>>>>  }
>>>> 
>>>> These two changes, as pointed out before, need some further
>>>> thought and tweaking: As I said earlier, we are already shipping
>>>> the code in its prior form, so outright rejecting MCG_CTL_P set
>>>> and non-default bank counts is not a proper solution. Warning
>>>> about them being in an unsupported state is certainly acceptable.
>>>> 
>>>> And I think the guest visible MCG_CAP value also shouldn't
>>>> change across migration, so tolerating (but ignoring) a higher
>>>> bank count seems necessary. Not sure what to do when the
>>>> bank count is lower (0 or 1) - for 1, all communication to the
>>>> guest should probably go through bank 0, while 0 should
>>>> probably disable vMCE  for that vCPU.
>>>> 
>>>> Further I just noticed that you don't touch fill_vmsr_data() at
>>>> all (sending patches created with diff's -p option or equivalent
>>>> helps spotting where individual changes belong), yet that
>>>> function uses the hardware bank number to fill the struct
>>>> bank_entry. With the intended concept, the "bank" member
>>>> of that structure can probably go away altogether.
>>>> 
>>>> Jan
>>> 
>>> 
>>> Seems things become a little bit chaos, let's jump out from details,
>>> make agreement first about what's the general purpose of this
>>> middle-work patch. 
>>> 
>>> ============
>>> 1. current xen vmce status
>>>   1.1) current xen vmce has 2 kinds of bugs:
>>>     * functional bug: it actually doesn't work correctly for guest;
>>>     * migration bug: partly solved by c/s 24887;
>>>   1.2) c/s 24887 not in formal xen release version, it's a temporary
>>> fix (but unfortunately has been backported to SELS11 sp2)
>>> ============ 
>>> 2. target of this middle-work patch
>>>   2.1) it's not used to address functional bug
>>>   2.3) it does minimal work, just in order not to bring
>>>     trouble/dirty to future new vMCE, so it would * remove MCG_CTL
>>> --> otherwise new vMCE have to add useless MCG_CTL_P and MCG_CTL, w/
>>> potential model specific issue; 
>>>     * stick all 1's to MCi_CTL --> to avoid semantic issue;
>>>     * set MCG_CTL_P=0 and 2 banks at MCG_CAP --> otherwise dirty
>>> code at new vMCE, like bank number issue; ============
>>> 
>>> I think in this middle-work patch, we don't need constrained by c/s
>>> 24887: 
>>> 1. c/s 24887 not in formal xen release
>>> 2. the benefit of keep compatible w/ 24887 is minor:
>>>   * currently all xen version have migration bug, 3.x -> 4.0 -> 4.1
>>> -> 4.2 
>>>   * keep compatible w/ 24887 only benefit one case --> migration
>>> from SLES11 sp2 to xen 4.2 (for both SLES11 sp2 and xen 4.2, vMCE
>>> actually doesn't functionally work fine to guest)
>>> 3. the price/hurt to future vMCE is high (to keep compatible w/
>>> 24887) 
>> 
>> Why do you consider the price high? I think it would require pretty
>> minor changes.
> 
> No, what I meant is not coding price. I mean the price that have to add 
> dirty and useless change to the new vMCE is high. Just curious why we cannot 
> simply get rid of c/s 24887? after all it only benefit SLES11 sp2.

This is what save MCG_CAP, and that this is necessary we agreed
on iirc. So why would you want to drop that code all of the sudden?

>>> Considering above, I prefer an outright cleanup in xen4.2, not
>>> constrained by c/s 24887 and not bring trouble/dirty to future vMCE.
>> 
>> Whether or not is was appropriate for us to backport this change
>> early is unclear, but given that back at that time I had already
>> pointed out the problems and asked for work to be done to clean
>> it up, and given that it has taken over four months to get going
>> with this, I don't think you would suggest the alternative of us
>> having left the bug unfixed for this entire time period.
> 
> Jan, I'm not challenge why you backport to SLES11 sp2. If there is anything 
> wrong, I agree it's my fault. Currently what I concern is if we can do an 
> outright cleanup at xen4.2 so that future vMCE could be simple and clean.

But we can't tell our customers that migration from, say, SP2 to
SP3 won't be possible.

>> So unless the price to pay is unreasonably high, I'd favor getting
>> this taken care of rather than ignored.
> 
> If so, why we need this middle-work patch? It could just keep current status 
> at xen 4.2, then start 'dirty' new vMCE implement at xen4.3 --> and the 
> 'dirty' 
> inherit from c/s 24887 which from code point of view would be totally removed 
> at new vMCE.

No, what we now want is to complete said c/s. While at that time
I thought we would need to save MCi_CTL, with the new concept
we won't need to. Instead, we need to enforce it to be all ones
universally.

The only compatibility thing is the need to deal with higher bank
counts perceived to be available by guests, and the possibly
previously seen set MCG_CTL_P bit.

And yes, if this created a lot of ugly and otherwise unnecessary
code, I'd agree not to care for this compatibility. But as I don't
expect this to be the case, I thought I'd still ask for it (since if
we don't do it in -unstable, we'll have to carry a private patch
in SLE to do so, and presumably for much longer than the 4.3
development period).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.