[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Help with Understanding vcpu xstate restore error during vm migration



----- Original Message -----
> From: "Andrew Cooper" <andrew.cooper3@xxxxxxxxxx>
> To: "Fonyuy-Asheri Caleb" <fonyuy-asheri.caleb@xxxxxxxx>
> Cc: "xen-devel" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, "Jan Beulich" 
> <jbeulich@xxxxxxxx>, "Roger Pau Monné"
> <roger.pau@xxxxxxxxxx>
> Sent: Thursday, July 11, 2024 3:04:05 PM
> Subject: Re: Help with Understanding vcpu xstate restore error during vm 
> migration

> On 11/07/2024 1:18 pm, Fonyuy-Asheri Caleb wrote:
>>>> Please do you mind giving me more insight on the logic currently 
>>>> implemented
>>>> and maybe what is wrong with it? It will be important for me since what I'm
>>>> doing is research work.
>>> See 9e6dbbe8bf40^..267122a24c49
>> What reference is this please?
> 
> It's a git commit-range. You want:
> 
> $ git log 9e6dbbe8bf40^..267122a24c49
> 
> to view them.
> 
>>
>>>> How do the values evc->size and xfeature_mask relate to the source and 
>>>> target
>>>> processor xstates (or xstate management)?
>>> The lower bounds check is for normal reasons, while the upper bounds
>>> check is a sanity "does this image appear to have more states active
>>> than the current system".
>>>
>>> The upper bound is bogus, because "what this VM has" has no true
>>> relationship to "what Xen decided to turn on by default at boot".
>> I see. My initial question about this was more of understanding how this
>> information
>> is gathered. Is it directly related to the CPUID of the VM or does depend on 
>> the
>> state
>> of the VM at the moment of migrating it?
>>
>> If it is related to the CPUID, how is it constructed?
> 
> The size of the xsave area is a function of the *current* value in
> %xcr0.  (On Haswell.  Lets ignore MSR_XSS on newer systems for now.)
> 
> However, because guests can modify %xcr0 and turn states back off, Xen
> has to track xcr0_accum which is all bits we've ever seen the guest turn on.
> 
> CPUID (and in particular, the guest's CPU policy data) controls which
> states the guest is permitted to activate, which in turn influences the
> size.
> 
> Xen's normal CPUID handling logic *should* make it impossible for a
> guest to see features which hardware isn't capable of, and should block
> migration to a system which is less capable too.
> 
> I suspect what's going on here is that the destination has one or more
> of AVX|SSE|x87 disabled somehow, and this check is failing before the
> more coherent one which should explain why the VM can't migrate in...
> 

Thank you. This makes much more sense now. 


> 
>>
>>>>> To start with, which version (or versions?) of Xen, and what hardware?
>>>> Xen version 4.18.3-pre
>>> As you're not on a specific tag, exact changeset?
>> I am on the stable-4.18 tag.
> 
> That's a branch which moves, not a tag.
> 
> What does `git show` say?  Just need the first few lines.
Here is the output of git show:


commit 45c5333935628e7c80de0bd5a9d9eff50b305b16 (HEAD -> stable-4.18, 
origin/staging-4.18, origin/stable-4.18)
Author: Jan Beulich <jbeulich@xxxxxxxx>
Date:   Thu Jul 4 16:57:29 2024 +0200

    evtchn: build fix for Arm


> 
> ~Andrew

Caleb



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.