[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH for-4.20 3/4] x86/fpu: Combine fpu_ctxt and xsave_area in arch_vcpu


  • To: Alejandro Vallejo <alejandro.vallejo@xxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Fri, 19 Jul 2024 11:14:19 +0200
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Fri, 19 Jul 2024 09:14:37 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 18.07.2024 18:54, Alejandro Vallejo wrote:
> On Thu Jul 18, 2024 at 12:49 PM BST, Jan Beulich wrote:
>> On 09.07.2024 17:52, Alejandro Vallejo wrote:
>>> --- a/xen/arch/x86/include/asm/domain.h
>>> +++ b/xen/arch/x86/include/asm/domain.h
>>> @@ -591,12 +591,7 @@ struct pv_vcpu
>>>  
>>>  struct arch_vcpu
>>>  {
>>> -    /*
>>> -     * guest context (mirroring struct vcpu_guest_context) common
>>> -     * between pv and hvm guests
>>> -     */
>>> -
>>> -    void              *fpu_ctxt;
>>> +    /* Fixed point registers */
>>>      struct cpu_user_regs user_regs;
>>
>> Not exactly, no. Selector registers are there as well for example, which
>> I wouldn't consider "fixed point" ones. I wonder why the existing comment
>> cannot simply be kept, perhaps extended to mention that fpu_ctxt now lives
>> elsewhere.
> 
> Would you prefer "general purpose registers"? It's not quite that either, but
> it's arguably closer. I can part with the comment altogether but I'd rather
> leave a token amount of information to say "non-FPU register state" (but not
> that, because that would be a terrible description). 
> 
> I'd rather update it to something that better reflects reality, as I found it
> quite misleading when reading through. I initially thought it may have been
> related to struct layout (as in C-style single-level inheritance), but as it
> turns out it's merely establishing a vague relationship between arch_vcpu and
> vcpu_guest_context. I can believe once upon a time the relationship was closer
> than it it now, but with the guest context missing AVX state, MSR state and
> other bits and pieces I thought it better to avoid such confusions for future
> navigators down the line so limit its description to the line below.

As said, I'd prefer if you amended the existing comment. Properly describing
what's in cpu_user_regs isn't quite as easy in only very few words. Neither
"fixed point register" nor "general purpose registers" really covers it. And
I'd really like to avoid having potentially confusing comments.

>>> --- a/xen/arch/x86/xstate.c
>>> +++ b/xen/arch/x86/xstate.c
>>> @@ -507,9 +507,16 @@ int xstate_alloc_save_area(struct vcpu *v)
>>>      unsigned int size;
>>>  
>>>      if ( !cpu_has_xsave )
>>> -        return 0;
>>> -
>>> -    if ( !is_idle_vcpu(v) || !cpu_has_xsavec )
>>> +    {
>>> +        /*
>>> +         * This is bigger than FXSAVE_SIZE by 64 bytes, but it helps 
>>> treating
>>> +         * the FPU state uniformly as an XSAVE buffer even if XSAVE is not
>>> +         * available in the host. Note the alignment restriction of the 
>>> XSAVE
>>> +         * area are stricter than those of the FXSAVE area.
>>> +         */
>>> +        size = XSTATE_AREA_MIN_SIZE;
>>
>> What exactly would break if just (a little over) 512 bytes worth were 
>> allocated
>> when there's no XSAVE? If it was exactly 512, something like xstate_all() 
>> would
>> need to apply a little more care, I guess. Yet for that having just 
>> always-zero
>> xstate_bv and xcomp_bv there would already suffice (e.g. using
>> offsetof(..., xsave_hdr.reserved) here, to cover further fields gaining 
>> meaning
>> down the road). Remember that due to xmalloc() overhead and the 
>> 64-byte-aligned
>> requirement, you can only have 6 of them in a page the way you do it, when 
>> the
>> alternative way 7 would fit (if I got my math right).
> 
> I'm slightly confused.
> 
> XSTATE_AREA_MIN_SIZE is already 512 + 64 to account for the XSAVE header,
> including its reserved fields. Did you mean something else?

No, I didn't. I've in fact commented on it precisely because it is the value
you name. That's larger than necessary, and when suitably shrunk - as said -
one more of these structures could fit in a page (assuming they were all
allocated back-to-back, which isn't quite true right now, but other
intervening allocations may or may not take space from the same page, so
chances are still that the ones here all might come from one page as long as
there's space left).

>     #define XSAVE_HDR_SIZE            64
>     #define XSAVE_SSE_OFFSET          160
>     #define XSTATE_YMM_SIZE           256
>     #define FXSAVE_SIZE               512
>     #define XSAVE_HDR_OFFSET          FXSAVE_SIZE
>     #define XSTATE_AREA_MIN_SIZE      (FXSAVE_SIZE + XSAVE_HDR_SIZE)
> 
> Part of the rationale is to simplify other bits of code that are currently
> conditionalized on v->xsave_header being NULL. And for that the full xsave
> header must be present (even if unused because !cpu_xsave)

But that's my point: The reserved[] part doesn't need to be there; it's
not being accessed anywhere, I don't think.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.