[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 13/13] x86/domctl: Implement XEN_DOMCTL_set_cpumsr_policy


  • To: Jan Beulich <JBeulich@xxxxxxxx>, Sergey Dyasli <sergey.dyasli@xxxxxxxxxx>
  • From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Date: Wed, 4 Jul 2018 19:47:22 +0100
  • Autocrypt: addr=andrew.cooper3@xxxxxxxxxx; prefer-encrypt=mutual; keydata= xsFNBFLhNn8BEADVhE+Hb8i0GV6mihnnr/uiQQdPF8kUoFzCOPXkf7jQ5sLYeJa0cQi6Penp VtiFYznTairnVsN5J+ujSTIb+OlMSJUWV4opS7WVNnxHbFTPYZVQ3erv7NKc2iVizCRZ2Kxn srM1oPXWRic8BIAdYOKOloF2300SL/bIpeD+x7h3w9B/qez7nOin5NzkxgFoaUeIal12pXSR Q354FKFoy6Vh96gc4VRqte3jw8mPuJQpfws+Pb+swvSf/i1q1+1I4jsRQQh2m6OTADHIqg2E ofTYAEh7R5HfPx0EXoEDMdRjOeKn8+vvkAwhviWXTHlG3R1QkbE5M/oywnZ83udJmi+lxjJ5 YhQ5IzomvJ16H0Bq+TLyVLO/VRksp1VR9HxCzItLNCS8PdpYYz5TC204ViycobYU65WMpzWe LFAGn8jSS25XIpqv0Y9k87dLbctKKA14Ifw2kq5OIVu2FuX+3i446JOa2vpCI9GcjCzi3oHV e00bzYiHMIl0FICrNJU0Kjho8pdo0m2uxkn6SYEpogAy9pnatUlO+erL4LqFUO7GXSdBRbw5 gNt25XTLdSFuZtMxkY3tq8MFss5QnjhehCVPEpE6y9ZjI4XB8ad1G4oBHVGK5LMsvg22PfMJ ISWFSHoF/B5+lHkCKWkFxZ0gZn33ju5n6/FOdEx4B8cMJt+cWwARAQABzSlBbmRyZXcgQ29v cGVyIDxhbmRyZXcuY29vcGVyM0BjaXRyaXguY29tPsLBegQTAQgAJAIbAwULCQgHAwUVCgkI CwUWAgMBAAIeAQIXgAUCWKD95wIZAQAKCRBlw/kGpdefoHbdD/9AIoR3k6fKl+RFiFpyAhvO 59ttDFI7nIAnlYngev2XUR3acFElJATHSDO0ju+hqWqAb8kVijXLops0gOfqt3VPZq9cuHlh IMDquatGLzAadfFx2eQYIYT+FYuMoPZy/aTUazmJIDVxP7L383grjIkn+7tAv+qeDfE+txL4 SAm1UHNvmdfgL2/lcmL3xRh7sub3nJilM93RWX1Pe5LBSDXO45uzCGEdst6uSlzYR/MEr+5Z JQQ32JV64zwvf/aKaagSQSQMYNX9JFgfZ3TKWC1KJQbX5ssoX/5hNLqxMcZV3TN7kU8I3kjK mPec9+1nECOjjJSO/h4P0sBZyIUGfguwzhEeGf4sMCuSEM4xjCnwiBwftR17sr0spYcOpqET ZGcAmyYcNjy6CYadNCnfR40vhhWuCfNCBzWnUW0lFoo12wb0YnzoOLjvfD6OL3JjIUJNOmJy RCsJ5IA/Iz33RhSVRmROu+TztwuThClw63g7+hoyewv7BemKyuU6FTVhjjW+XUWmS/FzknSi dAG+insr0746cTPpSkGl3KAXeWDGJzve7/SBBfyznWCMGaf8E2P1oOdIZRxHgWj0zNr1+ooF /PzgLPiCI4OMUttTlEKChgbUTQ+5o0P080JojqfXwbPAyumbaYcQNiH1/xYbJdOFSiBv9rpt TQTBLzDKXok86M7BTQRS4TZ/ARAAkgqudHsp+hd82UVkvgnlqZjzz2vyrYfz7bkPtXaGb9H4 Rfo7mQsEQavEBdWWjbga6eMnDqtu+FC+qeTGYebToxEyp2lKDSoAsvt8w82tIlP/EbmRbDVn 7bhjBlfRcFjVYw8uVDPptT0TV47vpoCVkTwcyb6OltJrvg/QzV9f07DJswuda1JH3/qvYu0p vjPnYvCq4NsqY2XSdAJ02HrdYPFtNyPEntu1n1KK+gJrstjtw7KsZ4ygXYrsm/oCBiVW/OgU g/XIlGErkrxe4vQvJyVwg6YH653YTX5hLLUEL1NS4TCo47RP+wi6y+TnuAL36UtK/uFyEuPy wwrDVcC4cIFhYSfsO0BumEI65yu7a8aHbGfq2lW251UcoU48Z27ZUUZd2Dr6O/n8poQHbaTd 6bJJSjzGGHZVbRP9UQ3lkmkmc0+XCHmj5WhwNNYjgbbmML7y0fsJT5RgvefAIFfHBg7fTY/i kBEimoUsTEQz+N4hbKwo1hULfVxDJStE4sbPhjbsPCrlXf6W9CxSyQ0qmZ2bXsLQYRj2xqd1 bpA+1o1j2N4/au1R/uSiUFjewJdT/LX1EklKDcQwpk06Af/N7VZtSfEJeRV04unbsKVXWZAk uAJyDDKN99ziC0Wz5kcPyVD1HNf8bgaqGDzrv3TfYjwqayRFcMf7xJaL9xXedMcAEQEAAcLB XwQYAQgACQUCUuE2fwIbDAAKCRBlw/kGpdefoG4XEACD1Qf/er8EA7g23HMxYWd3FXHThrVQ HgiGdk5Yh632vjOm9L4sd/GCEACVQKjsu98e8o3ysitFlznEns5EAAXEbITrgKWXDDUWGYxd pnjj2u+GkVdsOAGk0kxczX6s+VRBhpbBI2PWnOsRJgU2n10PZ3mZD4Xu9kU2IXYmuW+e5KCA vTArRUdCrAtIa1k01sPipPPw6dfxx2e5asy21YOytzxuWFfJTGnVxZZSCyLUO83sh6OZhJkk b9rxL9wPmpN/t2IPaEKoAc0FTQZS36wAMOXkBh24PQ9gaLJvfPKpNzGD8XWR5HHF0NLIJhgg 4ZlEXQ2fVp3XrtocHqhu4UZR4koCijgB8sB7Tb0GCpwK+C4UePdFLfhKyRdSXuvY3AHJd4CP 4JzW0Bzq/WXY3XMOzUTYApGQpnUpdOmuQSfpV9MQO+/jo7r6yPbxT7CwRS5dcQPzUiuHLK9i nvjREdh84qycnx0/6dDroYhp0DFv4udxuAvt1h4wGwTPRQZerSm4xaYegEFusyhbZrI0U9tJ B8WrhBLXDiYlyJT6zOV2yZFuW47VrLsjYnHwn27hmxTC/7tvG3euCklmkn9Sl9IAKFu29RSo d5bD8kMSCYsTqtTfT6W4A3qHGvIDta3ptLYpIAOD2sY3GYq2nf3Bbzx81wZK14JdDDHUX2Rs 6+ahAA==
  • Cc: Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>, Daniel de Graaf <dgdegra@xxxxxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>, Roger Pau Monne <roger.pau@xxxxxxxxxx>
  • Delivery-date: Wed, 04 Jul 2018 18:47:37 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Openpgp: preference=signencrypt

On 04/07/18 11:16, Jan Beulich wrote:
>>>> On 03.07.18 at 22:55, <andrew.cooper3@xxxxxxxxxx> wrote:
>> From: Sergey Dyasli <sergey.dyasli@xxxxxxxxxx>
>>
>> This hypercall allows the toolstack to present one combined CPUID and MSR
>> policy for a domain, which can be audited in one go by Xen, which is 
>> necessary
>> for correctness of the auditing.
>>
>> A stub x86_policies_are_compatible() function is introduced, although at
>> present it will always fail the hypercall.
>>
>> The hypercall ABI allows for update of individual CPUID or MSR entries, so
>> begins by duplicating the existing policy (for which a helper is introduced),
>> merging the toolstack data, then checking compatibility of the result.
> This reads to me as if it was fine for the tool stack to supply only partial
> data (or else there would be no need to merge anything). What's the
> thinking behind this, rather than requiring complete sets of data to be
> supplied?

Unless you have an identical build of Xen and libxc, the toolstack won't
always provide the same leaves as Xen expects.  Such a restriction would
massively hamper development.

More importantly, with the max_extd_leaf being vendor dependent (which
is data earlier in the structure), the toolstack can't reasonably create
a dataset which matches those expectations.  Also you'll get into
problems when migrating from older hosts, and when levelling a policy
down with a lower max_leaf.

The reason for having an architectural representation was deliberately
to prohibit having a hypercall which was a memcpy of an unstable
structure in the background, but with this design comes the fact that
Xen must not expect to find an exact match of data.

>
>> One awkard corner case is re-deserialising of the vcpu msrs.  The correct fix
>> would be to allocate a buffer, copy the MSRs list, then deserialise from 
>> that,
>> but trips the bounds checks in the copy_from_guest() helpers.  The compat 
>> XLAT
>> are would work, but would require that we allocate it even for 64bit PV
>> guests.
>
>
>> --- a/xen/arch/x86/domctl.c
>> +++ b/xen/arch/x86/domctl.c
>> @@ -330,6 +330,71 @@ static int update_domain_cpuid_info(struct domain *d,
>>      return 0;
>>  }
>>  
>> +static int update_domain_cpumsr_policy(struct domain *d,
>> +                                       xen_domctl_cpumsr_policy_t *xdpc)
>> +{
>> +    struct policy_group new = {};
>> +    const struct policy_group *sys = is_pv_domain(d)
>> +        ? &system_policies[XEN_SYSCTL_cpumsr_policy_pv_max]
>> +        : &system_policies[XEN_SYSCTL_cpumsr_policy_hvm_max];
>> +    struct vcpu *v = d->vcpu[0];
>> +    int ret = -ENOMEM;
>> +
>> +    /* Initialise some help identifying auditing errors. */
>> +    xdpc->err_leaf = xdpc->err_subleaf = XEN_CPUID_NO_SUBLEAF;
>> +    xdpc->err_msr_idx = ~0;
> I'm having trouble extracting information from the comment.

So am I...

These are the fields inside the domctl which try to provide some hint as
to which piece of data is problematic, rather than leaving the user with
simply "somthing is wrong".

>
>> +    /* Start with existing domain's policies */
>> +    if ( !(new.cp = xmemdup(d->arch.cpuid)) ||
>> +         !(new.dp = xmemdup(d->arch.msr)) ||
>> +         !(new.vp = xmemdup(v->arch.msr)) )
>> +        goto out;
>> +
>> +    /* Merge the toolstack provided data. */
>> +    if ( (ret = x86_cpuid_copy_from_buffer(
>> +              new.cp, xdpc->cpuid_policy, xdpc->nr_leaves,
>> +              &xdpc->err_leaf, &xdpc->err_subleaf)) )
>> +        goto out;
>> +
>> +    if ( (ret = x86_msr_copy_from_buffer(
>> +              new.dp, new.vp,
>> +              xdpc->msr_policy, xdpc->nr_msrs, &xdpc->err_msr_idx)) )
>> +        goto out;
>> +
>> +    /* Audit the combined dataset. */
>> +    ret = x86_policies_are_compatible(sys, &new);
>> +    if ( ret )
>> +        goto out;
> I'm afraid I don't follow - where's the merging? All you do is copy the
> first so many entries coming from libxc, and using the later so many
> entries from the previous policies. How's that going to provide a
> complete set, rather than e.g. some duplicate entries and some
> missing ones?

Consider the case where max_leaf is lower than Xen's maximum.  The data
provided by the toolstack can be self consistent and complete without
matching Xen's exact size of structure.

>
>> +    /*
>> +     * Audit was successful.  Replace existing policies, leaving the old
>> +     * policies to be freed.
>> +     */
>> +    SWAP(new.cp, d->arch.cpuid);
>> +    SWAP(new.dp, d->arch.msr);
>> +    SWAP(new.vp, v->arch.msr);
>> +
>> +    /* Merge the (now audited) vCPU MSRs into every other msr_vcpu_policy. 
>> */
>> +    for ( ; v; v = v->next_in_list )
> This open-coded almost-for_each_domain() doesn't look very nice.

ITYM for_each_vcpu()

And yes, but for_each_vcpu() is wrong to use here, and we don't have a
for_each_vcpu_other_than_0() helper.

>
>> +    {
>> +        /* XXX - Figure out how to avoid a TOCTOU race here.  XLAT area? */
>> +        if ( (ret = x86_msr_copy_from_buffer(
>> +                  NULL, v->arch.msr, xdpc->msr_policy, xdpc->nr_msrs, 
>> NULL)) )
> Why can't you go from vCPU 0's v->arch.msr here, which is the copied-in
> (and sanitized) representation already? Also, is it really a good idea to
> assume all vCPU-s have the same policies?

There are multiple colliding issues which lead to this code, but as
several people have pointed out, its probably over the top.

First, as to the same policy.  This hypercall can currently only be used
before the vcpu has started executing.

As such, it is setting the init state of the MSRs from the guests point
of view, and there is exactly one MSR I'm aware of which has an init
value which depends on the core (that being APIC_BASE.BSP which can
trivially be handled in Xen).  All other MSRs have identical init state
AFAICT, and I don't want to create an interface which makes it easy to
accidentally end up with wrong values.

The re-de-serialising actually came from your suggested usecase to be
able to increase the policy at runtime, .e.g. after a microcode update
and hypervisor livepatch.  In that case, a partial set doesn't want to
copy vCPU0's generally R/W MSRs over other vcpus, because that would be
a bad thing.

However, these two points aren't compatible, so if we accept the "only
before the domain has started running" aspect, then it should be safe to
copy v0's policy over all other vcpus.

Really, the problem here comes from (necessarily) how we will have to
implement SGX_LC and the fact that the LC Hash MSRs may be read-only
domain-wide state, or read-write per-vcpu state depending on a different
MSR setting (MSR_FEAT_CTRL.SGX_LC).

I was initially hoping to have only msr_domain_policy poked via this
interface, and msr_vcpu_policy poked exclusively via per-vcpu
hypercalls.  Then again, at least the complexity is visible at this
point, rather than at some point in the future when we need to try and
retrofit it.

>
>> @@ -1570,6 +1635,28 @@ long arch_do_domctl(
>>          domain_unpause(d);
>>          break;
>>  
>> +    case XEN_DOMCTL_set_cpumsr_policy:
>> +        if ( d == currd ||       /* no domain_pause() */
>> +             d->max_vcpus == 0 ) /* No vcpus yet. */
>> +        {
>> +            ret = -EINVAL;
>> +            break;
>> +        }
>> +
>> +        domain_pause(d);
>> +
>> +        if ( d->creation_finished )
>> +            ret = -EEXIST; /* No changing once the domain is running. */
>> +        else
>> +        {
>> +            ret = update_domain_cpumsr_policy(d, &domctl->u.cpumsr_policy);
>> +            if ( !ret ) /* Copy domctl->u.cpumsr_policy.err_* to guest. */
>> +                copyback = true;
> x86_cpuid_copy_from_buffer(), for example, sets the err_ fields
> only when returning -ERANGE. Is the if() condition inverted?

Oops yes.  This was part of a blanket "ret > 0" correction I needed to
do because the privcmd() ioctl interface isn't sufficiently expressive.

>
>> --- a/xen/include/public/domctl.h
>> +++ b/xen/include/public/domctl.h
>> @@ -648,6 +648,12 @@ struct xen_domctl_cpumsr_policy {
>>                           * 'msr_domain_policy' */
>>      XEN_GUEST_HANDLE_64(xen_cpuid_leaf_t) cpuid_policy; /* IN/OUT: */
>>      XEN_GUEST_HANDLE_64(xen_msr_entry_t) msr_policy;    /* IN/OUT: */
>> +    uint32_t err_leaf, err_subleaf; /* OUT, set_policy only.  If not ~0,
>> +                                     * indicates the leaf/subleaf which
>> +                                     * auditing objected to. */
>> +    uint32_t err_msr_idx;           /* OUT, set_policy only.  If not ~0,
>> +                                     * indicates the MSR idx which
>> +                                     * auditing objected to. */
> Explicit padding again please, with the handler checking it to be
> zero.
>
>> --- a/xen/include/xen/xmalloc.h
>> +++ b/xen/include/xen/xmalloc.h
>> @@ -13,6 +13,13 @@
>>  #define xmalloc(_type) ((_type *)_xmalloc(sizeof(_type), 
>> __alignof__(_type)))
>>  #define xzalloc(_type) ((_type *)_xzalloc(sizeof(_type), 
>> __alignof__(_type)))
>>  
>> +/* Allocate space for a typed object and copy an existing instance. */
>> +#define xmemdup(ptr)                                    \
>> +    ({  typeof(*ptr) *n_ = xmalloc(typeof(*ptr));       \
>> +        if ( n_ )                                       \
>> +            memcpy(n_, ptr, sizeof(*ptr));              \
>> +        n_; })
> Would be nice if this could handle input pointers to const-qualified types.
> I vaguely recall having seen a solution to this recently, but I don't recall
> where that was or how it looked like. Until then, may I suggest to use
> void * instead, despite this opening the risk of type incompatibilities?

The only way I'm aware of which might fix the const qualified aspect is
_Generic(), but ISTR you can't use typeof() inside.

As for xmemdup() specific, no - this will never want const qualified
types.  The caller is always going to mutate the structure, or they
wouldn't have dup'd the object to begin with.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.