[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 6/8] tools/libxc: Rework xc_cpuid_apply_policy() to use {get, set}_cpu_policy()



On 11.09.2019 22:05, Andrew Cooper wrote:
> The purpose of this change is to stop using xc_cpuid_do_domctl(), and to stop
> basing decisions on a local CPUID instruction.  This is not a correct or
> appropriate way to construct policy information for other domains.
> 
> The overwhelming majority of this logic is redundant with the policy logic in
> Xen, but has a habit of becoming stale (e.g. c/s 97e4ebdcd76 resulting in
> AVX512_BF16 not ever actually being offered to guests).

Well, not offering it to guests was intentional at that point,
but I guess you validly imply that by adding the A marker to the
public header it _still_ wouldn't have got exposed?

> ---
>  tools/libxc/xc_cpuid_x86.c      | 798 
> ++++++++++------------------------------
>  xen/include/xen/lib/x86/cpuid.h |  11 +-
>  2 files changed, 197 insertions(+), 612 deletions(-)

Nice.

> @@ -1057,3 +449,191 @@ int xc_cpuid_set(
>  
>      return rc;
>  }
> +
> +int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid,
> +                          const uint32_t *featureset, unsigned int 
> nr_features)
> +{
> +    int rc;
> +    xc_dominfo_t di;
> +    unsigned int i, nr_leaves, nr_msrs;
> +    xen_cpuid_leaf_t *leaves = NULL;
> +    struct cpuid_policy *p = NULL;
> +    uint32_t err_leaf = -1, err_subleaf = -1, err_msr = -1;
> +
> +    if ( xc_domain_getinfo(xch, domid, 1, &di) != 1 ||
> +         di.domid != domid )
> +    {
> +        ERROR("Failed to obtain d%d info", domid);
> +        rc = -ESRCH;
> +        goto out;
> +    }
> +
> +    rc = xc_get_cpu_policy_size(xch, &nr_leaves, &nr_msrs);
> +    if ( rc )
> +    {
> +        PERROR("Failed to obtain policy info size");
> +        rc = -errno;
> +        goto out;
> +    }
> +
> +    rc = -ENOMEM;
> +    if ( (leaves = calloc(nr_leaves, sizeof(*leaves))) == NULL ||
> +         (p = calloc(1, sizeof(*p))) == NULL )
> +        goto out;
> +
> +    nr_msrs = 0;
> +    rc = xc_get_domain_cpu_policy(xch, domid, &nr_leaves, leaves,
> +                                  &nr_msrs, NULL);
> +    if ( rc )
> +    {
> +        PERROR("Failed to obtain d%d's policy", domid);
> +        rc = -errno;
> +        goto out;
> +    }
> +
> +    rc = x86_cpuid_copy_from_buffer(p, leaves, nr_leaves,
> +                                    &err_leaf, &err_subleaf);
> +    if ( rc )
> +    {
> +        ERROR("Failed to deserialise CPUID (err leaf %#x, subleaf %#x) (%d = 
> %s)",
> +              err_leaf, err_subleaf, -rc, strerror(-rc));
> +        goto out;
> +    }
> +
> +    if ( featureset )
> +    {
> +        uint32_t disabled_features[FEATURESET_NR_ENTRIES],
> +            feat[FEATURESET_NR_ENTRIES] = {};
> +        static const uint32_t deep_features[] = INIT_DEEP_FEATURES;
> +        unsigned int i, b;
> +
> +        /*
> +         * The user supplied featureset may be shorter or longer than
> +         * FEATURESET_NR_ENTRIES.  Shorter is fine, and we will zero-extend.
> +         * Longer is fine, so long as it only padded with zeros.
> +         */
> +        unsigned int user_len = min(FEATURESET_NR_ENTRIES + 0u, nr_features);
> +
> +        /* Check for truncated set bits. */
> +        rc = -EOPNOTSUPP;
> +        for ( i = user_len; i < nr_features; ++i )
> +            if ( featureset[i] != 0 )
> +                goto out;
> +
> +        memcpy(feat, featureset, sizeof(*featureset) * user_len);
> +
> +        /* Disable deep dependencies of disabled features. */
> +        for ( i = 0; i < ARRAY_SIZE(disabled_features); ++i )
> +            disabled_features[i] = ~feat[i] & deep_features[i];
> +
> +        for ( b = 0; b < sizeof(disabled_features) * CHAR_BIT; ++b )
> +        {
> +            const uint32_t *dfs;
> +
> +            if ( !test_bit(b, disabled_features) ||
> +                 !(dfs = x86_cpuid_lookup_deep_deps(b)) )
> +                continue;
> +
> +            for ( i = 0; i < ARRAY_SIZE(disabled_features); ++i )
> +            {
> +                feat[i] &= ~dfs[i];
> +                disabled_features[i] &= ~dfs[i];
> +            }
> +        }
> +
> +        cpuid_featureset_to_policy(feat, p);
> +    }
> +
> +    if ( !di.hvm )
> +    {
> +        uint32_t host_featureset[FEATURESET_NR_ENTRIES];
> +        uint32_t len = ARRAY_SIZE(host_featureset);
> +
> +        rc = xc_get_cpu_featureset(xch, XEN_SYSCTL_cpu_featureset_host,
> +                                   &len, host_featureset);
> +        if ( rc )
> +        {
> +            /* Tolerate "buffer too small", as we've got the bits we need. */
> +            if ( errno == ENOBUFS )
> +                rc = 0;

So this is where I think returning an error (instead of a positive
number) from the hypercall is latently problematic: There's not
really any guarantee for ENOBUFS to not result from other than the
actual hypercall. I guess we have such dependencies elsewhere, so
having one more here isn't a big deal, but as a precaution against
using uninitialized data, wouldn't it be prudent for
host_featureset[] to get zero-initialized up front?

> +            else
> +            {
> +                PERROR("Failed to obtain host featureset");
> +                rc = -errno;
> +                goto out;
> +            }
> +        }
> +
> +        /*
> +         * On hardware without CPUID Faulting, PV guests see real topology.
> +         * As a consequence, they also need to see the host htt/cmp fields.
> +         */
> +        p->basic.htt       = test_bit(X86_FEATURE_HTT, host_featureset);
> +        p->extd.cmp_legacy = test_bit(X86_FEATURE_CMP_LEGACY, 
> host_featureset);
> +    }
> +    else
> +    {
> +        /*
> +         * Topology for HVM guests is entirely controlled by Xen.  For now, 
> we
> +         * hardcode APIC_ID = vcpu_id * 2 to give the illusion of no SMT.
> +         */
> +        p->basic.htt = true;
> +        p->extd.cmp_legacy = false;
> +
> +        p->basic.lppp *= 2;
> +
> +        switch ( p->x86_vendor )
> +        {
> +        case X86_VENDOR_INTEL:
> +            for ( i = 0; (p->cache.subleaf[i].type &&
> +                          i < ARRAY_SIZE(p->cache.raw)); ++i )
> +            {
> +                p->cache.subleaf[i].cores_per_package =
> +                    (p->cache.subleaf[i].cores_per_package << 1) | 1;
> +                p->cache.subleaf[i].threads_per_cache = 0;
> +            }
> +            break;

The original code masked EDX by 0x3ff. I don't see how this is reflected
here, and the description also doesn't indicate the change is on purpose.

> +        case X86_VENDOR_AMD:
> +        case X86_VENDOR_HYGON:
> +            p->extd.nc = (p->extd.nc << 1) | 1;

This actually fixes a latent "spill into bit 8" issue of the original
code.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.