[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/hvm: allow for more fine grained assisted flush



On 30.04.2020 11:17, Roger Pau Monne wrote:
> Improve the assisted flush by expanding the interface and allowing for
> more fine grained TLB flushes to be issued using the HVMOP_flush_tlbs
> hypercall. Support for such advanced flushes is signaled in CPUID
> using the XEN_HVM_CPUID_ADVANCED_FLUSH flag.
> 
> The new features make use of the NULL parameter so far passed in the
> hypercall in order to convey extra data to perform more selective
> flushes: a virtual address, an order field, a flags field and finally a
> vCPU bitmap. Note that not all features are implemented as part of
> this patch, but are already added to the interface in order to avoid
> having to introduce yet a new CPUID flag when the new features are
> added.
> 
> The feature currently implemented is the usage of a guest provided
> vCPU bitmap in order to signal which vCPUs require a TLB flush,
> instead of assuming all vCPUs must be flushed. Note that not
> implementing the rest of the features just make the flush less
> efficient, but it's still correct and safe.

A risk of not supporting these right away is that guest bugs may go
unnoticed for quite some time. Just as a remark, not as a request
to do the implementation right away.

> --- a/xen/arch/x86/guest/xen/xen.c
> +++ b/xen/arch/x86/guest/xen/xen.c
> @@ -326,7 +326,27 @@ static void __init e820_fixup(struct e820map *e820)
>  
>  static int flush_tlb(const cpumask_t *mask, const void *va, unsigned int 
> flags)
>  {
> -    return xen_hypercall_hvm_op(HVMOP_flush_tlbs, NULL);
> +    xen_hvm_flush_tlbs_t op = {
> +        .va = (unsigned long)va,
> +        .order = (flags - 1) & FLUSH_ORDER_MASK,

I consider such an expression as reasonable to use when there's a
single, central place (in flushtlb.c). For uses elsewhere (and
then better mirrored to the original place) I think we want to
have a macro inverting FLUSH_ORDER(), e.g. FLUSH_GET_ORDER().

> +        .flags = ((flags & FLUSH_TLB_GLOBAL) ? HVMOP_flush_global : 0) |
> +                 ((flags & FLUSH_VA_VALID) ? HVMOP_flush_va_valid : 0),
> +        .mask_size = BITS_TO_LONGS(nr_cpu_ids),
> +    };
> +    static int8_t __read_mostly advanced_flush = -1;
> +
> +    if ( advanced_flush == -1 )
> +    {
> +        uint32_t eax, ebx, ecx, edx;
> +
> +        ASSERT(xen_cpuid_base);
> +        cpuid(xen_cpuid_base + 4, &eax, &ebx, &ecx, &edx);
> +        advanced_flush = (eax & XEN_HVM_CPUID_ADVANCED_FLUSH) ? 1 : 0;

Use the more conventional (in our code base) !! here? Also use
cpuid_eax(), to avoid several dead locals?

> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -4011,17 +4011,42 @@ static void hvm_s3_resume(struct domain *d)
>      }
>  }
>  
> -static bool always_flush(void *ctxt, struct vcpu *v)
> +static bool flush_check(void *mask, struct vcpu *v)

const twice?

>  {
> -    return true;
> +    return mask ? test_bit(v->vcpu_id, (unsigned long *)mask) : true;

And a 3rd time?

>  }
>  
> -static int hvmop_flush_tlb_all(void)
> +static int hvmop_flush_tlb(XEN_GUEST_HANDLE_PARAM(xen_hvm_flush_tlbs_t) uop)
>  {
> +    xen_hvm_flush_tlbs_t op;

This could move into the more narrow scope below.

> +    DECLARE_BITMAP(mask, HVM_MAX_VCPUS) = { };
> +    bool valid_mask = false;
> +
>      if ( !is_hvm_domain(current->domain) )
>          return -EINVAL;
>  
> -    return paging_flush_tlb(always_flush, NULL) ? 0 : -ERESTART;
> +    if ( !guest_handle_is_null(uop) )
> +    {
> +        if ( copy_from_guest(&op, uop, 1) )
> +            return -EFAULT;
> +
> +        /*
> +         * TODO: implement support for the other features present in
> +         * xen_hvm_flush_tlbs_t: flushing a specific virtual address and not
> +         * flushing global mappings.
> +         */
> +
> +        if ( op.mask_size > ARRAY_SIZE(mask) )
> +            return -EINVAL;

While a proper safeguard for the implementation, this looks rather
arbitrary from the guests's pov: Bits beyond the guest's vCPU count
aren't meaningful anyway. They could be ignored by not copying here,
rather than by never inspecting them in flush_check(). And ignoring
some bits here would then call for this to be done consistently for
all of them, i.e. not returning -EINVAL.

> +        if ( copy_from_guest(mask, op.vcpu_mask, op.mask_size) )
> +            return -EFAULT;
> +
> +        valid_mask = true;
> +    }
> +
> +    return paging_flush_tlb(flush_check, valid_mask ? mask : NULL) ? 0
> +                                                                   : 
> -ERESTART;

Just as a suggestion, this might look a little less odd when wrapped
as

    return paging_flush_tlb(flush_check,
                            valid_mask ? mask : NULL) ? 0 : -ERESTART;

But it's really up to you.

> @@ -5017,7 +5042,7 @@ long do_hvm_op(unsigned long op, 
> XEN_GUEST_HANDLE_PARAM(void) arg)
>          break;
>  
>      case HVMOP_flush_tlbs:
> -        rc = guest_handle_is_null(arg) ? hvmop_flush_tlb_all() : -EINVAL;
> +        rc = hvmop_flush_tlb(guest_handle_cast(arg, xen_hvm_flush_tlbs_t));

There's nothing to be passed back so maybe you could even cast to a
const handle here?

> --- a/xen/include/public/hvm/hvm_op.h
> +++ b/xen/include/public/hvm/hvm_op.h
> @@ -99,8 +99,29 @@ DEFINE_XEN_GUEST_HANDLE(xen_hvm_set_pci_link_route_t);
>  
>  #endif /* __XEN_INTERFACE_VERSION__ < 0x00040900 */
>  
> -/* Flushes all VCPU TLBs: @arg must be NULL. */
> +/*
> + * Flushes all VCPU TLBs: @arg can be NULL or xen_hvm_flush_tlbs_t.
> + *
> + * Support for passing a xen_hvm_flush_tlbs_t parameter is signaled in CPUID,
> + * see XEN_HVM_CPUID_ADVANCED_FLUSH.
> + */
>  #define HVMOP_flush_tlbs          5
> +struct xen_hvm_flush_tlbs {
> +    /* Virtual address to be flushed. */
> +    uint64_t va;
> +    uint16_t order;
> +    uint16_t flags;
> +/* Flush global mappings. */
> +#define HVMOP_flush_global      (1u << 0)
> +/* VA for the flush has a valid mapping. */
> +#define HVMOP_flush_va_valid    (1u << 1)
> +    /* Number of uint64_t elements in vcpu_mask. */
> +    uint32_t mask_size;
> +    /* Bitmask of vcpus that should be flushed. */
> +    XEN_GUEST_HANDLE(const_uint64) vcpu_mask;

This will make the structure have different size for 64- and 32-bit
callers. Apart from altp2m interfaces there's no precedent of a
handle in the hvmop structures, so I wonder whether this wouldn't
better be replaced, e.g. by having an effectively variable size
array at the end of the struct. Simply forcing suitable padding,
e.g. by using uint64_aligned_t for the first field, won't help, as
this would lead to 32-bit callers not having suitable control over
the upper 32 bits of what Xen will take as the handle. (And of
course right now both uint64_aligned_t and XEN_GUEST_HANDLE_64 are
Xen+tools constructs only.)

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.