[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 3/3] x86/hyperv: L0 assisted TLB flush



On Mon, Feb 17, 2020 at 06:34:12PM +0100, Roger Pau Monné wrote:
> On Mon, Feb 17, 2020 at 01:55:17PM +0000, Wei Liu wrote:
> > Implement L0 assisted TLB flush for Xen on Hyper-V. It takes advantage
> > of several hypercalls:
> > 
> >  * HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST
> >  * HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX
> >  * HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE
> >  * HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX
> > 
> > Pick the most efficient hypercalls available.
> > 
> > Signed-off-by: Wei Liu <liuwe@xxxxxxxxxxxxx>
> 
> Just two comments below.
> 
> > ---
> > v3:
> > 1. Address more comments.
> > 2. Fix usage of max_vp_index.
> > 3. Use the fill_gva_list algorithm from Linux.
> > 
> > v2:
> > 1. Address Roger and Jan's comments re types etc.
> > 2. Fix pointer arithmetic.
> > 3. Misc improvement to code.
> > ---
> >  xen/arch/x86/guest/hyperv/Makefile  |   1 +
> >  xen/arch/x86/guest/hyperv/private.h |   9 ++
> >  xen/arch/x86/guest/hyperv/tlb.c     | 173 +++++++++++++++++++++++++++-
> >  xen/arch/x86/guest/hyperv/util.c    |  74 ++++++++++++
> >  4 files changed, 256 insertions(+), 1 deletion(-)
> >  create mode 100644 xen/arch/x86/guest/hyperv/util.c
> > 
> > diff --git a/xen/arch/x86/guest/hyperv/Makefile 
> > b/xen/arch/x86/guest/hyperv/Makefile
> > index 18902c33e9..0e39410968 100644
> > --- a/xen/arch/x86/guest/hyperv/Makefile
> > +++ b/xen/arch/x86/guest/hyperv/Makefile
> > @@ -1,2 +1,3 @@
> >  obj-y += hyperv.o
> >  obj-y += tlb.o
> > +obj-y += util.o
> > diff --git a/xen/arch/x86/guest/hyperv/private.h 
> > b/xen/arch/x86/guest/hyperv/private.h
> > index 509bedaafa..79a77930a0 100644
> > --- a/xen/arch/x86/guest/hyperv/private.h
> > +++ b/xen/arch/x86/guest/hyperv/private.h
> > @@ -24,12 +24,21 @@
> >  
> >  #include <xen/cpumask.h>
> >  #include <xen/percpu.h>
> > +#include <xen/types.h>
> 
> Do you still need to include types.h?
> 

Not anymore.

> None of the additions to this header done in this patch seems to
> require it AFAICT.
> 
> >  
> >  DECLARE_PER_CPU(void *, hv_input_page);
> >  DECLARE_PER_CPU(void *, hv_vp_assist);
> >  DECLARE_PER_CPU(unsigned int, hv_vp_index);
> >  
> > +static inline unsigned int hv_vp_index(unsigned int cpu)
> > +{
> > +    return per_cpu(hv_vp_index, cpu);
> > +}
> > +
> >  int hyperv_flush_tlb(const cpumask_t *mask, const void *va,
> >                       unsigned int flags);
> >  
> > +/* Returns number of banks, -ev if error */
> > +int cpumask_to_vpset(struct hv_vpset *vpset, const cpumask_t *mask);
> > +
> >  #endif /* __XEN_HYPERV_PRIVIATE_H__  */
> > diff --git a/xen/arch/x86/guest/hyperv/tlb.c 
> > b/xen/arch/x86/guest/hyperv/tlb.c
> > index 48f527229e..8cd1c6f0ed 100644
> > --- a/xen/arch/x86/guest/hyperv/tlb.c
> > +++ b/xen/arch/x86/guest/hyperv/tlb.c
> > @@ -19,17 +19,188 @@
> >   * Copyright (c) 2020 Microsoft.
> >   */
> >  
> > +#include <xen/cpu.h>
> >  #include <xen/cpumask.h>
> >  #include <xen/errno.h>
> >  
> > +#include <asm/guest/hyperv.h>
> > +#include <asm/guest/hyperv-hcall.h>
> > +#include <asm/guest/hyperv-tlfs.h>
> > +
> >  #include "private.h"
> >  
> > +/*
> > + * It is possible to encode up to 4096 pages using the lower 12 bits
> > + * in an element of gva_list
> > + */
> > +#define HV_TLB_FLUSH_UNIT (4096 * PAGE_SIZE)
> > +
> > +static unsigned int fill_gva_list(uint64_t *gva_list, const void *va,
> > +                                  unsigned int order)
> > +{
> > +    unsigned long cur = (unsigned long)va;
> > +    /* end is 1 past the range to be flushed */
> > +    unsigned long end = cur + (PAGE_SIZE << order);
> > +    unsigned int n = 0;
> > +
> > +    do {
> > +        unsigned long diff = end - cur;
> > +
> > +        gva_list[n] = cur & PAGE_MASK;
> > +
> > +        /*
> > +         * Use lower 12 bits to encode the number of additional pages
> > +         * to flush
> > +         */
> > +        if ( diff >= HV_TLB_FLUSH_UNIT )
> > +        {
> > +            gva_list[n] |= ~PAGE_MASK;
> > +            cur += HV_TLB_FLUSH_UNIT;
> > +        }
> > +        else
> > +        {
> > +            gva_list[n] |= (diff - 1) >> PAGE_SHIFT;
> > +            cur = end;
> > +        }
> > +
> > +        n++;
> > +    } while ( cur < end );
> > +
> > +    return n;
> > +}
> > +
> > +static uint64_t flush_tlb_ex(const cpumask_t *mask, const void *va,
> > +                             unsigned int flags)
> > +{
> > +    struct hv_tlb_flush_ex *flush = this_cpu(hv_input_page);
> > +    int nr_banks;
> > +    unsigned int max_gvas, order = flags & FLUSH_ORDER_MASK;
> > +    uint64_t *gva_list;
> > +
> > +    if ( !flush || local_irq_is_enabled() )
> > +    {
> > +        ASSERT_UNREACHABLE();
> > +        return ~0ULL;
> > +    }
> > +
> > +    if ( !(ms_hyperv.hints & HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED) )
> > +        return ~0ULL;
> > +
> > +    flush->address_space = 0;
> > +    flush->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
> > +    if ( !(flags & FLUSH_TLB_GLOBAL) )
> > +        flush->flags |= HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY;
> > +
> > +    nr_banks = cpumask_to_vpset(&flush->hv_vp_set, mask);
> > +    if ( nr_banks < 0 )
> > +        return ~0ULL;
> > +
> > +    max_gvas =
> > +        (PAGE_SIZE - sizeof(*flush) - nr_banks *
> > +         sizeof(flush->hv_vp_set.bank_contents[0])) /
> > +        sizeof(uint64_t);       /* gva is represented as uint64_t */
> > +
> > +    /*
> > +     * Flush the entire address space if va is NULL or if there is not
> > +     * enough space for gva_list.
> > +     */
> > +    if ( !va || (PAGE_SIZE << order) / HV_TLB_FLUSH_UNIT > max_gvas )
> > +        return hv_do_rep_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX, 
> > 0,
> > +                                   nr_banks, virt_to_maddr(flush), 0);
> > +
> > +    /*
> > +     * The calculation of gva_list address requires the structure to
> > +     * be 64 bits aligned.
> > +     */
> > +    BUILD_BUG_ON(sizeof(*flush) % sizeof(uint64_t));
> > +    gva_list = (uint64_t *)flush + sizeof(*flush) / sizeof(uint64_t) + 
> > nr_banks;
> > +
> > +    return hv_do_rep_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX,
> > +                               fill_gva_list(gva_list, va, order),
> > +                               nr_banks, virt_to_maddr(flush), 0);
> > +}
> > +
> > +/* Maximum number of gvas for hv_tlb_flush */
> > +#define MAX_GVAS ((PAGE_SIZE - sizeof(struct hv_tlb_flush)) / 
> > sizeof(uint64_t))
> > +
> >  int hyperv_flush_tlb(const cpumask_t *mask, const void *va,
> >                       unsigned int flags)
> >  {
> > -    return -EOPNOTSUPP;
> > +    unsigned long irq_flags;
> > +    struct hv_tlb_flush *flush = this_cpu(hv_input_page);
> > +    unsigned int order = flags & FLUSH_ORDER_MASK;
> 
> I think you need a - 1 here, as FLUSH_ORDER(x) is defined as ((x)+1).
> So if a user has specified order 0 here you would get order 1 instead.
> 
> unsigned int order = (flags - 1) & FLUSH_ORDER_MASK;

Yes, indeed. That's what flush_area_local does. I will fix this.

BTW, I think your series also needs fixing. The patch that introduced
hypervisor_flush_tlb hook. I took the snippet from that patch directly.

> 
> Sorry for not noticing this earlier.

Thanks for noticing this. :-)

> 
> > +    uint64_t ret;
> > +
> > +    if ( !flush || cpumask_empty(mask) )
> > +    {
> > +        ASSERT_UNREACHABLE();
> > +        return -EINVAL;
> > +    }
> > +
> > +    local_irq_save(irq_flags);
> 
> I think you disable interrupts in order to prevent re-entering this
> function, and hence avoid an interrupt from triggering in the middle
> and also attempting to do a TLB flush using the same per-CPU input
> page.
> 
> As pointed out to me by Jan, we can also get #MC and #NMI which will
> still happen despite interrupts being disabled, and hence you might
> want to assert that you are not in #MC or #NMI context before
> accessing the per-CPU hv_input_page (or else just return an error
> and avoid using the assisted flush). I have a patch that will
> hopefully be able to signal when in #MC or #NMI context.
> 

This function should return an error in that case. It is better to fall
back to native path than crashing.

Wei.


> Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.