[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 7/9] xen/arm: Implement virtual-linear page table for guest p2m mapping in live migration



On Fri, 2013-10-04 at 13:44 +0900, Jaeyong Yoo wrote:
> Implement vlpt (virtual-linear page table) for fast accessing of 3rd
> PTE of guest p2m. When creating a mapping for vlpt, just copy the
> 1st level PTE of guest p2m to the xen's 2nd level PTE. Then the mapping
> becomes the following:
>     xen's 1st PTE -->
>     xen's 2nd PTE (which is the same as 1st PTE of guest p2m) -->
>     guest p2m's 2nd PTE -->
>     guest p2m's 3rd PTE (the memory contents where the vlpt points)
> For more info about vlpt, see:
> http://www.technovelty.org/linux/virtual-linear-page-table.html
> 
> This function is used in dirty-page tracing: when domU write-fault is trapped 
> by xen,
> xen can immediately locate the 3rd PTE of guest p2m.

ISTR you had some numbers which backed up the choice of this solution.
Can you either include them here or link to the list archives?

> Signed-off-by: Jaeyong Yoo <jaeyong.yoo@xxxxxxxxxxx>
> ---
>  xen/arch/arm/mm.c            | 78 
> ++++++++++++++++++++++++++++++++++++++++++++
>  xen/include/asm-arm/config.h |  6 ++++
>  xen/include/asm-arm/domain.h |  7 ++++
>  xen/include/asm-arm/mm.h     | 22 +++++++++++++
>  4 files changed, 113 insertions(+)
> 
> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> index 120eae8..98edbc9 100644
> --- a/xen/arch/arm/mm.c
> +++ b/xen/arch/arm/mm.c
> @@ -1382,6 +1382,84 @@ int is_iomem_page(unsigned long mfn)
>          return 1;
>      return 0;
>  }
> +
> +/* flush the vlpt area */
> +static void flush_vlpt(struct domain *d)
> +{
> +    int flush_size;
> +    flush_size = (d->arch.dirty.second_lvl_end -
> +                  d->arch.dirty.second_lvl_start) << SECOND_SHIFT;
> +    /* flushing the 3rd level mapping */
> +    flush_xen_data_tlb_range_va(VIRT_LIN_P2M_START,
> +                                flush_size);

Shouldn't the base here be VIRT_LIN_P2M_START +
(d->arch.dirty.second_lvl_start) << SECOND_SHIFT) or something like
that?

flush_xen_data_tlb_range_va just turns into a loop over the addresses,
so you might find you may as well do the flushes as you update the ptes
in the below, perhaps with an optimisation to pull the barriers outside
that loop.

> +}
> +
> +/* restore the xen page table for vlpt mapping for domain d */
> +void restore_vlpt(struct domain *d)
> +{
> +    int i, j = 0;
> +    int need_flush = 0;
> +
> +    for ( i = d->arch.dirty.second_lvl_start;
> +          i < d->arch.dirty.second_lvl_end;
> +          ++i, ++j )
> +    {
> +        if ( xen_second[i].bits != d->arch.dirty.second_lvl[j].bits )
> +        {
> +            write_pte(&xen_second[i], d->arch.dirty.second_lvl[j]);
> +            need_flush = 1;
> +        }
> +    }
> +
> +    if ( need_flush ) flush_vlpt(d);
> +}
> +
> +/* setting up the xen page table for vlpt mapping for domain d */
> +void prepare_vlpt(struct domain *d)
> +{
> +    int xen_second_linear_base;
> +    int gp2m_start_index, gp2m_end_index;
> +    struct p2m_domain *p2m = &d->arch.p2m;
> +    struct page_info *second_lvl_page;
> +    vaddr_t gma_start = 0;
> +    vaddr_t gma_end = 0;
> +    lpae_t *first;
> +    int i, j;
> +
> +    xen_second_linear_base = second_linear_offset(VIRT_LIN_P2M_START);
> +    get_gma_start_end(d, &gma_start, &gma_end);
> +
> +    gp2m_start_index = gma_start >> FIRST_SHIFT;
> +    gp2m_end_index = (gma_end >> FIRST_SHIFT) + 1;
> +
> +    second_lvl_page = alloc_domheap_page(NULL, 0);

The p2m first is two concatenated pages with a total of 1024 entries
(which is needed to give the full 40 bit IPA space). I have a feeling
this means you need two pages here?

Or maybe we shouldn't be supporting the full 40 bit addresses on 32-bit?

NB currently create_p2m_entries doesn't actually support 40 bits, but I
need to fix that for an arm64 platform I'm looking at.

> +    d->arch.dirty.second_lvl = map_domain_page_global(
> +                                 page_to_mfn(second_lvl_page) );
> +
> +    first = __map_domain_page(p2m->first_level);
> +    for ( i = gp2m_start_index, j = 0; i < gp2m_end_index; ++i, ++j )
> +    {
> +        write_pte(&xen_second[xen_second_linear_base+i],
> +                  first[i]);
> +
> +        /* we copy the mapping into domain's structure as a reference in case
> +         * of the context switch (used in restore_vlpt) */
> +        d->arch.dirty.second_lvl[j] = first[i];
> +    }
> +    unmap_domain_page(first);
> +
> +    /* storing the start and end index */
> +    d->arch.dirty.second_lvl_start = xen_second_linear_base + 
> gp2m_start_index;
> +    d->arch.dirty.second_lvl_end = xen_second_linear_base + gp2m_end_index;
> +
> +    flush_vlpt(d);
> +}
> +
> +void cleanup_vlpt(struct domain *d)
> +{
> +    unmap_domain_page_global(d->arch.dirty.second_lvl);
> +}
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/xen/include/asm-arm/config.h b/xen/include/asm-arm/config.h
> index 9e395c2..47fc26e 100644
> --- a/xen/include/asm-arm/config.h
> +++ b/xen/include/asm-arm/config.h
> @@ -87,6 +87,7 @@
>   *   0  -   8M   <COMMON>
>   *
>   *  32M - 128M   Frametable: 24 bytes per page for 16GB of RAM
> + * 128M - 256M   Virtual-linear mapping to P2M table
>   * 256M -   1G   VMAP: ioremap and early_ioremap use this virtual address
>   *                    space
>   *
> @@ -132,6 +133,9 @@
>  
>  #define VMAP_VIRT_END    XENHEAP_VIRT_START
>  
> +#define VIRT_LIN_P2M_START     _AT(vaddr_t,0x08000000)

Can you put this up in the list of addresses in order please. 

> +#define VIRT_LIN_P2M_END       VMAP_VIRT_START

> +
>  #define DOMHEAP_ENTRIES        1024  /* 1024 2MB mapping slots */
>  
>  /* Number of domheap pagetable pages required at the second level (2MB 
> mappings) */
> @@ -158,6 +162,8 @@
>  
>  #define HYPERVISOR_VIRT_END    DIRECTMAP_VIRT_END
>  
> +/*TODO (ARM_64): define VIRT_LIN_P2M_START VIRT_LIN_P2M_END */
> +
>  #endif
>  
>  /* Fixmap slots */
> diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
> index 755c7af..7e7743a 100644
> --- a/xen/include/asm-arm/domain.h
> +++ b/xen/include/asm-arm/domain.h
> @@ -112,6 +112,13 @@ struct arch_domain
>          spinlock_t                  lock;
>      } vuart;
>  
> +    /* dirty-page tracing */
> +    struct {
> +        int second_lvl_start;            /* for context switch */
> +        int second_lvl_end;
> +        lpae_t *second_lvl;
> +    } dirty;
> +
>      struct dt_mem_info map_domain;
>      spinlock_t         map_lock;
>  }  __cacheline_aligned;
> diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
> index 021e8d6..b9cbcf2 100644
> --- a/xen/include/asm-arm/mm.h
> +++ b/xen/include/asm-arm/mm.h
> @@ -344,6 +344,28 @@ static inline void put_page_and_type(struct page_info 
> *page)
>  
>  void get_gma_start_end(struct domain *d, vaddr_t *start, vaddr_t *end);
>  
> +/* vlpt (virtual linear page table) related functions */
> +void prepare_vlpt(struct domain *d);
> +void cleanup_vlpt(struct domain *d);
> +void restore_vlpt(struct domain *d);
> +
> +/* calculate the xen's virtual address for accessing the leaf PTE of
> + * a given address (GPA) */
> +static inline lpae_t * get_vlpt_3lvl_pte(vaddr_t addr)

Argument is an IPA and should therefore be a paddr_t I think?

> +{
> +    unsigned long ipa_third = third_table_offset(addr);
> +    unsigned long ipa_second = second_table_offset(addr);
> +    unsigned long ipa_first = first_table_offset(addr);
> +
> +    /* guest p2m's first level index is the index of xen's second level,
> +       and guest p2m's second level index is the index of xen's third level 
> */
> +    return (lpae_t *)(
> +        ( (second_linear_offset(VIRT_LIN_P2M_START) + ipa_first)
> +           << SECOND_SHIFT) +
> +        (ipa_second << THIRD_SHIFT) +
> +        (ipa_third << 3) );
> +}

Can we assign the table base to an "lpae_t *table" and then use
&table[ipa_third] rather than hardcoding << 3?

I can't quite shake the feeling that the "ipa_first<<SECOND_SHIFT" and
"ipa_second<<THIRD_SHIFT" elements of this could be achieved with a
single shift of the original address (and maybe a mask).

> +
>  #endif /*  __ARCH_ARM_MM__ */
>  /*
>   * Local variables:



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.