[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v8] x86/mem-sharing: mem-sharing a range of memory



On 26/07/2016 16:49, George Dunlap wrote:
> On Tue, Jul 26, 2016 at 4:22 PM, Tamas K Lengyel
> <tamas.lengyel@xxxxxxxxxxxx> wrote:
>> On Tue, Jul 26, 2016 at 3:12 AM, George Dunlap <george.dunlap@xxxxxxxxxx> 
>> wrote:
>>> On Wed, Jul 20, 2016 at 7:01 PM, Tamas K Lengyel
>>> <tamas.lengyel@xxxxxxxxxxxx> wrote:
>>>> Currently mem-sharing can be performed on a page-by-page basis from the 
>>>> control
>>>> domain. However, this process is quite wasteful when a range of pages have 
>>>> to
>>>> be deduplicated.
>>>>
>>>> This patch introduces a new mem_sharing memop for range sharing where
>>>> the user doesn't have to separately nominate each page in both the source 
>>>> and
>>>> destination domain, and the looping over all pages happen in the 
>>>> hypervisor.
>>>> This significantly reduces the overhead of sharing a range of memory.
>>>>
>>>> Signed-off-by: Tamas K Lengyel <tamas.lengyel@xxxxxxxxxxxx>
>>>> Acked-by: Wei Liu <wei.liu2@xxxxxxxxxx>
>>>> Reviewed-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>>>> ---
>>>> Cc: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>
>>>> Cc: George Dunlap <george.dunlap@xxxxxxxxxxxxx>
>>>> Cc: Jan Beulich <jbeulich@xxxxxxxx>
>>>> Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>>>>
>>>> v8: style fixes and minor adjustments
>>>> ---
>>>>  tools/libxc/include/xenctrl.h        |  15 ++++
>>>>  tools/libxc/xc_memshr.c              |  19 +++++
>>>>  tools/tests/mem-sharing/memshrtool.c |  22 ++++++
>>>>  xen/arch/x86/mm/mem_sharing.c        | 140 
>>>> +++++++++++++++++++++++++++++++++++
>>>>  xen/include/public/memory.h          |  10 ++-
>>>>  5 files changed, 205 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
>>>> index e904bd5..3782eff 100644
>>>> --- a/tools/libxc/include/xenctrl.h
>>>> +++ b/tools/libxc/include/xenctrl.h
>>>> @@ -2334,6 +2334,21 @@ int xc_memshr_add_to_physmap(xc_interface *xch,
>>>>                      domid_t client_domain,
>>>>                      unsigned long client_gfn);
>>>>
>>>> +/* Allows to deduplicate a range of memory of a client domain. Using
>>>> + * this function is equivalent of calling xc_memshr_nominate_gfn for each 
>>>> gfn
>>>> + * in the two domains followed by xc_memshr_share_gfns.
>>>> + *
>>>> + * May fail with -EINVAL if the source and client domain have different
>>>> + * memory size or if memory sharing is not enabled on either of the 
>>>> domains.
>>>> + * May also fail with -ENOMEM if there isn't enough memory available to 
>>>> store
>>>> + * the sharing metadata before deduplication can happen.
>>>> + */
>>>> +int xc_memshr_range_share(xc_interface *xch,
>>>> +                          domid_t source_domain,
>>>> +                          domid_t client_domain,
>>>> +                          uint64_t start,
>>>> +                          uint64_t end);
>>>> +
>>>>  /* Debug calls: return the number of pages referencing the shared frame 
>>>> backing
>>>>   * the input argument. Should be one or greater.
>>>>   *
>>>> diff --git a/tools/libxc/xc_memshr.c b/tools/libxc/xc_memshr.c
>>>> index deb0aa4..2b871c7 100644
>>>> --- a/tools/libxc/xc_memshr.c
>>>> +++ b/tools/libxc/xc_memshr.c
>>>> @@ -181,6 +181,25 @@ int xc_memshr_add_to_physmap(xc_interface *xch,
>>>>      return xc_memshr_memop(xch, source_domain, &mso);
>>>>  }
>>>>
>>>> +int xc_memshr_range_share(xc_interface *xch,
>>>> +                          domid_t source_domain,
>>>> +                          domid_t client_domain,
>>>> +                          uint64_t start,
>>>> +                          uint64_t end)
>>>> +{
>>>> +    xen_mem_sharing_op_t mso;
>>>> +
>>>> +    memset(&mso, 0, sizeof(mso));
>>>> +
>>>> +    mso.op = XENMEM_sharing_op_range_share;
>>>> +
>>>> +    mso.u.range.client_domain = client_domain;
>>>> +    mso.u.range.start = start;
>>>> +    mso.u.range.end = end;
>>>> +
>>>> +    return xc_memshr_memop(xch, source_domain, &mso);
>>>> +}
>>>> +
>>>>  int xc_memshr_domain_resume(xc_interface *xch,
>>>>                              domid_t domid)
>>>>  {
>>>> diff --git a/tools/tests/mem-sharing/memshrtool.c 
>>>> b/tools/tests/mem-sharing/memshrtool.c
>>>> index 437c7c9..2af6a9e 100644
>>>> --- a/tools/tests/mem-sharing/memshrtool.c
>>>> +++ b/tools/tests/mem-sharing/memshrtool.c
>>>> @@ -24,6 +24,8 @@ static int usage(const char* prog)
>>>>      printf("  nominate <domid> <gfn>  - Nominate a page for sharing.\n");
>>>>      printf("  share <domid> <gfn> <handle> <source> <source-gfn> 
>>>> <source-handle>\n");
>>>>      printf("                          - Share two pages.\n");
>>>> +    printf("  range <source-domid> <destination-domid> <start-gfn> 
>>>> <end-gfn>\n");
>>>> +    printf("                          - Share pages between domains in 
>>>> range.\n");
>>>>      printf("  unshare <domid> <gfn>   - Unshare a page by grabbing a 
>>>> writable map.\n");
>>>>      printf("  add-to-physmap <domid> <gfn> <source> <source-gfn> 
>>>> <source-handle>\n");
>>>>      printf("                          - Populate a page in a domain with 
>>>> a shared page.\n");
>>>> @@ -180,6 +182,26 @@ int main(int argc, const char** argv)
>>>>          }
>>>>          printf("Audit returned %d errors.\n", rc);
>>>>      }
>>>> +    else if( !strcasecmp(cmd, "range") )
>>>> +    {
>>>> +        domid_t sdomid, cdomid;
>>>> +        int rc;
>>>> +        uint64_t start, end;
>>>> +
>>>> +        if ( argc != 6 )
>>>> +            return usage(argv[0]);
>>>>
>>>> +        sdomid = strtol(argv[2], NULL, 0);
>>>> +        cdomid = strtol(argv[3], NULL, 0);
>>>> +        start = strtoul(argv[4], NULL, 0);
>>>> +        end = strtoul(argv[5], NULL, 0);
>>>> +
>>>> +        rc = xc_memshr_range_share(xch, sdomid, cdomid, start, end);
>>>> +        if ( rc < 0 )
>>>> +        {
>>>> +            printf("error executing xc_memshr_range_share: %s\n", 
>>>> strerror(errno));
>>>> +            return rc;
>>>> +        }
>>>> +    }
>>>>      return 0;
>>>>  }
>>>> diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
>>>> index a522423..329fbd9 100644
>>>> --- a/xen/arch/x86/mm/mem_sharing.c
>>>> +++ b/xen/arch/x86/mm/mem_sharing.c
>>>> @@ -1294,6 +1294,58 @@ int relinquish_shared_pages(struct domain *d)
>>>>      return rc;
>>>>  }
>>>>
>>>> +static int range_share(struct domain *d, struct domain *cd,
>>>> +                       struct mem_sharing_op_range *range)
>>>> +{
>>>> +    int rc = 0;
>>>> +    shr_handle_t sh, ch;
>>>> +    unsigned long start = range->_scratchspace ?: range->start;
>>>> +
>>>> +    while( range->end >= start )
>>>> +    {
>>>> +        /*
>>>> +         * We only break out if we run out of memory as individual pages 
>>>> may
>>>> +         * legitimately be unsharable and we just want to skip over those.
>>>> +         */
>>>> +        rc = mem_sharing_nominate_page(d, start, 0, &sh);
>>>> +        if ( rc == -ENOMEM )
>>>> +            break;
>>>> +
>>>> +        if ( !rc )
>>>> +        {
>>>> +            rc = mem_sharing_nominate_page(cd, start, 0, &ch);
>>>> +            if ( rc == -ENOMEM )
>>>> +                break;
>>>> +
>>>> +            if ( !rc )
>>>> +            {
>>>> +                /* If we get here this should be guaranteed to succeed. */
>>>> +                rc = mem_sharing_share_pages(d, start, sh,
>>>> +                                             cd, start, ch);
>>>> +                ASSERT(!rc);
>>>> +            }
>>>> +        }
>>>> +
>>>> +        /* Check for continuation if it's not the last iteration. */
>>>> +        if ( range->end >= ++start && hypercall_preempt_check() )
>>>> +        {
>>>> +            rc = 1;
>>>> +            break;
>>>> +        }
>>>> +    }
>>>> +
>>>> +    range->_scratchspace = start;
>>>> +
>>>> +    /*
>>>> +     * The last page may fail with -EINVAL, and for range sharing we don't
>>>> +     * care about that.
>>>> +     */
>>>> +    if ( range->end < start && rc == -EINVAL )
>>>> +        rc = 0;
>>>> +
>>>> +    return rc;
>>>> +}
>>>> +
>>>>  int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
>>>>  {
>>>>      int rc;
>>>> @@ -1468,6 +1520,94 @@ int 
>>>> mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
>>>>          }
>>>>          break;
>>>>
>>>> +        case XENMEM_sharing_op_range_share:
>>>> +        {
>>>> +            unsigned long max_sgfn, max_cgfn;
>>>> +            struct domain *cd;
>>>> +
>>>> +            rc = -EINVAL;
>>>> +            if ( mso.u.range._pad[0] || mso.u.range._pad[1] ||
>>>> +                 mso.u.range._pad[2] )
>>>> +                 goto out;
>>>> +
>>>> +            /*
>>>> +             * We use _scratchscape for the hypercall continuation value.
>>>> +             * Ideally the user sets this to 0 in the beginning but
>>>> +             * there is no good way of enforcing that here, so we just 
>>>> check
>>>> +             * that it's at least in range.
>>>> +             */
>>>> +            if ( mso.u.range._scratchspace &&
>>>> +                 (mso.u.range._scratchspace < mso.u.range.start ||
>>>> +                  mso.u.range._scratchspace > mso.u.range.end) )
>>>> +                goto out;
>>>> +
>>>> +            if ( !mem_sharing_enabled(d) )
>>>> +                goto out;
>>>> +
>>>> +            rc = 
>>>> rcu_lock_live_remote_domain_by_id(mso.u.range.client_domain,
>>>> +                                                   &cd);
>>>> +            if ( rc )
>>>> +                goto out;
>>>> +
>>>> +            /*
>>>> +             * We reuse XENMEM_sharing_op_share XSM check here as this is
>>>> +             * essentially the same concept repeated over multiple pages.
>>>> +             */
>>>> +            rc = xsm_mem_sharing_op(XSM_DM_PRIV, d, cd,
>>>> +                                    XENMEM_sharing_op_share);
>>>> +            if ( rc )
>>>> +            {
>>>> +                rcu_unlock_domain(cd);
>>>> +                goto out;
>>>> +            }
>>>> +
>>>> +            if ( !mem_sharing_enabled(cd) )
>>>> +            {
>>>> +                rcu_unlock_domain(cd);
>>>> +                rc = -EINVAL;
>>>> +                goto out;
>>>> +            }
>>>> +
>>>> +            /*
>>>> +             * Sanity check only, the client should keep the domains 
>>>> paused for
>>>> +             * the duration of this op.
>>>> +             */
>>>> +            if ( !atomic_read(&d->pause_count) ||
>>>> +                 !atomic_read(&cd->pause_count) )
>>>> +            {
>>>> +                rcu_unlock_domain(cd);
>>>> +                rc = -EINVAL;
>>>> +                goto out;
>>>> +            }
>>>> +
>>>> +            max_sgfn = domain_get_maximum_gpfn(d);
>>>> +            max_cgfn = domain_get_maximum_gpfn(cd);
>>>> +
>>>> +            if ( max_sgfn < mso.u.range.start || max_sgfn < 
>>>> mso.u.range.end ||
>>>> +                 max_cgfn < mso.u.range.start || max_cgfn < 
>>>> mso.u.range.end )
>>>> +            {
>>>> +                rcu_unlock_domain(cd);
>>>> +                rc = -EINVAL;
>>>> +                goto out;
>>>> +            }
>>>> +
>>>> +            rc = range_share(d, cd, &mso.u.range);
>>>> +            rcu_unlock_domain(cd);
>>>> +
>>>> +            if ( rc > 0 )
>>>> +            {
>>>> +                if ( __copy_to_guest(arg, &mso, 1) )
>>>> +                    rc = -EFAULT;
>>>> +                else
>>>> +                    rc = 
>>>> hypercall_create_continuation(__HYPERVISOR_memory_op,
>>>> +                                                       "lh", 
>>>> XENMEM_sharing_op,
>>>> +                                                       arg);
>>>> +            }
>>>> +            else
>>>> +                mso.u.range._scratchspace = 0;
>>>> +        }
>>>> +        break;
>>>> +
>>>>          case XENMEM_sharing_op_debug_gfn:
>>>>          {
>>>>              unsigned long gfn = mso.u.debug.u.gfn;
>>>> diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
>>>> index 29ec571..e0bc018 100644
>>>> --- a/xen/include/public/memory.h
>>>> +++ b/xen/include/public/memory.h
>>>> @@ -465,6 +465,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_access_op_t);
>>>>  #define XENMEM_sharing_op_debug_gref        5
>>>>  #define XENMEM_sharing_op_add_physmap       6
>>>>  #define XENMEM_sharing_op_audit             7
>>>> +#define XENMEM_sharing_op_range_share       8
>>>>
>>>>  #define XENMEM_SHARING_OP_S_HANDLE_INVALID  (-10)
>>>>  #define XENMEM_SHARING_OP_C_HANDLE_INVALID  (-9)
>>>> @@ -500,7 +501,14 @@ struct xen_mem_sharing_op {
>>>>              uint64_aligned_t client_gfn;    /* IN: the client gfn */
>>>>              uint64_aligned_t client_handle; /* IN: handle to the client 
>>>> page */
>>>>              domid_t  client_domain; /* IN: the client domain id */
>>>> -        } share;
>>>> +        } share;
>>>> +        struct mem_sharing_op_range {         /* OP_RANGE_SHARE */
>>>> +            uint64_aligned_t start;          /* IN: start gfn. */
>>>> +            uint64_aligned_t end;            /* IN: end gfn (inclusive) */
>>>> +            uint64_aligned_t _scratchspace;  /* Must be set to 0 */
>>> Tamas,
>>>
>>> Why include this "scratchspace" that's not used in the interface,
>>> rather than just doing what the memory operations in
>>> xen/common/memory.c do, and store it in EAX by shifting it over by
>>> MEMOP_EXTENT_SHIFT?  I looked through the history and I see that v1
>>> did something very much like that, but it was changed to using the
>>> scratch space without any explanation.
>>>
>>> Having this in the public interface is ugly; and what's worse, it
>>> exposes some of the internal mechanism to the guest.
>>>
>>>  -George
>> Well, that's exactly what I did in an earlier version of the patch but
>> it was requested that I change it to something like this by Andrew
>> (see https://lists.xen.org/archives/html/xen-devel/2015-10/msg00434.html).
>> Then over the various iterations it ended up like looking like this.
> Oh right -- sorry, I did look but somehow I missed that Andrew had requested 
> it.
>
> I would have read his comment to  mean to put the _scratchspace
> variable in the larger structure.  But it has his R-b, so I'll
> consider myself answered.

I am open to other suggestion wrt naming, but wasn't looking to bikeshed
the issue.

_opaque is a common name used.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.