[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v5 1/2] x86/mem-sharing: Bulk mem-sharing entire domains



On Wed, Jun 22, 2016 at 9:38 AM, George Dunlap <dunlapg@xxxxxxxxx> wrote:
> On Sun, Jun 12, 2016 at 12:24 AM, Tamas K Lengyel <tamas@xxxxxxxxxxxxx> wrote:
>> Currently mem-sharing can be performed on a page-by-page base from the 
>> control
>> domain. However, when completely deduplicating (cloning) a VM, this requires
>> at least 3 hypercalls per page. As the user has to loop through all pages up
>> to max_gpfn, this process is very slow and wasteful.
>>
>> This patch introduces a new mem_sharing memop for bulk deduplication where
>> the user doesn't have to separately nominate each page in both the source and
>> destination domain, and the looping over all pages happen in the hypervisor.
>> This significantly reduces the overhead of completely deduplicating entire
>> domains.
>>
>> Signed-off-by: Tamas K Lengyel <tamas@xxxxxxxxxxxxx>
>> Acked-by: Wei Liu <wei.liu2@xxxxxxxxxx>
>> ---
>> Ian Jackson <ian.jackson@xxxxxxxxxxxxx>
>> George Dunlap <george.dunlap@xxxxxxxxxxxxx>
>> Jan Beulich <jbeulich@xxxxxxxx>
>> Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>
> I'm sorry I'm a bit late to this party -- I'm not sure how I managed
> to miss the earlier posts of this.  A couple of questions...
>
>
>> @@ -1468,6 +1516,79 @@ int 
>> mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
>>          }
>>          break;
>>
>> +        case XENMEM_sharing_op_bulk_share:
>> +        {
>> +            unsigned long max_sgfn, max_cgfn;
>> +            struct domain *cd;
>> +
>> +            rc = -EINVAL;
>> +            if( mso.u.bulk._pad[0] || mso.u.bulk._pad[1] || 
>> mso.u.bulk._pad[2] )
>> +                goto out;
>> +
>> +            if ( !mem_sharing_enabled(d) )
>> +                goto out;
>> +
>> +            rc = rcu_lock_live_remote_domain_by_id(mso.u.bulk.client_domain,
>> +                                                   &cd);
>> +            if ( rc )
>> +                goto out;
>> +
>> +            /*
>> +             * We reuse XENMEM_sharing_op_share XSM check here as this is 
>> essentially
>> +             * the same concept repeated over multiple pages.
>> +             */
>> +            rc = xsm_mem_sharing_op(XSM_DM_PRIV, d, cd, 
>> XENMEM_sharing_op_share);
>> +            if ( rc )
>> +            {
>> +                rcu_unlock_domain(cd);
>> +                goto out;
>> +            }
>> +
>> +            if ( !mem_sharing_enabled(cd) )
>> +            {
>> +                rcu_unlock_domain(cd);
>> +                rc = -EINVAL;
>> +                goto out;
>> +            }
>> +
>> +            if ( !atomic_read(&d->pause_count) ||
>> +                 !atomic_read(&cd->pause_count) )
>> +            {
>> +                rcu_unlock_domain(cd);
>> +                rc = -EINVAL;
>> +                goto out;
>> +            }
>
> I realize that Jan asked for this, but I'm really not sure what good
> this is supposed to do, since the guest can be un-paused at any point
> halfway through the transaction.
>
> Wouldn't it make more sense to have this function pause and unpause
> the domains itself?

The domains are paused by the user when this op is called, so this is
just a sanity check ensuring you are not issuing the op on some other
domain. So if this function would pause the domain as well all it
would do is increase the pause count. This is the only intended
use-case for this function at this time. It would make no sense to try
to issue this op on domains that are running, as that pretty much
guarantee that some of their memory has already diverged, and thus
bulk-sharing their memory would have some unintended side-effects.

>
>> +
>> +            max_sgfn = domain_get_maximum_gpfn(d);
>> +            max_cgfn = domain_get_maximum_gpfn(cd);
>> +
>> +            if ( max_sgfn != max_cgfn || max_sgfn < mso.u.bulk.start )
>> +            {
>> +                rcu_unlock_domain(cd);
>> +                rc = -EINVAL;
>> +                goto out;
>> +            }
>> +
>> +            rc = bulk_share(d, cd, max_sgfn + 1, &mso.u.bulk);
>> +            if ( rc > 0 )
>> +            {
>> +                if ( __copy_to_guest(arg, &mso, 1) )
>> +                    rc = -EFAULT;
>> +                else
>> +                    rc = 
>> hypercall_create_continuation(__HYPERVISOR_memory_op,
>> +                                                       "lh", 
>> XENMEM_sharing_op,
>> +                                                       arg);
>> +            }
>> +            else
>> +            {
>> +                mso.u.bulk.start = 0;
>> +                mso.u.bulk.shared = atomic_read(&cd->shr_pages);
>> +            }
>> +
>> +            rcu_unlock_domain(cd);
>> +        }
>> +        break;
>> +
>>          case XENMEM_sharing_op_debug_gfn:
>>          {
>>              unsigned long gfn = mso.u.debug.u.gfn;
>> diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
>> index 29ec571..084f06e 100644
>> --- a/xen/include/public/memory.h
>> +++ b/xen/include/public/memory.h
>> @@ -465,6 +465,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_access_op_t);
>>  #define XENMEM_sharing_op_debug_gref        5
>>  #define XENMEM_sharing_op_add_physmap       6
>>  #define XENMEM_sharing_op_audit             7
>> +#define XENMEM_sharing_op_bulk_share        8
>>
>>  #define XENMEM_SHARING_OP_S_HANDLE_INVALID  (-10)
>>  #define XENMEM_SHARING_OP_C_HANDLE_INVALID  (-9)
>> @@ -500,7 +501,19 @@ struct xen_mem_sharing_op {
>>              uint64_aligned_t client_gfn;    /* IN: the client gfn */
>>              uint64_aligned_t client_handle; /* IN: handle to the client 
>> page */
>>              domid_t  client_domain; /* IN: the client domain id */
>> -        } share;
>> +        } share;
>> +        struct mem_sharing_op_bulk {         /* OP_BULK_SHARE */
>> +            uint64_aligned_t start;          /* IN: start gfn. Set to 0 for
>> +                                                full deduplication. Field is
>> +                                                reset to 0 when hypercall
>> +                                                completes */
>
> It took me a while to figure out that this value isn't actually
> intended to be used as described.  You're actually intended to always
> set this to zero, and the hypervisor just uses it for scratch space.

In my original patch there was no field such as this but was requested
by other Maintainers. So technically it is now possible to start
bulk-sharing at a pfn other then 0 and the op would work, it would
just result in less then full sharing. I certainly have no use-case
for it but it is possible to use it that way so I simply documented
that unintended "feature".

>
> To start with, it seems like having a "bulk share" option which could
> do just a specific range would be useful as well as a "bulk share"
> which automatically deduped the entire set of memory.

I have no need for such options.

>
> Secondly, structuring the information like the other memory operations
> -- for example, "nr_extents" and "nr_done" -- would make it more
> consistent, and would allow you to also to avoid overwriting one of
> the "in" values and having to restore it when you were done.

I don't see any harm in clearing this value to 0 when the op finishes
so I don't think it would really make much difference if we have
another field just for scratch-space use. I'm fine with adding that
but it just seems a bit odd to me.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.