[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 1/2] x86/mem-sharing: Bulk mem-sharing entire domains



On Fri, May 13, 2016 at 9:09 AM, Jan Beulich <JBeulich@xxxxxxxx> wrote:
>>>> On 13.05.16 at 16:50, <tamas@xxxxxxxxxxxxx> wrote:
>> On Fri, May 13, 2016 at 6:00 AM, Jan Beulich <JBeulich@xxxxxxxx> wrote:
>>>>>> On 12.05.16 at 17:25, <tamas@xxxxxxxxxxxxx> wrote:
>>>> +            if ( !rc )
>>>> +                mem_sharing_share_pages(d, bulk->start, sh, cd, 
>>>> bulk->start, ch);
>>>
>>> You shouldn't be ignoring errors here.
>>
>> The only error this function returns is if the sh/ch handles are
>> invalid. However we obtained those just now from successful
>> nominations, so we are guaranteed to have valid handles. This error
>> checking is only important when nominations/sharing happen
>> independently where the handle may go stale in-between. Here that is
>> not possible.
>
> You describe the state of things right now. What if someone
> adds an error path to that function without inspecting all callers,
> as it is clear that the function could have returned errors before.
> Ignoring errors is plain bad. If you're sure there can't be errors,
> at least add a respective ASSERT().
>

Sure, that sounds like a reasonable middle ground.

>>>> +        }
>>>> +
>>>> +        ++(bulk->start);
>>>
>>> Pointless parentheses.
>>
>> Pointless but I prefer this style.
>
> But it's in contrast to what we do almost everywhere else ...
>

Alright..

>>>> +        /* Check for continuation if it's not the last iteration. */
>>>> +        if ( bulk->start < max && hypercall_preempt_check() )
>>>
>>> The loop head has <=; why < here?
>>
>> Because we only do preempt check if there are more then one pages left
>> (as the comment states).
>
> No, the comment says "last iteration", whereas the check is for
> "next to last" (as the increment has already happened). And I
> also don't see when between any two iterations preemption is
> possible, just not between the second from last and the last one.
>

Ah I see, yes you are right.

>>>> +            break;
>>>> +        }
>>>> +    }
>>>> +
>>>> +    /* We only propagate -ENOMEM so reset rc here */
>>>> +    if ( rc < 0 && rc != -ENOMEM )
>>>> +        rc = 0;
>>>
>>> What's the rationale for discarding all other errors? At least the
>>> patch description, but perhaps even the comment (which btw
>>> is lacking a full stop) should be explaining this.
>>
>> The reason we swallow errors here other then ENOMEM is that it's quite
>> possible that max_gpfn page is unsharable for example, thus rc would
>> have an EINVAL final error value. However, we don't care about the
>> success/fail of individual pages, we only care about the overall
>> state. For that only ENOMEM is critical.
>
> And you think no possible caller would care about the hypercall
> reporting success yet not everything having got done that was
> requested? Sounds strange to me, but as said - at least a bold
> comment please.

The user has no way of knowing what pages will be sharable to begin
with, nor does he has any recourse when something doesn't share. The
request is thus not to "share everything" but to share "as much as
possible". I'll state this explicitly in a comment.

>
>>>> @@ -1468,6 +1505,69 @@ int 
>>>> mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
>>>>          }
>>>>          break;
>>>>
>>>> +        case XENMEM_sharing_op_bulk_share:
>>>> +        {
>>>> +            unsigned long max_sgfn, max_cgfn;
>>>> +            struct domain *cd;
>>>> +
>>>> +            rc = -EINVAL;
>>>> +            if ( !mem_sharing_enabled(d) )
>>>> +                goto out;
>>>> +
>>>> +            rc = 
>>>> rcu_lock_live_remote_domain_by_id(mso.u.bulk.client_domain,
>>>> +                                                   &cd);
>>>> +            if ( rc )
>>>> +                goto out;
>>>> +
>>>> +            rc = xsm_mem_sharing_op(XSM_DM_PRIV, d, cd, mso.op);
>>>
>>> Either you pass XENMEM_sharing_op_share here, or you need to
>>> update xen/xsm/flask/policy/access_vectors (even if it's only a
>>> comment which needs updating).
>>
>> Right, it should actually be sharing_op_share here.
>>
>>>
>>> That said - are this and the similar pre-existing XSM checks actually
>>> correct? I.e. is one of the two domains here really controlling the
>>> other? I would have expected that a tool stack domain initiates the
>>> sharing between two domains it controls...
>>
>> Not sure what was the original rationale behind it either.
>
> Daniel - any opinion on this one?
>
>>>> +            if ( rc )
>>>> +            {
>>>> +                rcu_unlock_domain(cd);
>>>> +                goto out;
>>>> +            }
>>>> +
>>>> +            if ( !mem_sharing_enabled(cd) )
>>>> +            {
>>>> +                rcu_unlock_domain(cd);
>>>> +                rc = -EINVAL;
>>>> +                goto out;
>>>> +            }
>>>> +
>>>> +            if ( !atomic_read(&d->pause_count) ||
>>>> +                 !atomic_read(&cd->pause_count) )
>>>> +            {
>>>> +                rcu_unlock_domain(cd);
>>>> +                rc = -EINVAL;
>>>> +                goto out;
>>>> +            }
>>>> +
>>>> +            max_sgfn = domain_get_maximum_gpfn(d);
>>>> +            max_cgfn = domain_get_maximum_gpfn(cd);
>>>> +
>>>> +            if ( max_sgfn != max_cgfn || max_sgfn < mso.u.bulk.start )
>>>
>>> Why would the two domains need to agree in their maximum
>>> GPFN? There's nothing similar in this file so far. Nor does the
>>> right side of the || match anything pre-existing...
>>
>> The use-case for this function is to deduplicate identical VMs, not to
>> blindly share pages across arbitrary domains. So this is a safety
>> check to avoid accidentally running this function on domains that
>> obviously are not identical. The right hand size is a safety check
>> against not properly initialized input structs where the start point
>> is obviously outside the memory of the domain.
>
> Is that use case the only possible one, or merely the one you care
> about? In the former case, I'd be okay as long as a respctive
> comment got added.

I would say "blind" sharing like this only makes sense for identical
VMs. Replacing the memory of the VM with that of another one not in
the same state will lead to some spectacular crash for sure, so we
should do at least some sanity checking before.

>
>>>> @@ -488,7 +489,18 @@ struct xen_mem_sharing_op {
>>>>              uint64_aligned_t client_gfn;    /* IN: the client gfn */
>>>>              uint64_aligned_t client_handle; /* IN: handle to the client 
>>>> page */
>>>>              domid_t  client_domain; /* IN: the client domain id */
>>>> -        } share;
>>>> +        } share;
>>>> +        struct mem_sharing_op_bulk {         /* OP_BULK_SHARE */
>>>> +            uint64_aligned_t start;          /* IN: start gfn. Set to 0 
>>>> for
>>>> +                                                full deduplication. Field 
>>>> is
>>>> +                                                used internally and may 
>>>> change
>>>> +                                                when the hypercall 
>>>> returns. */
>>>> +            uint64_aligned_t shared;         /* OUT: the number of gfns
>>>> +                                                that are shared after this
>>>> +                                                operation including pages
>>>> +                                                already shared before */
>>>> +            domid_t client_domain;           /* IN: the client domain id 
>>>> */
>>>> +        } bulk;
>>>
>>> Let's not repeat pre-existing mistakes: There is explicit padding
>>> missing here, which then also ought to be checked to be zero on
>>> input.
>>
>> This struct is part of a union and is smaller then largest struct in
>> the union, even with padding. So how would padding have any effect on
>> anything?
>
> Being able to use the space for future extensions without having
> to bump the interface version. In domctl-s it's not as important
> due to them being separately versioned, but anyway.

I don't really follow, we are not growing the union here. This struct
is still smaller then the space available in the union, so what would
prevent us from later on expending this struct to the size of the
union without padding?

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.