[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 1/2] xenstore: add support for reading directory with many children



On 25/10/16 16:02, Jan Beulich wrote:
>>>> On 25.10.16 at 15:47, <JGross@xxxxxxxx> wrote:
>> On 25/10/16 15:20, Jan Beulich wrote:
>>>>>> On 25.10.16 at 13:41, <JGross@xxxxxxxx> wrote:
>>>> On 25/10/16 11:06, Jan Beulich wrote:
>>>>>>>> On 25.10.16 at 07:52, <JGross@xxxxxxxx> wrote:
>>>>>> --- a/tools/xenstore/xenstored_transaction.c
>>>>>> +++ b/tools/xenstore/xenstored_transaction.c
>>>>>> @@ -79,6 +79,11 @@ struct transaction
>>>>>>  
>>>>>>          /* List of changed domains - to record the changed domain entry 
>>>>>> number */
>>>>>>          struct list_head changed_domains;
>>>>>> +
>>>>>> +        /* Temporary buffer for XS_DIRECTORY_PART. */
>>>>>> +        char *dirpart_buf;
>>>>>> +        unsigned buf_off;
>>>>>> +        unsigned buf_len;
>>>>>>  };
>>>>>
>>>>> The buffer you allocate for this can - by nature of this change - be
>>>>> huge, and there's one per transaction. Isn't this causing a resource
>>>>> utilization issue?
>>>>
>>>> It will be allocated only when needed and its size is less than that of
>>>> the node to be read.
>>>
>>> That's not addressing my concern. A DomU could have multiple
>>> transactions each with such an operation in progress. And the
>>> space here doesn't get accounted against any of the quota.
>>
>> It is accounted against the maximum number of transactions.
>>
>> Additionally the buffer will be kept only if not all of the data
>> could be transferred in the first iteration of the read.
> 
> Which nevertheless means an ill behaved guest could (if its quota
> allow) create a node with sufficiently many children and then start
> as many transactions as it can, and do a partial directory listing
> over each of them. Perhaps, considering that you mainly need this
> for Dom0, the new verb could be restricted to the control domain
> for now?

What about driver domains? They might need it, too.

Regarding memory usage: current limits for a guest if no quota have been
specified are:

- 1000 nodes
- max. entry size of 2048 bytes
- 10 transactions

So each node will need max. 4kB (2048 bytes entry, 2048 bytes name),
total node memory will be 4MB for this guest.

Lets assume the domU puts all those nodes in just one directory and
tries to read it in all possible 10 transactions. For each
transaction we'll need:

- 2MB temp buffer (1000 nodes * 2048 bytes name)
- at least 4MB data base copy (each transaction needs a complete
  copy of the data base, which is one of the main reasons I want to
  rewrite the transaction handling)

So an ill-behaved domU can force xenstored to use at least 64MB of
memory. The (minimal) memory amount for n ill-behaved domUs is:

n * (24 MB + n * 40MB)

n=1:  64MB
n=2: 208MB
n=3: 432MB
n=4: 736MB

Without the XS_DIRECTORY_PART support the numbers are rather similar:

n * n * 40MB total memory, so e.g. 640MB for 4 evil domUs.

> 
>>>>> And then - why did you put this function here instead of in
>>>>> xenstored_core.c, e.g. next to send_directory()?
>>>>
>>>> It is using the xenstored_transaction.c private struct transaction.
>>>> Putting it in xenstored_core.c would have required to move the
>>>> struct definition into a header file.
>>>
>>> Yeah, I've realized this after having sent the reply, but then again
>>> this also relates to the first question above (whether this should be
>>> a per-transaction item in the first place). In particular I don't think
>>> this being per transaction helps with allowing the guest to (easily)
>>> run multiple such queries in parallel, as commonly simple operations
>>> get run with null transactions.
>>
>> The XS_DIRECTORY_PART command is valid in a transaction only. This is
>> a prerequisite to be able to split the operation into multiple pieces
>> while keeping the consistency.
> 
> But I'm sure other models could be come up with, ideally without
> requiring meaningful amounts of temporary storage. For instance
> nodes could be versioned, and if the version changed between
> iterations, the caller could be signaled to start over. And it looks
> like such a version wouldn't even need to be bumped for child
> additions, but only for child removals.

In case nobody objects to that idea I'll look into it. The node version
might even help for better transaction support.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.