[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2] tools/libxl: make default of max event channels dependant on vcpus [and 1 more messages]



On 02.06.2020 13:23, Jürgen Groß wrote:
> On 02.06.20 13:12, Jan Beulich wrote:
>> On 02.06.2020 13:06, Jürgen Groß wrote:
>>> On 06.04.20 14:09, Jan Beulich wrote:
>>>> On 06.04.2020 13:54, Jürgen Groß wrote:
>>>>> On 06.04.20 13:11, Jan Beulich wrote:
>>>>>> On 06.04.2020 13:00, Ian Jackson wrote:
>>>>>>> Julien Grall writes ("Re: [PATCH v2] tools/libxl: make default of max 
>>>>>>> event channels dependant on vcpus"):
>>>>>>>> There are no correlation between event channels and vCPUs. The number 
>>>>>>>> of
>>>>>>>> event channels only depends on the number of frontend you have in your
>>>>>>>> guest. So...
>>>>>>>>
>>>>>>>> Hi Ian,
>>>>>>>>
>>>>>>>> On 06/04/2020 11:47, Ian Jackson wrote:
>>>>>>>>> If ARM folks want to have a different formula for the default then
>>>>>>>>> that is of course fine but I wonder whether this might do ARMk more
>>>>>>>>> harm than good in this case.
>>>>>>>>
>>>>>>>> ... 1023 event channels is going to be plenty enough for most of the 
>>>>>>>> use
>>>>>>>> cases.
>>>>>>>
>>>>>>> OK, thanks for the quick reply.
>>>>>>>
>>>>>>> So, Jürgen, I think everyone will be happy with this:
>>>>>>
>>>>>> I don't think I will be - my prior comment still holds on there not
>>>>>> being any grounds to use a specific OS kernel's (and to be precise
>>>>>> a specific OS kernel version's) requirements for determining
>>>>>> defaults. If there was to be such a dependency, then OS kernel
>>>>>> [variant] should be part of the inputs to such a (set of) formula(s).
>>>>>
>>>>> IMO this kind of trying to be perfect will completely block a sane
>>>>> heuristic for being able to boot large guests at all.
>>>>
>>>> This isn't about being perfect - I'm suggesting to leave the
>>>> default alone, not to improve the calculation, not the least
>>>> because I've been implying ...
>>>>
>>>>> The patch isn't about to find an as stringent as possible upper
>>>>> boundary for huge guests, but a sane value being able to boot most of
>>>>> those.
>>>>>
>>>>> And how should Xen know the OS kernel needs exactly after all?
>>>>
>>>> ... the answer of "It can#t" to this question.
>>>>
>>>>> And it is not that we talking about megabytes of additional memory. A
>>>>> guest with 256 vcpus will just be able to use additional 36 memory
>>>>> pages. The maximum non-PV domain (the probably only relevant case
>>>>> of another OS than Linux being used) with 128 vcpus would "waste"
>>>>> 32 kB. In case the guest misbehaves.
>>>>
>>>> Any extra page counts, or else - where do you draw the line? Any
>>>> single page may decide between Xen (not) being out of memory,
>>>> and hence also not being able to fulfill certain other requests.
>>>>
>>>>> The alternative would be to do nothing and having to let the user
>>>>> experience a somewhat cryptic guest crash. He could google for a
>>>>> possible solution which would probably end in a rather high static
>>>>> limit resulting in wasting even more memory.
>>>>
>>>> I realize this. Otoh more people running into this will improve
>>>> the chances of later ones finding useful suggestions. Of course
>>>> there's also nothing wrong with trying to make the error less
>>>> cryptic.
>>>
>>> Reviving this discussion.
>>>
>>> I strongly disagree with your reasoning.
>>>
>>> Rejecting to modify tools defaults for large guests to make them boot
>>> is a bad move IMO. We are driving more people away from Xen this way.
>>>
>>> The fear of a misbehaving guest of that size to use a few additional
>>> pages on a machine with at least 100 cpus is fine from the academical
>>> point of view, but should not be weighed higher than the usability
>>> aspect in this case IMO.
>>
>> Very simple question then: Where do you draw the boundary if you don't
>> want this to be a pure "is permitted" or "is not permitted" underlying
>> rule? If we had a model where _all_ resources consumed by a guest were
>> accounted against its tool stack requested allocation, things would be
>> easier.
> 
> I'd say it should be allowed in case the additional resource use is much
> smaller than the already used implicit resources for such a guest (e.g.
> less than an additional 1% of implicitly used memory).
> 
> In cases like this, where a very small subset of guests is affected, and
> the additional need of resources will apply only in very extreme cases
> (I'm considering this case as extreme, as only non-Linux guests with
> huge numbers of vcpus _and_ which are misbehaving will need additional
> resources) I'd even accept higher margins like 5%.

IOW if we had 20 such cases, doubling resource consumption would be
okay to you? Not to me...

FAOD: If I'm the (almost?) only one to object here, I'll be okay to
be outvoted. But I'd like people agreeing with the change to
explicitly ack that they're fine with the unwanted (as I'd call it)
side effects.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.