[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 5/5] sched/arinc653: Implement CAST-32A multicore scheduling



On 18/09/2020 21:03, Jeff Kubascik wrote:
> On 9/17/2020 1:30 PM, Dario Faggioli wrote:
>> On Thu, 2020-09-17 at 15:59 +0000, Stewart Hildebrand wrote:
>>> On Thursday, September 17, 2020 11:20 AM, Dario Faggioli wrote:
>>>> On Thu, 2020-09-17 at 15:10 +0000, Stewart Hildebrand wrote:
>>>>>> It might be worth to consider using just the core scheduling
>>>>>> framework
>>>>>> in order to achive this. Using a sched_granularity with the
>>>>>> number
>>>>>> of
>>>>>> cpus in the cpupool running ARINC653 scheduler should already
>>>>>> do
>>>>>> the
>>>>>> trick. There should be no further midification of ARINC653
>>>>>> scheduler
>>>>>> required.
>>>>>>
>>>>> This CAST-32A multicore patch series allows you to have a
>>>>> different
>>>>> number of vCPUs (UNITs, I guess) assigned to domUs.
>>>>>
>>>> And if you have domain A with 1 vCPU and domain B with 2 vCPUs,
>>>> with
>>>> sched-gran=core:
>>>> - when the vCPU of domain A is scheduled on a pCPU of a core, no
>>>> vCPU
>>>>  from domain B can be scheduled on the same core;
>>>> - when one of the vCPUs of domain B is scheduled on a pCPU of a
>>>> core,
>>>>  no other vCPUs, except the other vCPU of domain B can run on the
>>>>  same core.
>>> Fascinating. Very cool, thanks for the insight. My understanding is
>>> that core scheduling is not currently enabled on arm. This series
>>> allows us to have multicore ARINC 653 on arm today without chasing
>>> down potential issues with core scheduling on arm...
>>>
>> Yeah, but at the cost of quite a bit of churn, and of a lot more code
>> in arinc653.c, basically duplicating the functionality.
>>
>> I appreciate how crude and inaccurate this is, but arinc653.c is
>> currently 740 LOCs, and this patch is adding 601 and removing 204.
>>
>> Add to this the fact that the architecture specific part of core-
>> scheduling should be limited to the handling of the context switches
>> (and that it may even work already, as what we weren't able to do was
>> proper testing).
>>
>> If I can cite an anecdote, back in the days where core-scheduling was
>> being developed, I sent my own series implementing, for both credit1
>> and credit2. It had its issues, of course, but I think it had some
>> merits, even if compared with the current implementation we have
>> upstream (e.g., more flexibility, as core-scheduling could have been
>> enabled on a per-domain basis).
>>
>> At least for me, a very big plus of the other approach that Juergen
>> suggested and then also implemented, was the fact that we would get the
>> feature for all the schedulers at once. And this (i.e., the fact that
>> it probably can be used for this purpose as well, without major changes
>> necessary inside ARINC653) seems to me to be a further confirmation
>> that it was the good way forward.
>>
>> And don't think only to the need of writing the code (as you kind of
>> have it already), but also to testing. As in, the vast majority of the
>> core-scheduling logic and code is scheduler independent, and hence has
>> been stressed and tested already, even by people using schedulers
>> different than ARINC.
> When is core scheduling expected to be available for ARM platforms? My
> understanding is that this only works for Intel.

x86, but the real answer is "any architecture which has some knowledge
of the thread/core/socket" topology of its CPUs.  ARM currently lacks
this information.

The actual implementation is totally architecture agnostic, and
implemented within the scheduler interface itself.

> With core scheduling, is the pinning of vCPUs to pCPUs configurable? Or can 
> the
> scheduler change it at will? One advantage of this patch is that you can
> explicitly pin a vCPU to a pCPU. This is a desirable feature for systems where
> you are looking for determinism.

All schedulers, including ARINC, now operate on sched_item's (previously
vCPUs) and sched_resource's (previously pCPUs), by virtue of the change
to the common scheduler interface.

The default case (== thread scheduling), there is a granularity of 1,
and a single sched_item maps to a single vCPU, and a single
sched_resource maps to a single pCPU. 

For the "core" case, we interrogate hardware to see how many threads per
core there are.  There may be 1 (no SMT), 2 (common case) or 4 (Knights
Landing/Corner HPC systems).  This becomes the granularity under the hood.

Therefore, a single sched_resource maps to a single core, which is 1, 2,
or 4 threads.  A single sched_item maps to the same number of vCPUs
(currently taken sequentially from the domain's vcpu list.  Any gaps
filled by the idle vcpus).

The scheduling decision of when and where a sched_item runs is still up
to the scheduler.  Pinning is also now expressed at the sched_item =>
sched_resource level.

~Andrew



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.