[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 1/2] xen: credit2: avoid vCPUs to ever reach lower credits than idle




> On Mar 19, 2020, at 12:11 AM, Dario Faggioli <dfaggioli@xxxxxxxx> wrote:
> 
> There have been report of stalls of guest vCPUs, when Credit2 was used.
> It seemed like these vCPUs were not getting scheduled for very long
> time, even under light load conditions (e.g., during dom0 boot).
> 
> Investigations led to the discovery that --although rarely-- it can
> happen that a vCPU manages to run for very long timeslices. In Credit2,
> this means that, when runtime accounting happens, the vCPU will lose a
> large quantity of credits. This in turn may lead to the vCPU having less
> credits than the idle vCPUs (-2^30). At this point, the scheduler will
> pick the idle vCPU, instead of the ready to run vCPU, for a few
> "epochs", which often times is enough for the guest kernel to think the
> vCPU is not responding and crashing.
> 
> An example of this situation is shown here. In fact, we can see d0v1
> sitting in the runqueue while all the CPUs are idle, as it has
> -1254238270 credits, which is smaller than -2^30 = −1073741824:
> 
>    (XEN) Runqueue 0:
>    (XEN)   ncpus              = 28
>    (XEN)   cpus               = 0-27
>    (XEN)   max_weight         = 256
>    (XEN)   pick_bias          = 22
>    (XEN)   instload           = 1
>    (XEN)   aveload            = 293391 (~111%)
>    (XEN)   idlers: 00,00000000,00000000,00000000,00000000,00000000,0fffffff
>    (XEN)   tickled: 00,00000000,00000000,00000000,00000000,00000000,00000000
>    (XEN)   fully idle cores: 
> 00,00000000,00000000,00000000,00000000,00000000,0fffffff
>    [...]
>    (XEN) Runqueue 0:
>    (XEN) CPU[00] runq=0, sibling=00,..., core=00,...
>    (XEN) CPU[01] runq=0, sibling=00,..., core=00,...
>    [...]
>    (XEN) CPU[26] runq=0, sibling=00,..., core=00,...
>    (XEN) CPU[27] runq=0, sibling=00,..., core=00,...
>    (XEN) RUNQ:
>    (XEN)     0: [0.1] flags=0 cpu=5 credit=-1254238270 [w=256] load=262144 
> (~100%)
> 
> We certainly don't want, under any circumstance, this to happen.
> Let's, therefore, define a minimum amount of credits a vCPU can have.
> During accounting, we make sure that, for however long the vCPU has
> run, it will never get to have less than such minimum amount of
> credits. Then, we set the credits of the idle vCPU to an even
> smaller value.
> 
> NOTE: investigations have been done about _how_ it is possible for a
> vCPU to execute for so much time that its credits becomes so low. While
> still not completely clear, there are evidence that:
> - it only happens very rarely,
> - it appears to be both machine and workload specific,
> - it does not look to be a Credit2 (e.g., as it happens when
>  running with Credit1 as well) issue, or a scheduler issue.
> 
> This patch makes Credit2 more robust to events like this, whatever
> the cause is, and should hence be backported (as far as possible).
> 
> Reported-by: Glen <glenbarney@xxxxxxxxx>
> Reported-by: Tomas Mozes <hydrapolic@xxxxxxxxx>
> Signed-off-by: Dario Faggioli <dfaggioli@xxxxxxxx>

Reviewed-by: George Dunlap <george.dunlap@xxxxxxxxxx>


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.