[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Recent upgrade of 4.13 -> 4.14 issue



On 26.10.20 17:31, Dario Faggioli wrote:
On Mon, 2020-10-26 at 15:30 +0100, Jürgen Groß wrote:
On 26.10.20 14:54, Andrew Cooper wrote:
On 26/10/2020 13:37, Frédéric Pierret wrote:

If anyone would have any idea of what's going on, that would be
very
appreciated. Thank you.

Does booting Xen with `sched=credit` make a difference?

Hmm, I think I have spotted a problem in credit2 which could explain
the
hang:

csched2_unit_wake() will NOT put the sched unit on a runqueue in case
it
has CSFLAG_scheduled set. This bit will be reset only in
csched2_context_saved().

Exactly, it does not put it back there. However, if it finds a vCPU
with the CSFLAG_scheduled flag set, It should set
CSFLAG_delayed_runq_add flag.

Unless curr_on_cpu(cpu)==unit or unit_on_runq(svc)==true... which
should not be the case. Or where you saying that we actually are in one
of this situations?

In fact...

So in case a vcpu (and its unit, of course) is blocked and there has
been no other vcpu active on its physical cpu but the idle vcpu,
there
will be no call of csched2_context_saved(). This will block the vcpu
to become active in theory for eternity, in case there is no need to
run another vcpu on the physical cpu.

...I maybe am not seeing what exact situation and sequence of events
you're exactly thinking to. What I see is this: [*]

- vCPU V is running, i.e., CSFLAG_scheduled is set
- vCPU V blocks
- we enter schedule()
   - schedule calls do_schedule() --> csched2_schedule()
     - we pick idle, so CSFLAG_delayed_runq_add is set for V
   - schedule calls sched_context_switch()
     - sched_context_switch() calls context_switch()
       - context_switch() calls sched_context_switched()
         - sched_context_switched() calls:
           - vcpu_context_saved()
           - unit_context_saved()
             - unit_context_saved() calls sched_context_saved() -->
                                           csched2_context_saved()
               - csched2_context_saved():
                 - clears CSFLAG_scheduled
                 - checks (and clear) CSFLAG_delayed_runq_add

[*] this assumes granularity 1, i.e., no core-scheduling and no
     rendezvous. Or was core-scheduling actually enabled?

And if CSFLAG_delayed_runq_add is set **and** the vCPU is runnable, the
task is added back to the runqueue.

So, even if we don't do the actual context switch (i.e., we don't call
__context_switch() ) if the next vCPU that we pick when vCPU V blocks
is the idle one, it looks to me that we go get to call
csched2_context_saved().

And it also looks to me that, when we get to that, if the vCPU is
runnable, even if it has the CSFLAG_scheduled still set, we do put it
back to the runqueue.

And if the vCPU blocked, but csched2_unit_wake() run while
CSFLAG_scheduled was still set, it indeed should mean that the vCPU
itself will be runnable again when we get to csched2_context_saved().

Or did you have something completely different in mind, and I'm missing
it?

No, I think you are right. I mixed that up with __context_switch() not
being called.

Sorry for the noise,


Juergen




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.