Re: [Xen-devel] CPU Lockup bug with the credit2 scheduler

On Mon, 2020-02-17 at 11:58 -0800, Sarah Newman wrote:
> On 1/7/20 6:25 AM, Alastair Browne wrote:
> > After the tests, we decided to stick with kernel and 4.12
> > for production use running credit1 as the default scheduler.
> One person CC'ed appears to be having the same experience, where the
> credit2 scheduler leads to lockups (in this case in the domU, not the
> dom0) under 
> relatively heavy load. It seems possible they may have the same root
> cause.
Yeah, well, if booting with `sched=credit` makes the problem disappear,
whatever the real root cause really is, it seems related to Credit2.

> I don't think there are, but have there been any patches since the
> 4.13.0 release which might have fixed problems with credit 2
> scheduler? If not, 
> what would the next step be to isolating the problem - a debug build
> of Xen or something else?
Yes, having a debug build of Xen running and providing, for instance,
the info that Juergen is asking for later in this thread, i.e.:

xl vcpu-list
/usr/lib/xen/bin/xenctx -C -S -s <domu-system-map> <domid>

And I'd add myself:

xl debug-keys r ; xl dmesg

And, in general, hypervisor logs when the problem occurs (I've gone
through the threads, and I don't think I have seen any, but maybe I
missed something?).


is also another way to have a look, from Dom0, at whether (and if yes,
which ones and how much) the vCPUs are busy.

> If there are no merged or proposed fixes soon, it may be worth
> considering making the credit scheduler the default again until
> problems with the 
> credit2 scheduler are resolved.
Nothing similar to what is being described has happened in our testing
(or we wouldn't have switched to Credit2, of course! :-D).

I will see about trying to reproduce this myself, but this may take a
little bit. In the meantime, if you help us by sending more logs, we're
happy to try diagnosing and fixing things.

