[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] CPU Lockup bug with the credit2 scheduler



On Mon, 2020-02-17 at 11:58 -0800, Sarah Newman wrote:
> On 1/7/20 6:25 AM, Alastair Browne wrote:
> > 
> > After the tests, we decided to stick with 4.9.0.9 kernel and 4.12
> > Xen
> > for production use running credit1 as the default scheduler.
> 
> One person CC'ed appears to be having the same experience, where the
> credit2 scheduler leads to lockups (in this case in the domU, not the
> dom0) under 
> relatively heavy load. It seems possible they may have the same root
> cause.
> 
Yeah, well, if booting with `sched=credit` makes the problem disappear,
whatever the real root cause really is, it seems related to Credit2.

> I don't think there are, but have there been any patches since the
> 4.13.0 release which might have fixed problems with credit 2
> scheduler? If not, 
> what would the next step be to isolating the problem - a debug build
> of Xen or something else?
> 
Yes, having a debug build of Xen running and providing, for instance,
the info that Juergen is asking for later in this thread, i.e.:

xl vcpu-list
/usr/lib/xen/bin/xenctx -C -S -s <domu-system-map> <domid>

And I'd add myself:

xl debug-keys r ; xl dmesg

And, in general, hypervisor logs when the problem occurs (I've gone
through the threads, and I don't think I have seen any, but maybe I
missed something?).

xentop

is also another way to have a look, from Dom0, at whether (and if yes,
which ones and how much) the vCPUs are busy.


> If there are no merged or proposed fixes soon, it may be worth
> considering making the credit scheduler the default again until
> problems with the 
> credit2 scheduler are resolved.
> 
Nothing similar to what is being described has happened in our testing
(or we wouldn't have switched to Credit2, of course! :-D).

I will see about trying to reproduce this myself, but this may take a
little bit. In the meantime, if you help us by sending more logs, we're
happy to try diagnosing and fixing things.

Thanks and Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.