[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen crash after S3 suspend - Xen 4.13



[Adding Juergen]

On Wed, 2020-03-18 at 23:10 +0100, Marek Marczykowski-Górecki wrote:
> On Wed, Mar 18, 2020 at 02:50:52PM +0000, Andrew Cooper wrote:
> > On 18/03/2020 14:16, Marek Marczykowski-Górecki wrote:
> > > Hi,
> > > 
> > > In my test setup (inside KVM with nested virt enabled), I rather
> > > frequently get Xen crash on resume from S3. Full message below.
> > > 
> > > This is Xen 4.13.0, with some patches, including "sched: fix
> > > resuming
> > > from S3 with smt=0".
> > > 
> > > Contrary to the previous issue, this one does not happen always -
> > > I
> > > would say in about 40% cases on this setup, but very rarely on
> > > physical
> > > setup.
> > > 
> > > This is _without_ core scheduling enabled, and also with smt=off.
> > > 
> > > Do you think it would be any different on xen-unstable? I cat
> > > try, but
> > > it isn't trivial in this setup, so I'd ask first.
> > > 
Well, Juergen has fixed quite a few issues.

Most of them where triggering with core-scheduling enabled, and I don't
recall any of them which looked similar or related to this.

Still, it's possible that the same issue causes different symptoms, and
hence that maybe one of the patches would fix this too.

But if it's difficult for you to try upstream, let's maybe wait and see
if he has an opinion about this bug.

I have just one question:

> (XEN) Assertion 'c2rqd(ops, sched_unit_master(unit)) == svc->rqd'
> failed at sched_credit2.c:2137
> (XEN) ----[ Xen-4.13.0  x86_64  debug=y   Not tainted ]----
> [...]
> (XEN) Xen call trace:
> (XEN)    [<ffff82d08022bee9>] R
> sched_credit2.c#csched2_unit_wake+0x174/0x176
> (XEN)    [<ffff82d0802346c6>] F vcpu_wake+0xdd/0x3ff
> (XEN)    [<ffff82d0802082f1>] F domain_unpause+0x2f/0x3b
> (XEN)    [<ffff82d08020843e>] F
> domain_unpause_by_systemcontroller+0x40/0x60
> (XEN)    [<ffff82d080205ea5>] F do_domctl+0x9e4/0x1952
> (XEN)    [<ffff82d08034d922>] F pv_hypercall+0x548/0x560
> (XEN)    [<ffff82d080354432>] F lstar_enter+0x112/0x120
> (XEN) 
> (XEN) 
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) Assertion 'c2rqd(ops, sched_unit_master(unit)) == svc->rqd'
> failed at sched_credit2.c:2137
> (XEN) ****************************************
> (XEN) 
>
Do you remember (or can easily test) whether this was also occurring on
Xen 4.12, i.e., without core-scheduling code even being there, when
this ASSERT was:

 ASSERT(c2rqd(ops, vc->processor) == svc->rqd );

If no, that might mean we have some scheduling resource and/or master
CPU issue on the S3 resume path.

Thanks and Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.