[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Yet another S3 issue in Xen 4.14



On Fri, Oct 02, 2020 at 09:19:55PM +0200, Marek Marczykowski-Górecki wrote:
> Disabling efi_get_time() or setting CR4 earlier solves _this_ issue, but
> applied on top of stable-4.14 still doesn't work. Looks like there is
> yet another S3 breakage in between. I'm bisecting it further...

This time I get to this commit:

commit 8e2aa76dc1670e82eaa15683353853bc66bf54fc (refs/bisect/bad)
Author: Dario Faggioli <dfaggioli@xxxxxxxx>
Date:   Thu May 28 23:29:44 2020 +0200

    xen: credit2: limit the max number of CPUs in a runqueue

The failing effect after S3 resume is slightly different and not really
deterministic - sometimes it hangs immediately, sometimes the system is
interactive for few seconds and then hangs and sometimes it crashes
(looks like panic).

I've tried to switch to credit1, but this seems to be broken in yet
another way, much earlier (commits at which S3 works with credit2,
crashes on S3 resume with credit1).

(few hours later)

I managed to setup kdump kernel and got a copy of vmcore after the
crash. Then extracted crash message using strings:

(XEN) Entering ACPI S3 state.
(XEN) [VT-D]Passed iommu=no-igfx option.  Disabling IGD VT-d engine.
(XEN) mce_intel.c:773: MCA Capability: firstbank 0, extended MCE MSR 0, BCAST, 
CMCI
(XEN) CPU0 CMCI LVT vector (0xf1) already installed
(XEN) Finishing wakeup from ACPI S3 state.
(XEN) Enabling non-boot CPUs  ...
(XEN) [VT-D]intremap.c:564: MSI index (65535) has an empty entry
(XEN) [VT-D]intremap.c:564: MSI index (65535) has an empty entry
(XEN) [VT-D]intremap.c:564: MSI index (65535) has an empty entry
(XEN) [VT-D]intremap.c:564: MSI index (65535) has an empty entry
(XEN) [VT-D]intremap.c:564: MSI index (65535) has an empty entry
(XEN) [VT-D]intremap.c:564: MSI index (65535) has an empty entry
(XEN) [VT-D]intremap.c:564: MSI index (65535) has an empty entry
(XEN) [VT-D]intremap.c:564: MSI index (65535) has an empty entry
(XEN) [VT-D]intremap.c:564: MSI index (65535) has an empty entry
(XEN) [VT-D]intremap.c:564: MSI index (65535) has an empty entry
(XEN) [VT-D]intremap.c:564: MSI index (65535) has an empty entry
(XEN) [VT-D]intremap.c:564: MSI index (65535) has an empty entry
(XEN) [VT-D]intremap.c:564: MSI index (65535) has an empty entry
(XEN) [VT-D]intremap.c:564: MSI index (65535) has an empty entry
(XEN) [VT-D]intremap.c:564: MSI index (65535) has an empty entry
(XEN) [VT-D]intremap.c:564: MSI index (65535) has an empty entry
(XEN) [VT-D]intremap.c:564: MSI index (65535) has an empty entry
(XEN) [VT-D]intremap.c:564: MSI index (65535) has an empty entry
(XEN) Assertion 'c2rqd(sched_unit_master(unit)) == svc->rqd' failed at 
credit2.c:2273
(XEN) ----[ Xen-4.14-unstable  x86_64  debug=y   Not tainted ]----
(XEN) CPU:    8
(XEN) RIP:    e008:[<ffff82d040242725>] credit2.c#csched2_unit_wake+0x14f/0x151
(XEN) RFLAGS: 0000000000010087   CONTEXT: hypervisor (d0v0)
(XEN) rax: ffff830250b609e0   rbx: ffff830250b18f10   rcx: 0000003210631000
(XEN) rdx: ffff830250b604a0   rsi: 0000000000000008   rdi: ffff830250b60846
(XEN) rbp: ffff830250ba7d98   rsp: ffff830250ba7d78   r8:  deadbeefdeadf00d
(XEN) r9:  deadbeefdeadf00d   r10: 0000000000000000   r11: 0000000000000000
(XEN) r12: ffff830250b0e040   r13: ffff82d04044abc0   r14: 0000000000000008
(XEN) r15: 2f3d053d56f91b80   cr0: 0000000080050033   cr4: 0000000000362660
(XEN) cr3: 0000000210270000   cr2: 0000000000000000
(XEN) fsb: 000077a6b25a2b80   gsb: ffff8881b5400000   gss: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen code around <ffff82d040242725> 
(credit2.c#csched2_unit_wake+0x14f/0x151):
(XEN)  df e8 dc bd ff ff eb ad <0f> 0b 55 48 89 e5 41 57 41 56 41 55 41 54 53 48
(XEN) Xen stack trace from rsp=ffff830250ba7d78:
(XEN)    ffff830250b10000 ffff830250b18f10 ffff830250b18f10 ffff830250b60840
(XEN)    ffff830250ba7de8 ffff82d04024b8eb 0000000000000202 ffff830250b60840
(XEN)    ffff830250b66018 0000000000000001 0000000000000000 0000000000000000
(XEN)    ffff830250b66018 ffff830250b10000 ffff830250ba7e58 ffff82d040207c3f
(XEN)    ffff82d0403673d4 ffff82d0403673c8 ffff82d0403673d4 ffff82d0403673c8
(XEN)    ffff82d0403673d4 ffff82d0403673c8 ffff82d0403673d4 ffff830250ba7ef8
(XEN)    0000000000000180 ffff830250b45000 deadbeefdeadf00d 0000000000000003
(XEN)    ffff830250ba7ee8 ffff82d0402e7759 0000000000000001 0000000000000005
(XEN)    0000000000000000 deadbeefdeadf00d deadbeefdeadf00d ffff82d0403673c8
(XEN)    ffff82d0403673d4 ffff82d0403673c8 ffff82d0403673d4 ffff82d0403673c8
(XEN)    ffff82d0403673d4 ffff830250b45000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 00007cfdaf4580e7 ffff82d040367432
(XEN)    ffff888192157320 0000000000000000 ffffffff810eb370 0000000000000000
(XEN)    ffff8881b0a626c0 0000000000000005 0000000000000246 0000000000000001
(XEN)    ffffea0006087608 ffffea0006087608 0000000000000018 ffffffff8100230a
(XEN)    0000000000000000 0000000000000005 0000000000000001 0000010000000000
(XEN)    ffffffff8100230a 000000000000e033 0000000000000246 ffffc90002653cf0
(XEN)    000000000000e02b d2c2c2c2c2c2c2c2 c2c2c2c2c2c2c282 c2c2c2c2c2c2c2c2
(XEN)    c2e2c2c2c2c2c2c2 0000e01000000008 ffff830250b45000 0000003210631000
(XEN)    0000000000362660 0000000000000000 8000000250bc3002 0000060100000000
(XEN) Xen call trace:
(XEN)    [<ffff82d040242725>] R credit2.c#csched2_unit_wake+0x14f/0x151
(XEN)    [<ffff82d04024b8eb>] F vcpu_wake+0x105/0x52c
(XEN)    [<ffff82d040207c3f>] F do_vcpu_op+0x1b0/0x631
(XEN)    [<ffff82d0402e7759>] F pv_hypercall+0x28f/0x57d
(XEN)    [<ffff82d040367432>] F lstar_enter+0x112/0x120
(XEN) 
(XEN) 
(XEN) ****************************************
(XEN) Panic on CPU 8:
(XEN) Assertion 'c2rqd(sched_unit_master(unit)) == svc->rqd' failed at 
credit2.c:2273
(XEN) ****************************************
(XEN) 
(XEN) Reboot in five seconds...
(XEN) Executing kexec image on cpu8
(XEN) Shot down all CPUs

Looks pretty similar to the other thread "Xen crash after S3 suspend -
Xen 4.13" - adding Jürgen. Since I've seen this one on Xen 4.13 before,
I think the commit I've found just makes it much more likely to happen.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

Attachment: signature.asc
Description: PGP signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.