[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen crash after S3 suspend - Xen 4.13



On Thu, Mar 19, 2020 at 01:28:10AM +0100, Dario Faggioli wrote:
> [Adding Juergen]
> 
> On Wed, 2020-03-18 at 23:10 +0100, Marek Marczykowski-Górecki wrote:
> > On Wed, Mar 18, 2020 at 02:50:52PM +0000, Andrew Cooper wrote:
> > > On 18/03/2020 14:16, Marek Marczykowski-Górecki wrote:
> > > > Hi,
> > > > 
> > > > In my test setup (inside KVM with nested virt enabled), I rather
> > > > frequently get Xen crash on resume from S3. Full message below.
> > > > 
> > > > This is Xen 4.13.0, with some patches, including "sched: fix
> > > > resuming
> > > > from S3 with smt=0".
> > > > 
> > > > Contrary to the previous issue, this one does not happen always -
> > > > I
> > > > would say in about 40% cases on this setup, but very rarely on
> > > > physical
> > > > setup.
> > > > 
> > > > This is _without_ core scheduling enabled, and also with smt=off.
> > > > 
> > > > Do you think it would be any different on xen-unstable? I cat
> > > > try, but
> > > > it isn't trivial in this setup, so I'd ask first.
> > > > 
> Well, Juergen has fixed quite a few issues.
> 
> Most of them where triggering with core-scheduling enabled, and I don't
> recall any of them which looked similar or related to this.
> 
> Still, it's possible that the same issue causes different symptoms, and
> hence that maybe one of the patches would fix this too.

I've tested on master (d094e95fb7c), and reproduced exactly the same crash
(pasted below for the completeness). 
But there is more: additionally, in most (all?) cases after resume I've got
soft lockup in Linux dom0 in smp_call_function_single() - see below. It
didn't happened before and the only change was Xen 4.13 -> master.

Xen crash:

(XEN) Assertion 'c2rqd(sched_unit_master(unit)) == svc->rqd' failed at 
credit2.c:2133
(XEN) ----[ Xen-4.14-unstable  x86_64  debug=y   Not tainted ]----
(XEN) CPU:    1
(XEN) RIP:    e008:[<ffff82d08023a3c5>] credit2.c#csched2_unit_wake+0x14f/0x151
(XEN) RFLAGS: 0000000000010002   CONTEXT: hypervisor (d0v1)
(XEN) rax: ffff8301ba8fafb0   rbx: ffff8300912238b0   rcx: 0000000000000000
(XEN) rdx: ffff8301ba8d81f0   rsi: 0000000000000000   rdi: ffff8301ba8d8016
(XEN) rbp: ffff830170db7d30   rsp: ffff830170db7d10   r8:  deadbeefdeadf00d
(XEN) r9:  deadbeefdeadf00d   r10: 0000000000000000   r11: 0000000000000000
(XEN) r12: ffff8300912239a0   r13: ffff82d080433780   r14: 0000000000000000
(XEN) r15: 0000005bdb5286ad   cr0: 0000000080050033   cr4: 0000000000000660
(XEN) cr3: 000000010e53c000   cr2: 00005ec1b2f56280
(XEN) fsb: 000079872ee29700   gsb: ffff88813ff00000   gss: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen code around <ffff82d08023a3c5> 
(credit2.c#csched2_unit_wake+0x14f/0x151):
(XEN)  df e8 f9 c5 ff ff eb ad <0f> 0b 55 48 89 e5 41 57 41 56 41 55 41 54 53 48
(XEN) Xen stack trace from rsp=ffff830170db7d10:
(XEN)    ffff830090a33000 ffff8300912238b0 ffff8300912238b0 ffff8301ba8d8010
(XEN)    ffff830170db7d78 ffff82d08024253b 0000000000000202 ffff8301ba8d8010
(XEN)    ffff830090a33000 ffff8300a864b000 000079872c600010 0000000000000000
(XEN)    0000000000000001 ffff830170db7d90 ffff82d080206e09 ffff8300a864b000
(XEN)    ffff830170db7da8 ffff82d080206f1c 0000000000000000 ffff830170db7ec0
(XEN)    ffff82d080204de7 ffff8301ba8cb001 ffff830170db7fff 0000000470db7e10
(XEN)    0000000000000000 ffff82e0021d0160 ffff88813ff15b28 ffff8301ba8cb000
(XEN)    ffff8301ba8cb000 ffff8301ba88b000 ffff830170db7e10 0000001200000004
(XEN)    0000798728000005 0000000000000001 0000000000000005 000079872ee286e0
(XEN)    000079872c109e77 000000030000001c 00007986ec0013c0 ffff010a00000005
(XEN)    000000000002a240 000000000002bb30 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000001 00000005d1ea5ab2 0000000000000001
(XEN)    7ba0548d00000000 ffff830170db7ef8 ffff8301ba88b000 0000000000000001
(XEN)    0000000000000000 0000000000000000 ffff830170db7ee8 ffff82d0802d779d
(XEN)    ffff8301ba88b000 0000000000000000 0000000000000000 00007cfe8f2480e7
(XEN)    ffff82d080355432 ffff88813a1bef00 000079872ee28590 000079872ee28590
(XEN)    ffff8881358e9c40 ffff88813a1bef00 ffff88813a1bef01 0000000000000282
(XEN)    0000000000000000 ffffc90001923e08 0000000000000000 0000000000000024
(XEN)    ffffffff8100148a 0000000000000000 0000000000000000 000079872c600010
(XEN)    0000010000000000 ffffffff8100148a 000000000000e033 0000000000000282
(XEN) Xen call trace:
(XEN)    [<ffff82d08023a3c5>] R credit2.c#csched2_unit_wake+0x14f/0x151
(XEN)    [<ffff82d08024253b>] F vcpu_wake+0xdd/0x3ff
(XEN)    [<ffff82d080206e09>] F domain_unpause+0x2f/0x3b
(XEN)    [<ffff82d080206f1c>] F domain_unpause_by_systemcontroller+0x40/0x60
(XEN)    [<ffff82d080204de7>] F do_domctl+0x9e1/0x16f1
(XEN)    [<ffff82d0802d779d>] F pv_hypercall+0x548/0x560
(XEN)    [<ffff82d080355432>] F lstar_enter+0x112/0x120
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 1:
(XEN) Assertion 'c2rqd(sched_unit_master(unit)) == svc->rqd' failed at 
credit2.c:2133
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...


Linux dom0 soft lockup:

[  524.742089] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [systemd:1]
[  524.747897] Modules linked in: joydev br_netfilter xt_physdev xen_netback 
bridge stp llc loop ebtable_filter ebtables ip6table_filter ip6_tables 
iptable_filter snd_hda_codec_generic ledtrig_audio ppdev snd_hda_intel 
snd_intel_nhlt snd_hda_codec snd_hda_core edac_mce_amd snd_hwdep snd_seq 
snd_seq_device snd_pcm pcspkr snd_timer snd parport_pc e1000e soundcore parport 
i2c_piix4 xenfs ip_tables dm_thin_pool dm_persistent_data libcrc32c 
dm_bio_prison bochs_drm drm_kms_helper drm_vram_helper ttm drm serio_raw 
ehci_pci ehci_hcd virtio_console virtio_scsi ata_generic pata_acpi floppy 
qemu_fw_cfg xen_privcmd xen_pciback xen_blkback xen_gntalloc xen_gntdev 
xen_evtchn uinput pkcs8_key_parser
[  524.768696] CPU: 1 PID: 1 Comm: systemd Tainted: G        W         
5.4.25-1.qubes.x86_64 #1
[  524.771407] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
[  524.775056] RIP: e030:smp_call_function_single+0xe0/0x110
[  524.776755] Code: 65 48 33 0c 25 28 00 00 00 75 3b c9 c3 4c 89 c2 4c 89 c9 
48 89 e6 e8 5f fe ff ff 8b 54 24 18 83 e2 01 74 0b f3 90 8b 54 24 18 <83> e2 01 
75 f5 eb ca 8b 05 3b 92 e0 01 85 c0 75 80 0f 0b e9 79 ff
[  524.783649] RSP: e02b:ffffc90000c0fc60 EFLAGS: 00000202
[  524.788857] RAX: 0000000000000000 RBX: ffff888136632540 RCX: 0000000000000040
[  524.791207] RDX: 0000000000000003 RSI: ffffffff82824c60 RDI: ffffffff820107c0
[  524.793610] RBP: ffffc90000c0fca0 R08: 0000000000000000 R09: ffff88813b0007e8
[  524.795737] R10: 0000000000000000 R11: ffffffff8265b6e8 R12: 0000000000000001
[  524.797847] R13: ffffc90000c0fdb0 R14: ffffffff82feb744 R15: ffff88813b7c6800
[  524.800156] FS:  000074e59239e5c0(0000) GS:ffff88813ff00000(0000) 
knlGS:0000000000000000
[  524.802883] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[  524.804661] CR2: 000074e59345a400 CR3: 00000001337e0000 CR4: 0000000000000660
[  524.807097] Call Trace:
[  524.807959]  ? perf_cgroup_attach+0x70/0x70
[  524.809433]  ? _raw_spin_unlock_irqrestore+0x14/0x20
[  524.811179]  ? cgroup_move_task+0x109/0x150
[  524.812623]  task_function_call+0x4d/0x80
[  524.814179]  ? perf_cgroup_switch+0x190/0x190
[  524.815738]  perf_cgroup_attach+0x3f/0x70
[  524.817125]  cgroup_migrate_execute+0x35e/0x420
[  524.818704]  cgroup_attach_task+0x159/0x210
[  524.820158]  ? find_inode_fast.isra.0+0x8e/0xb0
[  524.822055]  cgroup_procs_write+0xd0/0x100
[  524.823692]  cgroup_file_write+0x9b/0x170
[  524.825220]  kernfs_fop_write+0xce/0x1b0
[  524.826598]  vfs_write+0xb6/0x1a0
[  524.827776]  ksys_write+0x67/0xe0
[  524.828969]  do_syscall_64+0x5b/0x1a0
[  524.830083]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  524.831599] RIP: 0033:0x74e5933894b7
[  524.832696] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 
f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 
f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[  524.838570] RSP: 002b:00007ffdfc2df548 EFLAGS: 00000246 ORIG_RAX: 
0000000000000001
[  524.841100] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 000074e5933894b7
[  524.843469] RDX: 0000000000000005 RSI: 00007ffdfc2df70a RDI: 0000000000000017
[  524.846368] RBP: 00007ffdfc2df70a R08: 0000000000000000 R09: 00007ffdfc2df590
[  524.848816] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000005
[  524.851009] R13: 00006149cb4f3800 R14: 0000000000000005 R15: 000074e59345a700

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

Attachment: signature.asc
Description: PGP signature

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.