[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[BUG] Core scheduling patches causing deadlock in some situations



Hello,

I'm running DRAKVUF on Dell Inc. PowerEdge R640/08HT8T server with Intel(R) 
Xeon(R) Gold 6132 CPU @ 2.60GHz CPU.
When upgrading from Xen RELEASE 4.12 to 4.13, we have noticed some stability 
problems concerning freezes of Dom0 (Debian Buster):

---

maj 27 23:17:02 debian kernel: rcu: INFO: rcu_sched self-detected stall on CPU
maj 27 23:17:02 debian kernel: rcu: 0-....: (5250 ticks this GP) 
idle=cee/1/0x4000000000000002 softirq=11964/11964 fqs=2515
maj 27 23:17:02 debian kernel: rcu: (t=5251 jiffies g=27237 q=799)
maj 27 23:17:02 debian kernel: NMI backtrace for cpu 0
maj 27 23:17:02 debian kernel: CPU: 0 PID: 643 Comm: z_rd_int_1 Tainted: P OE 
4.19.0-6-amd64 #1 Debian 4.19.67-2+deb10u2
maj 27 23:17:02 debian kernel: Hardware name: Dell Inc. PowerEdge R640/08HT8T, 
BIOS 2.1.8 04/30/2019
maj 27 23:17:02 debian kernel: Call Trace:
maj 27 23:17:02 debian kernel: <IRQ>
maj 27 23:17:02 debian kernel: dump_stack+0x5c/0x80
maj 27 23:17:02 debian kernel: nmi_cpu_backtrace.cold.4+0x13/0x50
maj 27 23:17:02 debian kernel: ? lapic_can_unplug_cpu.cold.29+0x3b/0x3b
maj 27 23:17:02 debian kernel: nmi_trigger_cpumask_backtrace+0xf9/0xfb
maj 27 23:17:02 debian kernel: rcu_dump_cpu_stacks+0x9b/0xcb
maj 27 23:17:02 debian kernel: rcu_check_callbacks.cold.81+0x1db/0x335
maj 27 23:17:02 debian kernel: ? tick_sched_do_timer+0x60/0x60
maj 27 23:17:02 debian kernel: update_process_times+0x28/0x60
maj 27 23:17:02 debian kernel: tick_sched_handle+0x22/0x60

---

This usually results in machine being completely unresponsive and performing an 
automated reboot after some time.

I've bisected commits between 4.12 and 4.13 and it seems like this is the patch 
which introduced a bug:
https://github.com/xen-project/xen/commit/7c7b407e77724f37c4b448930777a59a479feb21

Enclosed you can find the `xl dmesg` log (attachment: dmesg.txt) from the fresh 
boot of the machine on which the bug was reproduced.

I'm also attaching the `xl info` output from this machine:

---

release : 4.19.0-6-amd64
version : #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11)
machine : x86_64
nr_cpus : 14
max_cpu_id : 223
nr_nodes : 1
cores_per_socket : 14
threads_per_core : 1
cpu_mhz : 2593.930
hw_caps : 
bfebfbff:77fef3ff:2c100800:00000121:0000000f:d19ffffb:00000008:00000100
virt_caps : pv hvm hvm_directio pv_directio hap shadow
total_memory : 130541
free_memory : 63591
sharing_freed_memory : 0
sharing_used_memory : 0
outstanding_claims : 0
free_cpus : 0
xen_major : 4
xen_minor : 13
xen_extra : -unstable
xen_version : 4.13-unstable
xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p 
hvm-3.0-x86_64
xen_scheduler : credit2
xen_pagesize : 4096
platform_params : virt_start=0xffff800000000000
xen_changeset : Wed Oct 2 09:27:27 2019 +0200 git:7c7b407e77-dirty
xen_commandline : placeholder dom0_mem=65270M,max:65270M dom0_max_vcpus=6 
dom0_vcpus_pin=1 force-ept=1 ept=pml=0 hap_1gb=0 hap_2mb=0 altp2m=1 smt=0 
no-real-mode edd=off
cc_compiler : gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
cc_compile_by : root
cc_compile_domain :
cc_compile_date : Fri May 29 02:13:39 UTC 2020
build_id : 958cea737ee01f06e595d52191a6d7bb5ee67deb
xend_config_format : 4

---


Best regards,
Michał Leszczyński
CERT Polska

Attachment: dmesg.txt
Description: Text document


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.