[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [BUG] Core scheduling patches causing deadlock in some situations
Hello, I'm running DRAKVUF on Dell Inc. PowerEdge R640/08HT8T server with Intel(R) Xeon(R) Gold 6132 CPU @ 2.60GHz CPU. When upgrading from Xen RELEASE 4.12 to 4.13, we have noticed some stability problems concerning freezes of Dom0 (Debian Buster): --- maj 27 23:17:02 debian kernel: rcu: INFO: rcu_sched self-detected stall on CPU maj 27 23:17:02 debian kernel: rcu: 0-....: (5250 ticks this GP) idle=cee/1/0x4000000000000002 softirq=11964/11964 fqs=2515 maj 27 23:17:02 debian kernel: rcu: (t=5251 jiffies g=27237 q=799) maj 27 23:17:02 debian kernel: NMI backtrace for cpu 0 maj 27 23:17:02 debian kernel: CPU: 0 PID: 643 Comm: z_rd_int_1 Tainted: P OE 4.19.0-6-amd64 #1 Debian 4.19.67-2+deb10u2 maj 27 23:17:02 debian kernel: Hardware name: Dell Inc. PowerEdge R640/08HT8T, BIOS 2.1.8 04/30/2019 maj 27 23:17:02 debian kernel: Call Trace: maj 27 23:17:02 debian kernel: <IRQ> maj 27 23:17:02 debian kernel: dump_stack+0x5c/0x80 maj 27 23:17:02 debian kernel: nmi_cpu_backtrace.cold.4+0x13/0x50 maj 27 23:17:02 debian kernel: ? lapic_can_unplug_cpu.cold.29+0x3b/0x3b maj 27 23:17:02 debian kernel: nmi_trigger_cpumask_backtrace+0xf9/0xfb maj 27 23:17:02 debian kernel: rcu_dump_cpu_stacks+0x9b/0xcb maj 27 23:17:02 debian kernel: rcu_check_callbacks.cold.81+0x1db/0x335 maj 27 23:17:02 debian kernel: ? tick_sched_do_timer+0x60/0x60 maj 27 23:17:02 debian kernel: update_process_times+0x28/0x60 maj 27 23:17:02 debian kernel: tick_sched_handle+0x22/0x60 --- This usually results in machine being completely unresponsive and performing an automated reboot after some time. I've bisected commits between 4.12 and 4.13 and it seems like this is the patch which introduced a bug: https://github.com/xen-project/xen/commit/7c7b407e77724f37c4b448930777a59a479feb21 Enclosed you can find the `xl dmesg` log (attachment: dmesg.txt) from the fresh boot of the machine on which the bug was reproduced. I'm also attaching the `xl info` output from this machine: --- release : 4.19.0-6-amd64 version : #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11) machine : x86_64 nr_cpus : 14 max_cpu_id : 223 nr_nodes : 1 cores_per_socket : 14 threads_per_core : 1 cpu_mhz : 2593.930 hw_caps : bfebfbff:77fef3ff:2c100800:00000121:0000000f:d19ffffb:00000008:00000100 virt_caps : pv hvm hvm_directio pv_directio hap shadow total_memory : 130541 free_memory : 63591 sharing_freed_memory : 0 sharing_used_memory : 0 outstanding_claims : 0 free_cpus : 0 xen_major : 4 xen_minor : 13 xen_extra : -unstable xen_version : 4.13-unstable xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 xen_scheduler : credit2 xen_pagesize : 4096 platform_params : virt_start=0xffff800000000000 xen_changeset : Wed Oct 2 09:27:27 2019 +0200 git:7c7b407e77-dirty xen_commandline : placeholder dom0_mem=65270M,max:65270M dom0_max_vcpus=6 dom0_vcpus_pin=1 force-ept=1 ept=pml=0 hap_1gb=0 hap_2mb=0 altp2m=1 smt=0 no-real-mode edd=off cc_compiler : gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 cc_compile_by : root cc_compile_domain : cc_compile_date : Fri May 29 02:13:39 UTC 2020 build_id : 958cea737ee01f06e595d52191a6d7bb5ee67deb xend_config_format : 4 --- Best regards, Michał Leszczyński CERT Polska Attachment:
dmesg.txt
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |