[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [BUG] Core scheduling patches causing deadlock in some situations
----- 29 maj 2020 o 15:15, Jürgen Groß jgross@xxxxxxxx napisał(a): > On 29.05.20 14:51, Michał Leszczyński wrote: >> ----- 29 maj 2020 o 14:44, Jürgen Groß jgross@xxxxxxxx napisał(a): >> >>> On 29.05.20 14:30, Michał Leszczyński wrote: >>>> Hello, >>>> >>>> I'm running DRAKVUF on Dell Inc. PowerEdge R640/08HT8T server with Intel(R) >>>> Xeon(R) Gold 6132 CPU @ 2.60GHz CPU. >>>> When upgrading from Xen RELEASE 4.12 to 4.13, we have noticed some >>>> stability >>>> problems concerning freezes of Dom0 (Debian Buster): >>>> >>>> --- >>>> >>>> maj 27 23:17:02 debian kernel: rcu: INFO: rcu_sched self-detected stall on >>>> CPU >>>> maj 27 23:17:02 debian kernel: rcu: 0-....: (5250 ticks this GP) >>>> idle=cee/1/0x4000000000000002 softirq=11964/11964 fqs=2515 >>>> maj 27 23:17:02 debian kernel: rcu: (t=5251 jiffies g=27237 q=799) >>>> maj 27 23:17:02 debian kernel: NMI backtrace for cpu 0 >>>> maj 27 23:17:02 debian kernel: CPU: 0 PID: 643 Comm: z_rd_int_1 Tainted: P >>>> OE >>>> 4.19.0-6-amd64 #1 Debian 4.19.67-2+deb10u2 >>>> maj 27 23:17:02 debian kernel: Hardware name: Dell Inc. PowerEdge >>>> R640/08HT8T, >>>> BIOS 2.1.8 04/30/2019 >>>> maj 27 23:17:02 debian kernel: Call Trace: >>>> maj 27 23:17:02 debian kernel: <IRQ> >>>> maj 27 23:17:02 debian kernel: dump_stack+0x5c/0x80 >>>> maj 27 23:17:02 debian kernel: nmi_cpu_backtrace.cold.4+0x13/0x50 >>>> maj 27 23:17:02 debian kernel: ? lapic_can_unplug_cpu.cold.29+0x3b/0x3b >>>> maj 27 23:17:02 debian kernel: nmi_trigger_cpumask_backtrace+0xf9/0xfb >>>> maj 27 23:17:02 debian kernel: rcu_dump_cpu_stacks+0x9b/0xcb >>>> maj 27 23:17:02 debian kernel: rcu_check_callbacks.cold.81+0x1db/0x335 >>>> maj 27 23:17:02 debian kernel: ? tick_sched_do_timer+0x60/0x60 >>>> maj 27 23:17:02 debian kernel: update_process_times+0x28/0x60 >>>> maj 27 23:17:02 debian kernel: tick_sched_handle+0x22/0x60 >>>> >>>> --- >>>> >>>> This usually results in machine being completely unresponsive and >>>> performing an >>>> automated reboot after some time. >>>> >>>> I've bisected commits between 4.12 and 4.13 and it seems like this is the >>>> patch >>>> which introduced a bug: >>>> https://github.com/xen-project/xen/commit/7c7b407e77724f37c4b448930777a59a479feb21 >>>> >>>> Enclosed you can find the `xl dmesg` log (attachment: dmesg.txt) from the >>>> fresh >>>> boot of the machine on which the bug was reproduced. >>>> >>>> I'm also attaching the `xl info` output from this machine: >>>> >>>> --- >>>> >>>> release : 4.19.0-6-amd64 >>>> version : #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11) >>>> machine : x86_64 >>>> nr_cpus : 14 >>>> max_cpu_id : 223 >>>> nr_nodes : 1 >>>> cores_per_socket : 14 >>>> threads_per_core : 1 >>>> cpu_mhz : 2593.930 >>>> hw_caps : >>>> bfebfbff:77fef3ff:2c100800:00000121:0000000f:d19ffffb:00000008:00000100 >>>> virt_caps : pv hvm hvm_directio pv_directio hap shadow >>>> total_memory : 130541 >>>> free_memory : 63591 >>>> sharing_freed_memory : 0 >>>> sharing_used_memory : 0 >>>> outstanding_claims : 0 >>>> free_cpus : 0 >>>> xen_major : 4 >>>> xen_minor : 13 >>>> xen_extra : -unstable >>>> xen_version : 4.13-unstable >>>> xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p >>>> hvm-3.0-x86_64 >>>> xen_scheduler : credit2 >>>> xen_pagesize : 4096 >>>> platform_params : virt_start=0xffff800000000000 >>>> xen_changeset : Wed Oct 2 09:27:27 2019 +0200 git:7c7b407e77-dirty >>> >>> Which is your original Xen base? This output is clearly obtained at the >>> end of the bisect process. >>> >>> There have been quite some corrections since the release of Xen 4.13, so >>> please make sure you are running the most actual version (4.13.1). >>> >>> >>> Juergen >> >> Sure, we have tested both RELEASE 4.13 and RELEASE 4.13.1. Unfortunately >> these >> corrections didn't help and the bug is still reproducible. >> >> From our testing it turns out that: >> >> Known working revision: 997d6248a9ae932d0dbaac8d8755c2b15fec25dc >> Broken revision: 6278553325a9f76d37811923221b21db3882e017 >> First bad commit: 7c7b407e77724f37c4b448930777a59a479feb21 > > Would it be possible to test xen unstable, too? > > I could imagine e.g. commit b492c65da5ec5ed or 99266e31832fb4a4 to have > an impact here. > > > Juergen I've tried b492c65da5ec5ed revision but it seems that there is some problem with ALTP2M support, so I can't launch anything at all. maj 29 15:45:32 debian drakrun[1223]: Failed to set HVM_PARAM_ALTP2M, RC: -1 maj 29 15:45:32 debian drakrun[1223]: VMI_ERROR: xc_altp2m_switch_to_view returned rc: -1 xl info: --- release : 4.19.0-6-amd64 version : #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11) machine : x86_64 nr_cpus : 14 max_cpu_id : 223 nr_nodes : 1 cores_per_socket : 14 threads_per_core : 1 cpu_mhz : 2593.977 hw_caps : bfebfbff:77fef3ff:2c100800:00000121:0000000f:d19ffffb:00000008:00000100 virt_caps : pv hvm hvm_directio pv_directio hap shadow iommu_hap_pt_share total_memory : 130541 free_memory : 63591 sharing_freed_memory : 0 sharing_used_memory : 0 outstanding_claims : 0 free_cpus : 0 xen_major : 4 xen_minor : 14 xen_extra : -unstable xen_version : 4.14-unstable xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 xen_scheduler : credit2 xen_pagesize : 4096 platform_params : virt_start=0xffff800000000000 xen_changeset : Thu May 14 17:36:13 2020 +0200 git:b492c65da5-dirty xen_commandline : placeholder dom0_mem=65270M,max:65270M dom0_max_vcpus=6 dom0_vcpus_pin=1 force-ept=1 ept=pml=0 hap_1gb=0 hap_2mb=0 altp2m=1 smt=0 no-real-mode edd=off cc_compiler : gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 cc_compile_by : root cc_compile_domain : cc_compile_date : Fri May 29 13:18:41 UTC 2020 build_id : cd3948792d88ec0bc45e03b227f6cbab9572b76b xend_config_format : 4 --- Best regards, Michał Leszczyński
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |