[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [BUG] Core scheduling patches causing deadlock in some situations



----- 29 maj 2020 o 15:15, Jürgen Groß jgross@xxxxxxxx napisał(a):

> On 29.05.20 14:51, Michał Leszczyński wrote:
>> ----- 29 maj 2020 o 14:44, Jürgen Groß jgross@xxxxxxxx napisał(a):
>> 
>>> On 29.05.20 14:30, Michał Leszczyński wrote:
>>>> Hello,
>>>>
>>>> I'm running DRAKVUF on Dell Inc. PowerEdge R640/08HT8T server with Intel(R)
>>>> Xeon(R) Gold 6132 CPU @ 2.60GHz CPU.
>>>> When upgrading from Xen RELEASE 4.12 to 4.13, we have noticed some 
>>>> stability
>>>> problems concerning freezes of Dom0 (Debian Buster):
>>>>
>>>> ---
>>>>
>>>> maj 27 23:17:02 debian kernel: rcu: INFO: rcu_sched self-detected stall on 
>>>> CPU
>>>> maj 27 23:17:02 debian kernel: rcu: 0-....: (5250 ticks this GP)
>>>> idle=cee/1/0x4000000000000002 softirq=11964/11964 fqs=2515
>>>> maj 27 23:17:02 debian kernel: rcu: (t=5251 jiffies g=27237 q=799)
>>>> maj 27 23:17:02 debian kernel: NMI backtrace for cpu 0
>>>> maj 27 23:17:02 debian kernel: CPU: 0 PID: 643 Comm: z_rd_int_1 Tainted: P 
>>>> OE
>>>> 4.19.0-6-amd64 #1 Debian 4.19.67-2+deb10u2
>>>> maj 27 23:17:02 debian kernel: Hardware name: Dell Inc. PowerEdge 
>>>> R640/08HT8T,
>>>> BIOS 2.1.8 04/30/2019
>>>> maj 27 23:17:02 debian kernel: Call Trace:
>>>> maj 27 23:17:02 debian kernel: <IRQ>
>>>> maj 27 23:17:02 debian kernel: dump_stack+0x5c/0x80
>>>> maj 27 23:17:02 debian kernel: nmi_cpu_backtrace.cold.4+0x13/0x50
>>>> maj 27 23:17:02 debian kernel: ? lapic_can_unplug_cpu.cold.29+0x3b/0x3b
>>>> maj 27 23:17:02 debian kernel: nmi_trigger_cpumask_backtrace+0xf9/0xfb
>>>> maj 27 23:17:02 debian kernel: rcu_dump_cpu_stacks+0x9b/0xcb
>>>> maj 27 23:17:02 debian kernel: rcu_check_callbacks.cold.81+0x1db/0x335
>>>> maj 27 23:17:02 debian kernel: ? tick_sched_do_timer+0x60/0x60
>>>> maj 27 23:17:02 debian kernel: update_process_times+0x28/0x60
>>>> maj 27 23:17:02 debian kernel: tick_sched_handle+0x22/0x60
>>>>
>>>> ---
>>>>
>>>> This usually results in machine being completely unresponsive and 
>>>> performing an
>>>> automated reboot after some time.
>>>>
>>>> I've bisected commits between 4.12 and 4.13 and it seems like this is the 
>>>> patch
>>>> which introduced a bug:
>>>> https://github.com/xen-project/xen/commit/7c7b407e77724f37c4b448930777a59a479feb21
>>>>
>>>> Enclosed you can find the `xl dmesg` log (attachment: dmesg.txt) from the 
>>>> fresh
>>>> boot of the machine on which the bug was reproduced.
>>>>
>>>> I'm also attaching the `xl info` output from this machine:
>>>>
>>>> ---
>>>>
>>>> release : 4.19.0-6-amd64
>>>> version : #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11)
>>>> machine : x86_64
>>>> nr_cpus : 14
>>>> max_cpu_id : 223
>>>> nr_nodes : 1
>>>> cores_per_socket : 14
>>>> threads_per_core : 1
>>>> cpu_mhz : 2593.930
>>>> hw_caps :
>>>> bfebfbff:77fef3ff:2c100800:00000121:0000000f:d19ffffb:00000008:00000100
>>>> virt_caps : pv hvm hvm_directio pv_directio hap shadow
>>>> total_memory : 130541
>>>> free_memory : 63591
>>>> sharing_freed_memory : 0
>>>> sharing_used_memory : 0
>>>> outstanding_claims : 0
>>>> free_cpus : 0
>>>> xen_major : 4
>>>> xen_minor : 13
>>>> xen_extra : -unstable
>>>> xen_version : 4.13-unstable
>>>> xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p
>>>> hvm-3.0-x86_64
>>>> xen_scheduler : credit2
>>>> xen_pagesize : 4096
>>>> platform_params : virt_start=0xffff800000000000
>>>> xen_changeset : Wed Oct 2 09:27:27 2019 +0200 git:7c7b407e77-dirty
>>>
>>> Which is your original Xen base? This output is clearly obtained at the
>>> end of the bisect process.
>>>
>>> There have been quite some corrections since the release of Xen 4.13, so
>>> please make sure you are running the most actual version (4.13.1).
>>>
>>>
>>> Juergen
>> 
>> Sure, we have tested both RELEASE 4.13 and RELEASE 4.13.1. Unfortunately 
>> these
>> corrections didn't help and the bug is still reproducible.
>> 
>>  From our testing it turns out that:
>> 
>> Known working revision: 997d6248a9ae932d0dbaac8d8755c2b15fec25dc
>> Broken revision: 6278553325a9f76d37811923221b21db3882e017
>> First bad commit: 7c7b407e77724f37c4b448930777a59a479feb21
> 
> Would it be possible to test xen unstable, too?
> 
> I could imagine e.g. commit b492c65da5ec5ed or 99266e31832fb4a4 to have
> an impact here.
> 
> 
> Juergen


I've tried b492c65da5ec5ed revision but it seems that there is some problem 
with ALTP2M support, so I can't launch anything at all.

maj 29 15:45:32 debian drakrun[1223]: Failed to set HVM_PARAM_ALTP2M, RC: -1
maj 29 15:45:32 debian drakrun[1223]: VMI_ERROR: xc_altp2m_switch_to_view 
returned rc: -1


xl info:

---

release                : 4.19.0-6-amd64
version                : #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11)
machine                : x86_64
nr_cpus                : 14
max_cpu_id             : 223
nr_nodes               : 1
cores_per_socket       : 14
threads_per_core       : 1
cpu_mhz                : 2593.977
hw_caps                : 
bfebfbff:77fef3ff:2c100800:00000121:0000000f:d19ffffb:00000008:00000100
virt_caps              : pv hvm hvm_directio pv_directio hap shadow 
iommu_hap_pt_share
total_memory           : 130541
free_memory            : 63591
sharing_freed_memory   : 0
sharing_used_memory    : 0
outstanding_claims     : 0
free_cpus              : 0
xen_major              : 4
xen_minor              : 14
xen_extra              : -unstable
xen_version            : 4.14-unstable
xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 
hvm-3.0-x86_32p hvm-3.0-x86_64
xen_scheduler          : credit2
xen_pagesize           : 4096
platform_params        : virt_start=0xffff800000000000
xen_changeset          : Thu May 14 17:36:13 2020 +0200 git:b492c65da5-dirty
xen_commandline        : placeholder dom0_mem=65270M,max:65270M 
dom0_max_vcpus=6 dom0_vcpus_pin=1 force-ept=1 ept=pml=0 hap_1gb=0 hap_2mb=0 
altp2m=1 smt=0 no-real-mode edd=off
cc_compiler            : gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
cc_compile_by          : root
cc_compile_domain      :
cc_compile_date        : Fri May 29 13:18:41 UTC 2020
build_id               : cd3948792d88ec0bc45e03b227f6cbab9572b76b
xend_config_format     : 4

---

Best regards,
Michał Leszczyński



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.