[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [arm] Dom0 hangs after enable KROBE_EVENTS and/or UPROBE_EVENTS in kernel config


  • To: Julien Grall <julien@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Oleksii Moisieiev <Oleksii_Moisieiev@xxxxxxxx>
  • Date: Wed, 21 Jul 2021 18:28:13 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=epam.com; dmarc=pass action=none header.from=epam.com; dkim=pass header.d=epam.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=r6zi+OiVf7DpspvaUjPpllKD/aKjvcY1uyx+URWFnhM=; b=Jn+ugNf1GMPClpHsm3aa//H6LXIRhbIyZPMAkviZavM9Jt0B/ik1UGoNd4clZq7LhcVORA3zyGjE5vDEALkUGYhRMewV81hSJD4kSqbGuEyJUvQW6n754zpFE2CGtXnzSGGvhcMNaQ8Zg+Oj2djaieFt5NIMXQ+wMBFep3B1AnqlgjoRb5C2fYEaTI6XmmT2mmSr03m/bgTueGeRb09aA3IPn5zzt30mRDcEGINhoTdBcnjMJfBzhbSoyly+5dAHjRJ4EKHwR0HrUs4gZit6hEjEdPibGu1S0hFZ+a62sXyvgWt7au15vytmgnPhmOXQgdbkblXXc1a1HZcSjst09w==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=JDZoPNCiohABqBSwOpw5e218iN+CfirvbASzoQSn5O3ZO6qLpr6rtUmNFVLxbjlx55ZRKWgtin5kIi+OKd4O7Za3zLvMKguOwpeK8zQ7qOoNR3p0HNah8FdLrTeD0YsSmyQi5VwyyXS9W0+picpFHOlMFesaDAY83wXo3Gee+LW/uXEgXLQlQ9CT1oQbSpPg4HpuMRJZ1QZ1PiWOtV8F3nxP2XFotKju5TmH91mvKl1UwtGEF2v7zrbH8GIeV8DMVk5D56wh0QqdRQQ2SzgVYTT8acD1sU5yoYIk5dsG4e1mCndvVyKkaBWuDy5cG/LS3mRlxtUYqJEF9H9F/0tBWw==
  • Authentication-results: xen.org; dkim=none (message not signed) header.d=none;xen.org; dmarc=none action=none header.from=epam.com;
  • Cc: Andrii Anisov <Andrii_Anisov@xxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>
  • Delivery-date: Wed, 21 Jul 2021 18:28:29 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHXfk7sayRbwafWg0WziHoeeCAd5atNqbDZ
  • Thread-topic: [arm] Dom0 hangs after enable KROBE_EVENTS and/or UPROBE_EVENTS in kernel config

Please see my answers below.


From: Julien Grall <julien@xxxxxxx>
Sent: Wednesday, July 21, 2021 7:39 PM
To: Oleksii Moisieiev <Oleksii_Moisieiev@xxxxxxxx>; xen-devel@xxxxxxxxxxxxxxxxxxxx <xen-devel@xxxxxxxxxxxxxxxxxxxx>
Cc: Andrii Anisov <Andrii_Anisov@xxxxxxxx>; Stefano Stabellini <sstabellini@xxxxxxxxxx>
Subject: Re: [arm] Dom0 hangs after enable KROBE_EVENTS and/or UPROBE_EVENTS in kernel config
 
On 21/07/2021 15:40, Oleksii Moisieiev wrote:
> Hello Julien,

Hello,

>>>
>>> My setup:
>>> Board: H3ULCB Kinfisher board
>>> Xen: revision dba774896f7dd74773c14d537643b7d7477fefcd (stable-4.15)
>>> https://urldefense.com/v3/__https://github.com/xen-project/xen.git__;!!GF_29dbcQIUBPA!m4NHC2XbbSHWWZjQ7CX1ZZhaET6l0bQhZo581jtCmpst8E8JBp8Qri3haIaks6cbo7Ri$
> <https://urldefense.com/v3/__https://github.com/xen-project/xen.git__;!!GF_29dbcQIUBPA!m4NHC2XbbSHWWZjQ7CX1ZZhaET6l0bQhZo581jtCmpst8E8JBp8Qri3haIaks6cbo7Ri$>[github[.]com]
>
>>> <https://urldefense.com/v3/__https://github.com/xen-project/xen.git__;!!GF_29dbcQIUBPA!m4NHC2XbbSHWWZjQ7CX1ZZhaET6l0bQhZo581jtCmpst8E8JBp8Qri3haIaks6cbo7Ri$
> [github[.]com]>;
>>> Kernel: revision 09162bc32c880a791c6c0668ce0745cf7958f576 (v5.10-rc4)
>
>>Hmmm... 5.10 was released a few months ago and there are probably a few
>>stable release for the version. Can you try the latest 5.10 stable?
>
> Switched to tag v5.10 rev: 2c85ebc57b3e of
> https://urldefense.com/v3/__https://github.com/torvalds/linux.git__;!!GF_29dbcQIUBPA!hJARiSsCASVNpAQxrnN-7sFsVHHTS39sjRraLqBkD6AoaCbplgoyiv-iCGlHhXafbPNc$ [github[.]com]
> and got the same problem, that I see no output from kernel. All tests
> were done with earlycon parameter set in the kernel cmdline.
The tag v5.10 is the first official release. What I meant is using the
stable branch from
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git (branch
linux-5.10.y).
I need some time to download and build mainline kernel. I'll test this scenario and send you results tomorrow.
>
>>>
>>> https://urldefense.com/v3/__https://github.com/torvalds/linux.git__;!!GF_29dbcQIUBPA!m4NHC2XbbSHWWZjQ7CX1ZZhaET6l0bQhZo581jtCmpst8E8JBp8Qri3haIaks29w69MC$
> <https://urldefense.com/v3/__https://github.com/torvalds/linux.git__;!!GF_29dbcQIUBPA!m4NHC2XbbSHWWZjQ7CX1ZZhaET6l0bQhZo581jtCmpst8E8JBp8Qri3haIaks29w69MC$>[github[.]com]
>
>>> <https://urldefense.com/v3/__https://github.com/torvalds/linux.git__;!!GF_29dbcQIUBPA!m4NHC2XbbSHWWZjQ7CX1ZZhaET6l0bQhZo581jtCmpst8E8JBp8Qri3haIaks29w69MC$
> [github[.]com]>;
>>>
>>> kernel config: see attached;
>>>
>>> dtb: see attached;
>
>>Please avoid large attachment as they will be duplicated on every
>>mailbox. Instead, in the future, please upload them somewhere (your own
>>webserve, pastebin...) and provide a link in the e-mail.
>
> I'm sorry for that.
>
>>>
>>>
>>> If kprobe/uprobe events are enabled - I see no output after xen switched
>>> input to Dom0, if disabled - system boots up successfully.
>>The console subsystem tends to be enabled quite late in the boot
>>process. So this may mean a panic during early boot.
>
>>If you haven't done yet, I would suggest to add earlycon=xenboot on the
>>dom0 command line. This will print some messages during early boot.
>>ing.
>
> All tests were done with earlycon parameter set in the kernel command
> line (xen, dom0-bootargs).
>
>>>
>>> Both configs work fine when I boot without xen.
>>>
>>>
>>> Dom0 information from Xen console shows that only one CPU works, and PC
>>> stays in "__arch_counter_get_cntvct" function on read_sysreg call. //
>>>
>>> I did further investigation and found that kernel 5.4 doesn't have such
>>> kind of issues.
>>> After bisecting kernel,between 5.10 and 5.4, I found that output
>>> disappeared on commit:
>>>
>>> 76085aff29f585139a37a10ea0a7daa63f70872c
>
>> From the information you provided so far, I am a bit confused how this
>>could be the source of the problem. But given this is not the latest
>>5.10, I will wait for you to confirm the bug is still present before
>>providing more input.
>
> I was confused with this commit either. As I mentioned above, I've
> checked with the latest stable 5.10 kernel and still got the same problem.

Thanks for the testing. I am not quite too sure where this may fail.
Maybe Stefano has an idea?

If you have an external debugger, can you use it to get a stack trace?
Otherwise, I would suggest to add some xen_raw_printk() in the code to
figure out where it may fail.
Unfortunately, I don't have an external debugger right now (my testing board is placed in the different country).
Let me share with you the results of the investigation I've done, before asking help from community. I haven't shared it before because I wasn't sure it's related.

 I've met error with no printk on linux-bsp kernel taken from the latest renesas yocto release: https://elinux.org/R-Car/Boards/Yocto-Gen3/v5.1.0

My original kernel based on rev 301d2c636929be96f3d87b1b5d287f87ed67a7be of linux-bsp kernel.

I've added HYPERVISOR_console_io calls  to the code and got the following backtrace:
It's a little bit messy, but still readable. I've added extra prints to the dump_stack function to be able to see backtrace. In this case, looks like the problem is that system is unable to get free descriptor from printk_ringbuffer.
In file kernel/printk/printk_ringbuffer.c, function desc_reserve, line:

prev_state_val = atomic_long_read(&desc->state_var); /* LMM(desc_reserve:E) */

we get some 18446744073709551615 while expecting 0.

But the problem seems to lay deeper because when I switched to the mainline kernel, it hangs on read_sysreg call in

 __arch_counter_get_cntvct function.


Then I reverted commit 76085aff29f585139a37a10ea0a7daa63f70872c and this fixed problem with no printk output.
Now, with the commit reverted, I see the kernel output with error CPU is in inconsistent state. Looks like commit 76085aff29f585139a37a10ea0a7daa63f70872c is the cause of no output issue.
>
>>>
>>>
>>> Another issue, which was revealed after I got kernel output was kernel
>>> oops with message that CPU is in inconsistent state.
>>>
>>> [0.415612] EFI services will not be available.
>>>
>>> [0.420267] smp: Bringing up secondary CPUs ...
>>>
>>> [0.425185] Detected PIPT I-cache on CPU1
>>>
>>> [0.425267] Xen: initializing cpu1
>>>
>>> [0.425292] CPU1: Booted secondary processor 0x0000000001 [0x411fd073]
>>>
>>> [0.425815] Detected PIPT I-cache on CPU2
>>>
>>> [0.425879] Xen: initializing cpu2
>>>
>>> [0.425899] CPU2: Booted secondary processor 0x0000000002 [0x411fd073]
>>>
>>> [0.426362] Detected PIPT I-cache on CPU3
>>>
>>> [0.426425] Xen: initializing cpu3
>>>
>>> [0.426444] CPU3: Booted secondary processor 0x0000000003 [0x411fd073]
>>>
>>> [0.426537] smp: Brought up 1 node, 4 CPUs
>>>
>>> [0.472807] SMP: Total of 4 processors activated.
>>>
>>> [0.477551] CPU features: detected: 32-bit EL0 Support
>>>
>>> [0.482745] CPU features: detected: CRC32 instructions
>>>
>>> [0.499470] ------------[ cut here ]------------
>>>
>>> [0.504034] CPU: CPUs started in inconsistent modes
>
>>Looking at Linux 5.7 code, this is printed when not all the CPUs are
>>booted in the same mode (i.e. EL1 or EL2).
>
>>This is quite odd. So let me ask a question first, did you see this
>>error during the bisection or on the latest 5.7?
>
> Switched to kernel v5.7 tag, rev:3d77e6a8804.

Similar to 5.10, the lastest stable in the linux-stable repo
linux-5.7.y. If this still doesn't help...

I need some time to download and build mainline kernel. I'll test this scenario and send you results tomorrow.
> On 5.7 kernel I can see printk output, but getting CPUs started in
> inconsistent modes error.
> Also, I tried with hmp-unsafe=false ( in xen cmdline, so only 0-3 CPU
> were enabled. And still got the same issue.
... can you print __boot_cpu_mode[0] and __boot_cpu_mode[1]?

Cheers,

--
Julien Grall

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.