[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Recent upgrade of 4.13 -> 4.14 issue



On 16.12.2020 13:19, Liwei wrote:
> On Wed, 16 Dec 2020 at 16:12, Jan Beulich <jbeulich@xxxxxxxx> wrote:
>> On 15.12.2020 20:08, Liwei wrote:
>>> Hi list,
>>>     This is a reply to the thread of the same title (linked here:
>>> https://www.mail-archive.com/xen-devel@xxxxxxxxxxxxxxxxxxxx/msg84916.html
>>> ) which I could not reply to because I receive this list by digest.
>>>
>>>     I'm unclear if this is exactly the reason, but I experienced the
>>> same symptoms when upgrading to 4.14. The issue does not occur if I
>>> downgrade to 4.11 (the previous version that was provided by Debian).
>>> Kernel is 5.9.11 and unchanged between xen versions.
>>>
>>>     One thing I noticed is that if I disable the monitor/mwait
>>> instructions on my CPU (Intel Xeon E5-2699 v4 ES), the stalls seem to
>>> occur later into the boot. With the instructions enabled, the system
>>> usually stalls less than a few minutes after boot; disabled, it can
>>> last for tens of minutes.
>>>
>>>     Further disabling the HPET or forcing the kernel to use PIT causes
>>> it to be somewhat usable. The stalls still occur tens of minutes in
>>> but somehow everything seems to continue chugging along fine?
>>
>> By "the kernel" do you really mean the kernel, or Xen?
> 
> Sorry, I mean xen. Too used to thinking that xen isn't there when I'm
> talking about dom0.
> 
>>
>>>     I've also verified that the stalls do not occur in all the above
>>> cases if I just boot into the kernel without xen.
>>>
>>>     When the stalls happen, I get the "rcu: INFO: rcu_sched detected
>>> stalls on CPUs/tasks" backtraces printed on the console periodically,
>>> but keystrokes don't do anything on the console, and I can't spawn new
>>> SSH sessions even though pinging the system produces a reply. The last
>>> item in the call trace is usually "xen_safe_halt", but I've seen it
>>> occur for other functions related to btrfs and the network adapter as
>>> well.
>>
>> The kernel log may not be the only relevant thing here - the hypervisor
>> log may also need looking at (with full verbosity enabled and
>> preferably a debug build in use).
> 
> I will build a debug version and get back to you on that. Do I just
> have loglvl and guest_loglvl set to full, console to ring, and get the
> entire serial spew? I recall that you wanted to see the I, q and r
> outputs as well.

Yes. The debug keys are kind of optional at a first step, but it won't
hurt if you include them unless your box is a really big one.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.