[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Host freezing after "fixing" recursive fault starting in multicalls.c



On 18.01.2020 19:59, Peter.Kurfer@xxxxxxxx wrote:
> Hi,
> 
> I was advised to bump this also to the devel mailing list, because the 
> mentioned error message was apparently added in Kernel 4.20 (and upwards) and 
> this kernel version  is not broadly adopted already and therefore it is 
> unlikely that another user encountered a smiliar problem alrleady. 
> 
> Original message (see also here: 
> https://lists.xenproject.org/archives/html/xen-users/2020-01/msg00013.html )
> 
> I'm running Xen 4.11.2 on Fedora 30 with Kernel versions 5.4.7 and 5.4.10 on 
> multiple HP servers.
> 
> The workflow I'm trying to achieve looks like the following:
> 
> - a VM is resumed from a snapshot with a Python script using the libvirt API
> - it is running for a few minutes,
> - it gets paused and finally destroyed for testing purposes
> 
> At some point - it doesn't seem to be deterministic because sometimes it  
> happens directly after the boot and sometimes after multiple hours - a  huge 
> stacktrace starting with an error in `arch/x86/xen/multicalls.c`  can be 
> found in the kernel logs which ends with the message 'Fixing recursive fault 
> but reboot is needed!'.
> 
> After some time the system completely freezes and needs to be hard  resetted 
> because it is not possible any more to login via SSH.
> The freeze is also not deterministic but there are no other critical errors 
> in the logs, so it seems somehow to be related.
> 
> Because the full stacktrace has round about 370 lines I attached it as a 
> GitHub Gist:
> 
> https://gist.github.com/baez90/135c3985cbb6fd4b4204269fb384221a
> 
> I'm a little confused as to what else to try and I have no idea what the 
> problem might be.
> 
> Any hints/ideas/proposals?

A debug hypervisor would most likely spit out a log message for every
individual failure. Seeing these messages may help diagnosing what's
wrong. Knowing more of what exactly triggers this may also help, but
judging from your report may be difficult to isolate. Of course all
of this is applicable only if no-one has already found an explanation
(and then perhaps also a fix) for this.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.