[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Help commissioning x86 boxes intended for builds [himrod[012]]

>>> On 18.02.19 at 17:27, <ian.jackson@xxxxxxxxxx> wrote:
> The symptom is that, occasionally, the network stops working for a
> while.  It then comes back, spontaneously.  There are no log messages
> recorded on the box itself in /var/log for this; no messages on the
> serial console.
> The failure probability is about 10% for any one individual test job.
> It seems to do it only under Xen with our own kernels (4.14.x).
> For initial installation and for for builds we use stock Debian
> kernels (currently, jessie, so 3.16.56-1 for the installer and
> 3.16.57-2 for the installed system); and I haven't seen failures
> there.  I have not tried other combinations (yet).

So one thing would clearly be to tell whether this is kernel version
or Xen dependent. Of course, by only observing the problem under
Xen it's still unclear in which of the two the issue is, but if it's kernel
version dependent, then the pointer is at least a little more clear.

The symptom clearly reminds me of behavior I've been observing
on one of my systems, just with USB (i.e. keyboard and mouse).
This observation applies to running both with and without Xen.
This is a rather old box, so till now I didn't really invest time into
figuring what the cause of this is (and of course at least initially I
was also hoping that others would observe something like this as
well, and it would get fixed without me looking into it).

The other more general troubleshooting I'd suggest to do would
be to check whether turning off the IOMMU helps, or whether
things work any better with booting fewer than all the 56 CPUs.

There being no indication in kernel or hypervisor logs at all is of
course pretty unexpected.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.