[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [xen-4.13-testing test] 144736: regressions - FAIL


On 13/12/2019 11:40, Ian Jackson wrote:
Julien Grall writes ("Re: [Xen-devel] [xen-4.13-testing test] 144736: regressions - 
AMD Seattle boards (laxton*) are known to fail booting time to time
because of PCI training issue. We have workaround for it (involving
longer power cycle) but this is not 100% reliable.

This wasn't a power cycle.  It was a software-initiated reboot.  It
does appear to hang in the firmware somewhere.  Do we expect the pci
training issue to occur in this case ?

The PCI training happens at every reset (including software). So I may have confused the workaround for firmware corruption with the PCI training. We definitely have a workfround for the former.

For the latter, I can't remember if we did use a new firmware or just hope it does not happen often.

I think we had a thread on infra@ about the workaround some times last year. Sadly this was sent on my Arm e-mail address and I didn't archive it before leaving :(. Can you have a look if you can find the thread?

   test-armhf-armhf-xl-vhd      18 leak-check/check         fail REGR.
vs. 144673

That one is strange. A qemu process seems to have have died producing
a core file, but I couldn't find any log containing any other indication
of a crashed program.

I haven't found anything interesting in the log. @Ian could you set up
a repro for this?

There is some heisenbug where qemu crashes with very low probability.
(I forget whether only on arm or on x86 too).  This has been around
for a little while.  I doubt this particular failure will be

I can't remember such bug been reported on Arm before. Anyway, I managed to get the stack trace from gdb:

Core was generated by `/usr/local/lib/xen/bin/qemu-system-i386 -xen-domid 1 -chardev socket,id=libxl-c'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x006342be in xen_block_handle_requests (dataplane=0x108e600) at /home/osstest/build.144736.build-armhf/xen/tools/qemu-xen-dir/hw/block/dataplane/xen-block.c:531 531 /home/osstest/build.144736.build-armhf/xen/tools/qemu-xen-dir/hw/block/dataplane/xen-block.c: No such file or directory.
[Current thread is 1 (LWP 1987)]
(gdb) bt
#0 0x006342be in xen_block_handle_requests (dataplane=0x108e600) at /home/osstest/build.144736.build-armhf/xen/tools/qemu-xen-dir/hw/block/dataplane/xen-block.c:531 #1 0x0063447c in xen_block_dataplane_event (opaque=0x108e600) at /home/osstest/build.144736.build-armhf/xen/tools/qemu-xen-dir/hw/block/dataplane/xen-block.c:626 #2 0x008d005c in xen_device_poll (opaque=0x107a3b0) at /home/osstest/build.144736.build-armhf/xen/tools/qemu-xen-dir/hw/xen/xen-bus.c:1077 #3 0x00a4175c in run_poll_handlers_once (ctx=0x1079708, timeout=0xb1ba17f8) at /home/osstest/build.144736.build-armhf/xen/tools/qemu-xen-dir/util/aio-posix.c:520 #4 0x00a41826 in run_poll_handlers (ctx=0x1079708, max_ns=8000, timeout=0xb1ba17f8) at /home/osstest/build.144736.build-armhf/xen/tools/qemu-xen-dir/util/aio-posix.c:562 #5 0x00a41956 in try_poll_mode (ctx=0x1079708, timeout=0xb1ba17f8) at /home/osstest/build.144736.build-armhf/xen/tools/qemu-xen-dir/util/aio-posix.c:597 #6 0x00a41a2c in aio_poll (ctx=0x1079708, blocking=true) at /home/osstest/build.144736.build-armhf/xen/tools/qemu-xen-dir/util/aio-posix.c:639 #7 0x0071dc16 in iothread_run (opaque=0x107d328) at /home/osstest/build.144736.build-armhf/xen/tools/qemu-xen-dir/iothread.c:75 #8 0x00a44c80 in qemu_thread_start (args=0x1079538) at /home/osstest/build.144736.build-armhf/xen/tools/qemu-xen-dir/util/qemu-thread-posix.c:502
#9  0xb67ae5d8 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

This feels like a race condition between the init/free code with handler. Anthony, does it ring any bell?


Julien Grall

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.