[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [libvirt test] 58119: regressions - FAIL



On Tue, Jun 23, 2015 at 02:32:23PM +0100, Anthony PERARD wrote:
> On Tue, Jun 23, 2015 at 01:57:18PM +0100, Ian Campbell wrote:
> > On Tue, 2015-06-23 at 12:15 +0100, Anthony PERARD wrote:
> > > On Mon, Jun 08, 2015 at 10:22:28AM +0100, Ian Campbell wrote:
> > > > On Mon, 2015-06-08 at 04:37 +0000, osstest service user wrote:
> > > > > flight 58119 libvirt real [real]
> > > > > http://logs.test-lab.xenproject.org/osstest/logs/58119/
> > > > > 
> > > > > Regressions :-(
> > > > > 
> > > > > Tests which did not succeed and are blocking,
> > > > > including tests which could not be run:
> > > > 
> > > > This has been failing for a while now, sorry for not brining it to your
> > > > attention sooner.
> > > 
> > > > libxl: debug: libxl_event.c:638:libxl__ev_xswatch_deregister: watch 
> > > > w=0x7f805c25b248 wpath=/local/domain/0/device-model/1/state token=3/0: 
> > > > deregister slotnum=3
> > > > libxl: error: libxl_exec.c:393:spawn_watch_event: domain 1 device 
> > > > model: startup timed out
> > > > libxl: debug: libxl_event.c:652:libxl__ev_xswatch_deregister: watch 
> > > > w=0x7f805c25b248: deregister unregistered
> > > > libxl: debug: libxl_event.c:652:libxl__ev_xswatch_deregister: watch 
> > > > w=0x7f805c25b248: deregister unregistered
> > > > libxl: error: libxl_dm.c:1564:device_model_spawn_outcome: domain 1 
> > > > device model: spawn failed (rc=-3)
> > > > libxl: error: libxl_create.c:1373:domcreate_devmodel_started: device 
> > > > model did not start: -3
> > > 
> > > Hi,
> > > 
> > > I've tried to debug this "device model: startup time out" issue that I'm
> > > seeing on OpenStack. What I've done is strace every single QEMU. It appear
> > > that QEMU take more than 10s to load...
> > 
> > FWIW I've started running some adhoc osstest jobs on the Cambridge
> > instance too, first time everything passed. The second attempt I forced
> > onto the *-frog machines which are "AMD Opteron(tm) Processor 6168"
> > processors which is as close as I can get to the "AMD Opteron(tm)
> > Processor 6376" ones in merlot* and they also passed. That's not enough
> > data to really be going on though.
> > 
> > Do you happen to know what h/w the openstack tests run on? It is using
> > nested virt, is that right?
> 
> For the strace I've sent, they come from a local machine and it is running
> Xen baremetal. It's an "AMD Opteron(tm) Processor 4284".
> Out of about 4100 domain created, there are only 16 device model startup
> timeout. I've gathered the data while running Tempest, and asked it to run
> 4 concurrent tests.

FYI, I have looked at how long it takes for QEMU to start, from libxl point
of view, and from strace point of view.

For libxl, I have look at the time difference between a call to
libxl__ev_xswatch_register('device-model/$domid/path') and
libxl__qmp_initialize():
cat deltatime | sort | uniq -c
  2754 0:00:00
  1309 0:00:01
    12 0:00:02
     8 0:00:03
     5 0:00:04
     1 0:00:05
     4 0:00:06
     7 0:00:07
     6 0:00:08
     1 0:00:09
     2 0:00:10
    16 timeout: 0:00:10

From straces, it is the time between the execve() call and when QEMU
respond to a QMP connection. The average is 0.316729, and the standard
deviation is 0.460369 (The average and std deviation does not take into
account the QEMUs that timed out).  But, out of the 3386 QEMU startup,
there are 26 run that took between 2s and 10s, and there are 14
more qemu run that have timed out.

I'm going to send a patch to ask to increase the timeout.

-- 
Anthony PERARD

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.