[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [libvirt test] 58119: regressions - FAIL



On Tue, Jun 23, 2015 at 01:57:18PM +0100, Ian Campbell wrote:
> On Tue, 2015-06-23 at 12:15 +0100, Anthony PERARD wrote:
> > On Mon, Jun 08, 2015 at 10:22:28AM +0100, Ian Campbell wrote:
> > > On Mon, 2015-06-08 at 04:37 +0000, osstest service user wrote:
> > > > flight 58119 libvirt real [real]
> > > > http://logs.test-lab.xenproject.org/osstest/logs/58119/
> > > > 
> > > > Regressions :-(
> > > > 
> > > > Tests which did not succeed and are blocking,
> > > > including tests which could not be run:
> > > 
> > > This has been failing for a while now, sorry for not brining it to your
> > > attention sooner.
> > 
> > > libxl: debug: libxl_event.c:638:libxl__ev_xswatch_deregister: watch 
> > > w=0x7f805c25b248 wpath=/local/domain/0/device-model/1/state token=3/0: 
> > > deregister slotnum=3
> > > libxl: error: libxl_exec.c:393:spawn_watch_event: domain 1 device model: 
> > > startup timed out
> > > libxl: debug: libxl_event.c:652:libxl__ev_xswatch_deregister: watch 
> > > w=0x7f805c25b248: deregister unregistered
> > > libxl: debug: libxl_event.c:652:libxl__ev_xswatch_deregister: watch 
> > > w=0x7f805c25b248: deregister unregistered
> > > libxl: error: libxl_dm.c:1564:device_model_spawn_outcome: domain 1 device 
> > > model: spawn failed (rc=-3)
> > > libxl: error: libxl_create.c:1373:domcreate_devmodel_started: device 
> > > model did not start: -3
> > 
> > Hi,
> > 
> > I've tried to debug this "device model: startup time out" issue that I'm
> > seeing on OpenStack. What I've done is strace every single QEMU. It appear
> > that QEMU take more than 10s to load...
> 
> FWIW I've started running some adhoc osstest jobs on the Cambridge
> instance too, first time everything passed. The second attempt I forced
> onto the *-frog machines which are "AMD Opteron(tm) Processor 6168"
> processors which is as close as I can get to the "AMD Opteron(tm)
> Processor 6376" ones in merlot* and they also passed. That's not enough
> data to really be going on though.
> 
> Do you happen to know what h/w the openstack tests run on? It is using
> nested virt, is that right?

For the strace I've sent, they come from a local machine and it is running
Xen baremetal. It's an "AMD Opteron(tm) Processor 4284".
Out of about 4100 domain created, there are only 16 device model startup
timeout. I've gathered the data while running Tempest, and asked it to run
4 concurrent tests.

> Given that merlot* seems to have some sort of barking NUMA configuration
> SNAFU I wouldn't necessarily rule out "it's just really slow".
> 
> 10s does seem _very_ slow though, on an essentially idle system, no
> matter how bad it's NUMA-ness is setup...

There is probably something specific about those merlot* machines. I think
the issue I have with OpenStack and the time out is just a machine that
have too much to do.

-- 
Anthony PERARD

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.