[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [xen-4.9-testing test] 126201: regressions - FAIL



On Fri, Aug 24, 2018 at 09:58:02AM +0100, Wei Liu wrote:
> On Wed, Aug 22, 2018 at 04:52:27PM -0600, Jim Fehlig wrote:
> > On 08/21/2018 05:14 AM, Jan Beulich wrote:
> > > > > > On 21.08.18 at 03:11, <osstest-admin@xxxxxxxxxxxxxx> wrote:
> > > > flight 126201 xen-4.9-testing real [real]
> > > > http://logs.test-lab.xenproject.org/osstest/logs/126201/
> > > > 
> > > > Regressions :-(
> > > > 
> > > > Tests which did not succeed and are blocking,
> > > > including tests which could not be run:
> > > >   test-amd64-amd64-libvirt-pair 22 guest-migrate/src_host/dst_host fail 
> > > > REGR. vs. 124328
> > > 
> > > Something needs to be done about this, as this continued failure is
> > > blocking the 4.9.3 release. I did mail about this on Aug 2nd already
> > > for flight 125710, I've got back from Wei:
> > > 
> > > > This is libvirtd's error message.
> > > > 
> > > > The remote host can't obtain the state change log due to it is already
> > > > held by another task/thread. It could be a libvirt / libxl bug.
> > > > 
> > > > 2018-08-01 16:12:13.433+0000: 3491: warning : 
> > > > libxlDomainObjBeginJob:151 :
> > > > Cannot start job (modify) for domain debian.guest.osstest; current job 
> > > > is (modify) owned by (24975)
> > 
> > I took a closer look at the logs and it appears the finish phase of
> > migration fails to acquire the domain job lock since it is already held by
> > the perform phase. In the perform phase, after the vm has been transferred
> > to the dst, the qemu process associated with the vm is started. For whatever
> > reason that takes a long time on this host:
> > 
> > 2018-08-19 17:05:19.182+0000: libxl: libxl_dm.c:2235:libxl__spawn_local_dm:
> > Domain 1:Spawning device-model /usr/local/lib/xen/bin/qemu-system-i386 with
> > arguments: ...
> > 2018-08-19 17:05:19.188+0000: libxl: libxl_exec.c:398:spawn_watch_event:
> > domain 1 device model: spawn watch p=(null)
> 
> This is a spurious event after the watch has been set up.
> 
> > ...
> > 2018-08-19 17:05:51.529+0000: libxl: libxl_event.c:573:watchfd_callback:
> > watch w=0x7f84a0047ee8 wpath=/local/domain/0/device-model/1/state token=2/1:
> > event epath=/local/domain/0/device-model/1/state
> > 2018-08-19 17:05:51.529+0000: libxl: libxl_exec.c:398:spawn_watch_event:
> > domain 1 device model: spawn watch p=running
> 
> So it has taken 32s for QEMU to write "running" in xenstore. This,
> however, is still within the timeout limit set by libxl (60s).
> 

I haven't been able to reliably reproduce the timeout.

One thing I observe is that libvirt picks qdisk backend while xl picks
phys backend.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.