[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [xen-unstable test] 142973: regressions - FAIL



Jürgen Groß writes ("Re: [Xen-devel] [xen-unstable test] 142973: regressions - 
FAIL"):
> On 21.10.19 10:23, osstest service owner wrote:
> > flight 142973 xen-unstable real [real]
> > http://logs.test-lab.xenproject.org/osstest/logs/142973/
> > 
> > Regressions :-(
> > 
> > Tests which did not succeed and are blocking,
> > including tests which could not be run:
> >   test-amd64-amd64-xl-pvshim   18 guest-localmigrate/x10   fail REGR. vs. 
> > 142750
> 
> Roger, I believe you have looked into that one?
> 
> I guess the conversation via IRC with Ian regarding the race between
> blkback and OSStest was related to the issue?

I think this failure is something else.

What happens here is this:

2019-10-21 02:58:32 Z executing ssh ... -v root@172.16.145.205 date 
[bounch of output from ssh]
status (timed out) at Osstest/TestSupport.pm line 550.
2019-10-21 02:58:42 Z exit status 4

172.16.145.205 is the guest here.  Ie, `ssh date guest' took longer
than 10s.

We can see that the guest networking is working soon after the
migration because we got most of the way through the ssh protocol
exchange.  On the previous repetition the next message from ssh was
   debug1: SSH2_MSG_SERVICE_ACCEPT received

Looking at
  
http://logs.test-lab.xenproject.org/osstest/logs/142973/test-amd64-amd64-xl-pvshim/rimava1---var-log-xen-console-guest-debian.guest.osstest--incoming.log
which is, I think, the log of the "new" instance of guest, after
migration, there are messages about killing various services.  Eg
  [1918064738.820550] systemd[1]: systemd-udevd.service: Main process
  exited, code=killed, status=6/ABRT
They don't seem to be normal.  For example:
  
http://logs.test-lab.xenproject.org/osstest/logs/142865/test-amd64-amd64-xl-pvshim/rimava1---var-log-xen-console-guest-debian.guest.osstest--incoming.log
is the previous xen-unstable flight and it doesn't have them.  I
looked in
  
http://logs.test-lab.xenproject.org/osstest/logs/142865/test-amd64-amd64-xl-pvshim/rimava1---var-log-xen-console-guest-debian.guest.osstest.log.gz
too and that has some alarming messages from the kernel like
 [  686.692660] rcu_sched kthread starved for 1918092123128 jiffies!
 g18446744073709551359 c18446744073709551358 f0x0 RCU_GP_WAIT_FQS(3)
 ->state=0x0 ->cpu=0
and accompanying stack traces.  But the test passed there.  I think
that is probably something else ?

ABRT suggests guest memory corruption.

> If this is the case, could you, Ian, please add the workaround you were
> thinking of to OSStest (unconditional by now, maybe make it condtitional
> later)?

I can add the block race workaround but I don't think it will help
with migration anyway.  The case where things go wrong is destroy.

Roger, am I right that a normal guest shutdown is race-free ?  I think
we tear things down in a slower manner and will therefore end up
waiting for blkback ?  Or is that not true ?

Maybe the right workaround is to disable the code in osstest which
tries to clean up a previous failed run.  I think the kernel doesn't
mind multiple blkfronts (or indeed multiple other tasks) using the
same device at once.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.