[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH V3] libxl: Increase device model startup timeout to 1min.



On Tue, 2015-07-14 at 10:37 +0100, Ian Campbell wrote:
> On Tue, 2015-07-14 at 11:25 +0200, Dario Faggioli wrote:
> > On Tue, 2015-07-14 at 08:55 +0100, Ian Campbell wrote:

> > > It'll be hard to say until this change gets through the Xen push gate
> > > and that version gets used for other branches (linux testing, libvirt,
> > > ovmf, osstest's own gate etc).
> > > 
> > Indeed. My opinion is that no, it is not.
> > 
> > My understanding of the data Anthony provided is that, under some
> > (difficult to track/analyze/reproduce/etc) load conditions, the Linux IO
> > and VM subsystem suffer from high latency, delaying QEMU startup.
> > 
> > In the merlot* cases, the system is completely idle, apart from the
> > failing creation/migration operation.
> > 
> > So, no, I don't think that would not be the fix we need for that
> > situation.
> 
> Even if it is not the correct fix it seems like in some situations the
> increase in timeout has improved things, hence it is an "answer" as Jan
> asked (his quote marks).
> 
Sure! And that's why I find this weird/interesting...

> > > At the moment it looks like it has helped with some but not all of the
> > > issues.
> > > 
> > > These:
> > > 
> > > http://logs.test-lab.xenproject.org/osstest/results/host/merlot0.html
> > > http://logs.test-lab.xenproject.org/osstest/results/host/merlot1.html
> > > 
> > Can I ask why (I mean, e.g., comparing what with what) you're saying it
> > seems to have helped?
> 
> There seemed (unscientifically) to be fewer of the libvirt related
> guest-start failures.
> 
And you mean by only looking at xen-unstable lines, don't you?

If yes, looking at merlot0, I've found the below.

Old timeout, failing:
http://logs.test-lab.xenproject.org/osstest/logs/59105/test-amd64-amd64-libvirt-xsm/info.html

New timeout, success:
http://logs.test-lab.xenproject.org/osstest/logs/59311/test-amd64-amd64-libvirt/info.html

And, looking at how long QEMU did take to start up that would be:

  13:44:32 - 13:43:42

i.e., just a bit less than 1min!

So, yes, it looks that this change is actually going to help in this
case.

What I'm missing is how it is possible that, on an idle system, DM
spawning takes that long. As said, in Anthony's OpenStack case, the
system was quite busy... not that it can't be a bug (somewhere, perhaps
in Linux) in that case too, but here, it looks even more weird to me.

May it be the NUMA misconfiguration? Well, if yes, I'm not sure how...

Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.