[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 3/3] automation: add a QEMU based x86_64 Dom0/DomU test



On Mon, 25 Oct 2021, Anthony PERARD wrote:
> On Fri, Oct 22, 2021 at 01:05:35PM -0700, Stefano Stabellini wrote:
> > On Fri, 22 Oct 2021, Anthony PERARD wrote:
> > > On Thu, Oct 21, 2021 at 04:08:39PM -0700, Stefano Stabellini wrote:
> > > > diff --git a/automation/scripts/qemu-alpine-x86_64.sh 
> > > > b/automation/scripts/qemu-alpine-x86_64.sh
> > > > new file mode 100644
> > > > index 0000000000..41b05210d6
> > > > --- /dev/null
> > > > +++ b/automation/scripts/qemu-alpine-x86_64.sh
> > > > @@ -0,0 +1,92 @@
> > > > +#!/bin/bash
> > > > +
> > > > +set -ex
> > > > +
> > > > +apt-get -qy update
> > > > +apt-get -qy install --no-install-recommends qemu-system-x86 \
> > > > +                                            cpio \
> > > > +                                            curl \
> > > > +                                            busybox-static
> > > 
> > > Please, don't install packages during the CI job. If you need new
> > > packages, update the container.
> > 
> > The container used to run this script is the one used for the Xen build:
> > automation/build/debian/unstable.dockerfile, AKA
> > registry.gitlab.com/xen-project/xen/debian:unstable. We don't have a
> > specific container for the sole purpose of running tests.
> 
> I've added qemu to our debian:stretch container recently, in order to
> run the "qemu-smoke-*" tests without running apt commands. Unless more
> recent software are needed, you could base the "qemu-alpine-x86*" test
> on debian:stretch which might only then missing cpio and busybox. Update
> of the stretch container should go smoothly as it has been done
> recently.

I am happy to use debian:stretch in this patch series. I'll make the
change. That only leaves cpio and busybox-static to apt-get which are
small. In general, I am not too happy about increasing the size of build
containers with testing binaries so I would rather not increase it any
further, see below.


> > Thus, I could add qemu-system-x86 to
> > automation/build/debian/unstable.dockerfile, but then we increase the
> > size of the debian unstable build container needlessly for all related
> > build jobs.
> 
> There is something I'm missing, how is it a problem to have a container
> that is a bit bigger? What sort of problem could we have to deal with?

It takes time to clone the container in the gitlab-ci, the bigger the
container the more time it takes. It is fetched over the network. Now we
are fetching qemu (as part of the container) 10 times during the build
although it is not needed.

There is a compromise to be made. Let's say the QEMU package is 100MB.
If we add QEMU to debian:stretch we are going to increase the overall
data usage by about 1GB, which means we are waiting to fetch 1GB of
additional data during the pipeline.

If we needed QEMU only once, then we could just apt-get it and we would
only fetch 100MB once. Although using apt-get will be a bit slower
because the deb needs to be unpacked, etc. I think that if we used QEMU
only in one test then it would be faster to apt-get it.

However in our case we are using QEMU 4 times during the tests, so
adding QEMU to debian:stretch probably made things a bit faster or at
least not slower.

Does it make sense?

This is why we should have specialized containers for the tests instead
of adding test packages to the build containers. However, today all
containers are updated manually so it would be best to make containers
automatically updatable first.  Until then I think it is OK to apt-get
stuff.


> On the other hand, doing more task, installing software, downloading
> anything from the internet, makes the job much less reliable. It
> increase the risk of a failure do to external factors and it takes more
> time to run the test.

That is fair but it is more a theoretical problem than a real issue at
the moment. In reality I haven't seen Debian's apt-get failing even a
single time in our gitlab-ci (it failed because our debian:unstable
container went out of date but that's our fault).

But we do have a severe problem at the moment with external sources: our
"git clones" keep failing during the build on x86. That is definitely
something worth improving (see my other email thread on the subject) and
it is the main problem affecting gitlab-ci at the moment, I keep having
to restart jobs almost daily to get the overall pipeline to "pass".

If you have any ideas on how to stop fetching things using "git" from
external repositories in gitlab-ci that would be fantastic :-)
The only thing I could think of to "fix it" is moving all external repos
to gitlab repositories mirrors.


> > Or we could add one more special container just for running tests, but
> > then it is one more container to maintain and keep up-to-date.
> > 
> > This is why I chose as a compromise to keep the underlying container as
> > is, and just apt-get the extra 3-4 packages here. It is the same thing
> > we do on ARM: automation/scripts/qemu-alpine-arm64.sh. Also keep in mind
> > that this job is run in the "test" step where we have far fewer jobs at
> > the moment and the runners are not busy. (It would be different in the
> > "build" step where we have many jobs.)
> 
> I don't really see any difference between a "test" job and a "build"
> jobs, both kind use the same resource/runner.
> 
> On that note, they're is something I've learned recently: "test" job
> don't even have to wait for all "build" job to complete, they can run in
> parallel. We would just need to replace "dependencies" by "needs":
>     https://docs.gitlab.com/ee/ci/yaml/index.html#needs
> But that could be an improvement for an other time and only a side note
> for the patch.

That is really cool! I didn't know about it. Yes, currently there is a
big distinction between build and test jobs because build jobs are the
bottleneck and test jobs don't start before all the build jobs finish.


> > I am not entirely sure what is the best solution overall, but for this
> > series at this stage I would prefer to keep the same strategy used for
> > the ARM tests (i.e. reuse the debian unstable build container and
> > apt-get the few missing packages.) If we do change the way we do it, I
> > would rather change both x86 and ARM at the same time.
> 
> I'm pretty sure the best strategy would be to do as little as possible
> during a job, download as little as possible and possibly cache as much
> as possible and do as much as possible ahead of time. Feel free to
> change the Arm test, but I don't think it is necessary to change the Arm
> test at the same time as introducing an x86 test.

I agree.

At the same time it would be nice to follow the same strategy between
x86 and ARM going forward: if one optimization is made for one, it is
also made for the other.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.