[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] QEMU bumping memory bug analysis



On Fri, 5 Jun 2015, Wei Liu wrote:
> Hi all
> 
> This bug is now considered a blocker for 4.6 release.
> 
> The premises of the problem remain the same (George's translated
> version):
> 
> 1. QEMU may need extra pages from Xen to implement option ROMS, and so at
>    the moment it calls set_max_mem() to increase max_pages so that it can
>    allocate more pages to the guest.  libxl doesn't know what max_pages a
>    domain needs prior to qemu start-up.
> 
> 2. Libxl doesn't know max_pages even after qemu start-up, because there
>    is no mechanism to communicate between qemu and libxl.

I might not know what is the right design for the overall solution, but
I do know that libxl shouldn't have its own state tracking for
max_pages, because max_pages is kept, maintained and enforced by Xen.

Ian might still remember, but at the beginning of the xl/libxl project,
we had few simple design principles. One of which was that we should not
have two places where we keep track of the same thing. If Xen keeps
track of something, libxl should avoid it.

In this specific case, libxl can ask Xen at any time what max_pages is
for the domain, so I don't think that libxl should store it or have its
own tracking for it.

Even if QEMU called into libxl to change max_pages, I don't think that
libxl should store max_pages anywhere. It is already stored in Xen and
can be retrieved at any time.


> 3. QEMU calls xc_domain_setmaxmem to increase max_pages by N pages.
>    Those pages are only accounted for in the hypervisor.  Libxl
>    (currently) does not extract that value from the hypervisor.
> 
> Several solutions were proposed:
> 
> 1. Add a new record type in libxc migration stream and call setmaxmem
>    in the middle of xc migration stream.
> 
> Main objections are calling xc_domain_setmaxmem in the middle of xc
> migration stream is layer violation. Also this prevents us from
> disaggregating domain construction to a less privileged domain.

It seems to me that max_pages is one of the memory properties of the
domain, so it should be saved and restored together with the rest of
memory.


> 2. Use libxl toolstack save restore blob to tranmit max pages
>    information to remote end.
> 
> This is considered a bodge and has been proven not to work because
> toolstack blob restore happens after xc_domain_restore.

Saving and restoring max_pages in libxl seems to me like a layering
violation. I would avoid 2. 3. 4. and 5.


> 3. Add a libxl layer that wraps necessary information, take over
>    Andrew's work on libxl migration v2.  Having a libxl layer that's not
>    part of migration v2 is a waste of effort.
> 
> There are several obstacles for libxl migration v2 at the moment. Libxl
> layer in migration v2 still has unresolved issues. It has
> inter-dependency with Remus / COLO.
> 
> Most importantly it doesn't inherently solve the problem. It still
> requires the current libxl JSON blob to contain information about max
> pages (or information used to derive max pages).
> 
> Andrew, correct me if I'm wrong.
> 
> 4. Add a none user configurable field in current libxl JSON structure to
>    record max pages information.
> 
> This is not desirable. All fields in libxl JSON should be user
> configurable.
> 
> 5. Add a user configurable field in current libxl JSON structure to
>    record how much more memory this domain needs. Admin is required to
>    fill in that value manually. In the mean time we revert the change in
>    QEMU and declare QEMU with that change buggy.

QEMU 2.3.0 was released with that change in it, so it is not quite
possible to revert it. Also I think it is the right change for QEMU.


> No response to this so far. But in fact I consider this the most viable
> solution.
> 
> It's a simple enough solution that is achievable within 4.6 time frame.
> It doesn't prevent us from doing useful work in the future
> (disaggregated architecture with stricter security policy). It provides
> a way to work around buggy QEMU (admin sets that value to prevent QEMU
> from bumping memory limit). It's orgthogonal to migration v2 which means
> it won't be blocked by migration v2 or block migration v2.
> 
> I tend to go with solution 5. Speak up if you don't agree with my
> analysis or you think I miss some aspects.

I disagree


> For long term we need to:
> 
> 1. Establish libxl as the arbitrator how much pages a domain can have.
>    Anything else doing stuff behind arbitrator's back is considered
>    buggy. This principle probably apply to other aspects of a domain as
>    well.

I disagree that libxl should be the arbitrator of a property that is
stored, maintained and enforced by Xen. Xen should be the arbitrator.


> 2. Work out a solution communicate between QEMU and libxl. This can be
>    expanded to cover other components in a Xen setup, but currently we
>    only have QEMU.

Even if QEMU called into libxl to change maxmem, I don't think that
libxl should store maxmem anywhere. It is already stored in Xen.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.