[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 09/29] [HACK] tools/libxc: save/restore v2 framework




On 15/09/2014 19:58, Konrad Rzeszutek Wilk wrote:
On Mon, Sep 15, 2014 at 04:09:51PM +0100, Andrew Cooper wrote:
On 14/09/2014 11:23, Shriram Rajagopalan wrote:
On Sep 11, 2014 4:08 AM, "Andrew Cooper" <andrew.cooper3@xxxxxxxxxx
<mailto:andrew.cooper3@xxxxxxxxxx>> wrote:
On 11/09/14 12:01, Ian Campbell wrote:
On Thu, 2014-09-11 at 11:37 +0100, Andrew Cooper wrote:
On 11/09/14 11:34, Ian Campbell wrote:
On Wed, 2014-09-10 at 18:10 +0100, Andrew Cooper wrote:
For testing purposes, the environmental variable
"XG_MIGRATION_V2" allows the
two save/restore codepaths to coexist, and have a runtime switch.

It is indended that once this series is less RFC, the v2
framework will
completely replace v1.
I think we are now at the point where this hack needs to be
dropped from
the series.
One problem is remus.  My plan when dropping this patch was to
The other is 'tmem'. But 'tmem' has not yet been declared 'baked' so
not making it work from a release perspective is OK.

With the 'tmem' maintainer hat on, however I would like to it work without
having to do anything :-) Which reminds me I need to follow up
on double-checking the migation hasn't bitrotten!

While reverse engineering the existing protocol is not too difficult, I think the TMEM migration needs redesigning. From memory, there is a huge quantity of metadata which is sent redundantly (tmem pool uuid with every frame). It would also benefit massively from some batching to help reduce the quantity of hypercalls made (5 per frame iirc).



drop all
of xc_domain_{save/restore}.c as well, but without remus migration-v2
support available, this will break existing set-ups.
And by 'set-ups' you mean Xen 4.5 using the v1 migration tools and then
out of tree patches on top of that. In other words, users of the libxc
"API" (which we do not gurantee between releases - it is an internal
API).

Hrm, how is that going wrt 4.5 freeze?
I havenât heard seen anything since v5 of this series (for which I did
some quick bugfixes and released v6).

FYI, thats not entirely true. Yang did post a set of RFC patches for
remus
support in migration v2, based on your V6 series (back in July)
http://lists.xenproject.org/archives/html/xen-devel/2014-07/msg01163.html
My apologies - it was v6 to v6.1


It would actually be helpful if you could cc me on the patches
relevant to Remus,
or if there is anything specific to Remus that needs to be done. There
are 100s of
posts on Xen devel every day and its hard to keep track of everything
posted to
Xen devel.
I've found that putting filters for the right keywords help in that.
That is how I can subscribe to lkml without drinking the
firehouse.


And I looking at your patch sets in
http://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=shortlog;h=refs/heads/saverestore2-v6.3

I see that there is no support for Remus currently. Nor can I
differentiate which parts of the
code fix to these "quick bug fixes" that you mentioned above. From the
discussion over the remus rfc
patches, I only recall a bug related to vcpu context caching. But I
cannot delineate that specific part from
the patches in the repo. So, if these bug fixes you are referring to
are something else, please explain.
The bugfixes were referring to the vcpu context caching, but far more
bits needed caching than the remus series fixed.  The fixes were
necessary even in the non-remus case and there were also improvements to
receive side state machine to avoid vm corruption caused by an incorrect
send order.

I did not integrate the remus specific patches as there were outstanding
review concerns/comments.
<nods> My recollection as well.

I don't know, which probably means not good.

One option might be to have legacy and v2 sitting properly
side-by-side
in libxc for the transition period.
How long do you mean? Until 4.6?
fwiw, I don't plan to work on remus migration v2 support until the
remus netbuffer patches get in.
I have been at this for almost two release cycles. Its frustrating to
iterate on feedbacks for patch 4/11
of a series for two months and then get a bunch of first-pass review
for patch 6/10 at the eleventh hour
before a feature freeze, while the rest of the series has still not
been reviewed at all for the past 3 months.
What is the dependency on "full remus" support? Is there a list of
all the different patchset that need to be reviewed?

As with TMEM, remus support needs redesigning, as it needs coordinated additions to both the libxc and libxl stream formats to support checkpoints without the current layer violations.


I can appreciate your frustration on this point, and do not envy your
position.

The concern I have is that XenServer 6.5 is shipping with migrationv2 as
we absolutely need it, given the 32->64bit upgrade.  We were hoping to
get the new format committed in 4.5 to guarantee stability, but that is
looking increasingly unlikely to happen.  As a result, it will probably
have to go in early in 4.6, with extra care taken to ensure that no
incompatible changes are made as a result of further review.
Could you tell me what are the benefits of having a v1 to v2 runtime
switch for developers/users besides the obvious (faster migration,
easier to understand code)?

Users should not notice a difference, other than it being faster.

From a developer point of view,

* It actually has some header information now
* It is independent of the bitness of the toolstack (which is the key reason we needed to do it for XenServers switch from 32 to 64bit dom0) * The old format (little that it was) was basically inextensible for PV guests (See the PV MSRs thread) * It has allowed for dropping 2-level PV guest support, as well as other 32bit Xen bits.


For me it sounded that this would allow the community to also
test it and report bugs - which would be invaluable. And better
yet there is a env flag to swap between a baseline and new
code to ease the testing.

That was only supposed to be development, and removed when committed upstream.

~Andrew


The risks seem quite contained - if something goes awry, folks can
use the v1 version - which should have the same amount of bugs
that it had in previous releases. And since it is on by default - so
only dedicated users would turn v2 on.

 From an maintaince perspective, it does add more code but then once
feature freeze hits we do not pay attention to features anymore,
but rather to bug-fixes.

Hm, Ian's - what are you folks take on it?


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.