[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Buggy interaction of live migration and p2m updates



On Thu, 2014-11-20 at 18:28 +0000, Andrew Cooper wrote:
> Realistically, this means no updates to the
> p2m at all, due to several potential race conditions.

From the rest of the mail it seems as if you are talking primarily about
changes to the p2m *structure*, i.e. which guest frames contain the p2m
pages, rather than changes to the p2m entries themselves. Is that
correct?

I don't see any (explicit) mention of the pfn_to_mfn_frame_list_list
here, where does that fit in?

> As far as these issues are concerned, there are two distinct p2m
> modifications which we care about:
> 1) p2m structure changes (rearranging the layout of the p2m)
> 2) p2m content changes (altering entries in the p2m)
> 
> There is no possible way for the toolstack to prevent a domain from
> altering its p2m.  At the moment, ballooning typically only occurs when
> requested by the toolstack, but the underlying operations
> (increase/decrease_reservation, mem_exchange, etc) can be used by the
> guest at any point.  This includes Wei's guest memory fragmentation
> changes.  Changes to the content of the p2m also occur for grant map and
> unmap operations.
> 
> 
> Currently in PV guests, the p2m is implemented using a 3-level tree,
> with its root in the guests shared_info page.  It provides a hard VM
> memory limit of 4TB for 32bit PV guests (which is far higher than the
> 128GB limit from the compat p2m mappings), or 512GB for 64bit PV guests.
> 
> Juergen has a proposed new p2m interface using a virtual linear
> mapping.  This is conceptually similar to the previous implementation
> (which is fine from the toolstacks point of view), but far less
> complicated from the guests point of view, and removes the memory limits
> imposed by the p2m structure.
> 
> The new virtual linear mapping suffers from the same interaction issues
> as the old 3-level tree did, but the introduction of the new interface
> affords us an opportunity to make all API modifications at once to
> reduce churn.
> 
> 
> During live migration, the toolstack maps the guests p2m into a linear
> mapping in the toolstacks virtual address space.  This is done once at
> the start of migration, and never subsequently altered.  During live
> migration, the p2m is cross-verified with the m2p, and frames are sent
> using pfns as a reference, as they will be located in different frames
> on the receiving side.
> 
> Should the guest change the p2m structure during live migration, the
> toolstack ends up with a stale p2m with a non-p2m frame in the middle,
> resulting in bogus cross-referencing.  Should the guest change an entry
> in the p2m, the p2m frame itself will be resent as it would be marked as
> dirty in the logdirty bitmap, but the target pfn will remain unsent and
> probably stale on the receiving side.
> 
> 
> Another factor which needs to be taken into account is Remus/COLO, which
> run the domains under live migration conditions for the duration of
> their lifetime.
> 
> During the live part of migration, the toolstack already has to be able
> to tolerate failures to normalise the pagetables, which result as a
> consequent of the pagetables being in active.  These failures are fatal
> on the final iteration after the guest has been paused, but the same
> logic could be extended to p2m/m2p issues, if needed.
> 
> 
> There are several potential solutions to these problems.
> 
> 1) Freeze the guests p2m during live migrate
> 
> This is the simplest sounding option, but is quite problematic from the
> point of view of the guest.  It is essentially a shared spinlock between
> the toolstack and the guest kernel.  It would prevent any grant
> map/unmap operations from occurring, and might interact badly with
> certain p2m updated in the guest which would previously be expected to
> unconditionally succeed.
> 
> Pros) (Can't think of any)
> Cons) Not easy to implement (even conceptually), requires invasive guest
> changes, will cripple Remus/COLO
> 
> 
> 2) Deep p2m dirty tracking
> 
> In the case that a p2m frame is discovered dirty in the logdirty bitmap,
> we can be certain that a write has occurred to it, and in the common
> case, means that the mapping has changed.  The toolstack could maintain
> a non-live copy of the p2m which is updated as new frames are sent. 
> When a dirty p2m frame is found, the live and non-live copies can be
> consulted to find which pfn mappings have changed, and locally mark all
> the altered pfns for retransmit.
> 
> Pros) No guest changes required
> Cons) Toolstack needs to keep an additional copy of the guests p2m on
> the sending side
> 
> 3) Eagerly check for p2m structure changes.
> 
> p2m structure changes are rare after boot, but not impossible.  Each
> iteration of live migration, the toolstack can check for dirty
> higher-level p2m frames in the dirty bitmap.  In the case that a
> structure update occurs, the toolstack can use information it already
> has to calculate a subset of pfns affected by the update, and mark them
> for resending.  (This can currently be done to the frame granularity
> given the p2m frame lit, but in combination with 2), could result in
> fewer pfns needing resending.)
> 
> Pros) No guest changes required.
> Cons) Moderately high toolstack overhead,  Possibility to resend far
> more pfns than strictly required.
> 
> 4) Request p2m structure change updates from the guest
> 
> The guest could provide a "p2m generation count" to allow the toolstack
> to evaluate whether the structure had changed.  This would allow the
> live part of migration to periodically re-evaluate whether it should
> remap the p2m to avoid stale mappings.
> 
> Pros) Easy to implement alongside the virtual linear mapping support. 
> Easy for toolstack and guest
> Cons) Only works with new virtual linear guests.
> 
> 
> Proposed solution:  A combination of 2, 3 and 4.
> 
> For legacy 3-level p2m guests, the toolstack can detect p2m structure
> updates by tracking the p2m top and mid levels in the logdirty bitmap,
> and invalidating the modified subset of pfns.  It has to eagerly check
> the p2m frame list list mfn entry in the shared info to see whether the
> guest has swapped onto a completely new p2m.
> 
> For a virtual linear map, the intermediate levels are not available to
> track, but we can require that the guest increment p2m generation clock
> in the shared info.  When the structure changes, the toolstack can remap
> the p2m and calculate the altered subset of pfns, and mark for resend.
> 
> The toolstack must also track changes in the p2m itself, and compare to
> a local copy showing the mapping at the time at which the pfn was last
> sent.  This can be used to work out which p2m mappings have changed, and
> also be used to confirm whether the pfns on the receiving side are stale
> or not.
> 
> I believe this covered all cases and race conditions.  In the case that
> the p2m is updated before the m2p, the p2m frame will be marked dirty in
> the bitmap, and discoverable on the next iteration.  At that point, if
> the p2m and m2p are inconsistent, the pfn will be deferred until the
> final iteration.  If not, the frame is sent and everything is all ok. 
> In the case that the p2m is updated after the m2p, the p2m/m2p will be
> consistent when the dirty bitmap is acted on.
> 
> 
> Thoughts? (for anyone who has made it this far :)  I think I have
> covered everything.)
> 
> ~Andrew
> 



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.