[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH RFC 08/20] libxl/migration: add precopy tuning parameters



On Wed, Apr 12, 2017 at 04:37:16PM +0100, Wei Liu wrote:
> On Thu, Mar 30, 2017 at 02:03:29AM -0400, Joshua Otto wrote:
> > I guess the way I had imagined an administrator using them would be in a
> > non-production/test environment - if they could run workloads
> > representative of their production application in this environment, they
> > could experiment with different --precopy-iterations and
> > --precopy-threshold values (having just a high-level understanding of
> > what they control) and choose the ones that result in the best outcome
> > for later use in production.
> > 
> 
> Running in a test environment isn't always an option -- think about
> public cloud providers who don't have control over the VMs or the
> workload.

Sure, it definitely won't always be an option, but sometimes it might.
The question is whether or not the benefit in the cases where it can be
used justifies the added complexity to the interface.  I think so, but
that's just my intuition.

> > > 
> > > The plan, following migration v2, was always to come back to this and
> > > see about doing something better than the current hard coded parameters,
> > > but I am still working on fixing migration in other areas (not having
> > > VMs crash when moving, because they observe important differences in the
> > > hardware).
> > 
> > I think a good strategy would be to solicit three parameters from the
> > user:
> > - the precopy duration they're willing to tolerate
> > - the downtime duration they're willing to tolerate
> > - the bandwidth of the link between the hosts (we could try and estimate
> >   it for them but I'd rather just make them run iperf)
> > 
> > Then, after applying this patch, alter the policy so that precopy simply
> > runs for the duration that the user is willing to wait.  After that,
> > using the bandwidth estimate, compute the approximate downtime required
> > to transfer the final set of dirty-pages.  If this is less than what the
> > user indicated is acceptable, proceed with the stop-and-copy - otherwise
> > abort.
> > 
> > This still requires the user to figure out for themselves how long their
> > workload can really wait, but hopefully they already had some idea
> > before deciding to attempt live migration in the first place.
> > 
> 
> I am not entirely sure what to make of this. I'm not convinced using
> durations would cover all cases, but I can't come up with a counter
> example that doesn't sound contrived.
> 
> Given this series is already complex enough, I think we should set this
> aside for another day.
> 
> How hard would it be to _not_ include all the knobs in this series?

Fair enough.  It wouldn't be much trouble, so I'll drop it for now.

As a general comment on the patch series for anyone following: I've just
finished with the last of my academic commitments and now have time to
pick this back up.  I'll follow up in the next few weeks with the
suggested revisions, the design document and the quantitative
performance evaluation.

Thanks!

Josh

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.