[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!

> I wonder if we couldn't do something when we know that we're scheduling
> a VPCU onto a different CPU to ensure time can't go backwards.

Again no guarantees but I think we are now under the magic
threshold where the skew is smaller than the time required
for scheduling a VCPU onto a different CPU.  If so,
consecutive gethrtime's by the same thread in a domain
should always be monotonic.

The overhead of measuring the inter-CPU stime skew is
too large to do at every cross-PCPU-schedule so doing
any kind of adjustment would be difficult.
But it might make sense for the Xen scheduler to do a
get_s_time() before and after a cross-PCPU-schedule
to detect the problem and printk if it occurs
(possibly rate-limited in case it happens a lot on
some badly-behaved machine).

> -----Original Message-----
> From: John Levon [mailto:levon@xxxxxxxxxxxxxxxxx]
> Sent: Wednesday, August 06, 2008 7:38 AM
> To: Dan Magenheimer
> Cc: Ian Pratt; Xen-Devel (E-mail); Dave Winchell; Keir Fraser
> Subject: Re: [Xen-devel] RE: [PATCH] rendezvous-based local time
> calibration WOW!
> On Wed, Aug 06, 2008 at 07:25:50AM -0600, Dan Magenheimer wrote:
> > > > I'm not sure its possible to guarantee monotonicity in
> > > > PV domains (without a global lock) except by doing a trap
> > > > or hypercall at each "get time".
> > >
> > > That's a shame.
> >
> > Further followup on this...
> >
> > I'd encourage you to put some test code in your lock to
> > see if time ever measurably goes backwards.  It may never,
> > or it may only on some ill-behaved-tsc machines or when
> > cpufreq changes occur... needs testing.  Even if it
> > does, it may be by a smaller delta than all but the
> > most sophisticated SMP applications can detect.
> I believe the normal (metal) Solaris algorithm expects any 
> inter-CPU TSC
> differences to remain static (that is, no drift), so any machine that
> breaks that is problematic:
> http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/



The presumption is that gethrtimef() is monotonically increasing, which
at least Xen 3.0.4 regularly broke. If the hypervisor has been fixed to
give as much guarantees as we got already then great.

A monotonic gethrtime() is part of the ABI so I'm not sure we can avoid
a lock even on well-behaved machines if Xen isn't correct.

I wonder if we couldn't do something when we know that we're scheduling
a VPCU onto a different CPU to ensure time can't go backwards.

Anyway, some more testing sounds like it would be interesting.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.