WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Time skew on HP DL785 (and possibly other boxes)

To: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>
Subject: Re: [Xen-devel] Time skew on HP DL785 (and possibly other boxes)
From: Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Date: Fri, 27 Mar 2009 15:36:59 -0700
Cc: john.v.morris@xxxxxx, "Xen-Devel \(E-mail\)" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Fri, 27 Mar 2009 15:37:30 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <055de860-7f5f-496c-81ae-df1bf383d4bc@default>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <055de860-7f5f-496c-81ae-df1bf383d4bc@default>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Thunderbird 2.0.0.21 (X11/20090320)
Dan Magenheimer wrote:
However, I'm told that its not possible to route a clocksource
over hypertransport, so TSC's on processors on different
motherboards may be VERY different and apparently the
mechanisms for synchronizing Xen system time across
motherboards may not be up to the challenge.  As a result,
OS's and apps sensitive to time that are running on PV
domains may be in for a rough ride on systems like this.
(HVM domains may run into other problems because time will
apparently stop for a "long time".)

I don't see what the problem is. If each individual cpu has well known tsc parameters (rate and offset), then a PV client will get those timing parameters and use it to compute its time. It doesn't matter if they're syncronized between cpus or nodes.

Xen will need to calibrate each of them against a good reference (hpet?), but that's no different from now. I guess its possible that this system has more variation and latency for hpet access, which may mean that the calibration algorithm needs tweaking.

Of course, if the tsc rates on each cpu is changing in some unpredictable way then that's a whole other barrel of problems. Guests rely on Xen maintaing accurate tsc timing parameters.

Since systems like this are targeted for consolidation
and virtualization, I see this as a potentially big problem
as it may appear to real Xen customers as bizarre
non-reproducible problems, such as "make" failing,
leading to questions about the stability and viability
of using Xen.

Comments?

In Linux there's this function:

/*
* apic_is_clustered_box() -- Check if we can expect good TSC
*
* Thus far, the major user of this is IBM's Summit2 series:
*
* Clustered boxes may have unsynced TSC problems if they are
* multi-chassis. Use available data to take a good guess.
* If in doubt, go HPET.
*/
__cpuinit int apic_is_clustered_box(void)
{...}


Which deals with Summit2 and ScaleSMP vsmp systems which also have unsynchronized tscs across nodes. At the moment it assumes that no non-VSMP AMD system has unsynchronized tscs; sounds like it will need updating for this system.

   J

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel