WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

RE: [Xen-devel] Clock jumped 50 minutes in dom0 caused incorrect 2008 R2

To: Jeremy Fitzhardinge <jeremy@xxxxxxxx>, Mark Adams <mark@xxxxxxxxxxxxxxxxxx>
Subject: RE: [Xen-devel] Clock jumped 50 minutes in dom0 caused incorrect 2008 R2 domU time
From: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>
Date: Thu, 7 Oct 2010 07:04:18 -0700 (PDT)
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date: Thu, 07 Oct 2010 07:05:22 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4CACA26A.10007@xxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <20101006111618.GA31233@xxxxxxxxxxxxxxxxxx> <4CAC98BF.9010902@xxxxxxxx> <20101006161529.GA3635@xxxxxxxxxxxxxxxxxx 4CACA26A.10007@xxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Hi Jeremy and Mark --

Oddly, I saw that "clocksource tsc unstable" message myself
on a busy 2.6.36-rc5 PV domain yesterday.  While it is possible
that this reflects a hardware problem, the fact that you
saw it on a Nehalem+ Intel processor makes it very unlikely.
The "s" and "t" debug keys (the output of which can be seen via
"xm debug-key s; xm dmesg | tail" in dom0) can help diagnose
the problem if it is indeed a hardware problem or BIOS
problem or the result of a CPU hot-add... all unlikely.

It IS possible that the code that emulates tsc is broken
somewhere, but I don't think tsc should be emulated by
default for dom0 on a Nehalem+ box... and even if it is,
it is directly based on Xen system time which, if it went
awry, would probably cause major problems.

Looking through the Linux code that prints that message (in
kernel/time/clocksource.c) it appears that the message
appears if the tsc deviates from the "watchdog clocksource",
which in PV domains is "xen" (or more precisely pvclock
I think).  So most likely, this is a symptom of a problem
with pvclock or the watchdog code in the pvops kernel, not
an indicator that the tsc is actually unstable.

Dan

> -----Original Message-----
> From: Jeremy Fitzhardinge [mailto:jeremy@xxxxxxxx]
> Sent: Wednesday, October 06, 2010 10:23 AM
> To: Mark Adams
> Cc: xen-devel@xxxxxxxxxxxxxxxxxxx; Dan Magenheimer
> Subject: Re: [Xen-devel] Clock jumped 50 minutes in dom0 caused
> incorrect 2008 R2 domU time
> 
>  On 10/06/2010 09:15 AM, Mark Adams wrote:
> > On Wed, Oct 06, 2010 at 08:41:51AM -0700, Jeremy Fitzhardinge wrote:
> >>  On 10/06/2010 04:16 AM, Mark Adams wrote:
> >>> Hi Xen-Devel's
> >>>
> >>> Please see my note below regarding a serious issue where my clock
> jumped
> >>> in dom0. I'm sending this through to the devel list as I haven't
> managed
> >>> to glean any clear help from xen-users and the debian bug team are
> >>> unsure what could have caused this.
> >>>
> >>> Can you confirm if the kernel or xen controls the clock in dom0? I
> also
> >>> understand that this could be an underlying hardware issue but I
> have
> >>> another system on exactly the same hardware which hasn't had this
> occur.
> >> The kernel manages its own time, but it uses the Xen system clock as
> its
> >> timebase.  If the Xen system clock is unstable for some reason, then
> it
> >> will affect the kernel's timekeeping.
> >>
> >> Nothing should be using the tsc clocksource, so I'm not sure why its
> >> reporting any kinds of messages.  No PV Xen domain can expect the
> raw
> >> tsc to be stable.
> > The message was reported in dom0, not domU.
> 
> Dom0 is a normal PV domain.  It just has a few more privileges than a
> regular domU.
> 
> >> But the tsc is the basis for the Xen clocksource, and if the tsc is
> >> unstable in unexpected ways then it can affect Xen timekeeping.
> This
> >> can be caused by certain power management modes.
> >>
> >>> Any advice on how to investigate further or ensure better clock
> >>> stability across dom0 and domU would be appreciated.
> >> What type of system is it?  How many CPUs?  What CPU vendor?
> > It is a Tyan S7010AGM2NRF with 2 intel quad core Xeon E5620 CPU's.
> 
> I forget all the magic options that can affect timekeeping (cc:d Dan,
> since this stuff is close to his heart).
> 
>     J
> 
> > Thanks,
> > Mark
> >
> >>> Also is it correct behaviour for Xen to reboot an 2008 R2 HVM domU
> if
> >>> the time moves this much? My guess is that the domU crashed when
> the
> >>> time changed, and was thus rebooted automatically. Strangely the
> Windows
> >>> 2003 server didn't get rebooted.
> >> I don't think there would be any direct connection between the dom0
> time
> >> jump and Windows dying, but if the CPU's tsc and/or Xen's
> timekeeping is
> >> unstable, then Windows might also see a similar time jump and react
> badly.
> >>
> >>     J
> >>
> >>> If you need any more info to help please let me know.
> >>>
> >>> Thanks,
> >>> Mark
> >>>
> >>> On Mon, Oct 04, 2010 at 01:00:51PM +0100, Mark Adams wrote:
> >>>> On Mon, Oct 04, 2010 at 11:01:10AM +0100, Mark Adams wrote:
> >>>>> Hi All,
> >>>>>
> >>>>> Im running Xen 4.0.1-rc6 Debian squeeze with pvops 2.6.32-21
> kernel.
> >>>>> Today I noticed (when kerberos to the domain controllers stopped
> >>>>> working..) that the clock was 50 minutes out in dom0 -- This
> caused the
> >>>>> HVM windows domain controllers to have the wrong time.
> >>>>>
> >>>>> I'm not sure if this is a kernel issue or a xen issue, but the
> only
> >>>>> thing related is I can see the following in the kernel log:
> >>>>>
> >>>>> Oct  2 18:50:33 havhost1 kernel: [623480.977748] Clocksource tsc
> unstable (delta = -2999660303788 ns)
> >>>>>
> >>>>> But I also see in the dmesg log that xen is using it's own clock.
> >>>>>
> >>>>> [    7.676563] Switching to clocksource xen
> >>>>>
> >>>>> I can't identify anything else in the logs to indicate when the
> time
> >>>>> might have changed. I have a few other dom0 at the same level
> that
> >>>>> haven't decided to change the time.
> >>>>>
> >>>>> Can anyone confirm whether xen controls the time or the kernel?
> Also
> >>>>> when I corrected the time in dom0 it was still wrong in HVM domU
> -- How
> >>>>> long does it take for this to propogate? (I rebooted the VM's to
> correct
> >>>>> it immediately).
> >>>>>
> >>>>> Any other pointers on how to ensure stability of clocks from dom0
> to
> >>>>> domU HVM hosts (and pv for that matter..) would be appreciated.
> >>>> Some further info on this, It appears the HVM domU (windows server
> 2008)
> >>>> unexpectedly shut down at 18:51, after the unstable clocksource
> error.
> >>>> qemu-dm logs show a reset "reset requested in cpu_handle_ioreq."
> and
> >>>> xend.log shows a reboot
> >>>>
> >>>> [2010-10-02 18:51:03 1759] INFO (XendDomainInfo:2088) Domain has
> shutdown: name=ha-dc1 id=2 reason=reboot.
> >>>>
> >>>> This is like someone issuing "xm reboot domain" is it not? Is it
> >>>> possible that xen could have issued this reboot itself due to a
> crash? I
> >>>> can't see any crash logs.
> >>>>
> >>>> Cheers,
> >>>> Mark
> >>> _______________________________________________
> >>> Xen-devel mailing list
> >>> Xen-devel@xxxxxxxxxxxxxxxxxxx
> >>> http://lists.xensource.com/xen-devel
> >>>
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel