This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


RE: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB

To: Jan Beulich <JBeulich@xxxxxxxxxx>, Keir Fraser <keir.fraser@xxxxxxxxxxxxx>, mukesh.rathor@xxxxxxxxxx
Subject: RE: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
From: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>
Date: Tue, 22 Dec 2009 09:27:01 -0800 (PST)
Cc: kurt.hackel@xxxxxxxxxx, Jeremy Fitzhardinge <jeremy@xxxxxxxx>, Xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date: Tue, 22 Dec 2009 09:28:08 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4B31051D02000078000273CB@xxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
So, checking my understanding, the underlying problem is that
shadow->tsc_timestamp has essentially stopped but hardware tsc has
continued moving forward?  Thus in timer_interrupt() (in time-xen.c)
shadow->system_timestamp will be stale and so get_nsec_offset()
is returning a large number, resulting in a large delta,
which in turn causes jiffies to be incremented by a large
amount which, if the interrupt happens by coincidence in the
middle of the first while loop in calibrate_delay_direct()
(in init/calibrate.c) and the large jiffies increment happens
to be enough to wrap, the while loop will run for weeks.

If this is right, I'm still not clear on how it can be fixed
in Xen.

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@xxxxxxxxxx]
> Sent: Tuesday, December 22, 2009 9:43 AM
> To: Keir Fraser; Dan Magenheimer; Mukesh Rathor
> Cc: Jeremy Fitzhardinge; Xen-devel@xxxxxxxxxxxxxxxxxxx; Kurt Hackel
> Subject: Re: [Xen-devel] [timer/ticks related] dom0 hang 
> during boot on
> large 1TB system
> >>> "Jan Beulich" <JBeulich@xxxxxxxxxx> 22.12.09 17:33 >>>
> >One other irregular at the first glance thing is that the mentioned
> >very first run through time_calibration() does not seem to result in
> >running local_time_calibration() on CPU0. One invocation (apparently
> >independent of time_calibration()) happens right before Dom0 starts
> >executing.
> And that's of course the problem: CPU0's TIME_CALIBRATE_SOFTIRQ can't
> get serviced until entry to Dom0, but CPU0 is responsible for 
> re-arming
> calibration_timer. Hence there's a gap of calibrations, 
> resulting in an
> excessive delta observed during the first timer interrupt in Dom0.
> Jan

Xen-devel mailing list

<Prev in Thread] Current Thread [Next in Thread>