This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


RE: [Xen-devel] RE: rdtsc strangeness on upstream kernel

> On 07/19/2010 05:42 PM, Dan Magenheimer wrote:
> > Yes, CONFIG_XEN_DEBUG_FS is on.  I dug up some instrumentation
> > I used in the hypervisor last fall which should identify the
> > kernel locations of the rdtsc's.  I'll plug that in and
> > post the results tomorrow.
> They're probably all in the same place, so you'll need to get some
> calling stack to make sense of it.
>     J

Sorry this took awhile.  I've been hacking and instrumenting and
measuring and the results are not very good.

I'm cc'ing Andrew Jones since things look worse on stock RHEL6b2.

First, I've measured on both PV 2.6.35-rc4 ("upstream") and PV
2.6.32-37 ("rhel6b2").  I've attached the patch for rhel6b2 that
I used for measurement; the same code fragments were used for

Rhel6b2 had HZ=1000 and had vcpus=2.  Upstream had HZ=250 and vcpus=1.

Both had the "xen_sched_clock" call changed to xen_clocksource_read.

The workload is simply "main() {while(1); }" run in background.
On vcpus=2, I ran two in background.

Both had CONFIG_XEN_FS_DEBUG set, but the workload should encounter
few or no kernel spinlocks and the measurements confirmed this.
Rhel6b2 had HIGH_RES_TIMERS and SCHED_HRTICK set; they were not
set on upstream... don't know if this matters.

BOTTOM LINE: A cpu-loaded vcpus=2 HZ=1000 rhel6b2 guest had close to
20000 rdtsc/sec.  Unloaded, the same guest had about 600 rdtsc/sec.
A cpu-loaded HZ=250 upstream guest had about 4000 rdtsc/sec,
regardless of whether it was loaded or not.

Breakdown: I instrumented several of the primary sources of calls
to native_read_tsc.  No rdtsc instructions occurred, except the
one in native_read_tsc.  (Some small number are binary translated
to call native_read_tsc.)

The most frequent users of rdtsc:
(A) sched_clock via xen_clocksource_read->pvclock_clocksource_read->
(B) clock->read via xen_clocksource_read
(C) set_next_event via xen_timerop_clockevent

I measured (running the workload overnight) how frequently each
of these paths resulted in a call to native_read_tsc.  Each
of these functions go through this path exactly once so this
is also a frequency (per second) count for the callers.

ALL 4672..18689
(A) 2149...8428 (~8*HZ)
(B) 1350...9456 (~(4*HZ+?)*vcpus)
(C)  524...2102 (~2*HZ)

The measurement code outputs frequency every 2**20 samples
and I did a cursory review to verify that the frequency
samples were consistent. There is some small variance,
which accounts for the fact that the numbers above (which
were from one sample) don't quite add up.  So please
consider them "trend indicators" rather than exact measurements.
I also did verify the total rdtsc usage counts against the
"xm debug-key s" output from the Xen console.

1) Frequency of rdtsc use is much higher than expected, especially
   when HZ=1000.  Sadly, the divider= kernel parameter may still
   be necessary for RHEL6b2 on older machines where Xen must do
   rdtsc emulation... and for any guest that has migrated (or will
   migrate?)... unless this high frequency identifies a fixable bug.
2) There is some fix in rhel6b2 that hides this when vcpus are inactive.
   Should this go upstream so idle guests don't consume so much CPU?
3) Why is sched_clock called at 8*HZ?!? Why is clock->read called at
   4*HZ (on every processor)? !? Why is set_next_event called
   at 2*HZ?!?  Even with hardware rdtsc, this seems like a potential
   (though admittedly small) performance issue.

Attachment: linux-2.6.32-37-tsc.patch
Description: Binary data

Xen-devel mailing list