On 09/18/09 13:27, Dan Magenheimer wrote:
> If guest vm.cfg has vrdtscp=0 (default):
> rdtscp is emulated and returns nsec since guest
> boot (same as emulated rdtsc), value returned
> for TSC_AUX is -1
>
> If guest vm.cfg has vrdtscp=1:
> If underlying hardware has rdtscp support:
> rdtscp is directly executed by hardware,
> value returned for TSC_AUX is non-zero
> (see below)
> Else: (no hardware rdtscp support)
> rdtscp is emulated and returns nsec since
> guest boot, value returned for TSC_AUX is 0
>
Why do you need to distinguish between the two emulated rdtscp cases?
Special-casing a version of '0' is awkward because it would arise
naturally from version wraparound (after 2^31 time parameter updates,
but still).
If the hardware doesn't support rdtscp, how should an app know whether
or not to use it? Should it just try running rdtscp being prepared to
handle a SIGILL?
> How it works from the app point-of-view:
>
> Guest app must have some capability of getting 64-bit
> pvclock parameters directly from Xen without OS changes,
> e.g. emulated userland wrmsr, userland hypercall,
> or userland mapped shared page. (This will be done
> rarely so need not be fast! But it does create
> a new userland<->Xen ABI that must be kept compatible.)
>
> On first rdtscp, app records returned TSC_AUX value,
> verifies that it is neither 0 nor -1,
> fetches pvclock parameters from Xen, executes
> another rdtscp. If TSC_AUX matches previous value,
> app applies pvclock algorithm to tsc value to
> obtain nsec since guest boot. If TSC_AUX is
> zero or -1, tsc value IS nsec since guest boot.
> If TSC_AUX differs from last recorded value,
> fetch pvclock parameters from Xen again.
>
> On subsequent rdtscp's, app compares
> returned TSC_AUX against the previous one,
> and fetches pvclock parameters from Xen only
> if it differs (which should be rare).
>
Presumably the pvclock would contain the same version number which must
match; if not it keeps iterating (rdtscp, get-timing-parameters) until
they do.
> What Xen needs to do:
>
> Xen must record the setting for each guest's vrdtscp
> config variable and ensure that it persists across
> save/restore and migration. If the guest has
> vrdtscp=1, a vrdtscp "version" number is also
> part of the guest's state and must persist
> across save/restore/migration.
>
> Xen must know whether or not it is running on a
> machine where TSC is reliable. If TSC is NOT
> reliable AND rdtscp is supported by hardware,
> Xen must ensure that TSC_AUX is -1 on all pcpu's
> that are running a guest with vrdtscp=0, and 0
> on all pcpu's that are running a guest where
> vrdtscp=1 (and must enable CR4.TSD on those
> pcpus if it wasn't already).
If rdtscp is not reliable but Xen has accurate tsc parameter info, then
the algorithm above will still work efficiently.
> If TSC is NOT
> reliable AND rdtscp is NOT supported by hardware,
> Xen must emulate rdtscp (e.g.
> return Xen system time) and emulate the
> same behavior for TSC_AUX. If TSC IS reliable,
> Xen sets TSC_AUX to the guest's vrdtscp version
> number on all pcpu's that are running the guest.
> Finally, when a guest transitions from one
> "TSC domain" to another (restore/migrate/NUMA)
> it increments the vrdtscp version number.
>
Well, it just needs to increment it whenever Xen knows the tsc has
changed, as the current pvclock code does. It could be more frequently
than restore/migrate if tsc changes on power events.
> The only problem I can see is that when
> vrdtscp==1, other apps that are running on that guest
> that use rdtsc (no p) directly (i.e. haven't been
> modified to use pv-rdtscp) will continue to
> have the same kinds of failure on save/restore/
> migration. But this is true of all the solutions
> proposed so far: Xen can only turn on emulation
> guest-wide, not per-app.
>
Linux already reserves rdtscp for use as part of vsyscall, where TSC_AUX
contains the NUMA node and the CPU number, so there should be no "naked"
users of rdtscp.
> Also even on machines where TSC is reliable,
> there is a small chance that consecutive
> TSC values read will be from different
> processors and so TSC might appear to go
> backwards by some small amount. So apps
> must still put raw TSC values through
> a "monotonicity filter". (Xen already
> does this for emulated reads of TSC.)
>
Why? I thought "reliable" tscs were supposed to be synced between cores?
J
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|