|
|
|
|
|
|
|
|
|
|
xen-devel
[Xen-devel] rdtsc hypercall, from userland?!? (was: rdtsc: correctness v
(Although Jeremy and others are still discussing how to
implement vsyscall+pvclock for upstream Linux, I am still
looking for a way to allow apps to use rdtsc without
suffering the performance loss from rdtsc emulation
so I've begun a new thread.)
To recap: In order to properly implement the required
semantics of the rdtsc instruction in a virtual environment,
the current Xen method of allowing the rdtsc instruction
to execute natively is insufficient and may lead to
random failure, possibly resulting in data loss.
Upstream Xen now has a boot option to force rdtsc
to be emulated for both hvm and pv guests. Soon
this will be controlled by a per-guest vm.cfg option.
The default will likely be emulation.
However, some apps do tens-to-hundreds of thousands
of rdtsc's per core per second. On my dual-core Conroe
box, an rdtsc instruction takes about 22ns in hardware
and about 360ns to emulate. So emulation may slow
performance in the worst case by as much as 5-10%.
Vsyscall+pvclock in upstream 64-bit Linux may be
the right answer at some point in the future. BUT
(IMPORTANT NEW POINT!!!) the pvclock algorithm requires
an rdtsc instruction, and there is no way to
emulate some guest rdtsc instructions (e.g. only
those in apps) and not others (e.g. only those in
the kernel). Thus, for guests that have rdtsc emulation
enabled, vsyscall+pvclock will be SLOWER than emulation,
thus meaning it is still not a palatable alternative.
I'm looking for something that provides correctness
TODAY with less of a performance hit AND does not
require guest operating systems to change. (App
changes and Xen changes are allowed.)
Previous attempts have run into insurmountable x86
architecture barriers (see the previous thread).
But it recently occurred to me to compare the
performance of a hypercall vs rdtsc emulation.
The results are promising, at least on 64-bit guests:
rdtsc native: 22ns
rdtsc emulated: 360ns
nearly-NULL hypercall (32b guest): 260ns
nearly-NULL hypercall (64b guest): 125ns
(Note these measurements are normal kernel-land
hypercalls.) Currently all hypercalls from userland
are illegal, but this need not be the case for ALL
hypercalls. Is it possible
for Xen to implement a "rdtsc hypercall" that
is executable from userland, without requiring
OS changes? Early discussions look promising.
Certainly, it makes sense to implement a normal
kernel-callable rdtsc hypercall so that
vsyscall+pvclock can execute more quickly.
I'll be taking a look at that, but I'd be grateful
for assistance in architecting a userland hypercall
mechanism that will work for "hyper-rdtsc".
(While implementing a userland "hyper-rdtsc" is
highest priority, I'd also be interested in whether
the mechanism can be more generic... I'd like
to explore the use of tmem from apps, Ian
Pratt has suggested that userland hypercalls
might be interesting for blktap, and there are
probably other OS-independent ideas to explore
assuming security issues can be handled.)
Thanks,
Dan
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
<Prev in Thread] |
Current Thread |
[Next in Thread>
|
- [Xen-devel] rdtsc hypercall, from userland?!? (was: rdtsc: correctness vs performance on Xen),
Dan Magenheimer <=
|
|
|
|
|