[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] RE: rdtsc: correctness vs performance on Xen (and KVM?)



Hi Jeremy --

Thanks for the feedback!

> Making vsyscall work...

While I highly respect your opinion, and while vsyscall
may be a fine choice in the future, it just doesn't
solve the problem today and won't solve it ever for
currently shipping PV OS's.  If you can figure out a
way to allow vsyscall to be installed as a module and
still achieve its performance, it
might be a possible solution, but otherwise we have
to go around the OS to solve this problem.

The rdtsc instruction will be fully emulated by default
in Xen 4.0, and before that releases I need to find
a fast alternative for those apps that are dependent
on BOTH its correct functionality AND high performance.

> > work both on Xen and bare metal, and works properly
> > across: vcpu-to-pcpu rescheduling even on NUMA
> > machines; system sleep/hibernation; and 
> > save/restore/migration between machines with
> > dissimilar clock rates. 
> 
> But it will only do this when running under Xen.  If running on bare
> metal, there will be nothing providing the correction info to the app,
> and it will be no better than using raw rdtsc with all its 
> limitations. 
> In practice this means that the app will have to have some other code
> path anyway.

Yes, that's true.  I'm not trying to legislate whether
an app can use rdtsc or not on a physical machine, just
trying to provide the same guarantees for a rdtsc executed
in a virtual environment as already provided for a a
physical environment, but without significant performance
cost.

> > 3)  App executes a special rdmsr instruction or
> >     hypercall.
> 
> No way to do direct hypercalls from usermode, so it would 
> need to be an illegal instruction (like cpuid).
> ...and I don't think we should
> start making fake rdmsrs start working in usermode.

I'm told (by Keir) that it might be possible to allow certain
hypercalls to be executed from userland.  I haven't
investigated yet.  But a "fake rdmsr" might be a better
answer anyway; enlightened Windows and HyperV already use
a fake rdmsr, correct?

But I'm not keen on it either and am open to alternatives.

> But really it should be a system-wide kernel setting, set via 
> sysctl or something.

I'm not sure what you are suggesting here.

> > 4a) If SIGILL results, not running on Xen at all,
> >     or on old Xen; app uses rdtsc at own risk. Done.
> > 4b) Else, rdmsr/hypercall returns virtual address of
> >     special pvclock page ("pvclock_va").
> >   
> This can't be done without changing the kernel; Xen can't just start
> sticking stuff into usermode mappings (how does Xen even know where a
> given OS's usermode is?).

It doesn't have to be a usermode mapping, it just needs
to be a "magic" address; it can (for example) be in the
virtual address space Xen has reserved for itself.

> > 5)  App executes another special rdmsr instruction/
> >     hypercall to disable rdtsc emulation.  This
> >     affects ALL execution for all processes in this VM.
> 
> Once enabled, it should just stay enabled.  System-wide is very coarse
> anyway (since there's no guarantee that all apps will use the 
> mechanism).

Yes this is an ugly potential issue.  Fortunately, many
enterprise class apps essentially are the machine; and this
may be even more true in a virtualized world.

Again, I'm not keen on this either but I don't see an
alternative.

> > 6)  Xen maintains mapping of pvclock_va to a
> >     different physical page for each processor
> >     and transparently handles TLB misses for
> >     pvclock_va
> 
> If you mean that a given VA has a per-cpu mapping, it requires percpu
> pagetables.  That's not possible in Linux with PV pagetables 
> (since two
> tasks/threads on different cpus sharing the same mm will use the same
> pagetable).

What the OS can do is completely irrelevant.  The mapping
is handled entirely by Xen so the OS will never even
see a page fault for this address.

Note also that one-page-per-cpu is not needed.  The page
is readonly and there is no sensitive information in
a pvclock data structure so many per-cpu-pvclock-structs
could be on the same page.

> In general even Linux's specialised APIs are entirely unused 
> (sendfile,
> vmsplice, etc).  Something as esoteric as this will be pretty 
> much unused.

If apps are happy with the performance of emulated
rdtsc, there's no reason for them to use it, so I would
be happy if this pvtsc ABI never gets used.  However,
most enterprise apps are sensitive to a performance hit
of several percent and will be eager to try alternatives.

> This can be entirely done within the vsyscall mechansim 
> without any app
> changes.  There's no reason no to.

Performance with app portability is the reason.
 
> > P.S. While it would be nice if we could just tell
> > apps to use a fast vgettimeofday equivalent, this
> > does not exist today and, even if it did, would not
> > be widely available for years in the kernel running under
> > most enterprise app deployments (and, even then,
> > only on 64-bit Linux.)
> 
> These rationales are very unconvincing:
> 
> Making vsyscall work on 32bit is just a matter of doing it; apparently
> nobody has put the effort into it, but there's no fundimental 
> reason why
> it wouldn't work.  Besides, who runs enterprise apps on 32-bit these
> days?  Anything requiring even moderate amounts of memory is 
> better run
> on 64-bit.

Many people run enterprise apps on 32-bit these days, and
I'm not planning on forcing them to switch.  But 32-bit
vs 64-bit is a small parenthetical objection, not
particularly relevant to the main issue.
 
> Your mechanism will require kernel changes anyway, so there's 
> no getting
> around that.

I think that's exactly what the proposal does: gets around
requiring kernel changes.  If kernel changes are required
(other than bolting on a kernel loadable module),
pvtsc is also not an acceptable solution.

> Once vsyscall does Xen/KVM properly, then every app will automatically
> do the right thing without modification.  There's no need for
> specialized APIs that nobody will end up using anyway.

I fully agree that vsyscall is the right longterm answer
but telling the app providers to switch to something that
is non-existent in 100% of their deployments today, has not
yet been implemented sufficiently to be measured, and
probably won't exceed 50% of their deployments within
five years... well I don't expect them to be convinced.

> It only makes
> sense to go to this kind of effort if it ends up making a 
> plain "rdtsc"
> have the properties you want it to have.

Intel and AMD are responsible for making a plain rdtsc have
the properties you want it to have in a physical environment
and apparently they've done a good enough job that apps are
using it today (albeit with an added layer of glue to handle
certain SMP systems).

Emulating rdtsc provides the same properties in a virtual
environment but at a significant performance cost.

pvtsc is only intended to retrieve some of that performance.

Thanks,
Dan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.