Xen project Mailing List

[Xen-devel] RE: rdtsc: correctness vs performance on Xen (and KVM?)

To: Jeremy Fitzhardinge <jeremy@xxxxxxxx>

From: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>

Date: Tue, 1 Sep 2009 06:54:21 -0700 (PDT)

Cc: "Xen-Devel \(E-mail\)" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Keir Fraser <keir.fraser@xxxxxxxxxxxxx>, Alan Cox <alan@xxxxxxxxxxxxxxxxxxx>

Delivery-date: Tue, 01 Sep 2009 06:55:30 -0700

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Hi Jeremy -- Thanks for the feedback! > Making vsyscall work... While I highly respect your opinion, and while vsyscall may be a fine choice in the future, it just doesn't solve the problem today and won't solve it ever for currently shipping PV OS's. If you can figure out a way to allow vsyscall to be installed as a module and still achieve its performance, it might be a possible solution, but otherwise we have to go around the OS to solve this problem. The rdtsc instruction will be fully emulated by default in Xen 4.0, and before that releases I need to find a fast alternative for those apps that are dependent on BOTH its correct functionality AND high performance. > > work both on Xen and bare metal, and works properly > > across: vcpu-to-pcpu rescheduling even on NUMA > > machines; system sleep/hibernation; and > > save/restore/migration between machines with > > dissimilar clock rates. > > But it will only do this when running under Xen. If running on bare > metal, there will be nothing providing the correction info to the app, > and it will be no better than using raw rdtsc with all its > limitations. > In practice this means that the app will have to have some other code > path anyway. Yes, that's true. I'm not trying to legislate whether an app can use rdtsc or not on a physical machine, just trying to provide the same guarantees for a rdtsc executed in a virtual environment as already provided for a a physical environment, but without significant performance cost. > > 3) App executes a special rdmsr instruction or > > hypercall. > > No way to do direct hypercalls from usermode, so it would > need to be an illegal instruction (like cpuid). > ...and I don't think we should > start making fake rdmsrs start working in usermode. I'm told (by Keir) that it might be possible to allow certain hypercalls to be executed from userland. I haven't investigated yet. But a "fake rdmsr" might be a better answer anyway; enlightened Windows and HyperV already use a fake rdmsr, correct? But I'm not keen on it either and am open to alternatives. > But really it should be a system-wide kernel setting, set via > sysctl or something. I'm not sure what you are suggesting here. > > 4a) If SIGILL results, not running on Xen at all, > > or on old Xen; app uses rdtsc at own risk. Done. > > 4b) Else, rdmsr/hypercall returns virtual address of > > special pvclock page ("pvclock_va"). > > > This can't be done without changing the kernel; Xen can't just start > sticking stuff into usermode mappings (how does Xen even know where a > given OS's usermode is?). It doesn't have to be a usermode mapping, it just needs to be a "magic" address; it can (for example) be in the virtual address space Xen has reserved for itself. > > 5) App executes another special rdmsr instruction/ > > hypercall to disable rdtsc emulation. This > > affects ALL execution for all processes in this VM. > > Once enabled, it should just stay enabled. System-wide is very coarse > anyway (since there's no guarantee that all apps will use the > mechanism). Yes this is an ugly potential issue. Fortunately, many enterprise class apps essentially are the machine; and this may be even more true in a virtualized world. Again, I'm not keen on this either but I don't see an alternative. > > 6) Xen maintains mapping of pvclock_va to a > > different physical page for each processor > > and transparently handles TLB misses for > > pvclock_va > > If you mean that a given VA has a per-cpu mapping, it requires percpu > pagetables. That's not possible in Linux with PV pagetables > (since two > tasks/threads on different cpus sharing the same mm will use the same > pagetable). What the OS can do is completely irrelevant. The mapping is handled entirely by Xen so the OS will never even see a page fault for this address. Note also that one-page-per-cpu is not needed. The page is readonly and there is no sensitive information in a pvclock data structure so many per-cpu-pvclock-structs could be on the same page. > In general even Linux's specialised APIs are entirely unused > (sendfile, > vmsplice, etc). Something as esoteric as this will be pretty > much unused. If apps are happy with the performance of emulated rdtsc, there's no reason for them to use it, so I would be happy if this pvtsc ABI never gets used. However, most enterprise apps are sensitive to a performance hit of several percent and will be eager to try alternatives. > This can be entirely done within the vsyscall mechansim > without any app > changes. There's no reason no to. Performance with app portability is the reason. > > P.S. While it would be nice if we could just tell > > apps to use a fast vgettimeofday equivalent, this > > does not exist today and, even if it did, would not > > be widely available for years in the kernel running under > > most enterprise app deployments (and, even then, > > only on 64-bit Linux.) > > These rationales are very unconvincing: > > Making vsyscall work on 32bit is just a matter of doing it; apparently > nobody has put the effort into it, but there's no fundimental > reason why > it wouldn't work. Besides, who runs enterprise apps on 32-bit these > days? Anything requiring even moderate amounts of memory is > better run > on 64-bit. Many people run enterprise apps on 32-bit these days, and I'm not planning on forcing them to switch. But 32-bit vs 64-bit is a small parenthetical objection, not particularly relevant to the main issue. > Your mechanism will require kernel changes anyway, so there's > no getting > around that. I think that's exactly what the proposal does: gets around requiring kernel changes. If kernel changes are required (other than bolting on a kernel loadable module), pvtsc is also not an acceptable solution. > Once vsyscall does Xen/KVM properly, then every app will automatically > do the right thing without modification. There's no need for > specialized APIs that nobody will end up using anyway. I fully agree that vsyscall is the right longterm answer but telling the app providers to switch to something that is non-existent in 100% of their deployments today, has not yet been implemented sufficiently to be measured, and probably won't exceed 50% of their deployments within five years... well I don't expect them to be convinced. > It only makes > sense to go to this kind of effort if it ends up making a > plain "rdtsc" > have the properties you want it to have. Intel and AMD are responsible for making a plain rdtsc have the properties you want it to have in a physical environment and apparently they've done a good enough job that apps are using it today (albeit with an added layer of glue to handle certain SMP systems). Emulating rdtsc provides the same properties in a virtual environment but at a significant performance cost. pvtsc is only intended to retrieve some of that performance. Thanks, Dan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.