Xen project Mailing List

Re: [Xen-devel] rdtscP and xen (and maybe the app-tsc answer I've been looking for)

To: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>

From: Jeremy Fitzhardinge <jeremy@xxxxxxxx>

Date: Mon, 21 Sep 2009 15:50:29 -0700

Cc: kurt.hackel@xxxxxxxxxx, "Xen-Devel \(E-mail\)" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Keir Fraser <keir.fraser@xxxxxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxxxx>

Delivery-date: Mon, 21 Sep 2009 15:51:02 -0700

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

On 09/21/09 15:20, Dan Magenheimer wrote: >>> However, I do need one special case to indicate >>> emulation vs non-emulation, so wraparound is >>> still a problem. >>> >> I was assuming you'd just repurpose the existing version number scheme >> which is always even, and therefore can never equal -1. >> > That wasn't my plan but if it can be made to work (see > below), it probably saves code in Xen. > > >> What's the full algorithm for detecting this feature? Usermode has to >> establish: >> >> 1. It is running under Xen (or not, if you expect this to be >> implemented on multiple hypervisors) >> 2. rdtscp is available >> 3. the ABI is actually being implemented, ie: >> 1. the tsc_aux value actually has the correct meaning >> 2. it has a working mechanism for getting the tsc scaling >> parameters >> 3. (accommodate ways to evolve the ABI in a >> back-compatible way) >> before it can do anything else. >> > Yes, that's what I was thinking. I was planning on prototyping > these checks with "userland-rdmsr" but userland-hypercall or > userland-shared-page could work also. > > >> If nothing else, its probably worth removing the rdtscp >> feature from the >> logical guest cpuid, so that nothing else tries to use it for its own >> purposes; in other words, you're exclusively claiming rdtscp for this >> ABI. Or you could disable this ABI if a guest kernel tries >> to set TSC_AUX. >> > I was thinking that setting pvrdtscp=1 would override > any kernel use of rdtscp/TSC_AUX, but disabling the > cpuid has_rdtscp flag and using a different userland > detection mechanism (than checking cpuid for has_rdtscp) > would be a better way to avoid possible conflict. > > >>> I've restricted the scheme to constant_tsc as I think >>> it breaks down due to nasty races if running on a >>> machine where the pvclock parameters differ across >>> different pcpus. I think the races can only be >>> avoided if Xen sets the TSC_AUX for all of the >>> pcpus running a pvrdtscp doman while all are idle. >>> >>> Is there a scheme that avoids the races? >>> >> rdtscp makes it quite easy to avoid races because you get the tsc and >> metadata about the tsc atomically. You just need to encode >> enough info >> in the metadata to do the conversion. >> > Yes but I don't think there is enough bits for encoding > it all (32-bits in TSC_AUX, right?). > > >> The obvious thing to do is to pack a version number and pcpu >> number into >> TSC_AUX. Usermode would maintain an array of pv_clock parameters, one >> for each pcpu. If the version number matches, then it uses the >> parameters it has; if not it fetches new parameters and repeats the >> rdtscp. There's no need to worry about either thread or vcpu context >> switches because you get the (tsc,params) tuple atomically, >> which is the >> tricky bit without rdtscp. >> >> (The version number would be truncated wrt the normal pvclock version >> number, but it just needs to be large enough to avoid aliasing from >> wrapping; I'm assuming something like 24 bits version and 8 bits cpu >> number.) >> > I think a race occurs if the vcpu switches pcpu TWICE > from pcpu-A to pcpu-B and back to pcpu-A and does rdtscp > each time on pcpu-A but reads one or more pvclock parameters > (that are too big to be encoded in TSC_AUX) on pcpu-B. > That shouldn't matter. Once the process has (tsc,cpu,version) it can use its own local copy of cpu's pvclock parameters to compute the tsc->ns conversion. Once it has that triple, it doesn't matter if it gets context-switched; the time computation doesn't depend on what CPU is currently running. It only needs to iterate if it gets a version mismatch. You can potentially get a livelock if the version is constantly changing between the rdtscp and the get-pvclock-params, and exacerbated if the process keeps bouncing between cpus between the two. But given that the rdtsc+get-params should take no more than a couple of microseconds, it seems very unlikely the process is sustaining a megahertz CPU migration rate. And even if it fails, the process always has to be prepared to go to some other time source. > If Xen can atomically bump/change > TSC_AUX on *all* pcpus runniing a guest vcpu, the race > can be avoided. But I suspect that is too expensive (some > kind of rendezvous required for each bump on any processor). > Right. Any synchronized cross-cpu call is going to be very expensive, and can't be done atomically without some kind of stop-the-world which is even worse. > Even if my assumption of the race (above) is incorrect, > 32-bits is not very much time at 100Hz. But the version > bump needs to occur synchronously with every P/C-state > transition for pvclock to work on non_constant_tsc machines > doesn't it? How frequent can those transitions occur? > 24 bits at 100Hz is 46ish hours. So there's a potential alias problem if the program reads the tsc at precisely 46.603 (ish) hours after its previous read. One workaround would be to force a re-read of the timing parameters every X secs/mins/hours to guarantee that there's no wrap for some expected rate of param updates. That said, the standard pvclock algorithm is only 128 times better than that, and I don't think it has ever considered to be a problem. I've never seen an update rate of more than once every few seconds. Also Xen need only update the version number if something has actually read that version; if nobody had read the current parameters, there's no need to update the version when updating them to a new value. That would help mitigate the case of rapid param updates and a low rate of reading. > I guess this all depends on what Xen is capable of > guaranteeing. If Xen can provide a "cacheline > bounce guarantee", the app shouldn't have to care. > It can't, in princple, sync the tscs at a finer grain than the app can measure. It only has the same resources to play with, and there'll always be some error margin. > Linux now seems to provide a cacheline bounce guarantee for > itself, but afaik has no way to communicate that to an app > using raw rdtsc{,p} and all the relevant syscalls have a > monotonicity option and/or have insufficient resolution > to matter. > It's a detail that a usermode app can't rely on anyway. J _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.