[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] rdtscP and xen (and maybe the app-tsc answer I've been looking for)



On 09/21/09 15:20, Dan Magenheimer wrote:
>>> However, I do need one special case to indicate
>>> emulation vs non-emulation, so wraparound is
>>> still a problem.
>>>       
>> I was assuming you'd just repurpose the existing version number scheme
>> which is always even, and therefore can never equal -1.
>>     
> That wasn't my plan but if it can be made to work (see
> below), it probably saves code in Xen.
>
>   
>> What's the full algorithm for detecting this feature?  Usermode has to
>> establish:
>>
>>    1. It is running under Xen (or not, if you expect this to be
>>       implemented on multiple hypervisors)
>>    2. rdtscp is available
>>    3. the ABI is actually being implemented, ie:
>>          1. the tsc_aux value actually has the correct meaning
>>          2. it has a working mechanism for getting the tsc scaling
>>             parameters
>>          3. (accommodate ways to evolve the ABI in a 
>> back-compatible way)
>> before it can do anything else.
>>     
> Yes, that's what I was thinking.  I was planning on prototyping
> these checks with "userland-rdmsr" but userland-hypercall or
> userland-shared-page could work also.
>
>   
>> If nothing else, its probably worth removing the rdtscp 
>> feature from the
>> logical guest cpuid, so that nothing else tries to use it for its own
>> purposes; in other words, you're exclusively claiming rdtscp for this
>> ABI.  Or you could disable this ABI if a guest kernel tries 
>> to set TSC_AUX.
>>     
> I was thinking that setting pvrdtscp=1 would override
> any kernel use of rdtscp/TSC_AUX, but disabling the
> cpuid has_rdtscp flag and using a different userland
> detection mechanism (than checking cpuid for has_rdtscp)
> would be a better way to avoid possible conflict.
>
>   
>>> I've restricted the scheme to constant_tsc as I think
>>> it breaks down due to nasty races if running on a
>>> machine where the pvclock parameters differ across
>>> different pcpus.  I think the races can only be
>>> avoided if Xen sets the TSC_AUX for all of the
>>> pcpus running a pvrdtscp doman while all are idle.
>>>
>>> Is there a scheme that avoids the races? 
>>>       
>> rdtscp makes it quite easy to avoid races because you get the tsc and
>> metadata about the tsc atomically.  You just need to encode 
>> enough info
>> in the metadata to do the conversion.
>>     
> Yes but I don't think there is enough bits for encoding
> it all (32-bits in TSC_AUX, right?).
>
>   
>> The obvious thing to do is to pack a version number and pcpu 
>> number into
>> TSC_AUX.  Usermode would maintain an array of pv_clock parameters, one
>> for each pcpu.  If the version number matches, then it uses the
>> parameters it has; if not it fetches new parameters and repeats the
>> rdtscp.  There's no need to worry about either thread or vcpu context
>> switches because you get the (tsc,params) tuple atomically, 
>> which is the
>> tricky bit without rdtscp.
>>
>> (The version number would be truncated wrt the normal pvclock version
>> number, but it just needs to be large enough to avoid aliasing from
>> wrapping; I'm assuming something like 24 bits version and 8 bits cpu
>> number.)
>>     
> I think a race occurs if the vcpu switches pcpu TWICE
> from pcpu-A to pcpu-B and back to pcpu-A and does rdtscp
> each time on pcpu-A but reads one or more pvclock parameters
> (that are too big to be encoded in TSC_AUX) on pcpu-B.
>   

That shouldn't matter.  Once the process has (tsc,cpu,version) it can
use its own local copy of cpu's pvclock parameters to compute the
tsc->ns conversion.  Once it has that triple, it doesn't matter if it
gets context-switched; the time computation doesn't depend on what CPU
is currently running. 

It only needs to iterate if it gets a version mismatch.  You can
potentially get a livelock if the version is constantly changing between
the rdtscp and the get-pvclock-params, and exacerbated if the process
keeps bouncing between cpus between the two.  But given that the
rdtsc+get-params should take no more than a couple of microseconds, it
seems very unlikely the process is sustaining a megahertz CPU migration
rate.

And even if it fails, the process always has to be prepared to go to
some other time source.

> If Xen can atomically bump/change
> TSC_AUX on *all* pcpus runniing a guest vcpu, the race
> can be avoided.  But I suspect that is too expensive (some
> kind of rendezvous required for each bump on any processor).
>   

Right.  Any synchronized cross-cpu call is going to be very expensive,
and can't be done atomically without some kind of stop-the-world which
is even worse.

> Even if my assumption of the race (above) is incorrect,
> 32-bits is not very much time at 100Hz.  But the version
> bump needs to occur synchronously with every P/C-state
> transition for pvclock to work on non_constant_tsc machines
> doesn't it?  How frequent can those transitions occur?
>   

24 bits at 100Hz is 46ish hours.  So there's a potential alias problem
if the program reads the tsc at precisely 46.603 (ish) hours after its
previous read.  One workaround would be to force a re-read of the timing
parameters every X secs/mins/hours to guarantee that there's no wrap for
some expected rate of param updates.

That said, the standard pvclock algorithm is only 128 times better than
that, and I don't think it has ever considered to be a problem.  I've
never seen an update rate of more than once every few seconds.

Also Xen need only update the version number if something has actually
read that version; if nobody had read the current parameters, there's no
need to update the version when updating them to a new value.  That
would help mitigate the case of rapid param updates and a low rate of
reading.

> I guess this all depends on what Xen is capable of
> guaranteeing.  If Xen can provide a "cacheline
> bounce guarantee", the app shouldn't have to care.
>   

It can't, in princple, sync the tscs at a finer grain than the app can
measure.  It only has the same resources to play with, and there'll
always be some error margin.

> Linux now seems to provide a cacheline bounce guarantee for
> itself, but afaik has no way to communicate that to an app
> using raw rdtsc{,p} and all the relevant syscalls have a
> monotonicity option and/or have insufficient resolution
> to matter.
>   

It's a detail that a usermode app can't rely on anyway.

    J


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.