[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] BUG: NOW() seems to (sometimes) go backwards!



On 06/08/2016 02:43 PM, Dario Faggioli wrote:
> On Wed, 2016-06-08 at 04:42 -0600, Jan Beulich wrote:
>>>>> On 07.06.16 at 17:54, <dario.faggioli@xxxxxxxxxx> wrote:
>>> So, it looks to me that the TSC is actually ok, and it could be the
>>> local_tsc_stamp and scale_delta() magic done in get_s_time_fixed()
>>> (which, AFAICUI, is there to convert cycles to time) that does not
>>> guarantee the result to be monotonic, even if the input is...
>>> Thoughts?
>> Indeed this smells like an issue in the scaling. However, the scaling
>> values vary only when !CONSTANT_TSC. Can you check that this
>> flag is actually set on that system? 
>>
> Checked. I do have it. I instrumented the code to print stuff if it
> finds it, and it prints.
> 
> Also:
> root@Zhaman:~# xl debug-keys s
> (XEN) [  406.719464] TSC marked as reliable, warp = 0 (count=3)
> (XEN) [  406.719467] dom1(hvm): 
> mode=0,ofs=0xffffd9279716c276,khz=2394069,inc=4,vtsc count: 195367 kernel, 0 
> user
> 
>> (I hope you're not running a
>> strange Dom0 setup with FREQCTL_dom0_kernel in effect.) 
>>
> I've no idea what this is. I've been running 4.1.0, built myself, and
> stock Debian unstable 4.5.0, and I'm seeing the issue in both cases.
> 
> Looking FREQCTL_dom0_kernel up, I guess you mean what happens when one
> passes cpufreq="dom0-kernel" to xen on boot command line. In which
> case, no, I'm not doing that.
> 
>> And
>> at the same time that it's time_calibration_tsc_rendezvous() that
>> is in use?
>>
> The code you're referring to should be this:
> 
>     /* If we have constant-rate TSCs then scale factor can be shared. */
>     if ( boot_cpu_has(X86_FEATURE_CONSTANT_TSC) )
>     {
>         /* If TSCs are not marked as 'reliable', re-sync during rendezvous. */
>         if ( !boot_cpu_has(X86_FEATURE_TSC_RELIABLE) )
>             time_calibration_rendezvous_fn = time_calibration_tsc_rendezvous;
>     }
> 
> And I have both X86_FEATURE_CONSTANT_TSC and X86_FEATURE_TSC_RELIABLE.
> 
> I've again instrumented the code in order to check whether it is
> time_calibration_tsc_rendezvous() or time_calibration_std_rendezvous()
> that is being used, and it's the latter:
> 
> (XEN) [    1.795916] TSC reliable. Yay!! Using ffff82d080198362 for 
> rendevousez
> 
> [dario@Solace xen.git] $ objdump -D xen/xen-syms |grep ffff82d080198362
> ffff82d080198362 <time_calibration_std_rendezvous>:
> 
> which looks correct to me.
Probably one other codepath that you would be interested is on
local_time_calibration() commenting the chunk on this conditional:

if ( boot_cpu_has(X86_FEATURE_CONSTANT_TSC )
{
...
}

And with that you rarely see time going backwards since the scaling would be
adjusted to each cpu calibration error - but it would be a (very very close)
approximation on software not exactly guaranteed as hardware TSC. But trying 
this out
would be just for experimentation - doesn't look to be a correct way of 
addressing this.

> 
>> Yet when the scaling values get set only once at boot, monotonic
>> (cross-CPU) TSC means monotonic (cross-CPU) returns from NOW().
>>
> Yep. And at this point, this is what needs to be verified, I guess...
I think get_s_time_fixed doesn't guarantee monotonicity across CPUs being it
different socket or (SMT) siblings. local_tsc_stamp is seeded differently on 
each CPU
i.e. rdtsc() right after doing the platform time read on the calibration 
rendezvous.
Plus stime_local_stamp is seeded with values taken from platform timer (HPET, 
ACPI,
PIT) on local_time_calibration which means that get_s_time isn't solely based 
on TSC
and that there will always be a gap between stime_local_stamp and 
local_tsc_stamp.
TSC is indeed monotonic on a TSC invariant box, but the delta that is computed
differs from cpu to cpu according to when time calibration happens on each CPU 
- thus
not guaranteeing the desired monotonicity property. Having stime_local_stamp be 
based
on the same timestamp that of the local_tsc_stamp plus having a single
local_tsc_stamp as reference would address this behaviour - see also below.

> 
>> In any event - could you try to exclude C- and P-state effects? Of
>> course that makes sense only if you can reasonably repro the
>> problem situation (and hence can tell from its absence over a certain
>> period of time that whatever change was done did have some
>> positive effect).
>>
> It's actually quite hard *NOT* to reproduce the problem... it happens
> all the time, and if the load is high enough, I see the "Time went
> backwards?" printk several times per second!
> 
> So, trying to do what you suggest in an online fashion, i.e., issueing:
> 
>  # xenpm set-max-cstate 0
>  # xenpm set-scaling-governor all performance
> 
> did not change the situation (I still see the printks).
> 
> I've then tried passing both cpufreq="none" and max_cstate=0 to xen at
> boot, but they made no difference at all either.
> 
> Most of the time, we're speaking of very small skews, e.g.:
> 
> (XEN) [   59.999959] WARNING: __update_runq_load: Time went backwards? now 
> 59999946079 llu 59999946085
> (XEN) [  117.595508] WARNING: __update_runq_load: Time went backwards? now 
> 117595495295 llu 117595495310
> 
> i.e., 6 nanoseconds or 15 nanoseconds!
> 
> Then there are instances where the difference is bigger (microseconds
> time scale, like in the first email of the thread).
It does sound similar to the cross cpu calibration error I was observing but on 
a
place other than guests pv time infos updates. My pvclock TSC stable bit series 
could
address your issue - but only for clocksource=tsc added there, though it would 
be
quite limitative for the other cases i.e. clocksource=hpet|acpi|etc.

One patch in the RFC of that series, could fix your issue but it was more of a 
hacky
patch for discussion [0]. It converts a value returned from the platform timer 
(which
monotonically increases) and scales it to the equivalent in TSC, assuming the 
TSC is
always ahead of the other clocksources. But probably the correct fix would be 
to do
as linux does with pvclock? Linux keeps an accumulator representing the latest 
system
time across CPUs and atomically update it if now > last. Thus always returning 
the
most up-to-date value. But then NOW() would take a hit in performance.

[0] http://lists.xenproject.org/archives/html/xen-devel/2015-12/msg02832.html

Cheers,
Joao

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.