[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 3/3] x86/time: don't move TSC backwards in time_calibration_tsc_rendezvous()

On 08.02.2021 10:38, Roger Pau Monné wrote:
> On Mon, Feb 01, 2021 at 01:43:28PM +0100, Jan Beulich wrote:
>> ---
>> Since CPU0 reads its TSC last on the first iteration, if TSCs were
>> perfectly sync-ed there shouldn't ever be a need to update. However,
>> even on the TSC-reliable system I first tested this on (using
>> "tsc=skewed" to get this rendezvous function into use in the first
>> place) updates by up to several thousand clocks did happen. I wonder
>> whether this points at some problem with the approach that I'm not (yet)
>> seeing.
> I'm confused by this, so on a system that had reliable TSCs, which
> you forced to remove the reliable flag, and then you saw big
> differences when doing the rendezvous?
> That would seem to indicate that such system doesn't really have
> reliable TSCs?

I don't think so, no. This can easily be a timing effect from the
heavy cache line bouncing involved here.

What I'm worried here seeing these updates is that I might still
be moving TSCs backwards in ways observable to the rest of the
system (i.e. beyond the inherent property of the approach), and
this then getting corrected by a subsequent rendezvous. But as
said - I can't see what this could result from, and hence I'm
inclined to assume these are merely effects I've not found a
good explanation for so far.

>> Considering the sufficiently modern CPU it's using, I suspect the
>> reporter's system wouldn't even need to turn off TSC_RELIABLE, if only
>> there wasn't the boot time skew. Hence another approach might be to fix
>> this boot time skew. Of course to recognize whether the TSCs then still
>> aren't in sync we'd need to run tsc_check_reliability() sufficiently
>> long after that adjustment. Which is besides the need to have this
>> "fixing" be precise enough for the TSCs to not look skewed anymore
>> afterwards.
> Maybe it would make sense to do a TSC counter sync after APs are up
> and then disable the rendezvous if the next calibration rendezvous
> shows no skew?

Yes, that's what I was hinting at with the above. For the next
rendezvous to not observe any skew, our adjustment would need to
be far more precise than it is today, though.

> I also wonder, we test for skew just after the APs have been booted,
> and decide at that point whether we need a calibration rendezvous.
> Maybe we could do a TSC sync just after APs are up (to hopefully bring
> them in sync), and then do the tsc_check_reliability just before Xen
> ends booting (ie: before handing control to dom0?)
> What we do right now (ie: do the tsc_check_reliability so early) is
> also likely to miss small skews that will only show up after APs have
> been running for a while?

The APs' TSCs will have been running for about as long as the
BSP's, as INIT does not affect them (and in fact they ought to
be running for _exactly_ as long, or else tsc_check_reliability()
would end up turning off TSC_RELIABLE). So I expect skews to be
large enough at this point to be recognizable.

>> @@ -1712,6 +1720,16 @@ static void time_calibration_tsc_rendezv
>>              while ( atomic_read(&r->semaphore) < total_cpus )
>>                  cpu_relax();
>> +            if ( tsc == 0 )
>> +            {
>> +                uint64_t cur;
>> +
>> +                tsc = rdtsc_ordered();
>> +                while ( tsc > (cur = r->max_tsc_stamp) )
>> +                    if ( cmpxchg(&r->max_tsc_stamp, cur, tsc) == cur )
>> +                        break;
> I think you could avoid reading cur explicitly for each loop and
> instead do?
> cur = ACCESS_ONCE(r->max_tsc_stamp)
> while ( tsc > cur )
>     cur = cmpxchg(&r->max_tsc_stamp, cur, tsc);

Ah yes. I tried something similar, but not quite the same,
and it looked wrong, so I gave up re-arranging.

>> @@ -1719,9 +1737,12 @@ static void time_calibration_tsc_rendezv
>>              while ( atomic_read(&r->semaphore) > total_cpus )
>>                  cpu_relax();
>>          }
>> +
>> +        /* Just in case a read above ended up reading zero. */
>> +        tsc += !tsc;
> Won't that be worthy of an ASSERT_UNREACHABLE? I'm not sure I see how
> tsc could be 0 on a healthy system after the loop above.

It's not forbidden for the firmware to set the TSCs to some
huge negative value. Considering the effect TSC_ADJUST has on
the actual value read by RDTSC, I think I did actually observe
a system coming up this way, because of (not very helpful)
TSC_ADJUST setting by firmware. So no, no ASSERT_UNREACHABLE()




Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.