Xen project Mailing List

Re: [PATCH v2 3/3] x86/time: don't move TSC backwards in time_calibration_tsc_rendezvous()

To: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Mon, 8 Feb 2021 12:22:25 +0100

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Claudemir Todo Bom <claudemir@xxxxxxxxxxx>

Delivery-date: Mon, 08 Feb 2021 11:22:38 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 08.02.2021 10:38, Roger Pau Monné wrote: > On Mon, Feb 01, 2021 at 01:43:28PM +0100, Jan Beulich wrote: >> --- >> Since CPU0 reads its TSC last on the first iteration, if TSCs were >> perfectly sync-ed there shouldn't ever be a need to update. However, >> even on the TSC-reliable system I first tested this on (using >> "tsc=skewed" to get this rendezvous function into use in the first >> place) updates by up to several thousand clocks did happen. I wonder >> whether this points at some problem with the approach that I'm not (yet) >> seeing. > > I'm confused by this, so on a system that had reliable TSCs, which > you forced to remove the reliable flag, and then you saw big > differences when doing the rendezvous? > > That would seem to indicate that such system doesn't really have > reliable TSCs? I don't think so, no. This can easily be a timing effect from the heavy cache line bouncing involved here. What I'm worried here seeing these updates is that I might still be moving TSCs backwards in ways observable to the rest of the system (i.e. beyond the inherent property of the approach), and this then getting corrected by a subsequent rendezvous. But as said - I can't see what this could result from, and hence I'm inclined to assume these are merely effects I've not found a good explanation for so far. >> Considering the sufficiently modern CPU it's using, I suspect the >> reporter's system wouldn't even need to turn off TSC_RELIABLE, if only >> there wasn't the boot time skew. Hence another approach might be to fix >> this boot time skew. Of course to recognize whether the TSCs then still >> aren't in sync we'd need to run tsc_check_reliability() sufficiently >> long after that adjustment. Which is besides the need to have this >> "fixing" be precise enough for the TSCs to not look skewed anymore >> afterwards. > > Maybe it would make sense to do a TSC counter sync after APs are up > and then disable the rendezvous if the next calibration rendezvous > shows no skew? Yes, that's what I was hinting at with the above. For the next rendezvous to not observe any skew, our adjustment would need to be far more precise than it is today, though. > I also wonder, we test for skew just after the APs have been booted, > and decide at that point whether we need a calibration rendezvous. > > Maybe we could do a TSC sync just after APs are up (to hopefully bring > them in sync), and then do the tsc_check_reliability just before Xen > ends booting (ie: before handing control to dom0?) > > What we do right now (ie: do the tsc_check_reliability so early) is > also likely to miss small skews that will only show up after APs have > been running for a while? The APs' TSCs will have been running for about as long as the BSP's, as INIT does not affect them (and in fact they ought to be running for _exactly_ as long, or else tsc_check_reliability() would end up turning off TSC_RELIABLE). So I expect skews to be large enough at this point to be recognizable. >> @@ -1712,6 +1720,16 @@ static void time_calibration_tsc_rendezv >> while ( atomic_read(&r->semaphore) < total_cpus ) >> cpu_relax(); >> >> + if ( tsc == 0 ) >> + { >> + uint64_t cur; >> + >> + tsc = rdtsc_ordered(); >> + while ( tsc > (cur = r->max_tsc_stamp) ) >> + if ( cmpxchg(&r->max_tsc_stamp, cur, tsc) == cur ) >> + break; > > I think you could avoid reading cur explicitly for each loop and > instead do? > > cur = ACCESS_ONCE(r->max_tsc_stamp) > while ( tsc > cur ) > cur = cmpxchg(&r->max_tsc_stamp, cur, tsc); Ah yes. I tried something similar, but not quite the same, and it looked wrong, so I gave up re-arranging. >> @@ -1719,9 +1737,12 @@ static void time_calibration_tsc_rendezv >> while ( atomic_read(&r->semaphore) > total_cpus ) >> cpu_relax(); >> } >> + >> + /* Just in case a read above ended up reading zero. */ >> + tsc += !tsc; > > Won't that be worthy of an ASSERT_UNREACHABLE? I'm not sure I see how > tsc could be 0 on a healthy system after the loop above. It's not forbidden for the firmware to set the TSCs to some huge negative value. Considering the effect TSC_ADJUST has on the actual value read by RDTSC, I think I did actually observe a system coming up this way, because of (not very helpful) TSC_ADJUST setting by firmware. So no, no ASSERT_UNREACHABLE() here. Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.