Xen project Mailing List

Re: [PATCH v2 3/3] x86/time: don't move TSC backwards in time_calibration_tsc_rendezvous()

To: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Mon, 8 Feb 2021 14:59:55 +0100

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Claudemir Todo Bom <claudemir@xxxxxxxxxxx>

Delivery-date: Mon, 08 Feb 2021 14:00:00 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 08.02.2021 14:19, Roger Pau Monné wrote: > On Mon, Feb 08, 2021 at 12:22:25PM +0100, Jan Beulich wrote: >> On 08.02.2021 10:38, Roger Pau Monné wrote: >>> On Mon, Feb 01, 2021 at 01:43:28PM +0100, Jan Beulich wrote: >>>> --- >>>> Since CPU0 reads its TSC last on the first iteration, if TSCs were >>>> perfectly sync-ed there shouldn't ever be a need to update. However, >>>> even on the TSC-reliable system I first tested this on (using >>>> "tsc=skewed" to get this rendezvous function into use in the first >>>> place) updates by up to several thousand clocks did happen. I wonder >>>> whether this points at some problem with the approach that I'm not (yet) >>>> seeing. >>> >>> I'm confused by this, so on a system that had reliable TSCs, which >>> you forced to remove the reliable flag, and then you saw big >>> differences when doing the rendezvous? >>> >>> That would seem to indicate that such system doesn't really have >>> reliable TSCs? >> >> I don't think so, no. This can easily be a timing effect from the >> heavy cache line bouncing involved here. >> >> What I'm worried here seeing these updates is that I might still >> be moving TSCs backwards in ways observable to the rest of the >> system (i.e. beyond the inherent property of the approach), and >> this then getting corrected by a subsequent rendezvous. But as >> said - I can't see what this could result from, and hence I'm >> inclined to assume these are merely effects I've not found a >> good explanation for so far. > > I'm slightly worried by this, maybe because I'm misunderstanding part > of the TSC sync stuff. > > So you forced a system that Xen would otherwise consider to have a > reliable TSC (one that doesn't need a calibration rendezvous) into > doing the calibration rendezvous, and then the skew seen is quite big. > I would expect such skew to be minimal? As we would otherwise consider > the system to not need calibration at all. > > This makes me wonder if the system does indeed need such calibration > (which I don't think so), or if the calibration that we actually try > to do is quite bogus? I wouldn't call it bogus, but it's not very precise. Hence me saying that if we wanted to make the problematic system seen as TSC_RELIABLE (and hence be able to switch from "tsc" to "std" rendezvous), we'd first need to improve accuracy here quite a bit. (I suspect sufficient accuracy can only be achieved by making use of TSC_ADJUST, but that's not available on the reporter's hardware, so of no immediate interest here.) >>>> @@ -1719,9 +1737,12 @@ static void time_calibration_tsc_rendezv >>>> while ( atomic_read(&r->semaphore) > total_cpus ) >>>> cpu_relax(); >>>> } >>>> + >>>> + /* Just in case a read above ended up reading zero. */ >>>> + tsc += !tsc; >>> >>> Won't that be worthy of an ASSERT_UNREACHABLE? I'm not sure I see how >>> tsc could be 0 on a healthy system after the loop above. >> >> It's not forbidden for the firmware to set the TSCs to some >> huge negative value. Considering the effect TSC_ADJUST has on >> the actual value read by RDTSC, I think I did actually observe >> a system coming up this way, because of (not very helpful) >> TSC_ADJUST setting by firmware. So no, no ASSERT_UNREACHABLE() >> here. > > But then the code here will loop 5 times, and it's not possible for > those 5 loops to read a TSC value of 0? I could see it reading 0 on > one of the iterations but not in all of them. Sure, we can read zero at most once here. Yet the "if ( tsc == 0 )" conditionals get executed on every iteration, while they must yield "true" only on the first (from the variable's initializer); we absolutely need to go the "else if()" path on CPU0 on the 2nd iteration, and we also need to skip that part on later iterations on the other CPUs (for CPU0 to then take the 2nd "else if()" path from no later than the 3rd iteration onwards; the body of this of course will only be executed on the last iteration). The arrangement of all of this could be changed of course, but I'd prefer to retain the property of there not being any dependency on the exact number of iterations the loop header specifies, as long as it's no less than 2 (before this series) / 3 (after this series). I.e. I wouldn't want to identify e.g. the 1st iteration by "i == 4". Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.