[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 3/3] x86/time: don't move TSC backwards in time_calibration_tsc_rendezvous()


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Mon, 8 Feb 2021 17:33:53 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=sLypsmW5q1e6Ayo48trqI8SIWhmV4lmtJWvBXiMikZg=; b=CnEZ1RnoeALfM/M0UNx/SOQ83wVQxGZ5zb+cR9IjuXfgefnQXTNBg+sYuJ5F3MMQGCYFGphhZaGJUQsUSFxP+xrHRlAVRWC6UE4uATQpB6ggUjEfktPrXVHTb0eGyuMwcmcgUG4ODvesn2/xqIzJQTNAhpOirp/Dim0CRUgX38u8Ar0vziRxd90htf3usWdx2H8oaCkMb2Bd8wSebS6UNtjLkO8cMkr2Jh9XNmDQrZooJn4elLlwLD7Z/IgWMpOg103wXv+N2aHlIF6kA6oN8OObwnwge/M90XvU5Fyu5RfQJSTHrljTI46vzTczprkGuku7ZzlMB5jOM3DKlDbOdQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=amSSSO763ZmBPu4zYSBLKjao6pbErrWas+OXuQyYsWG3ezqBUn4zJINov4dtRlfUZ0VWlGBdFum/kkJ2aEc8xcuKNqTkVO7pP0urGDKe7Pf6rsQq/JUA0n5qhYtf7SSwtINuFCUaNRa0U3MGYO1tWgjUh/ebdKuSUW4Cy7Bi3s5xJcIbbgMzU3nenV7Sa757HFyP0cKQDSnAUa6Amx/Tjnmz7Lu3sPul8cX7DnTZhaxXJW5fy8RnIMIjz0njw+L5fDBnZ5uPba2ohoIDSiNLUP5Z/+ulqSrNgMhB4MWBSW9gFJVCOG3LA9w8J+bR8nyfLdhkQrzGZz6RzlHgViac1A==
  • Authentication-results: esa1.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Claudemir Todo Bom <claudemir@xxxxxxxxxxx>
  • Delivery-date: Mon, 08 Feb 2021 16:34:12 +0000
  • Ironport-sdr: te60TKIPCi4bvwd05rkVTextz6J+Gb36KbzIn78tXkB8j0nCEFZmJdcfgXqaCYIcm+3nOHv873 IVPcXt3H4AsZmzNgBo71KvX0/N9rhcQwd6rhbsWD1k73fpV+EBy76GpiWmVMsRUqbkDQ2iKmXY JgBfImg0KNSkBGWUxjNHb1dxb+zVUb51PwO+W9JQOclHNrQ2lLb5aKACPEG4WXl8EhKEW8hD6M 2GuBnofMEmUcbpy/G0ETK5mmK8jbkrVWtZ+6iOq8l+fCYtcOncHoNPKDdhr2J7Oup7+IG4lUHf KfA=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Mon, Feb 08, 2021 at 02:59:55PM +0100, Jan Beulich wrote:
> On 08.02.2021 14:19, Roger Pau Monné wrote:
> > On Mon, Feb 08, 2021 at 12:22:25PM +0100, Jan Beulich wrote:
> >> On 08.02.2021 10:38, Roger Pau Monné wrote:
> >>> On Mon, Feb 01, 2021 at 01:43:28PM +0100, Jan Beulich wrote:
> >>>> ---
> >>>> Since CPU0 reads its TSC last on the first iteration, if TSCs were
> >>>> perfectly sync-ed there shouldn't ever be a need to update. However,
> >>>> even on the TSC-reliable system I first tested this on (using
> >>>> "tsc=skewed" to get this rendezvous function into use in the first
> >>>> place) updates by up to several thousand clocks did happen. I wonder
> >>>> whether this points at some problem with the approach that I'm not (yet)
> >>>> seeing.
> >>>
> >>> I'm confused by this, so on a system that had reliable TSCs, which
> >>> you forced to remove the reliable flag, and then you saw big
> >>> differences when doing the rendezvous?
> >>>
> >>> That would seem to indicate that such system doesn't really have
> >>> reliable TSCs?
> >>
> >> I don't think so, no. This can easily be a timing effect from the
> >> heavy cache line bouncing involved here.
> >>
> >> What I'm worried here seeing these updates is that I might still
> >> be moving TSCs backwards in ways observable to the rest of the
> >> system (i.e. beyond the inherent property of the approach), and
> >> this then getting corrected by a subsequent rendezvous. But as
> >> said - I can't see what this could result from, and hence I'm
> >> inclined to assume these are merely effects I've not found a
> >> good explanation for so far.
> > 
> > I'm slightly worried by this, maybe because I'm misunderstanding part
> > of the TSC sync stuff.
> > 
> > So you forced a system that Xen would otherwise consider to have a
> > reliable TSC (one that doesn't need a calibration rendezvous) into
> > doing the calibration rendezvous, and then the skew seen is quite big.
> > I would expect such skew to be minimal? As we would otherwise consider
> > the system to not need calibration at all.
> > 
> > This makes me wonder if the system does indeed need such calibration
> > (which I don't think so), or if the calibration that we actually try
> > to do is quite bogus?
> 
> I wouldn't call it bogus, but it's not very precise. Hence me
> saying that if we wanted to make the problematic system seen as
> TSC_RELIABLE (and hence be able to switch from "tsc" to "std"
> rendezvous), we'd first need to improve accuracy here quite a bit.
> (I suspect sufficient accuracy can only be achieved by making use
> of TSC_ADJUST, but that's not available on the reporter's hardware,
> so of no immediate interest here.)

Right, TSC_ADJUST does indeed seem to be a better way to adjust TSC,
and to notice if firmware has skewed them.

> 
> >>>> @@ -1719,9 +1737,12 @@ static void time_calibration_tsc_rendezv
> >>>>              while ( atomic_read(&r->semaphore) > total_cpus )
> >>>>                  cpu_relax();
> >>>>          }
> >>>> +
> >>>> +        /* Just in case a read above ended up reading zero. */
> >>>> +        tsc += !tsc;
> >>>
> >>> Won't that be worthy of an ASSERT_UNREACHABLE? I'm not sure I see how
> >>> tsc could be 0 on a healthy system after the loop above.
> >>
> >> It's not forbidden for the firmware to set the TSCs to some
> >> huge negative value. Considering the effect TSC_ADJUST has on
> >> the actual value read by RDTSC, I think I did actually observe
> >> a system coming up this way, because of (not very helpful)
> >> TSC_ADJUST setting by firmware. So no, no ASSERT_UNREACHABLE()
> >> here.
> > 
> > But then the code here will loop 5 times, and it's not possible for
> > those 5 loops to read a TSC value of 0? I could see it reading 0 on
> > one of the iterations but not in all of them.
> 
> Sure, we can read zero at most once here. Yet the "if ( tsc == 0 )"
> conditionals get executed on every iteration, while they must yield
> "true" only on the first (from the variable's initializer); we
> absolutely need to go the "else if()" path on CPU0 on the 2nd
> iteration, and we also need to skip that part on later iterations
> on the other CPUs (for CPU0 to then take the 2nd "else if()" path
> from no later than the 3rd iteration onwards; the body of this of
> course will only be executed on the last iteration).

Oh, I see. Then I think I have no further comments. If you agree to
adjust the cmpxchg please add by R-b.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.