[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/time: further improve TSC / CPU freq calibration accuracy


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Wed, 19 Jan 2022 09:46:26 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=JeLSh4Yg6b3fmZr3jFmOSIw8LngkfrgTdBlog1LrqeM=; b=JrbkuqnmMX/BtHBdHF0T5QMmVCdkuAdEB585WeOCK97wfzojo50qEBLi2hj8u/xUfv7k3I7svYrzpyYbrK9JVB3Cc34CE+SfnYEMdXwneuWdAJb13gHBqg86F4ViUy+UIHwUPrdqWLoWElescXB2yxsxfFhRqoZXZaxy5/i0qnBPCW9F9/IszWIqBQ/RNs6Muf+XOg1+nbHZNxJVGOE8lv1LGhoebhGnNi6XbM1E20NPj9pbJ/nrY8E/h+owHnsdHxZZbvO5iolH/owkLLsUpn493/Bam8w5EtQRK2LdJ4sXc0ANBcubvhKpea4xnxeP7UWrdfbdz7Qx6BzCrYa8kA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=IWdBsWd8fmZov+220hplsvyJSLqQrkvQVuYj5zWnCcQsvOHgiJUou4Rv2/WE6bQvJzvcOQh6FGVbg33D9oWrp0drnDoqH14gB1ySwSsExjU/Ce/thfHfkRnJ6jy/ZLZhA/mQelA9vzFC/rrEVqhnQbKjZQvTmIJLPJkSUR/wVwt90tJVEFQBTRp4CAYyE8WLPIIauFg/b0Ja1tdqqMjIEDCd67lNPVv/UTgUBBaJUZLBH7X+bBHZL6weRsS9Z3ltv04k8Yne8RYhAPNGFWMQa9zMkKnZl0jpqw4qtU4bTdtDMivVCkUZ4bOqrWhL9wvNPdtCgGKDljSf1qirbfp+nw==
  • Authentication-results: esa2.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>
  • Delivery-date: Wed, 19 Jan 2022 08:46:42 +0000
  • Ironport-data: A9a23:1BF9zKrXnqQrtQDKUwBK+tO4AXBeBmL5YhIvgKrLsJaIsI4StFCzt garIBmFM/iOMDDyedAiPIq0oUICsMCAx4M1TlA/qy83EX8S8JuZCYyVIHmrMnLJJKUvbq7GA +byyDXkBJppJpMJjk71atANlZT4vE2xbuKU5NTsY0idfic5Dndx4f5fs7Rh2NQw2IHoW1rlV e7a+KUzBnf0g1aYDUpMg06zgEsHUCPa4W5wUvQWPJinjXeG/5UnJMt3yZKZdhMUdrJ8DO+iL 9sv+Znilo/vE7XBPfv++lrzWhVirrc/pmFigFIOM0SpqkAqSiDfTs/XnRfTAKtao2zhojx/9 DlCnZWreyMRDK3eousiYT9hFQNUOIYb6rCSdBBTseTLp6HHW37lwvEoB0AqJ4wIvO1wBAmi9 9RBdmpLNErawbvrnvTrEYGAhex6RCXvFJkYtXx6iynQEN4tQIzZQrWM7thdtNs1rp4UR6eCO ZtGAdZpRBjAPidFJVtOM4AZgNWMm2e8LzAEsWvA8MLb5ECMlVcsgdABKuH9ZdiiVchT2EGCq Qru72n/Rx0XKtGb4T6E6W63wP/CmzvhX4AfH6H+8eRl6HWRzGEODBwdVXOgvOK0zEW5Xrpix 1c8o3R06/JorQryE4e7D0bQTGO4UgA0UsFiN9UI6BO3xZXmxT+fJEIBZWNTZ4lz3CMpfgAC2 liMltLvIDVgtryJVH6QnoupQSOO1Ts9djFbO3JdJecRy5y6+dxo0EqTJjp2OPft1oWdJN3m/ 9ydQMHSbZ03hNVD6ai09Euvb9mE9smQFV5dCuk6swuYAuJFiGyNO93ABbvzt68owGOlor+p5 ihsdy+2trFmMH11vHbRKNjh5Znwjxp/DBXSgER0A74q/Cm39niocOh4uW8ifhoya5peJmGzO Sc/XD+9ArcJbBNGioctMuqM5zkCl/C8RbwJqNiJBjaxXnSBXFDep3w/DaJh92vsjFItgckC1 WSzKq6R4YIhIf0/llKeHr5FuZdyn3xW7T6NGfjTkkr2uZLDNC/9YepUazOmM7FmhJ5oVS2Iq b6zwePQlUUGOAA/CwGKmbMuwacidilkVcuo+p0OJoZu4GNOQQkcNhMY+pt4E6RNlKVJjObYu Ha7X05T0l3kgnPbbw6NbxhehHnHB/6TdFo3Yn4hO0iGwX8mbdr95asTbcJvL7Im6PZi3bh/S PxcI5eMBfFGSzLm/TUBbMai8Nw+JUrz3Q/ebTC4ZDUffoJ7Q1Cb8NHTYQaypjIFCTC6tJVir uT4hB/bW5cKWy9rEN3SNKC011q0sHVEwLByUkLEL8N9YkLp9IQ2eSX9guVue5MHKAnZxyvc3 AGTWE9Kqe7Iqo4z0d/ImaHb8Nv5T7ogRhJXRjCJ46y3OC/W+nuY7bVBCOvYLyrAUG7U+bm5Y bkHxf/LL/Bazk1BtJBxEug3wPtmtcfvvbJT0i9tAG7PMwawErplL3SLgZtPu6lKyuMLsAe6Q BvSqNxTOLHPM8L5Cl8BYgEia73bh/0TnzDT69UzIVn7u3ArrObWDx0KMknekjFZIZt0LJghk LUot8Mh4gCijgYnb4SdhSdO+mXQdnENXs3LbH3B7FMHXubz9mx/XA==
  • Ironport-hdrordr: A9a23:4bzBvKuTQm1v2MY0/aAMq1G07skC7oMji2hC6mlwRA09TyXGra +TdaUguSMc1gx9ZJhBo7G90KnpewK6yXdQ2/hqAV7EZniahILIFvAY0WKG+VPd8kLFh4xgPM tbAs1D4ZjLfCRHZKXBkXiF+rQbsaC6GcmT7I+0pRcdLj2CKZsQlzuRYjzrbHGeLzM2Y6bReq Dsgvau8FGbCAsqh4mAdzI4dtmGg+eOuIPtYBYACRJiwA6SjQmw4Lq/NxSDxB8RXx5G3L9nqA H+4kHEz5Tml8v+5g7X1mfV4ZgTsNz9yuFbDMjJrsQOMD3jhiuheYwkcbyfuzIepv2p9T8R4Z PxiiZlG/42x2Laf2mzrxeo8w780Aw243un8lOciWuLm72OeBsKT+56wa5JeBrQ7EQt+Ptm1r hQ4m6fv51LSTvdgSXU/bHzJl9Xv3vxhUBnvf8YjnRZX4dbQqRWt5Yj8ERcF4pFND7m6bogDP JlAKjnlblrmGuhHjDkV1RUsZ+RtixZJGbFfqFCgL3Y79FupgE586NCr/Zv20vp9/oGOu55Dq r/Q+BVfYp1P7wrhJRGdZM8qPuMexzwqC33QRCvyHTcZeg60iH22tbKCItc3pDeRHVP9up0pK j8
  • Ironport-sdr: 1zgF++htwVn4p8NMbXFpxH/6FLFHfETlW3zL4sbV9ffkDY8WddiBR7UqYdjWR7aoRqzYX5gCV/ zEn0ZedqjE7bSFN6dxbVLFwEQ1jpj7usOAJfBhnX2pSVYpZ10juRKsA15S3IoLYZtoy+7LqKAG jvWnZgjPCeqvGfzVpDxn+rNkaLagQ4ReKzr+8i0fmUe9hjEhZcv2nVwhlF9nt3ncgWIQcTj1i+ CDMquik7meD7KaGsxpWu0ptn7jvAtGZdYOegbgpYw8i8kNejmP7YKZctK7usj0C/ST5rvbV4ow PORbKSUsIc79CiI4Z+XzCWQZ
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Tue, Jan 18, 2022 at 03:26:36PM +0100, Jan Beulich wrote:
> On 18.01.2022 15:05, Roger Pau Monné wrote:
> > On Tue, Jan 18, 2022 at 02:39:03PM +0100, Jan Beulich wrote:
> >> On 18.01.2022 13:45, Roger Pau Monné wrote:
> >>> On Thu, Jan 13, 2022 at 02:41:31PM +0100, Jan Beulich wrote:
> >>>> Calibration logic assumes that the platform timer (HPET or ACPI PM
> >>>> timer) and the TSC are read at about the same time. This assumption may
> >>>> not hold when a long latency event (e.g. SMI or NMI) occurs between the
> >>>> two reads. Reduce the risk of reading uncorrelated values by doing at
> >>>> least four pairs of reads, using the tuple where the delta between the
> >>>> enclosing TSC reads was smallest. From the fourth iteration onwards bail
> >>>> if the new TSC delta isn't better (smaller) than the best earlier one.
> >>>
> >>> Do you have some measurements as to whether this makes the frequency
> >>> detection any more accurate?
> >>
> >> In the normal case (no SMI or alike) I haven't observed any further
> >> improvement on top of the earlier patch.
> >>
> >>>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
> >>>> ---
> >>>> Obviously (I think) instead of having both read_{hpet,pmtmr}_and_tsc()
> >>>> the calibration logic could be folded between HPET and PMTMR, at the
> >>>
> >>> As an intermediate solution you could have a read_counter_and_tsc I
> >>> would think?
> >>
> >> Not sure in which way you view this as "intermediate".
> > 
> > As in not unifying both calibration logic into a single function, but
> > rather just use a generic function to read the counter and TSC.
> > 
> > With your original remark I assumed that you wanted to unify all the
> > calibration code in init_hpet and init_pmtimer into a generic
> > function, but maybe I've misunderstood.
> 
> Oh, I see. I have to admit that I see little value (at this point) to
> break out just the pair-read logic. While I did say we have similar
> issues elsewhere, my initial analysis has left me with the impression
> that those other cases are sufficiently different for such a helper to
> be of no use there.
> 
> >>>> expense of a couple more indirect calls (which may not be that much of a
> >>>> problem as this is all boot-time only). Really such folding would have
> >>>> been possible already before, just that the amount of duplicate code
> >>>> hasn't been this large so far. IOW if so desired, then perhaps the
> >>>> folding should be a separate prereq patch.
> >>>
> >>> You could even pass a platform_timesource as a parameter to that
> >>> generic read counter and TSC helper.
> >>
> >> This was the plan, yes, in case I was asked to follow that route (or
> >> if I decided myself that I'd prefer that significantly over the
> >> redundancy).
> > 
> > I think having a generic read_counter_and_tsc would be better than
> > having read_{hpet,pmtmr}_and_tsc.
> > 
> >>
> >>>> --- a/xen/arch/x86/time.c
> >>>> +++ b/xen/arch/x86/time.c
> >>>> @@ -392,9 +392,36 @@ static u64 read_hpet_count(void)
> >>>>      return hpet_read32(HPET_COUNTER);
> >>>>  }
> >>>>  
> >>>> +static uint32_t __init read_hpet_and_tsc(uint64_t *tsc)
> >>>> +{
> >>>> +    uint64_t tsc_prev = *tsc = rdtsc_ordered(), tsc_min = ~0;
> >>>> +    uint32_t best = best;
> >>>> +    unsigned int i;
> >>>> +
> >>>> +    for ( i = 0; ; ++i )
> >>>> +    {
> >>>> +        uint32_t hpet = hpet_read32(HPET_COUNTER);
> >>>> +        uint64_t tsc_cur = rdtsc_ordered();
> >>>> +        uint64_t tsc_delta = tsc_cur - tsc_prev;
> >>>> +
> >>>> +        if ( tsc_delta < tsc_min )
> >>>> +        {
> >>>> +            tsc_min = tsc_delta;
> >>>> +            *tsc = tsc_cur;
> >>>> +            best = hpet;
> >>>> +        }
> >>>> +        else if ( i > 2 )
> >>>> +            break;
> >>>> +
> >>>> +        tsc_prev = tsc_cur;
> >>>
> >>> Would it be better to set tsc_prev to the current TSC value here in
> >>> order to minimize the window you are measuring? Or else tsc_delta will
> >>> also account for the time in the loop code, and there's more
> >>> likeliness from being interrupted?
> >>
> >> I did consider this (or some variant thereof), but did for the moment
> >> conclude that it wouldn't make enough of a difference. If you think
> >> it would be meaningfully better that way, I'll switch.
> >>
> >> Both here and for the aspect above, please express you preference (i.e.
> >> not in the form of a question).
> > 
> > Let's leave it as-is (just one TSC read per loop iteration).
> > 
> >>> I guess being interrupted four times so that all loops are bad is very
> >>> unlikely.
> >>>
> >>> Since this loop could in theory run multiple times, do we need to
> >>> check that the counter doesn't overflow? Or else the elapsed counter
> >>> value would be miscalculated.
> >>
> >> I don't think I see any issue with overflow here. It's the callers
> >> who need to care about that. And I don't think this function will run
> >> for long enough such that a counter would not only wrap, but also
> >> reach its initial count again.
> > 
> > Right, I haven't expressed myself correctly. It's not overflowing the
> > issue, but rather wrapping to the start count value. Limiting the
> > iterations to some fixed value (10?) in order to remove that
> > possibility completely would be OK for me.
> 
> Hmm, I was in fact hoping (and trying) to get away without any arbitrarily
> chosen numbers. The loop cannot be infinite in any event ... Please let me
> know how strong you feel about putting in place such an arbitrary limit.

It was mostly for safety reasons, I wouldn't expect that loop to need
more than 4 iterations really (which is also an arbitrary chosen
number). Let's leave it without any limit then for the time being.

Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.