[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: statistical time calibration

To: Roger Pau Monné <roger.pau@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>
From: Colin Percival <cperciva@xxxxxxxxxxx>
Date: Fri, 11 Mar 2022 19:25:51 -0800
Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>
Delivery-date: Sat, 12 Mar 2022 03:26:13 +0000
List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Hi everyone,

On 3/11/22 07:29, Roger Pau Monné wrote:

On Tue, Jan 18, 2022 at 04:03:56PM +0100, Jan Beulich wrote:

1) When deciding whether to increment "passes", both variance values have
an arbitrary value of 4 added to them. There's a sentence about this in
the earlier (big) comment, but it lacks any justification as to the chosen
value. What's worse, variance is not a plain number, but a quantity in the
same units as the base values. Since typically both clocks will run at
very difference frequencies, using the same (constant) value here has much
more of an effect on the lower frequency clock's value than on the higher
frequency one's.


This additional variance arises from the quantization, and so it scales with
the timing quantum.  It makes sense that it has a larger effect on a lower
frequency clock -- if you imagine trying to calibrate against a clock which
runs at 1 Hz, without this term you would read several identical values from
that clock and conclude that your clock runs at infinity Hz.

2) The second of the "important formulas" is nothing I could recall or was
able to look up. All I could find are somewhat similar, but still
sufficiently different ones. Perhaps my "introductory statistics" have
meanwhile been too long ago ... (In this context I'd like to also mention
that it took me quite a while to prove to myself that the degenerate case
of, in particular, the first iteration wouldn't lead to an early exit
from the function.)


Most statistics courses present a formula for the absolute uncertainty in the
slope rather than the relative uncertainty.  But it's easy to derive one from
the other.

3) At the bottom of the loop there is some delaying logic, leading to
later data points coming in closer succession than earlier ones. I'm
afraid I don't understand the "theoretical risk of aliasing", and hence
I'm seeing more risks than benefits from this construct.


Suppose it takes exactly 1 us to run through the loop but one of the clocks
runs at exactly 1000001 Hz.  Without the extra delay, we'll probably observe
the clock incrementing by 1 every time through the loop (since it would only
increment by 2 once a second) and end up computing the wrong frequency.  The
"noise" introduced by adding small (variable) delays eliminates any chance
of this scenario and makes the data points behave like the *random* data
points which the statistical analysis needs.

Might be easier to just add Colin, he did the original commit and can
likely answer those questions much better than me. He has also done a
bunch of work for FreeBSD/Xen.


You're too generous... I think the only real Xen work I did was adding support
for indirect segment I/Os to blkfront.  Mostly I was just packaging things up
for EC2 (back when EC2 used Xen).

My main concern is with the goal of reaching accuracy of 1PPM, and the
loop ending only after a full second (if I got that right) if that
accuracy cannot be reached. Afaict there's no guarantee that 1PPM is
reachable. My recent observations suggest that with HPET that's
feasible (but only barely), but with PMTMR it might be more like 3 or
more.


The "give up after 1 second" thing is just "fall back to historical FreeBSD
behaviour".  In my experiments I found that calibrating against the i8254
we would get 1PPM in about 50 ms while HPET was 2-3 ms.

The other slight concern I have, as previously voiced on IRC, is the use
of floating point here.


FWIW, my first version of this code (about 5 years ago) used fixed-point
arithmetic.  It was far uglier so I was happy when the FreeBSD kernel became
able to use the FPU this early in the boot process.

--
Colin Percival
Security Officer Emeritus, FreeBSD | The power to serve
Founder, Tarsnap | www.tarsnap.com | Online backups for the truly paranoid

References:
- Re: statistical time calibration
  - From: Roger Pau Monné

Prev by Date: [linux-5.4 test] 168515: tolerable FAIL - PUSHED
Next by Date: [ovmf test] 168519: regressions - FAIL
Previous by thread: Re: statistical time calibration
Next by thread: [ovmf test] 168512: regressions - FAIL
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.