Xen project Mailing List

Re: [Xen-devel] [PATCH 1/1] x86/arch: VM resume: avoid RDTSC emulation due to host clock drift

From: Edwin Török <edvin.torok@xxxxxxxxxx>

Date: Tue, 3 Sep 2019 10:26:37 +0100

Authentication-results: esa1.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none; spf=None smtp.pra=edvin.torok@xxxxxxxxxx; spf=Pass smtp.mailfrom=edvin.torok@xxxxxxxxxx; spf=None smtp.helo=postmaster@xxxxxxxxxxxxxxx

Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Delivery-date: Tue, 03 Sep 2019 09:26:43 +0000

Ironport-sdr: C3vTfgIC/H+WoQZEGfguOrNO66N4lZRbeWgTZ80d0EQE6f9BP3PQKluKn2bup0ntxJws48cgw4 AHibdhRUcnd7Y6zSj/vHey9BELejBCOrCpUArYlXpInK6DA/Qf/zbh4BQrvm1yeqbnwRoL1kbC m7iZg0yL14lfmLAPWhVijIIpACykEbzicwK9ePr9UKdtt3TRc/E7blYiDU3ZYEEjY9K07O7oNS 0+Xjb10PxJvcXdAAlZ80y6NdYeubA7hK8GglWHSefowyVdkWkn1GRLwBPxz02hAY3f0T5DjVgF 5kk=

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Openpgp: preference=signencrypt

On 03/09/2019 08:54, Jan Beulich wrote: > On 02.09.2019 20:27, Edwin Török wrote: >> On a Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz the host frequency drifts: >> ``` >> (XEN) [ 6.607693] Detected 2600.004 MHz processor. >> (XEN) [ 2674.213081] dom1(hvm): >> mode=0,ofs=0xfffee6f70b7faa48,khz=2600018,inc=3 >> (XEN) [ 2674.213087] dom2(hvm): >> mode=0,ofs=0xfffee6fd499835c0,khz=2600018,inc=2 >> ``` >> >> The 2 domains were suspended prior to rebooting the host and applying a >> xen/microcode patch. After the reboot the frequency of the host was deemed to >> be slightly different, and therefore switching on RDTSC emulation for the >> Linux >> HVM guest, even though the difference was only 5 ppm. This CPU doesn't >> support >> TSC scaling. >> >> Therefore we should either measure the standard deviation of our calibration >> and have a range of acceptable frequencies as "same", or have a static >> tolerance value. >> >> The platform timer's clock frequency accuracy is: >> * IA-PC HPET Specification 1.0a sections 2.2 and 2.4.1: 500 ppm or better >> * ACPI PM timer, and PIT timer do not have defined accuracies >> * Intel 300 Series datasheet section 25.6: 24 MHz crystal 100 ppm or better >> * NTP FAQ section 3.3 Clock Quality: 11 ppm drift due to temperature >> * section 2.2.2 claims that PIT/ACPI PM timer share the same crystal as HPET >> and >> thus 500 ppm as an upper bound, "the real drift is usually smaller than >> 30ppm" >> >> For simplicity and determinism opted for a static tolerance value of 100 ppm >> here, such that the any error would be well within the error you would get >> with >> HPET/Linux's calibration. NTP can cope with a drift < 500 ppm. >> Most importantly this should stop Xen from claiming that the clock frequency >> on >> the same host is different across reboots. Specifications do not currently >> mandate an accuracy higher than 100 ppm, therefore OSes should already be >> able >> to cope with such drift on real hardware. Any improvements in accuracy from >> future specifications/motherboards wouldn't be applicable, because they would >> also come with newer CPUs that support TSC scaling. >> >> If the CPU does support TSC scaling Xen will of course still attempt to match >> the exact frequency value it thinks the guest had when it was suspended. >> See below for `if ( hvm_tsc_scaling_supported && !d->arch.vtsc )` (not >> visible >> in patch context). >> >> llabs() doesn't appear to be available when building xen, hence the 2 >> comparisons. >> >> After this patch when suspending a VM, and rebooting the host I get this >> output: >> ``` >> (XEN) [ 6.614703] Detected 2600.010 MHz processor. >> (XEN) [ 138.924342] TSC marked as reliable, warp = 0 (count=2) >> (XEN) [ 138.924346] dom1(hvm): >> mode=0,ofs=0xfffed01901016d18,khz=2600012,inc=2 >> ``` >> >> Signed-off-by: Edwin Török <edvin.torok@xxxxxxxxxx> > > First of all - are you aware that there had been multiple iterations > of a patch (by Olaf Hering) making this a command line and/or guest > config controllable setting? No, I'll take a look at them and the associated discussion. Found a '[PATCH v10] new config option vtsc_tolerance_khz to avoid TSC emulation' in the archives from 9 months ago. > If so, it would have been nice if at > least the cover letter identified the correlation. If not, please > take a look. After all it hasn't gone in so far because of objections > by Andrew. > > Using a hardcoded tolerance value in any event raises the question > of how you know whether a particular guest OS can actually cope. > >> --- a/xen/arch/x86/time.c >> +++ b/xen/arch/x86/time.c >> @@ -2171,6 +2171,12 @@ void tsc_get_info(struct domain *d, uint32_t >> *tsc_mode, >> *elapsed_nsec = 0; >> } >> >> +static inline int frequency_same_with_tolerance(int64_t khz1, int64_t khz2) > > The return type wants to be bool. And there wants to be an explaining > comment ahead of the function, (re-)stating some of what you have in > the description. > >> +{ >> + int64_t ppm = (khz2 - khz1) * 1000000 / khz2; >> + return -100 < ppm && ppm < 100; > > While we have no llabs(), you should imo use either ABS() or > __builtin_labs() / __builtin_llabs(). > > Furthermore, could we limit this behavior to the case of there not > being TSC scaling available (due to running on old hardware, or due > to it being a PV guest)? Yes, that'd make sense. Best regards, --Edwin _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.