[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 1/1] x86/arch: VM resume: avoid RDTSC emulation due to host clock drift



On 03/09/2019 08:54, Jan Beulich wrote:
> On 02.09.2019 20:27, Edwin Török  wrote:
>> On a Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz the host frequency drifts:
>> ```
>> (XEN) [    6.607693] Detected 2600.004 MHz processor.
>> (XEN) [ 2674.213081] dom1(hvm): 
>> mode=0,ofs=0xfffee6f70b7faa48,khz=2600018,inc=3
>> (XEN) [ 2674.213087] dom2(hvm): 
>> mode=0,ofs=0xfffee6fd499835c0,khz=2600018,inc=2
>> ```
>>
>> The 2 domains were suspended prior to rebooting the host and applying a
>> xen/microcode patch. After the reboot the frequency of the host was deemed to
>> be slightly different, and therefore switching on RDTSC emulation for the 
>> Linux
>> HVM guest, even though the difference was only 5 ppm. This CPU doesn't 
>> support
>> TSC scaling.
>>
>> Therefore we should either measure the standard deviation of our calibration
>> and have a range of acceptable frequencies as "same", or have a static
>> tolerance value.
>>
>> The platform timer's clock frequency accuracy is:
>> * IA-PC HPET Specification 1.0a sections 2.2 and 2.4.1: 500 ppm or better
>> * ACPI PM timer, and PIT timer do not have defined accuracies
>> * Intel 300 Series datasheet section 25.6: 24 MHz crystal 100 ppm or better
>> * NTP FAQ section 3.3 Clock Quality: 11 ppm drift due to temperature
>> * section 2.2.2 claims that PIT/ACPI PM timer share the same crystal as HPET 
>> and
>> thus 500 ppm as an upper bound, "the real drift is usually smaller than 
>> 30ppm"
>>
>> For simplicity and determinism opted for a static tolerance value of 100 ppm
>> here, such that the any error would be well within the error you would get 
>> with
>> HPET/Linux's calibration. NTP can cope with a drift < 500 ppm.
>> Most importantly this should stop Xen from claiming that the clock frequency 
>> on
>> the same host is different across reboots. Specifications do not currently
>> mandate an accuracy higher than 100 ppm, therefore OSes should already be 
>> able
>> to cope with such drift on real hardware. Any improvements in accuracy from
>> future specifications/motherboards wouldn't be applicable, because they would
>> also come with newer CPUs that support TSC scaling.
>>
>> If the CPU does support TSC scaling Xen will of course still attempt to match
>> the exact frequency value it thinks the guest had when it was suspended.
>> See below for `if ( hvm_tsc_scaling_supported && !d->arch.vtsc )` (not 
>> visible
>> in patch context).
>>
>> llabs() doesn't appear to be available when building xen, hence the 2 
>> comparisons.
>>
>> After this patch when suspending a VM, and rebooting the host I get this 
>> output:
>> ```
>> (XEN) [    6.614703] Detected 2600.010 MHz processor.
>> (XEN) [  138.924342] TSC marked as reliable, warp = 0 (count=2)
>> (XEN) [  138.924346] dom1(hvm): 
>> mode=0,ofs=0xfffed01901016d18,khz=2600012,inc=2
>> ```
>>
>> Signed-off-by: Edwin Török <edvin.torok@xxxxxxxxxx>
> 
> First of all - are you aware that there had been multiple iterations
> of a patch (by Olaf Hering) making this a command line and/or guest
> config controllable setting?

No, I'll take a look at them and the associated discussion.
Found a '[PATCH v10] new config option vtsc_tolerance_khz to avoid TSC 
emulation' in the archives from 9 months ago.

> If so, it would have been nice if at
> least the cover letter identified the correlation. If not, please
> take a look. After all it hasn't gone in so far because of objections
> by Andrew.
> 
> Using a hardcoded tolerance value in any event raises the question
> of how you know whether a particular guest OS can actually cope.
> 
>> --- a/xen/arch/x86/time.c
>> +++ b/xen/arch/x86/time.c
>> @@ -2171,6 +2171,12 @@ void tsc_get_info(struct domain *d, uint32_t 
>> *tsc_mode,
>>          *elapsed_nsec = 0;
>>  }
>>  
>> +static inline int frequency_same_with_tolerance(int64_t khz1, int64_t khz2)
> 
> The return type wants to be bool. And there wants to be an explaining
> comment ahead of the function, (re-)stating some of what you have in
> the description.
> 
>> +{
>> +    int64_t ppm = (khz2 - khz1) * 1000000 / khz2;
>> +    return -100 < ppm && ppm < 100;
> 
> While we have no llabs(), you should imo use either ABS() or
> __builtin_labs() / __builtin_llabs().
> 
> Furthermore, could we limit this behavior to the case of there not
> being TSC scaling available (due to running on old hardware, or due
> to it being a PV guest)?

Yes, that'd make sense.


Best regards,
--Edwin

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.