[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation

>>> On 01.10.18 at 17:17, <andrew.cooper3@xxxxxxxxxx> wrote:
> On 01/10/18 14:29, Jan Beulich wrote:
>>>>> On 01.10.18 at 14:39, <andrew.cooper3@xxxxxxxxxx> wrote:
>>> On 07/06/18 14:08, Olaf Hering wrote:
>>>> Add an option to control when vTSC emulation will be activated for a
>>>> domU with tsc_mode=default. Without such option each TSC access from
>>>> domU will be emulated, which causes a significant perfomance drop for
>>>> workloads that make use of rdtsc.
>>>> One option to avoid the TSC option is to run domUs with tsc_mode=native.
>>>> This has the drawback that migrating a domU from a "2.3GHz" class host
>>>> to a "2.4GHz" class host may change the rate at wich the TSC counter
>>>> increases, the domU may not be prepared for that.
>>>> With the new option the host admin can decide how a domU should behave
>>>> when it is migrated across systems of the same class. Since there is
>>>> always some jitter when Xen calibrates the cpu_khz value, all hosts of
>>>> the same class will most likely have slightly different values. As a
>>>> result vTSC emulation is unavoidable. Data collected during the incident
>>>> which triggered this change showed a jitter of up to 200 KHz across
>>>> systems of the same class.
>>> Do you have any further details of the systems involved?  If they are
>>> identical systems, they should all have the same real TSC frequency, and
>>> its a known issue that Xen isn't very good at working out the
>>> frequency.  TBH, fixing that would be far better overall.
>> Are you convinced all parts match their nominal frequency without
>> _any_ deviation?
> That is the intent of publishing the numbers, yes.
>> If that was the case, we could indeed use CPUID
>> leaves 0x15 / 0x16 output, if available.
> We very much should be doing this.  There are also model-specific ways
> of getting the same data on older processors.
>> But I very much doubt this.
>> As an example, here's what bare metal Linux says on my newest
>> system:
>> tsc: Detected 2600.000 MHz processor
>> tsc: Refined TSC clocksource calibration: 2591.990 MHz
>> Xen figures:
>> (XEN) Detected 2592.107 MHz processor.
>> And then after another re-boot bare metal Linux again
>> tsc: Refined TSC clocksource calibration: 2592.008 MHz
> What is surprising here?  The calibration loop is not 100% accurate and
> cannot be made to be perfect.
> The fact that Linux and Xen agree is because they basically share the
> same calibration algorithm - not that the processor is really running at
> 2592MHz.

And I'm not claiming it is. I'm merely voicing my doubt that the
processor is running at exactly the announced 2600.000 MHz.
In which case it is simply unknown whether calibrated or
nominal values come closer to the truth.

>  For one, all calibration options will read slow by the amount
> of time it takes an interrupt to propagate through the system fabric,
> and there is basically nothing software can do to account for this.

Well, if you measure under otherwise identical conditions twice
the arrival of instances of the same signal, then its time to travel
through the fabric doesn't matter for the distance in time between
the arrival of the two instances. But of course this is as idealized
as is an assumption that humans would be able to manufacture
clocks ticking at exactly their nominal frequency.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.