[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] x86/cpuid: Untangle Invariant TSC handling



On 04.03.2020 19:40, Andrew Cooper wrote:
> On 04/03/2020 10:25, Jan Beulich wrote:
>> On 03.03.2020 19:24, Andrew Cooper wrote:
>>> ITSC being visible to the guest is currently implicit with the toolstack
>>> unconditionally asking for it, and Xen clipping it based on the vTSC and/or
>>> XEN_DOMCTL_disable_migrate settings.
>>>
>>> This is problematic for several reasons.
>>>
>>> First, the implicit vTSC behaviour manifests as a real bug on migration to a
>>> host with a different frequency, with ITSC but without TSC scaling
>>> capabilities, whereby the ITSC feature becomes advertised to the guest.  
>>> ITSC
>>> will disappear again if the guest migrates to server with the same frequency
>>> as the original, or to one with TSC scaling support.
>>>
>>> Secondly, disallowing ITSC unless the guest doesn't migrate is conceptually
>>> wrong.  It is common to have migration pools of identical hardware, at which
>>> point the TSC frequency is the same,
>> This statement is too broad: Pools of identical hardware may have the same
>> nominal frequencies, but two distinct systems are hardly ever going to have
>> the exact same actual (measured or even real) frequencies.
> 
> There is no such thing as truly invariant TSC.  Even with the best
> hardware in the world, the reference frequency will change based on
> physical properties of the surroundings, including things like ambient
> temperature.  i.e. even a single server, sitting in a datacenter is
> likely to see a fractional change in frequency across a 24h period.
> 
> What matters is the error margins, and how long until it manifests as a
> noticeable difference.
> 
>> Recall Olaf's vTSC-tolerance patch that still hasn't landed anywhere?
> 
> This is a different problem.  Even on the same system, errors in Xen's
> frequency calculations can differ by several hundred kHz (iirc), boot to
> boot, making it quite useless for answering the question "am I running
> at the frequency the guest saw before?", which is how we just whether to
> intercept TSC accesses or not.

But that's why I've said "too broad": Right now pools of identical
hardware will not look to us as if they all had the same freq.

> There are things which can be done about this, such as using frequency
> data provided by the CPU directly (rather than correlating it with a
> separate timesource).  At that point, the only difference between two
> identical systems will be the variability in the reference clock, and
> PLL circuitry which ultimately multiplies it up from 19.2/25/100 MHz to
> the 1-3.5GHz typically encountered for core frequencies.

Right. The question just is how large the error margin is from the
nominal frequency reported via CPUID leaves 15/16 and the actual
frequency. If it's no worse than the differences we observe from
our "measurement", then yes, we could and perhaps should use that
data if available.

>>> and more modern hardware has TSC scaling
>>> support anyway.  In both cases, it is safe to advertise ITSC and migrate the
>>> guest.
>>>
>>> Remove all implicit logic logic in Xen, and make ITSC part of the max CPUID
>>> policies for guests.  Plumb an itsc parameter into xc_cpuid_apply_policy() 
>>> and
>>> have libxl__cpuid_legacy() fill in the two cases where it can reasonably
>>> expect ITSC to be safe for the guest to see.
>>>
>>> This is a behaviour change for TSC_MODE_NATIVE, where the ITSC will now
>>> reliably not appear, and for the case where the user explicitly requests 
>>> ITSC,
>>> in which case it will appear even if the guest isn't marked as nomigrate.
>> How sensible is it to allow the user to request something like ITSC with
>> no respective support underneath?
> 
> Right now, Xen will ignore ITSC if the hardware isn't capable, just like
> any other missing feature flag.
> 
> When we get the policy auditing logic in better shape, I intend to
> reject requests which can't be fulfilled.

Okay, good to know. I wonder though how well we'll be able to
express in the eventual user visible error message which of
the settings was actually refused.

>> Shouldn't we translate such a request
>> into enabling vTSC if there's no ITSC on the platform?
> 
> No, because a) doing things implicitly like this is the root of far too
> many bugs, this patch included, and b) it probably isn't what the user
> wants.
> 
> The reason to play around with TSC settings will ultimately to be try
> and avoid intercepting RDTSC, because the performance hit from
> interception dominates most other factors.
> 
>> Actually looking
>> at the change to libxl__cpuid_legacy() I wonder whether you don't instead
>> mean "requests vTSC" here.
> 
> I don't see how you come to that conclusion.  It is two separate cases
> where the toolstack can reasonably expect the guest-observed frequency
> not to differ.

Looking at this hunk

@@ -432,7 +433,22 @@ void libxl__cpuid_legacy(libxl_ctx *ctx, uint32_t domid,
     if (info->type == LIBXL_DOMAIN_TYPE_HVM)
         pae = libxl_defbool_val(info->u.hvm.pae);
 
-    xc_cpuid_apply_policy(ctx->xch, domid, NULL, 0, pae);
+    /*
+     * Advertising Invariant TSC to a guest means that the TSC frequency won't
+     * change at any point in the future.
+     *
+     * We do not have enough information about potential migration
+     * destinations to know whether advertising ITSC is safe, but if the guest
+     * isn't going to migrate, then the current hardware is all that matters.
+     *
+     * Alternatively, an internal property of vTSC is that the values read are
+     * invariant.  Advertise ITSC when we know the domain will have emualted
+     * TSC everywhere it goes.
+     */
+    itsc = (libxl_defbool_val(info->disable_migrate) ||
+            info->tsc_mode == LIBXL_TSC_MODE_ALWAYS_EMULATE);
+
+    xc_cpuid_apply_policy(ctx->xch, domid, NULL, 0, pae, itsc);

I see the check of ->tsc_mode, which aiui is a request to enable
vTSC unconditionally. This plus "no-migrate" get translated to
enabling of ITSC.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.