[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] CPUID improvements (phase 2) Design Doc



On 08/11/16 16:32, Jan Beulich wrote:
>>>> On 08.11.16 at 16:35, <andrew.cooper3@xxxxxxxxxx> wrote:
>> Please find inline the design doc for further CPUID improvements, planned for
>> development during the 4.9 timeframe.
> Looks good, just a couple of minor remarks.
>
>> ## Changes in hypercall behaviour
>>
>> During domain construction, some pieces of information critical to the
>> determination of the domains maximum acceptable CPUID policy are available
>> right from the very start (Most notably, the HVM and HAP flags from the
>> `XEN_DOMCTL_createdomain`).
>>
>> However, some other parameters are not available at convenient points.
>>
>> 1.  The disable flag from `XEN_DOMCTL_disable_migrate` is used to set
>>     `d->disable_migrate`, whose only purpose is to avoid the unconditional
>>     clobbering of the Invariant TSC flag.  This flag cannot even be queried 
>> by
>>     the toolstack once set.
>>
>>     There are other facilities which should be restricted based on whether a
>>     VM might migrate or not.  (e.g. The use of LBR, whose record format is
>>     hardware specific.)
> Not really - the LBR format only limits the set of hosts the VM can
> migrate to. I.e. this is just like a CPUID flag which needs to be set
> on the target host in order for the VM to be permitted to migrate
> there.

It is more complicated than that.  The LBR format also depends on
whether TSX is enabled or not, which on Haswell-WS CPUs depends on
whether hyperthreading is enabled.

ITSC itself is complicated.  If the toolstack can guarantee it only
migrates to hosts which support full TSC scaling, ITSC is still safe to
expose to the guest.

For situations like this, Xen should default safe (i.e. disable those
features), but still permit the toolstack to enable them on migrateable
VMs.  By having the toolstack explicitly opt in to enabling the unsafe
features for migratable VMs, it is taking on the added responsibility of
ensuring destination compatibility.

From an implementation point of view, this would work exactly like
choosing to use experiential features.  All we do in Xen is audit
against the maximum allowable featureset for a domain, not the default.

>
>> 2.  The use of `XEN_DOMCTL_set_address_size` switches a PV guest between
>>     native (64bit) and compat (32bit) mode.  The current behaviour for 32bit
>>     PV guests is to hide all 64bit-only features.
>>
>>     Using this hypercall once to switch from native to compat is fairly easy
>>     to cope with, feature wise, but using the hypercall a second time to
>>     switch back causes the same ordering problems with respect to
>>     `XEN_DOMCTL_set_address_size`.
>>
>>     The preferred option here is to avoid hiding 32bit features.  This is 
>> more
>>     architecturally correct, as a 32bit kernel running on 64bit-capable
>>     hardware will see 64bit-only features.
> But the upside of hiding them is that the guest won't even try to play
> any long / 64-bit mode games (which wouldn't work anyway).

A PV guest already understands that it is running under Xen and can't
change its mode.  As such, not hiding the long mode features won't
affect the guest (even more so because we were never doing it
consistently before).

>
>>  Other options would be to modify
>>     the API to make `XEN_DOMCTL_set_address_size` a single-shot hypercall
>>     (which, given the existing restrictions, shouldn't impact any usecase),
> There must have been a reason why we had made it bi-directional,
> but I don't recall what it was. As long as no existing functionality is
> impacted, I think making this single-shot would be fine.

I am still hesitant of this route, because it is far less easy to be
confident that it is a safe change to make than to revert to
architectural feature behaviour.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.