[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] RFC: Linux: disable APERF/MPERF feature in PV kernels

On 23/05/12 08:34, Jan Beulich wrote:
>>>> On 22.05.12 at 18:07, Andre Przywara <andre.przywara@xxxxxxx> wrote:
>> while testing some APERF/MPERF semantics I discovered that this feature 
>> is enabled in Xen Dom0, but is not reliable.
>> The Linux kernel's scheduler uses this feature if it sees the CPUID bit, 
>> leading to costly RDMSR traps (a few 100,000s during a kernel compile) 
>> and bogus values due to VCPU migration during the measurement.
>> The attached patch explicitly disables this CPU capability inside the 
>> Linux kernel, I couldn't measure any APERF/MPERF reads anymore with the 
>> patch applied.
>> I am not sure if the PVOPS code is the right place to fix this, we could 
>> as well do it in the HV's xen/arch/x86/traps.c:pv_cpuid().
>> Also when the Dom0 VCPUs are pinned, we could allow this, but I am not 
>> sure if it's worth to do so.
>> Awaiting your comments.
> First of all I'm of the opinion that this indeed should not be
> masked in the hypervisor - there's no reason to disallow the
> guest to read these registers (but we should of course deny
> writes as long as Xen is controlling P-states, which we do).

I am sorry but I am going to have to disagree with you on this point.

We should not be advertising this feature to any guest at all if we
can't provide an implementation which works as native expects.  Else we
are failing in our job of virtualisation.

There is 'dom0_vcpus_pin'[1] which identity pins dom0 vcpus, and
prevents update of the affinity masks, and appears to conditionally
allow access to certain MSRs.  I think it would be fine to expose this
feature iff dom0s vcpus are pinned in this fashion.  That way, the
measurement should succeed, even if dom0 only has read access to the MSRs.

The same logic applies to domU guests, although there is currently no
way I can see to get domU domains into a state where advertising it
would be safe.


[1] I am however about to submit a patch (for inclusion after the
feature freeze) which adds more semantics to this command line option. 
There is currently no way ask Xen to pin dom0s vcpus from domain create
time but allow their affinity masks to be updated later.  The XenServer
performance team have been experimenting and have found performance
benefits from being able to do this.

> Next I'd like to note that in our kernels we simply don't build
> arch/x86/kernel/cpu/sched.o. Together with CPU_FREQ being
> suppressed, there's no consumer of the feature flag in our
> kernels.
> So I would think that your suggested change is appropriate,
> but I'm adding Konrad to Cc as these days he's the one to pick
> this up.
> Jan
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx
> http://lists.xen.org/xen-devel

Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.