[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] RFC: Linux: disable APERF/MPERF feature in PV kernels
On Wed, May 23, 2012 at 12:44:07AM +0200, Andre Przywara wrote: > On 05/22/2012 11:00 PM, Konrad Rzeszutek Wilk wrote: > >On Tue, May 22, 2012 at 11:02:01PM +0200, Andre Przywara wrote: > >>On 05/22/2012 07:18 PM, Konrad Rzeszutek Wilk wrote: > >>>On Tue, May 22, 2012 at 06:07:11PM +0200, Andre Przywara wrote: > >>>>Hi, > >>>> > >>>>while testing some APERF/MPERF semantics I discovered that this > >>>>feature is enabled in Xen Dom0, but is not reliable. > >>>>The Linux kernel's scheduler uses this feature if it sees the CPUID > >>>>bit, leading to costly RDMSR traps (a few 100,000s during a kernel > >>>>compile) and bogus values due to VCPU migration during the > >>> > >>>Can you point me to the Linux scheduler code that does this? Thanks. > >> > >>arch/x86/kernel/cpu/sched.c contains code to read out and compute > >>APERF/MPERF registers. I added a Xen debug-key to dump a usage > >>counter added in traps.c and thus could prove that it is actually > >>the kernel that accesses these registers. > >>As far as I understood this the idea is to learn about boosting and > >>down-clocking (P-states) to get a fairer view on the actual > >>computing time a process consumed. > > > >Looks like its looking for this: > > > >X86_FEATURE_APERFMPERF > > > >Perhaps masking that should do it? Something along this in enlighten.c: > > > > cpuid_leaf1_edx_mask = > > ~((1<< X86_FEATURE_MCE) | /* disable MCE */ > > (1<< X86_FEATURE_MCA) | /* disable MCA */ > > (1<< X86_FEATURE_MTRR) | /* disable MTRR */ > > (1<< X86_FEATURE_ACC)); /* thermal monitoring > > > >would be more appropiate? > > > >Or is that attribute on a different leaf? > > Right, it is bit 0 on level 6. That's why I couldn't use any of the > predefined masks and I didn't feel like inventing a new one just for > this single bit. > We could as well explicitly use clear_cpu_cap somewhere, but I > didn't find any code place in the Xen tree already doing this, > instead it looks like it belongs to where I put it (we handle leaf 5 > in a special way already here) OK, can you resend the patch please, looking similar to what you sent earlier, but do use a #define if possible (you can have the #define in that file) and an comment explaining why this is neccessary - and point to the Linux source code that uses this. Thanks! .. snip.. > >>>>P.S. Of course this doesn't fix pure userland software like > >>>>cpupower, but I would consider this in the user's responsibility to > >>> > >>>Which would not work anymore as the cpufreq support is disabled > >>>when it boots under Xen. > >> > >>Do you mean with "anymore" in a future kernel? I tested this on > >>3.4.0 and cpupower monitor worked fine. Right, cpufreq is not > >>enabled, but cpupower uses the /dev/cpu/<n>/msr device file to > >>directly read the MSRs. So I get this output if run on an idle Dom0: > > > >Ahh. Neat. Will have to play with that. > > Bad news is we cannot forbid cpupower querying the feature directly > using the CPUID instruction in PV guests. Only we could patch it to > use /proc/cpuinfo readout instead, as this reflects the kernel view > of available features. With my patch aperfmperf is no longer there. Looks like a patch to cpupower should be cooked up too? _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |