[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Workings/effectiveness of the xen-acpi-processor driver
On Tue, May 01, 2012 at 06:35:45PM -0400, Boris Ostrovsky wrote: > On 05/01/2012 04:02 PM, Konrad Rzeszutek Wilk wrote: > >On Thu, Apr 26, 2012 at 06:25:28PM +0200, Stefan Bader wrote: > >>On 26.04.2012 17:50, Konrad Rzeszutek Wilk wrote: > >>>On Wed, Apr 25, 2012 at 03:00:58PM +0200, Stefan Bader wrote: > >>>>Since there have been requests about that driver to get backported into > >>>>3.2, I > >>>>was interested to find out what or how much would be gained by that. > >>>> > >>>>The first system I tried was an AMD based one (8 core Opteron 6128@2GHz). > >>>>Which > >>>>was not very successful as the drivers bail out of the init function > >>>>because the > >>>>first call to acpi_processor_register_performance() returns -ENODEV. > >>>>There is > >>>>some frequency scaling when running without Xen, so I need to do some more > >>>>debugging there. > > I believe this is caused by the somewhat under-enlightened xen_apic_read(): > > static u32 xen_apic_read(u32 reg) > { > return 0; > } > > This results in some data, most importantly > boot_cpu_physical_apicid, not being set correctly and, in turn, > causes x86_cpu_to_apicid to be broken. What is the involvment of x86_cpu_to_apicid to acpi_processor_register_performance? Or is this more of a stab in the dark? Stefan, one way to debug this is to make the driver be a module and then configure the /sys/../acpi/debug_level and debug_layer to be 0xffffffff and try loading the module. It should print out tons of data (And the reason it returned -Exxx). > > On larger AMD systems boot processor is typically APICID=0x20 (I > don't have Intel system handy to see how it looks there). > > As a quick and dirty test you can try: > > diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c > index edc2448..1f78998 100644 > --- a/arch/x86/kernel/apic/apic.c > +++ b/arch/x86/kernel/apic/apic.c > @@ -1781,6 +1781,7 @@ void __init register_lapic_address(unsigned > long address) > } > if (boot_cpu_physical_apicid == -1U) { > boot_cpu_physical_apicid = read_apic_id(); > + boot_cpu_physical_apicid = 32; > apic_version[boot_cpu_physical_apicid] = > GET_APIC_VERSION(apic_read(APIC_LVR)); > } > > > (Set it to whatever APICID on core0 is, I suspect it won't be zero). > > -boris > > > >>> > >>>Did you back-port the other components - the ones that turn off the native > >>>frequency scalling? > >>> > >>> provide disable_cpufreq() function to disable the API. > >>> xen/acpi-processor: Do not depend on CPU frequency scaling drivers. > >>> xen/cpufreq: Disable the cpu frequency scaling drivers from loading > >>>> > >> > >>Yes, here is the full set for reference: > >> > >>* xen/cpufreq: Disable the cpu frequency scaling drivers from loading. > >>* xen/acpi: Remove the WARN's as they just create noise. > >>* xen/acpi: Fix Kconfig dependency on CPU_FREQ > >>* xen/acpi-processor: Do not depend on CPU frequency scaling drivers. > >>* xen/acpi-processor: C and P-state driver that uploads said data to hyper > >>* provide disable_cpufreq() function to disable the API. > > > >And (Linus just pulled it), you also need this one: > > df88b2d96e36d9a9e325bfcd12eb45671cbbc937 (xen/enlighten: Disable > > MWAIT_LEAF so that acpi-pad won't be loaded.) > > > >> > >>>>The second system was an Intel one (4 core i7 920@xxxxxxx) which was > >>>>successfully loading the driver. Via xenpm I can see the various > >>>>frequencies and > >>>>also see them being changed. However the cpuidle data out of xenpm looks > >>>>a bit odd: > >>>> > >>>>#> xenpm get-cpuidle-states 0 > >>>>Max C-state: C7 > >>>> > >>>>cpu id : 0 > >>>>total C-states : 2 > >>>>idle time(ms) : 10819311 > >>>>C0 : transition [00000000000000000001] > >>>> residency [00000000000000005398 ms] > >>>>C1 : transition [00000000000000000001] > >>>> residency [00000000000010819311 ms] > >>>>pc3 : [00000000000000000000 ms] > >>>>pc6 : [00000000000000000000 ms] > >>>>pc7 : [00000000000000000000 ms] > >>>>cc3 : [00000000000000000000 ms] > >>>>cc6 : [00000000000000000000 ms] > >>>> > >>>>Also gathering samples over 30s does look like only C0 and C1 are used. > >>>>This > >>> > >>>Yes. > >>>>might be because C1E support is enabled in BIOS but when looking at the > >>>>intel_idle data in sysfs when running without a hypervisor will show C3 > >>>>and C6 > >>>>for the cores. That could have been just a wrong output, so I plugged in > >>>>a power > >>>>meter and compared a kernel running natively and running as dom0 (with and > >>>>without the acpi-processor driver). > >>>> > >>>>Native: 175W > >>>>dom0: 183W (with only marginal difference between with or without the > >>>> processor driver) > >>>>[yes, the system has a somewhat high base consumption which I attribute > >>>>to a > >>>>ridiculously dimensioned graphics subsystem to be running a text console] > >>>> > >>>>This I would take as C3 and C6 really not being used and the frequency > >>>>scaling > > > >So the other thing I forgot to note is that C3->C6 have a detrimental > >effect on some Intel boxes with Xen. We haven't figured out exactly which > >ones > >and the bug is definitly in the hypervisor. The bug is that when the CPU > >goes in > >those states the NIC ends up being unresponsive. Its like the interrupts > >stopped > >being ACKed. If I run 'xenpm set-max-cstate 2' the issue disappears. > > > >>> > >>>To go in deeper modes there is also a need to backport a Xen unstable > >>>hypercall which will allow the kernel to detect the other states besides > >>>C0-C2. > >>> > >>>"XEN_SET_PDC query was implemented in c/s 23783: > >>> "ACPI: add _PDC input override mechanism". > >>> > >> > >>I see. There is a kernel patch about enabling MWAIT that refers to that... > > > >Were there any special things you ran when checking the output? Just plugging > >and looking at the results? > >> > >>> > >>>>having no impact on the idle system is not that much surprising. But if > >>>>that was > >>>>true it would also limit the usefulness of the turbo mode which I > >>>>understand > >>>>would also be limited by the c-state of the other cores. > >>> > >>>Hm, I should double-check that - but somehow I thought that Xen > >>>independetly > >>>checks for TurboMode and if the P-states are in, then they are activated. > > > >I did a bit of checking around and it does seem that is the case. From what > >I have gathered the TurboMode kicks in when the CPU is C0 mode (which should > >be obvious), and when the other cores are in anything but C0 mode. And sure > >enough that seems to be the case. But I can't get the concrete details > >whether > >the "but C0 mode" means that TurboMode will work better if the C mode is > >legacy > >C1, C2, C3 or the CPU C-states (so MWAIT enabled). Trying to find out from > >Len Brown more details.. > >>> > >>Turbo mode should be enabled. I had been only looking at a generic overview > >>about it on Intel site which sounded like it would make more of a > >>difference on > >>how much one core could get overclocked related to how many cores are active > >>(and I translated active or not into deeper c-states or not). > >>Looking at the verbose output of turbostat it seems not to make that much > >>difference whether 2-4 cores are running. A single core alone could get one > >>more > >>increment in clock stepping. That does not immediately sound a lot. And of > >>course how much or long the higher clock is used depends on other factors as > >>well and is not under OS control. > >> > >>In the end it is probably quite dynamic and hard to come up with hard facts > >>to > >>prove its value. Though if I can lower the idle power usage by reaching a > >>bit > >>further, that would greatly help to justify the effort and potential risk of > >>backporting... > > > >I understand. I wish I could give you the exact percentage points by which > >the power usage will drop. But I think the more substantial reason benefit of > >these patches is performance gains. The ones that Ian Campbell ran and were > >posted on Phorenix site paint that they are beneficial. > > > >> > >>>> > >>>>Do I misread the data I see? Or maybe its a known limitation? In case it > >>>>is > >>>>worth doing more research I'll gladly try things and gather more data. > >>> > >>>Just missing some patches. > >>> > >>>Oh, and this one: > >>> xen/acpi: Fix Kconfig dependency on CPU_FREQ > >>> > >>>Hmm.. I think a patch disappeared somewhere. > > > >That was the one I referenced at the beginning of this email. > > > >_______________________________________________ > >Xen-devel mailing list > >Xen-devel@xxxxxxxxxxxxx > >http://lists.xen.org/xen-devel > > > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |