[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Workings/effectiveness of the xen-acpi-processor driver
On 02.05.2012 00:35, Boris Ostrovsky wrote: > On 05/01/2012 04:02 PM, Konrad Rzeszutek Wilk wrote: >> On Thu, Apr 26, 2012 at 06:25:28PM +0200, Stefan Bader wrote: >>> On 26.04.2012 17:50, Konrad Rzeszutek Wilk wrote: >>>> On Wed, Apr 25, 2012 at 03:00:58PM +0200, Stefan Bader wrote: >>>>> Since there have been requests about that driver to get backported into >>>>> 3.2, I >>>>> was interested to find out what or how much would be gained by that. >>>>> >>>>> The first system I tried was an AMD based one (8 core Opteron 6128@2GHz). >>>>> Which >>>>> was not very successful as the drivers bail out of the init function >>>>> because the >>>>> first call to acpi_processor_register_performance() returns -ENODEV. >>>>> There is >>>>> some frequency scaling when running without Xen, so I need to do some more >>>>> debugging there. > > I believe this is caused by the somewhat under-enlightened xen_apic_read(): > > static u32 xen_apic_read(u32 reg) > { > return 0; > } > > This results in some data, most importantly boot_cpu_physical_apicid, not > being > set correctly and, in turn, causes x86_cpu_to_apicid to be broken. Ah ok. I check what my box say and try the change below and gathering more data as suggested in the follow-ups (including to turn on the acpi debugging and debugging in the xen acpi processor driver). The latter I had done but that only would print "max acpi id: 16" (or so) before the failure. No wonder missing the acpi debugging. > > On larger AMD systems boot processor is typically APICID=0x20 (I don't have > Intel system handy to see how it looks there). > > As a quick and dirty test you can try: > > diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c > index edc2448..1f78998 100644 > --- a/arch/x86/kernel/apic/apic.c > +++ b/arch/x86/kernel/apic/apic.c > @@ -1781,6 +1781,7 @@ void __init register_lapic_address(unsigned long > address) > } > if (boot_cpu_physical_apicid == -1U) { > boot_cpu_physical_apicid = read_apic_id(); > + boot_cpu_physical_apicid = 32; > apic_version[boot_cpu_physical_apicid] = > GET_APIC_VERSION(apic_read(APIC_LVR)); > } > > > (Set it to whatever APICID on core0 is, I suspect it won't be zero). > > -boris > > >>>> >>>> Did you back-port the other components - the ones that turn off the native >>>> frequency scalling? >>>> >>>> provide disable_cpufreq() function to disable the API. >>>> xen/acpi-processor: Do not depend on CPU frequency scaling drivers. >>>> xen/cpufreq: Disable the cpu frequency scaling drivers from loading >>>>> >>> >>> Yes, here is the full set for reference: >>> >>> * xen/cpufreq: Disable the cpu frequency scaling drivers from loading. >>> * xen/acpi: Remove the WARN's as they just create noise. >>> * xen/acpi: Fix Kconfig dependency on CPU_FREQ >>> * xen/acpi-processor: Do not depend on CPU frequency scaling drivers. >>> * xen/acpi-processor: C and P-state driver that uploads said data to hyper >>> * provide disable_cpufreq() function to disable the API. >> >> And (Linus just pulled it), you also need this one: >> df88b2d96e36d9a9e325bfcd12eb45671cbbc937 (xen/enlighten: Disable MWAIT_LEAF >> so that acpi-pad won't be loaded.) >> >>> >>>>> The second system was an Intel one (4 core i7 920@xxxxxxx) which was >>>>> successfully loading the driver. Via xenpm I can see the various >>>>> frequencies and >>>>> also see them being changed. However the cpuidle data out of xenpm looks a >>>>> bit odd: >>>>> >>>>> #> xenpm get-cpuidle-states 0 >>>>> Max C-state: C7 >>>>> >>>>> cpu id : 0 >>>>> total C-states : 2 >>>>> idle time(ms) : 10819311 >>>>> C0 : transition [00000000000000000001] >>>>> residency [00000000000000005398 ms] >>>>> C1 : transition [00000000000000000001] >>>>> residency [00000000000010819311 ms] >>>>> pc3 : [00000000000000000000 ms] >>>>> pc6 : [00000000000000000000 ms] >>>>> pc7 : [00000000000000000000 ms] >>>>> cc3 : [00000000000000000000 ms] >>>>> cc6 : [00000000000000000000 ms] >>>>> >>>>> Also gathering samples over 30s does look like only C0 and C1 are used. >>>>> This >>>> >>>> Yes. >>>>> might be because C1E support is enabled in BIOS but when looking at the >>>>> intel_idle data in sysfs when running without a hypervisor will show C3 >>>>> and C6 >>>>> for the cores. That could have been just a wrong output, so I plugged in a >>>>> power >>>>> meter and compared a kernel running natively and running as dom0 (with and >>>>> without the acpi-processor driver). >>>>> >>>>> Native: 175W >>>>> dom0: 183W (with only marginal difference between with or without the >>>>> processor driver) >>>>> [yes, the system has a somewhat high base consumption which I attribute >>>>> to a >>>>> ridiculously dimensioned graphics subsystem to be running a text console] >>>>> >>>>> This I would take as C3 and C6 really not being used and the frequency >>>>> scaling >> >> So the other thing I forgot to note is that C3->C6 have a detrimental >> effect on some Intel boxes with Xen. We haven't figured out exactly which >> ones >> and the bug is definitly in the hypervisor. The bug is that when the CPU >> goes in >> those states the NIC ends up being unresponsive. Its like the interrupts >> stopped >> being ACKed. If I run 'xenpm set-max-cstate 2' the issue disappears. >> >>>> >>>> To go in deeper modes there is also a need to backport a Xen unstable >>>> hypercall which will allow the kernel to detect the other states besides >>>> C0-C2. >>>> >>>> "XEN_SET_PDC query was implemented in c/s 23783: >>>> "ACPI: add _PDC input override mechanism". >>>> >>> >>> I see. There is a kernel patch about enabling MWAIT that refers to that... >> >> Were there any special things you ran when checking the output? Just plugging >> and looking at the results? >>> >>>> >>>>> having no impact on the idle system is not that much surprising. But if >>>>> that was >>>>> true it would also limit the usefulness of the turbo mode which I >>>>> understand >>>>> would also be limited by the c-state of the other cores. >>>> >>>> Hm, I should double-check that - but somehow I thought that Xen >>>> independetly >>>> checks for TurboMode and if the P-states are in, then they are activated. >> >> I did a bit of checking around and it does seem that is the case. From what >> I have gathered the TurboMode kicks in when the CPU is C0 mode (which should >> be obvious), and when the other cores are in anything but C0 mode. And sure >> enough that seems to be the case. But I can't get the concrete details >> whether >> the "but C0 mode" means that TurboMode will work better if the C mode is >> legacy >> C1, C2, C3 or the CPU C-states (so MWAIT enabled). Trying to find out from >> Len Brown more details.. >>>> >>> Turbo mode should be enabled. I had been only looking at a generic overview >>> about it on Intel site which sounded like it would make more of a >>> difference on >>> how much one core could get overclocked related to how many cores are active >>> (and I translated active or not into deeper c-states or not). >>> Looking at the verbose output of turbostat it seems not to make that much >>> difference whether 2-4 cores are running. A single core alone could get one >>> more >>> increment in clock stepping. That does not immediately sound a lot. And of >>> course how much or long the higher clock is used depends on other factors as >>> well and is not under OS control. >>> >>> In the end it is probably quite dynamic and hard to come up with hard facts >>> to >>> prove its value. Though if I can lower the idle power usage by reaching a >>> bit >>> further, that would greatly help to justify the effort and potential risk of >>> backporting... >> >> I understand. I wish I could give you the exact percentage points by which >> the power usage will drop. But I think the more substantial reason benefit of >> these patches is performance gains. The ones that Ian Campbell ran and were >> posted on Phorenix site paint that they are beneficial. >> >>> >>>>> >>>>> Do I misread the data I see? Or maybe its a known limitation? In case it >>>>> is >>>>> worth doing more research I'll gladly try things and gather more data. >>>> >>>> Just missing some patches. >>>> >>>> Oh, and this one: >>>> xen/acpi: Fix Kconfig dependency on CPU_FREQ >>>> >>>> Hmm.. I think a patch disappeared somewhere. >> >> That was the one I referenced at the beginning of this email. >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@xxxxxxxxxxxxx >> http://lists.xen.org/xen-devel >> > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxx > http://lists.xen.org/xen-devel Attachment:
signature.asc _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |