|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Workings/effectiveness of the xen-acpi-processor driver
On 02.05.2012 00:35, Boris Ostrovsky wrote:
> On 05/01/2012 04:02 PM, Konrad Rzeszutek Wilk wrote:
>> On Thu, Apr 26, 2012 at 06:25:28PM +0200, Stefan Bader wrote:
>>> On 26.04.2012 17:50, Konrad Rzeszutek Wilk wrote:
>>>> On Wed, Apr 25, 2012 at 03:00:58PM +0200, Stefan Bader wrote:
>>>>> Since there have been requests about that driver to get backported into
>>>>> 3.2, I
>>>>> was interested to find out what or how much would be gained by that.
>>>>>
>>>>> The first system I tried was an AMD based one (8 core Opteron 6128@2GHz).
>>>>> Which
>>>>> was not very successful as the drivers bail out of the init function
>>>>> because the
>>>>> first call to acpi_processor_register_performance() returns -ENODEV.
>>>>> There is
>>>>> some frequency scaling when running without Xen, so I need to do some more
>>>>> debugging there.
>
> I believe this is caused by the somewhat under-enlightened xen_apic_read():
>
> static u32 xen_apic_read(u32 reg)
> {
> return 0;
> }
>
> This results in some data, most importantly boot_cpu_physical_apicid, not
> being
> set correctly and, in turn, causes x86_cpu_to_apicid to be broken.
Ah ok. I check what my box say and try the change below and gathering more data
as suggested in the follow-ups (including to turn on the acpi debugging and
debugging in the xen acpi processor driver). The latter I had done but that only
would print "max acpi id: 16" (or so) before the failure. No wonder missing the
acpi debugging.
>
> On larger AMD systems boot processor is typically APICID=0x20 (I don't have
> Intel system handy to see how it looks there).
>
> As a quick and dirty test you can try:
>
> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> index edc2448..1f78998 100644
> --- a/arch/x86/kernel/apic/apic.c
> +++ b/arch/x86/kernel/apic/apic.c
> @@ -1781,6 +1781,7 @@ void __init register_lapic_address(unsigned long
> address)
> }
> if (boot_cpu_physical_apicid == -1U) {
> boot_cpu_physical_apicid = read_apic_id();
> + boot_cpu_physical_apicid = 32;
> apic_version[boot_cpu_physical_apicid] =
> GET_APIC_VERSION(apic_read(APIC_LVR));
> }
>
>
> (Set it to whatever APICID on core0 is, I suspect it won't be zero).
>
> -boris
>
>
>>>>
>>>> Did you back-port the other components - the ones that turn off the native
>>>> frequency scalling?
>>>>
>>>> provide disable_cpufreq() function to disable the API.
>>>> xen/acpi-processor: Do not depend on CPU frequency scaling drivers.
>>>> xen/cpufreq: Disable the cpu frequency scaling drivers from loading
>>>>>
>>>
>>> Yes, here is the full set for reference:
>>>
>>> * xen/cpufreq: Disable the cpu frequency scaling drivers from loading.
>>> * xen/acpi: Remove the WARN's as they just create noise.
>>> * xen/acpi: Fix Kconfig dependency on CPU_FREQ
>>> * xen/acpi-processor: Do not depend on CPU frequency scaling drivers.
>>> * xen/acpi-processor: C and P-state driver that uploads said data to hyper
>>> * provide disable_cpufreq() function to disable the API.
>>
>> And (Linus just pulled it), you also need this one:
>> df88b2d96e36d9a9e325bfcd12eb45671cbbc937 (xen/enlighten: Disable MWAIT_LEAF
>> so that acpi-pad won't be loaded.)
>>
>>>
>>>>> The second system was an Intel one (4 core i7 920@xxxxxxx) which was
>>>>> successfully loading the driver. Via xenpm I can see the various
>>>>> frequencies and
>>>>> also see them being changed. However the cpuidle data out of xenpm looks a
>>>>> bit odd:
>>>>>
>>>>> #> xenpm get-cpuidle-states 0
>>>>> Max C-state: C7
>>>>>
>>>>> cpu id : 0
>>>>> total C-states : 2
>>>>> idle time(ms) : 10819311
>>>>> C0 : transition [00000000000000000001]
>>>>> residency [00000000000000005398 ms]
>>>>> C1 : transition [00000000000000000001]
>>>>> residency [00000000000010819311 ms]
>>>>> pc3 : [00000000000000000000 ms]
>>>>> pc6 : [00000000000000000000 ms]
>>>>> pc7 : [00000000000000000000 ms]
>>>>> cc3 : [00000000000000000000 ms]
>>>>> cc6 : [00000000000000000000 ms]
>>>>>
>>>>> Also gathering samples over 30s does look like only C0 and C1 are used.
>>>>> This
>>>>
>>>> Yes.
>>>>> might be because C1E support is enabled in BIOS but when looking at the
>>>>> intel_idle data in sysfs when running without a hypervisor will show C3
>>>>> and C6
>>>>> for the cores. That could have been just a wrong output, so I plugged in a
>>>>> power
>>>>> meter and compared a kernel running natively and running as dom0 (with and
>>>>> without the acpi-processor driver).
>>>>>
>>>>> Native: 175W
>>>>> dom0: 183W (with only marginal difference between with or without the
>>>>> processor driver)
>>>>> [yes, the system has a somewhat high base consumption which I attribute
>>>>> to a
>>>>> ridiculously dimensioned graphics subsystem to be running a text console]
>>>>>
>>>>> This I would take as C3 and C6 really not being used and the frequency
>>>>> scaling
>>
>> So the other thing I forgot to note is that C3->C6 have a detrimental
>> effect on some Intel boxes with Xen. We haven't figured out exactly which
>> ones
>> and the bug is definitly in the hypervisor. The bug is that when the CPU
>> goes in
>> those states the NIC ends up being unresponsive. Its like the interrupts
>> stopped
>> being ACKed. If I run 'xenpm set-max-cstate 2' the issue disappears.
>>
>>>>
>>>> To go in deeper modes there is also a need to backport a Xen unstable
>>>> hypercall which will allow the kernel to detect the other states besides
>>>> C0-C2.
>>>>
>>>> "XEN_SET_PDC query was implemented in c/s 23783:
>>>> "ACPI: add _PDC input override mechanism".
>>>>
>>>
>>> I see. There is a kernel patch about enabling MWAIT that refers to that...
>>
>> Were there any special things you ran when checking the output? Just plugging
>> and looking at the results?
>>>
>>>>
>>>>> having no impact on the idle system is not that much surprising. But if
>>>>> that was
>>>>> true it would also limit the usefulness of the turbo mode which I
>>>>> understand
>>>>> would also be limited by the c-state of the other cores.
>>>>
>>>> Hm, I should double-check that - but somehow I thought that Xen
>>>> independetly
>>>> checks for TurboMode and if the P-states are in, then they are activated.
>>
>> I did a bit of checking around and it does seem that is the case. From what
>> I have gathered the TurboMode kicks in when the CPU is C0 mode (which should
>> be obvious), and when the other cores are in anything but C0 mode. And sure
>> enough that seems to be the case. But I can't get the concrete details
>> whether
>> the "but C0 mode" means that TurboMode will work better if the C mode is
>> legacy
>> C1, C2, C3 or the CPU C-states (so MWAIT enabled). Trying to find out from
>> Len Brown more details..
>>>>
>>> Turbo mode should be enabled. I had been only looking at a generic overview
>>> about it on Intel site which sounded like it would make more of a
>>> difference on
>>> how much one core could get overclocked related to how many cores are active
>>> (and I translated active or not into deeper c-states or not).
>>> Looking at the verbose output of turbostat it seems not to make that much
>>> difference whether 2-4 cores are running. A single core alone could get one
>>> more
>>> increment in clock stepping. That does not immediately sound a lot. And of
>>> course how much or long the higher clock is used depends on other factors as
>>> well and is not under OS control.
>>>
>>> In the end it is probably quite dynamic and hard to come up with hard facts
>>> to
>>> prove its value. Though if I can lower the idle power usage by reaching a
>>> bit
>>> further, that would greatly help to justify the effort and potential risk of
>>> backporting...
>>
>> I understand. I wish I could give you the exact percentage points by which
>> the power usage will drop. But I think the more substantial reason benefit of
>> these patches is performance gains. The ones that Ian Campbell ran and were
>> posted on Phorenix site paint that they are beneficial.
>>
>>>
>>>>>
>>>>> Do I misread the data I see? Or maybe its a known limitation? In case it
>>>>> is
>>>>> worth doing more research I'll gladly try things and gather more data.
>>>>
>>>> Just missing some patches.
>>>>
>>>> Oh, and this one:
>>>> xen/acpi: Fix Kconfig dependency on CPU_FREQ
>>>>
>>>> Hmm.. I think a patch disappeared somewhere.
>>
>> That was the one I referenced at the beginning of this email.
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@xxxxxxxxxxxxx
>> http://lists.xen.org/xen-devel
>>
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx
> http://lists.xen.org/xen-devel
Attachment:
signature.asc _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |