Xen project Mailing List

Re: [Xen-devel] High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x

To: Marek Marczykowski <marmarek@xxxxxxxxxxxxxxxxxxxxxx>

From: Ben Guthro <ben@xxxxxxxxxx>

Date: Tue, 16 Apr 2013 00:36:42 +0100

Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>

Delivery-date: Mon, 15 Apr 2013 23:37:18 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Mon, Apr 15, 2013 at 11:09 PM, Marek Marczykowski <marmarek@xxxxxxxxxxxxxxxxxxxxxx> wrote: > On 02.04.2013 03:13, Marek Marczykowski wrote: >> On 01.04.2013 15:53, Ben Guthro wrote: >>> On Thu, Mar 28, 2013 at 3:03 PM, Marek Marczykowski >>> <marmarek@xxxxxxxxxxxxxxxxxxxxxx> wrote: >>>> (XEN) Restoring affinity for d2v3 >>>> (XEN) Assertion '!cpus_empty(cpus) && cpu_isset(cpu, cpus)' failed at >>>> sched_credit.c:481 >>> >>> >>> I think the "fix-suspend-scheduler-*" patches posted here are applicable >>> here: >>> http://markmail.org/message/llj3oyhgjzvw3t23 >>> >>> >>> Specifically, I think you need this bit: >>> >>> diff --git a/xen/common/cpu.c b/xen/common/cpu.c >>> index 630881e..e20868c 100644 >>> --- a/xen/common/cpu.c >>> +++ b/xen/common/cpu.c >>> @@ -5,6 +5,7 @@ >>> #include <xen/init.h> >>> #include <xen/sched.h> >>> #include <xen/stop_machine.h> >>> +#include <xen/sched-if.h> >>> >>> unsigned int __read_mostly nr_cpu_ids = NR_CPUS; >>> #ifndef nr_cpumask_bits >>> @@ -212,6 +213,8 @@ void enable_nonboot_cpus(void) >>> BUG_ON(error == -EBUSY); >>> printk("Error taking CPU%d up: %d\n", cpu, error); >>> } >>> + if (system_state == SYS_STATE_resume) >>> + cpumask_set_cpu(cpu, cpupool0->cpu_valid); >>> } >>> >>> cpumask_clear(&frozen_cpus); >>> >> >> Indeed, this makes things better, but still not ideal. >> Now after resume all CPUs are in Pool-0, which is good. But CPU0 is much more >> preferred than others (xl vcpu-list). For example if I start 4 busy loops in >> dom0, I got (even after some time): >> [user@dom0 ~]$ xl vcpu-list >> Name ID VCPU CPU State Time(s) CPU >> Affinity >> dom0 0 0 0 r-- 98.5 any cpu >> dom0 0 1 0 --- 181.3 any cpu >> dom0 0 2 2 r-- 262.4 any cpu >> dom0 0 3 3 r-- 230.8 any cpu >> netvm 1 0 0 -b- 18.4 any cpu >> netvm 1 1 0 -b- 9.1 any cpu >> netvm 1 2 0 -b- 7.1 any cpu >> netvm 1 3 0 -b- 5.4 any cpu >> firewallvm 2 0 0 -b- 10.7 any cpu >> firewallvm 2 1 0 -b- 3.0 any cpu >> firewallvm 2 2 0 -b- 2.5 any cpu >> firewallvm 2 3 3 -b- 3.6 any cpu >> >> If I remove some CPU from Pool-0 and re-add it, things back to normal for >> this >> particular CPU (so I got two equally used CPUs) - to fully restore system I >> must remove all but CPU0 from Pool-0 and add it again. >> >> Also still only CPU0 have all C-states (C0-C3), all others have only C0-C1. >> This probably could be fixed by your "xen: Re-upload processor PM data to >> hypervisor after S3 resume" patch (reload of xen-acpi-processor module helps >> here). But I don't think it is a right way. It isn't necessary on other >> systems (with somehow older hardware). It must be something missing on resume >> path. The question is what... >> >> Perhaps someone need to go through enable_nonboot_cpus() (__cpu_up?) and >> check >> if it restore all things disabled in disable_nonboot_cpus() (__cpu_disable?). >> Unfortunately I don't know x86 details so good to follow that code... > > Summarize ACPI S3 issues: > > I. Fixed issues: > > 1. IRQ problem fixed by "x86: irq_move_cleanup_interrupt() must ignore legacy > vectors" commit > 2. Assertion failure on resume with vcpu affinity used, fixes by "x86/S3: > Restore broken vcpu affinity on resume" commit > > > II. Not (fully) fixed issues: > > 1. CPU Pool-0 contains only CPU0 after resume - patch quoted above fixes the > issue, but it isn't applied to xen-unstable > 2. After resume scheduler chooses (almost) only CPU0 (above quoted listing). > Removing and re-adding all CPUs to Pool-0 solves the problem. Perhaps some > timers are not restarted after resume? Marek, Please try the patch from this thread to see if it solves your 2 issues above: http://markmail.org/thread/35ecqimv7bwq3k6d This patch was NAK'ed due to cpupool breakage...but in my testing, it solved both of these problems. I don't know how to properly solve it in a cpupool compatible way... but I also haven't put much additional effort into doing so. > 3. ACPI C-states are only present for CPU0 (after resume of course), fixed by > "xen: Re-upload processor PM data to hypervisor after S3" patch by Ben, but it > isn't in upstream linux (nor Konrad's acpi-s3 branches). I don't recall seeing any ACK / NAK from Konrad on this. Original post: https://patchwork.kernel.org/patch/2033981/ Konrad - do you have any thoughts about incorporating this into a future merge window? Ben _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.