[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] x86/S3: Fix cpu pool scheduling after suspend/resume
On Tue, Apr 9, 2013 at 10:52 AM, Keir Fraser <keir.xen@xxxxxxxxx> wrote: > On 09/04/2013 14:17, "Ben Guthro" <benjamin.guthro@xxxxxxxxxx> wrote: > >> On Tue, Apr 9, 2013 at 9:03 AM, Jan Beulich <JBeulich@xxxxxxxx> wrote: >>>>>> On 09.04.13 at 14:46, Ben Guthro <benjamin.guthro@xxxxxxxxxx> wrote: >>>> This review is another S3 scheduler problem with the system_state variable >>>> introduced with the following changeset: >>>> http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=269f543ea750ed567d18f2e8 >>>> 19e5d5ce58eda5c5 >>>> >>>> Specifically, the cpu_callback function that takes the CPU down during >>>> suspend, and back up during resume. >>>> We were seeing situations where, after S3, only CPU0 was in cpupool0. Guest >>>> performance suffered greatly, since all vcpus were only on a single pcpu. >>>> Guests under high CPU load showed the problem much more quickly than an >>>> idle >>>> guest. >>>> >>>> Removing this if condition forces the CPUs to go through the expected >>>> online/offline state, and be properly scheduled after S3. >>> >>> But this doesn't explain _why_ the code block you remove was >>> wrong. And that would be vital to understand, so we can be >>> reasonably sure this change won't lead to regressions elsewhere >>> again. >> >> I would argue that there has been so many problems with the original >> changeset, that the argument should be in the other direction - since >> this changeset that introduced the system_state variable, nobody has >> been able to successfully suspend, as has been discussed in multiple >> threads over the past year. > > We could revert all the system_state patches and try Juergen's original > patch? Apologies for my delay on responding to this - I have been traveling. I think that reversion of the original patch should be considered, if we want to have S3 work. IMO - Severe regression of a long standing platform (S3) feature for the introduction of a new one (cpupools) seems like a problem that should be addressed by the author of the new feature, or that new feature should be reverted. Unfortunately, since Konrad's acpi-s3 patches are not upstream, it makes it difficult to add to the automated tests to catch these early. Ben > > -- Keir > >> What is the reason that this particular callback gets bailed out of, >> but not others? >> Previously, the code worked, and went through this code path. >> Why this one, in particular? >> >> We have been systematically removing parts of the system_state >> changeset, in regard to the S3 path. This is just another one that >> puts it back to the way it was prior to that changeset (at least the >> second hunk of it) >> >> I'm open to other suggestions, but this was the only path that >> explained the fact that all of the vcpus would end up on cpu0. >> >> Ben >> >>> >>> Jan >>> >>>> This also includes a necessary partial change proposed earlier by Tomasz >>>> Wroblewski here: >>>> http://lists.xen.org/archives/html/xen-devel/2013-01/msg02206.html >>>> >>>> It should also resolve the issues discussed in this thread: >>>> http://lists.xen.org/archives/html/xen-devel/2012-11/msg01801.html >>>> >>>> Signed-off-by: Ben Guthro <benjamin.guthro@xxxxxxxxxx> >>>> --- >>>> xen/common/cpu.c | 3 +++ >>>> xen/common/cpupool.c | 5 ----- >>>> 2 files changed, 3 insertions(+), 5 deletions(-) >>>> >>>> diff --git a/xen/common/cpu.c b/xen/common/cpu.c >>>> index 630881e..e20868c 100644 >>>> --- a/xen/common/cpu.c >>>> +++ b/xen/common/cpu.c >>>> @@ -5,6 +5,7 @@ >>>> #include <xen/init.h> >>>> #include <xen/sched.h> >>>> #include <xen/stop_machine.h> >>>> +#include <xen/sched-if.h> >>>> >>>> unsigned int __read_mostly nr_cpu_ids = NR_CPUS; >>>> #ifndef nr_cpumask_bits >>>> @@ -212,6 +213,8 @@ void enable_nonboot_cpus(void) >>>> BUG_ON(error == -EBUSY); >>>> printk("Error taking CPU%d up: %d\n", cpu, error); >>>> } >>>> + if (system_state == SYS_STATE_resume) >>>> + cpumask_set_cpu(cpu, cpupool0->cpu_valid); >>>> } >>>> >>>> cpumask_clear(&frozen_cpus); >>>> diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c >>>> index 10b10f8..a9653a8 100644 >>>> --- a/xen/common/cpupool.c >>>> +++ b/xen/common/cpupool.c >>>> @@ -633,10 +633,6 @@ static int cpu_callback( >>>> unsigned int cpu = (unsigned long)hcpu; >>>> int rc = 0; >>>> >>>> - if ( (system_state == SYS_STATE_suspend) || >>>> - (system_state == SYS_STATE_resume) ) >>>> - goto out; >>>> - >>>> switch ( action ) >>>> { >>>> case CPU_DOWN_FAILED: >>>> @@ -650,7 +646,6 @@ static int cpu_callback( >>>> break; >>>> } >>>> >>>> -out: >>>> return !rc ? NOTIFY_DONE : notifier_from_errno(rc); >>>> } >>>> >>>> -- >>>> 1.7.9.5 >>>> >>>> >>>> _______________________________________________ >>>> Xen-devel mailing list >>>> Xen-devel@xxxxxxxxxxxxx >>>> http://lists.xen.org/xen-devel >>> >>> >>> >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@xxxxxxxxxxxxx >>> http://lists.xen.org/xen-devel >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@xxxxxxxxxxxxx >> http://lists.xen.org/xen-devel > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxx > http://lists.xen.org/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |