[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 3/3] xen: cpupools: avoid crashing if shutting down with free CPUs



>>> On 08.05.15 at 12:20, <JGross@xxxxxxxx> wrote:
> On 05/06/2015 05:10 PM, Dario Faggioli wrote:
>> in fact, before this change, shutting down or suspending the
>> system with some CPUs not assigned to any cpupool, would
>> crash as follows:
>>
>>    (XEN) Xen call trace:
>>    (XEN)    [<ffff82d080101757>] disable_nonboot_cpus+0xb5/0x138
>>    (XEN)    [<ffff82d0801a8824>] enter_state_helper+0xbd/0x369
>>    (XEN)    [<ffff82d08010614a>] continue_hypercall_tasklet_handler+0x4a/0xb1
>>    (XEN)    [<ffff82d0801320bd>] do_tasklet_work+0x78/0xab
>>    (XEN)    [<ffff82d0801323f3>] do_tasklet+0x5e/0x8a
>>    (XEN)    [<ffff82d080163cb6>] idle_loop+0x56/0x6b
>>    (XEN)
>>    (XEN)
>>    (XEN) ****************************************
>>    (XEN) Panic on CPU 0:
>>    (XEN) Xen BUG at cpu.c:191
>>    (XEN) ****************************************
>>
>> This is because, for free CPUs, -EBUSY were being returned
>> when trying to tear them down, making cpu_down() unhappy.
>>
>> It is certainly unpractical to forbid shutting down or
>> suspenging if there are unassigned CPUs, so this change
>> fixes the above by just avoiding returning -EBUSY for those
>> CPUs. If shutting off, that does not matter much anyway. If
>> suspending, we make sure that the CPUs remain unassigned
>> when resuming.
>>
>> While there, take the chance to:
>>   - fix the doc comment of cpupool_cpu_remove() (it was
>>     wrong);
>>   - improve comments in general around and in cpupool_cpu_remove()
>>     and cpupool_cpu_add();
>>   - add a couple of ASSERT()-s for checking consistency.
> 
> I did a test with the patches applied.
> 
> # xl cpupool-cpu-remove Pool-0 2
> # echo mem >/sys/power/state
> 
> When resuming this resulted in:
> 
> (XEN) mce_intel.c:735: MCA Capability: BCAST 1 SER 0 CMCI 1 firstbank 0 
> extended MCE MSR 0
> (XEN) CPU0 CMCI LVT vector (0xf2) already installed
> (XEN) Finishing wakeup from ACPI S3 state.
> (XEN) Enabling non-boot CPUs  ...
> (XEN) Xen BUG at cpu.c:149
> (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
> (XEN) CPU:    0
> (XEN) RIP:    e008:[<ffff82d080101531>] cpu_up+0xaf/0xfe
> (XEN) RFLAGS: 0000000000010202   CONTEXT: hypervisor
> (XEN) rax: 0000000000008016   rbx: 0000000000000000   rcx: 0000000000000000
[...]
> (XEN) Xen call trace:
> (XEN)    [<ffff82d080101531>] cpu_up+0xaf/0xfe
> (XEN)    [<ffff82d080101733>] enable_nonboot_cpus+0x4f/0xfc
> (XEN)    [<ffff82d0801a6a8d>] enter_state_helper+0x2cb/0x370
> (XEN)    [<ffff82d08010615f>] continue_hypercall_tasklet_handler+0x4a/0xb1
> (XEN)    [<ffff82d08013101d>] do_tasklet_work+0x78/0xab
> (XEN)    [<ffff82d08013134c>] do_tasklet+0x5e/0x8a
> (XEN)    [<ffff82d080161bcb>] idle_loop+0x56/0x70
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) Xen BUG at cpu.c:149
> (XEN) ****************************************

Which would seem to more likely be a result of patch 2. Having
taken a closer look - is setting ret to -EINVAL at the top of
cpupool_cpu_add() really correct? I.e. it is guaranteed that
at least one of the two places altering ret will always be run
into? If it is, then I'd still suspect one of the two
cpupool_assign_cpu_locked() invocations to be failing.

In any event, unless confirmed otherwise we may need to
revert patch 2 for the time being.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.