[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 2/4] xen: x86 / cpupool: clear the proper cpu_valid bit on pCPU teardown



On 25/06/15 16:04, Dario Faggioli wrote:
> On Thu, 2015-06-25 at 15:20 +0100, Andrew Cooper wrote:
>> On 25/06/15 13:15, Dario Faggioli wrote:
>>> In fact, if a pCPU belonging to some other pool than
>>> cpupool0 goes down, we want to clear the relevant bit
>>> from its actual pool, rather than always from cpupool0.
>> This sentence is a little hard to parse.
>>
>> I presume you mean "use the correct cpupools valid mask, rather than
>> cpupool0's".
>>
> Yes, that's a better way to say what I meant.
>
>>> # xl cpupool-cpu-remove Pool-0 8-15
>>> # xl cpupool-create name=\"Pool-1\"
>>> # xl cpupool-cpu-add Pool-1 8-15
>>> --> suspend
>>> --> resume
>>> (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
>>> (XEN) CPU:    8
>>> (XEN) RIP:    e008:[<ffff82d080123078>] csched_schedule+0x4be/0xb97
>>> (XEN) RFLAGS: 0000000000010087   CONTEXT: hypervisor
>>> (XEN) rax: 80007d2f7fccb780   rbx: 0000000000000009   rcx: 0000000000000000
>>> (XEN) rdx: ffff82d08031ed40   rsi: ffff82d080334980   rdi: 0000000000000000
>>> (XEN) rbp: ffff83010000fe20   rsp: ffff83010000fd40   r8:  0000000000000004
>>> (XEN) r9:  0000ffff0000ffff   r10: 00ff00ff00ff00ff   r11: 0f0f0f0f0f0f0f0f
>>> (XEN) r12: ffff8303191ea870   r13: ffff8303226aadf0   r14: 0000000000000009
>>> (XEN) r15: 0000000000000008   cr0: 000000008005003b   cr4: 00000000000026f0
>>> (XEN) cr3: 00000000dba9d000   cr2: 0000000000000000
>>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
>>> (XEN) ... ... ...
>>> (XEN) Xen call trace:
>>> (XEN)    [<ffff82d080123078>] csched_schedule+0x4be/0xb97
>>> (XEN)    [<ffff82d08012c732>] schedule+0x12a/0x63c
>>> (XEN)    [<ffff82d08012f8c8>] __do_softirq+0x82/0x8d
>>> (XEN)    [<ffff82d08012f920>] do_softirq+0x13/0x15
>>> (XEN)    [<ffff82d080164791>] idle_loop+0x5b/0x6b
>>> (XEN)
>>> (XEN) ****************************************
>>> (XEN) Panic on CPU 8:
>>> (XEN) GENERAL PROTECTION FAULT
>>> (XEN) [error_code=0000]
>>> (XEN) ****************************************
>> What is the actual cause of the #GP fault?  There are no obviously
>> poised registers.  
>>
> IIRC, CPU 8 has been just brought up and is scheduling. Not any other
> CPU from Pool-1 is online yet. We are on CPU 8, in
> csched_load_balance(), more specifically here:
>
>     ...
>     BUG_ON( cpu != snext->vcpu->processor );
>     online = cpupool_scheduler_cpumask(per_cpu(cpupool, cpu));
>     ...
>     for_each_csched_balance_step( bstep )
>     {
>         /*
>          * We peek at the non-idling CPUs in a node-wise fashion. In fact,
>          * it is more likely that we find some affine work on our same
>          * node, not to mention that migrating vcpus within the same node
>          * could well expected to be cheaper than across-nodes (memory
>          * stays local, there might be some node-wide cache[s], etc.).
>          */
>         peer_node = node;
>         do
>         {
>             /* Find out what the !idle are in this node */
>             cpumask_andnot(&workers, online, prv->idlers);
>             cpumask_and(&workers, &workers, &node_to_cpumask(peer_node));
>             __cpumask_clear_cpu(cpu, &workers);
>
>             peer_cpu = cpumask_first(&workers);
>             if ( peer_cpu >= nr_cpu_ids )
>                 goto next_node;
>             do
>             {
>                 /*
>                  * Get ahold of the scheduler lock for this peer CPU.
>                  *
>                  * Note: We don't spin on this lock but simply try it. 
> Spinning
>                  * could cause a deadlock if the peer CPU is also load
>                  * balancing and trying to lock this CPU.
>                  */
>                 spinlock_t *lock = pcpu_schedule_trylock(peer_cpu);
>
> Because of the fact that we did not clear Pool-1->cpu_valid online is
> 8-15. Also, since we _did_ clear bits 8-15 in prv->idlers when tearing
> them down, during suspend, they're all (or all but 8) workers, as far as
> the code above can tell.
>
> We therefore enter the inner do{}while with, for instance (that's what
> I've seen in my debugging), peer_cpu=9, but we've not yet done
> cpu_schedule_up()-->alloc_pdata()-->etc. for that CPU, so we die at (or
> shortly after) the end of the code snippet shown above.

Aah - it is a dereference with %rax as a pointer, which is

#define INVALID_PERCPU_AREA (0x8000000000000000L - (long)__per_cpu_start)

That explains the #GP fault which is due to a non-canonical address.

It might be better to use 0xDEAD000000000000L as the constant to make it
slightly easier to spot as a poisoned pointer.

>
>> Is it something we should modify to be a BUG or ASSERT?
>>
> Not sure how/where. Note that some more fixing of similar situations
> happen in other patches in the series, and that includes also adding
> ASSERT-s (although, no, they probably won't cover this case).
>
> I can try to think at it and to come up with something if you think it's
> important...

Not to worry. I was more concerned about working out why it was dying
with an otherwise unqualified #GP fault.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.