[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Hit ASSERT in credit2 code with NR_CPUS=1 build



On 10.03.2021 12:13, Dario Faggioli wrote:
> On Tue, 2021-03-09 at 17:24 +0100, Roger Pau Monné wrote:
>> Hello,
>>
> Hey,
> 
>> While looking at the NR_CPUS == 1 build I realized I could reliable
>> trigger the following ASSERT by creating a guest (note that dom0
>> seems
>> to be fine):
>>
> Yes, I'm (somewhat, not sure if exactly though) able to reproduce.
> 
>> (XEN) Assertion 'i != cpu' failed at credit2.c:1725
>> (XEN) ----[ Xen-4.15.0-rc  x86_64  debug=y  Tainted:   C   ]----
>> (XEN) CPU:    0
>> (XEN) RIP:    e008:[<ffff82d040249399>]
>> common/sched/credit2.c#runq_tickle+0x469/0x571
>> (XEN) RFLAGS: 0000000000010046   CONTEXT: hypervisor (d4v0)
>> (XEN) rax: ffffffffffffffff   rbx: 0000000000000000   rcx:
>> 0000000000000000
>> (XEN) rdx: ffff83086c62feb0   rsi: 0000012774fba66c   rdi:
>> ffff8307e11d5d40
>> (XEN) rbp: ffff83008c8c7cf8   rsp: ffff83008c8c7c68   r8: 
>> ffff83086c66d6c0
>> (XEN) r9:  ffff82d0405d1218   r10: 0000000000000000   r11:
>> ffff83086c631000
>> (XEN) r12: ffff83086c6437c0   r13: 0000000000000000   r14:
>> ffff83086c62fe20
>> (XEN) r15: ffff82d0405d0320   cr0: 0000000080050033   cr4:
>> 00000000003526e0
>> (XEN) cr3: 00000007e130d000   cr2: ffff88826910cb38
>> (XEN) fsb: 00007efee038b780   gsb: ffff888273400000   gss:
>> 0000000000000000
>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
>> (XEN) Xen code around <ffff82d040249399>
>> (common/sched/credit2.c#runq_tickle+0x469/0x571):
>> (XEN)  ac ff 75 3d 0f 0b 0f 0b <0f> 0b c7 45 ac 00 00 00 00 48 8d 05
>> 6f 7e 38 00
>> (XEN) Xen stack trace from rsp=ffff83008c8c7c68:
>> [...]
>> (XEN)
>> (XEN) ****************************************
>> (XEN) Panic on CPU 0:
>> (XEN) Assertion 'i != cpu' failed at credit2.c:1725
>> (XEN) ****************************************
>>
> Interesting... So, how do cpumasks look like/work, with NR_CPUS=1
> (sorry, I couldn't follow all the aspects of it too closely) ?
> 
> I'm asking because, what we're doing here is the following. First of
> all we put together a cpumask (in `mask`) out of the intersection of
> the CPUs that are in the vcpu's hard/soft affinity, are part of this
> runqueue, are idle and have not been tickled (where tickled == they've
> been poked and will go through schedule() soon):
> 
>     cpumask_andnot(&mask, &rqd->active, &rqd->idle);
>     cpumask_andnot(&mask, &mask, &rqd->tickled);
>     cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu));
> 
> Now, I would very much expect for `mask` to have at most one bit set
> (i.e., the one of our only CPU). Actually, considering how unlikely it
> would be that our only CPU is both idle and not-tickled, I expect mask
> to be empty most of the times.
> 
> Anyway, let's say the cpumask has 1 bit set (in which case, it must be
> the one associated to CPU 0, I presume?). What we do now is this:
> 
> if ( __cpumask_test_and_clear_cpu(cpu, &mask) )
> {
>     ...
> }
> 
> Which I think means that, no matter whether or not we enter the loop,
> we clear the bit. Of course, which bit depends on the value of `cpu`...
> But with NR_CPUS=1, I don't see how `cpu` can have a value different
> than the ID of the one and only CPU we have.
> 
> So, in my mind, now `mask` is empty. Therefore, I'm currently clueless
> about why we enter this loop...
> 
>>     for_each_cpu(i, &mask)
>>     {
>>         s_time_t score;
>>
>>         /* Already looked at this one above */
>>         ASSERT(i != cpu); <====
>>
> ... and we reach this point.

from xen/cpumask.h:

#define for_each_cpu(cpu, mask)                 \
        for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)(mask))

I'm struggling though why this is this way.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.