Re: [Xen-devel] [xen] double fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC

On Mon, Oct 7, 2013 at 1:35 AM, Fengguang Wu <fengguang.wu@xxxxxxxxx> wrote:
> On Mon, Oct 07, 2013 at 01:12:17AM -0700, Linus Torvalds wrote:
> My pleasure! Here are 100 randomly selected call traces. Also attached
> several full dmesgs and the kconfig.

Ok, they may be randomly selected, but they are all the same. Which is
good, I guess, we're only talking about one bug.

Anyway, they all have RIP:run_timer_softirq+0x12c/0x1b8, and the code is

   0: 8b 65 c8             mov    -0x38(%rbp),%esp
   3: 4d 39 ec             cmp    %r13,%r12
   6: 0f 84 2f ff ff ff     je     0xffffffffffffff3b
   c: 41 8b 4c 24 18       mov    0x18(%r12),%ecx
  11: 4d 8b 74 24 20       mov    0x20(%r12),%r14
  16: 4d 8b 7c 24 28       mov    0x28(%r12),%r15
  1b: 4c 89 63 38           mov    %r12,0x38(%rbx)
  1f: 49 8b 44 24 08       mov    0x8(%r12),%rax
  24: 49 8b 14 24           mov    (%r12),%rdx
  28: 83 e1 02             and    $0x2,%ecx
  2b:* 48 89 42 08           mov    %rax,0x8(%rdx) <-- trapping instruction
  2f: 48 89 10             mov    %rdx,(%rax)
  32: 48 b8 00 02 20 00 00 movabs $0xdead000000200200,%rax

where that constant is LIST_POISON2 and the "and $2" seems to be
TIMER_IRQSAFE. So the trapping instruction *looks* like it's doing
__list_del() on the timer, and timer->next is NULL.

So somebody added a timer, and then deallocated/cleared the structure
before it triggered. The problem is, I can't see a way to figure out
_who_ did that.

I *think* r14 contains the function we're going to jump to in the
oops, and that could be interesting to know, but it's not decoded, so
you'd have to match it up against a symbol map...


