[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] xenstored crashes with SIGSEGV



On Mon, 2014-12-15 at 15:19 +0100, Philipp Hahn wrote:
> Hello Ian,
> 
> On 15.12.2014 14:17, Ian Campbell wrote:
> > On Fri, 2014-12-12 at 17:58 +0000, Ian Campbell wrote:
> >>  On Fri, 2014-12-12 at 18:20 +0100, Philipp Hahn wrote:
> >>> On 12.12.2014 17:56, Ian Campbell wrote:
> >>>> On Fri, 2014-12-12 at 17:45 +0100, Philipp Hahn wrote:
> >>>>> On 12.12.2014 17:32, Ian Campbell wrote:
> >>>>>> On Fri, 2014-12-12 at 17:14 +0100, Philipp Hahn wrote:
> ...
> >>> The 1st and 2nd trace look like this: ptr in frame #2 looks very bogus.
> >>>
> >>> (gdb) bt full
> >>> #0  talloc_chunk_from_ptr (ptr=0xff00000000) at talloc.c:116
> >>>         tc = <value optimized out>
> >>> #1  0x0000000000407edf in talloc_free (ptr=0xff00000000) at talloc.c:551
> >>>         tc = <value optimized out>
> >>> #2  0x000000000040a348 in tdb_open_ex (name=0x1941fb0
> >>> "/var/lib/xenstored/tdb.0x1935bb0",
> 
> I just noticed something strange:
> 
> > #3  0x000000000040a684 in tdb_open (name=0xff00000000 <Address
> > 0xff00000000 out of bounds>, hash_size=0,
> >     tdb_flags=4254928, open_flags=-1, mode=3119127560) at tdb.c:1773
> > #4  0x000000000040a70b in tdb_copy (tdb=0x192e540, outfile=0x1941fb0
> > "/var/lib/xenstored/tdb.0x1935bb0")
> 
> Why does gdb-7.0.1 print "name=0xff000000" here for frame 3, but for
> frame 2 and 4 the pointers are correct again?
> Verifying the values with an explicit "print" shows them as correct.

I has just noticed that and was wondering about that same thing. I'm
starting to worry that 0xff00000000 might just be a gdb thing, similar
to <value optimized out>, but infinitely more misleading.

I've also noticed in
https://forge.univention.org/bugzilla/show_bug.cgi?id=35104 that the
constant can be either 0xff000000, 0xff00000000 or 0xff0000000000 (6, 8
or 10 zeroes).

> >>>     hash_size=<value optimized out>, tdb_flags=0, open_flags=<value
> >>> optimized out>, mode=<value optimized out>,
> >>>     log_fn=0x4093b0 <null_log_fn>, hash_fn=<value optimized out>) at
> >>> tdb.c:1958
> > 
> > Please can you confirm what is at line 1958 of your copy of tdb.c. I
> > think it will be tdb->locked, but I'd like to be sure.
> 
> Yes, that's the line:
> # sed -ne 1958p tdb.c
>         SAFE_FREE(tdb->locked);

Good, thanks.

> > You are running a 64-bit dom0, correct?
> 
> yes: x86_64

Thanks for confirming. I'm resurrecting the 64-bit root partition on my
test box (which it turns out was still Debian Squeeze!)

> 
> > I've only just noticed that
> > 0xff00000000 is >32bits. My testing so far was 32-bit, I don't think it
> > should matter wrt use of uninitialised data etc.
> > 
> > I can't help feeling that 0xff00000000 must be some sort of magic
> > sentinel value to someone. I can't figure out what though.
> 
> 0xff is too much for bit flip errors. and also two crashes on different
> machines in the same location very much rules out any HW error for me.
> 
> My 2nd idea was that someone decremented 0 one too many, but then that
> would have to be an 8 bit value - reading the code I didn't see anything
> like that.

I was wondering if it was an overflow or sign-extension thing, but it
doesn't seem likely, not enough high bits set for one thing.

> One more thing we noticed: /var/lib/xenstored/ contained the tdb file
> and to bit-identical copies after the crash, so I would read that as two
> transactions being in progress at the time of the crash. Might be that
> this is important.

It's certainly worth noting, thanks.

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.