WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] xend segfaults when starting

On Wednesday 18 August 2010 16:59:30 Ian Campbell wrote:
> On Wed, 2010-08-18 at 15:02 +0100, Christoph Egger wrote:
> > On Wednesday 18 August 2010 14:14:19 Ian Campbell wrote:
> > > > In unlock_pages, the address and length passed to munlock() is:
> > > >
> > > >  laddr 0x7f7ffdfe7000, llen 0x2000
> > > >
> > > > The reason why munlock() fails is that mlock() hasn't been called
> > > > before. The hcall_buf_prep() is not called at all before the first
> > > > call to _xc_clean_hcall_buf().
> > >
> > > If hcall_buf_prep() has never been called then
> > > "pthread_getspecific(hcall_buf_pkey)" should return NULL and
> > > _xc_clean_hcall_buf will never be called from xc_clean_hcall_buf.
> > > _xc_clean_hcall_buf also ignores NULL values itself.
> >
> > Who calls hcall_buf_prep() in your case ?
> >
> > Only hypercalls call hcall_buf_prep().
> > What if no hypercalls are not called during xend startup ?
>
> Then I would have expected pthread_getspecific(hcall_buf_pkey) to return
> NULL (because _xc_init_hcall_buf was never called) and therefore for
> xc_clean_hcall_buf to not doing any unlocking.
>
> However I think my expectation was wrong. If _xc_init_hcall_buf is never
> called then hcall_buf_pkey is undefined but not necessarily invalid --
> and it seems to be the case on your system that it turns out to be valid
> (perhaps pthread_key_t is valid on NetBSD and invalid on Linux or
> something like that) and therefore we try an unlock some random address.

To make it even more mysterious, the "random" address is always the same
even across machine reboots.

>
> My updated patch ensured that hcall_buf_pkey is always initialised
> before use.

Yes, but we also need to figure out why hcall_buf_prep is never called.
Who calls hcall_buf_prep() on your machine ?
Can you provide a call trace when hcall_buf_prep() is called the first time, 
please ?

> > If you call xc_clean_hcall_buf() from xc_interface_close()
> > then you should also call hcall_buf_prep() from xc_interface_open().
> >
> > > However you say that hcall_buf_pkey is not NULL, but rather contains a
> > > valid hcall_buf containing 0x7f7ffdfe7040.
> >
> > hcall_buf itself has the address 0x7f7ffdfe7000.
> >
> > hcall_buf->buf has the address 0x7f7ffdfe7040.
>
> That's very odd -- hcall_buf->buf is allocated with xc_memalign and
> therefore should be page aligned. Are you sure the addresses aren't the
> other way round?

Yes, I am.

>
> > > The only call to "pthread_setspecific(hcall_buf_pkey, ...)" with a
> > > non-NULL value is in hcall_buf_prep(), so it must have been called at
> > > some point.
> >
> > In that case, I am puzzled why I don't get the trace.
> > Something really fishy is going on.
> >
> > > Please can you confirm if _xc_init_hcall_buf() is ever called and what
> > > the behaviour of "pthread_getspecific(hcall_buf_pkey)" is if
> > > _xc_init_hcall_buf() has never been called. I think it is supposed to
> > > return NULL in this case and we certainly rely on that.
> >
> > _xc_init_hcall_buf() is not called.  pthread_getspecific() should return
> > NULL but doesn't.
> >
> > I am starting to ask myself "How did libxc ever work?". It feels like we
> > are hunting down a long-term hidden bug.
>
> Previously _xc_clean_hcall_buf would be called IFF hcall_buf_prep had
> been called. My patch changed this to also be called on close (even if
> hcall_buf_prep was never called) and could therefore access an
> uninitialised hcall_buf_pkey.

Calling _xc_clean_hcall_buf() unconditionally and hcall_buf_prep()
conditionally sounds to me like calling free() unconditionally
and malloc() conditionally.

I will give calling hcall_buf_prep() from xc_interface_open() a try with your
patch tomorrow.

> I am reasonably confident that before my patch libxc was OK.

And is ok again after it has been backed out. :)

> > > pthread_getspecific(hcall_buf_pkey) is supposed to return NULL on
> > > error, however hcall_buf_pkey is uninitialised until
> > > _xc_init_hcall_buf, perhaps on NetBSD the uninitialised value somehow
> > > looks valid? It's not clear what the correct value to initialise a
> > > pthread_key_t to in order for it to appear invalid until it is properly
> > > setup is, but I suppose we should be initialising it before use. Please
> > > can you try this patch:
> >
> > I tried the replacement patch from the other mail.
> > With it, xend does not crash, hcall_buf is NULL,
> > pthread_getspecific() returns NULL,
>
> OK, I think that suggests that my updated patch does the right thing
> here.

Is it possible that xend can call xc_interface_close() during startup
and hcall_buf_prep() later when xend comes in interaction with xm ?

> > and I am not able to start a guest with 'xm'
> >
> > Xend has probably crashed!  Invalid or missing HTTP status code.
>
> There was another HTTP (XML/RPC) related mail on the list this morning

I saw this mail. No, I don't think it is related to this.

> -- is this related to that? Are you sure it is related to the libxc
> patch?

Yes.

> (did you by any chance update to python2.7 recently?)

No, I am on python 2.5.

> > > If that doesn't work perhaps you can reduce the issue to a simple test
> > > case like the attached? (which doesn't reproduce the issue for me on
> > > Linux) If you can do that then please run it with the attached libxc
> > > patch and post the output.
> >
> > xc_interface is 0x7f7ffdb03800
> > before prep buf is 0x7f7ffdb0b000 / 0x7f7ffdb0b040
> > after prep buf is 0x7f7ffdb0b000 / 0x7f7ffdb20000
> > after release buf is 0x7f7ffdb0b000 / 0x7f7ffdb0b040
> > xc interface close returned 0
> >
> > No crash. Is this the expected output ?
>
> It looks correct but didn't reproduce the crash so is of limited
> utility.
>
> Ian.

Christoph


-- 
---to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel