Excellent work isolating this!
It's not clear to me why your fix works, but as long as
it gets you going, Matt and I will try to figure out
the real problem... we are trying to get a hardware
debugger working which will make finding this kind
of problem easier.
In the meantime, in case anyone else is tracking this,
here's some more info gathered from the simulator:
The address that is being cmpxchg'd is 0xf0ffffffffff0000
which is the first 4 bytes in the local_cpu_data page
(a per-cpu page, with the symbol per_cpu__cpu_init).
The 4 bytes contain the softirq_pending flags.
If this is getting trashed somehow, that would certainly
explain the behavior.
Dan
> -----Original Message-----
> From: Haavard Bjerke [mailto:havard.bjerke@xxxxxxx]
> Sent: Friday, May 20, 2005 9:26 AM
> To: Magenheimer, Dan (HP Labs Fort Collins)
> Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
> Subject: Re: No scheduling after domU launch
>
> Here's an update on this bug. The problem seems to be the
>
> asm volatile ("cmpxchg4.acq %0=[%1],%2,ar.ccv":
> "=r"(ia64_intri_res) : "r"(ptr),
> "r"(new) : "memory");
>
> line in gcc_intrin.h:ia64_cmpxchg4_acq(). When I comment it
> out when launching domU, things are back to normal.
>
> The call sequence leading up to this instruction is:
> sched_bvt.c:bvt_wake()
> softirq.h:cpu_raise_softirq()
> bitops.h:test_and_set_bit()
> intrinsics.h:cmpxchg_acq()
> gcc_intrin.h:ia64_cmpxchg()
> gcc_intrin.h:ia64_cmpxchg4_acq()
>
> Håvard
>
> On Thu, May 19, 2005 at 07:32:54PM +0200, Haavard Bjerke wrote:
> > I think I've found some leads to the most recent bug (dom0
> freezes immediately, as opposed to after a short while). The
> problem seems to be somewhere within the cpu_raise_softirq()
> routine, which is called from bvt_wake() in sched_bvt.c. By
> not calling that routine when launching domU, I've managed to
> get control back to dom0 for a short while, after which it
> freezes as before. I'll look more into it tomorrow.
> >
> > Håvard
> >
> > On Wed, May 18, 2005 at 08:32:20AM -0700, Magenheimer, Dan
> (HP Labs Fort Collins) wrote:
> > > Given the previous discussion around this (last month?), I suspect
> > > that there is a bug somewhere that is overwriting some random
> > > memory related to the scheduler. As Mark W pointed out, your
> > > previous workaround fixed a problem that should never happen.
> > > And I don't think any recent changes in xeno-unstable-ia64 have
> > > had anything to do with the scheduler, so I suspect the
> > > "random memory" moved to a different random spot which
> > > is causing your current problem.
> > >
> > > This is just a theory... you are probably as familiar with this
> > > part of the code as anybody on this list right now. Try
> > > adding some more printf's to see if any clues arise.
> > > I'll try to take a look but probably not today, so reply
> > > to this thread if you learn anything new or interesting.
> > >
> > > Dan
> > >
> > > > -----Original Message-----
> > > > From: Haavard Bjerke [mailto:havard.bjerke@xxxxxxx]
> > > > Sent: Wednesday, May 18, 2005 9:23 AM
> > > > To: Magenheimer, Dan (HP Labs Fort Collins)
> > > > Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
> > > > Subject: No scheduling after domU launch
> > > >
> > > > Since pulling the latest xeno-unstable-ia64 a few days ago,
> > > > scheduling seems to stop immediately after launching domU,
> > > > that is, domU continues to load, while dom0 stops because the
> > > > scheduler is never entered again. This looks like the same
> > > > problem as before, with dom0 freezing after domU launch;
> > > > only, now it seems to freeze earlier. Before, I was able to
> > > > run a hypercall right after launch. This is kind of critical,
> > > > since a user-space app running in dom0 is supposed to
> > > > establish a ctrl-channel right after launch, while domU
> is booting.
> > > >
> > > > So I'm quite stuck and wondering why it stops scheduling, and
> > > > I could use some input. So far I've found out that
> > > > __enter_sceduler() is never called after domU launch, while
> > > > the routine that's supposed to call that routine,
> > > > ac_timer_softirq_action(), continues to be called. I think
> > > > the __enter_sceduler() routine should be in a heap, but I
> > > > don't understand why it would suddenly be removed from
> the heap..
> > > >
> > > > Thanks,
> > > > Håvard
> > > >
>
_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel
|