On Sun, Aug 13, 2006 at 01:08:53AM +0100, Keir Fraser wrote:
> On 12/8/06 7:48 pm, "Ian Pratt" <m+Ian.Pratt@xxxxxxxxxxxx> wrote:
> > It's possible this was a cset before the alignment fix which would have
> > exercised skb copying more heavily, but that's no excuse for it
> > crashing.
> If the problem has only appeared with recent changesets then it might be
> worth working backwards to find which one introduced the problem. The
> network driver changes for GSO would be an obvious candidate.
So far -testing tip (changeset 9762) looks like it avoids both the
soft lockups that I was getting earlier in -testing, and the various
crashes I've been seeing in -unstable 10868.
(In addition to the dom0 oops we're talking about in this thread,
10868 also randomly crashes domUs when xendomains restores them during
boot; I haven't captured data on that since it hasn't been as critical
and can be worked around. It should be easy enough for folks to
duplicate if anyone wants to chase it down -- get a few domUs running
on a dual-CPU box, then reboot dom0, then check the console of the
domUs after everything's back up. About half of the domUs wound up
oopsed and hung in my case, possibly the odd-numbered ones but I'm not
sure if it was that consistent. If you can't duplicate it, let me
know and I'll move some boxes back to 10868 and have another go.)
Overall, I'm *really* wishing I had time to set up a stress test suite
that exercizes DRBD, aoe, heavy disk and net I/O, etc., and run daily
or weekly changesets across it on a dedicated set of hardware, posting
the results here. Maybe after the dust settles on this rollout I'm in
the middle of right now...
Stephen G. Traugott (KG6HDQ)
Managing Partner, TerraLuna LLC
stevegt@xxxxxxxxxxxxx -- http://www.t7a.org
Xen-devel mailing list