[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] 100% reliable Oops on xen 4.0.1



>>> On 14.08.12 at 17:55, Peter Moody <pmoody@xxxxxxxxxx> wrote:
> On Tue, Aug 14, 2012 at 7:47 AM, Jan Beulich <JBeulich@xxxxxxxx> wrote:
>>>>> On 14.08.12 at 16:42, Peter Moody <pmoody@xxxxxxxxxx> wrote:
>>> Hi Ian, here's the trace in question. I'm perfectly happy with this
>>> not being a xen issue if for no other reason then it means that I have
>>> one less thing I need to look at. The python script in question was
>>> essentially doing the same thing as crasher.c, though in the middle of
>>> other, more productive activities.
>>> ...
>>> Call Trace:
>>>  [<ffffffff81654cd1>] ? down_read+0x11/0x30
>>>  [<ffffffff811c9294>] ? ext3_xattr_get+0xf4/0x2b0
>>>  [<ffffffff811baf88>] ext3_clear_blocks+0x128/0x190
>>>  [<ffffffff811bb104>] ext3_free_data+0x114/0x160
>>>  [<ffffffff811bbc0a>] ext3_truncate+0x87a/0x950
>>>  [<ffffffff812133f5>] ? journal_start+0xb5/0x100
>>>  [<ffffffff811bc840>] ext3_evict_inode+0x180/0x1a0
>>>  [<ffffffff8114065f>] evict+0x1f/0xb0
>>>  [<ffffffff81006d52>] ? check_events+0x12/0x20
>>>  [<ffffffff81140c14>] iput+0x1a4/0x290
>>>  [<ffffffff8113ed05>] dput+0x265/0x310
>>>  [<ffffffff81132435>] path_put+0x15/0x30
>>>  [<ffffffff810a5d31>] audit_syscall_exit+0x171/0x260
>>>  [<ffffffff8103ed9a>] sysexit_audit+0x21/0x5f
>>>  [<ffffffff810065ad>] ? xen_force_evtchn_callback+0xd/0x10
>>>  [<ffffffff81006d52>] ? check_events+0x12/0x20
>>
>> This obviously is just a leftover on the stack, one can see clearly
>> that we're in the middle of a syscall (which would never have
>> xen_force_evtchn_callback that deep into the stack (i.e. where
>> we just came from user mode).
> 
> Interesting, thanks. Do you have any idea why something like this
> would only be reproducible (thus far anyway, still trying to get my
> hands on some other test systems) on xen? And not just xen, but on
> this particular xen configuration (huge memory, lots of cpus, etc)? Is
> this likely a race condition with the audit subsystem or some other
> part of the kernel that this configuration somehow tickles?

>From the above as well as based on you indicating that the
traces are highly variable between instances, I'd suppose
this is memory corruption of some sort, which can easily be
hidden by all sorts of factors.

Until you can find a pattern, I don't think there can be done
much by anyone not having an affected system available for
debugging.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.