On 09/03/2010 01:02 AM, Scott Garron wrote:
> On 8/31/2010 2:06 PM, Scott Garron wrote:
>> I'm going to try to reproduce it on another, less critical machine
>> today, so I can poke at it a little more. I'll let you know what I
> To try to replicate my server environment as close as possible, I
> installed, onto my desktop machine, the same version of Xen, the same
> version of the Linux paravirt dom0 kernel, and four virtual machines: 1
> 64bit HVM, 1 32bit HVM, 1 64bit paravirt, and 1 32bit paravirt.
> My desktop machine has "similar" architecture in that it's AMD (but
> it's Athlon64 X2 5000+, not Opteron 1212) and I have not yet been able
> to trigger the bug. I ran into a different problem in which both the
> dom0 console and HVM domUs would periodically hang for several seconds
> and then return as if nothing was wrong. That happened every minute or
> so and was really annoying, but I ended up fixing it by unsetting
> CONFIG_NO_HZ in the kernel, and everything ran pretty smoothly after
What kernel is this? This sounds like a symptom of the sched_clock
problem I fixed a few weeks ago.
> I went ahead and unset some other kernel options, too - mostly
> things that were listed as "Experimental" or "If you don't know what
> this is, say N" and such. It ran the entire day, and I set up a while
> true; do lvcreate ; sleep 2 ; lvremove ; sleep 2 ; done kind of script
> to just sit there and peg lvm/dm & udev for about 15-20 minutes
> straight, without incident. I'm not sure what to make of that in terms
> of a conclusion, though. It could just be slightly different
> architecture or the fact that the machine has overall less RAM (4G
> instead of 8G). The distribution is the same, and all of the versions
> of software are the same. They're both dual core AMD 64bit CPUs.
The RAM difference could be a significant factor. If you have less than
4G then all pages are guaranteed to be directly accessible with 32-bit
pointers and 32-bit devices, whereas with more than 4G you need to deal
with the case where the kernel thinks a page is below 4G (=DMA
accessible by 32-bit device) but it is actually physically resident above.
I don't know if that's a specific factor in this case, but the error you
got suggested something very strange going on with unusual memory mappings.
> On a hunch, I copied the kernel config from my desktop to the
> server, recompiled with those options, booted into it, and tried
> triggering the bug. It took more than two tries this time around, but
> it became apparent pretty quickly that things weren't quite right.
> Creations and removals of snapshot volumes started causing lvm to return
> "/dev/dm-63: open failed: no such device or address" and something along
> the lines of (paraphrasing here) "unable to remove active logical
> volume" when the snapshot wasn't mounted or active anywhere, but a few
> seconds later, without changing anything, you could remove it. udev
> didn't seem to be removing the dm-?? devices from /dev, though.
What happens if you boot that system with "mem=4G" on the Xen command line?
> And the oops looks different this time around as well:
> [ 6791.053986] ------------[ cut here ]------------
> [ 6791.054160] kernel BUG at arch/x86/xen/mmu.c:1649!
So it has just allocated a new page to include in a pagetable, but it is
failing to pin it. That suggests that there's another mapping of that
page somewhere which is preventing the pin.
This means that something is leaving stray mappings of pages around
somewhere. We already deal with the standard mechanisms for doing this,
but perhaps LVM is keeping a private cache of mappings off to one side.
But I'm surprised we haven't seen anything like this before, given the
widespread use of LVM.
> [ 6791.054418] invalid opcode: 0000 [#1] SMP
> [ 6791.054592] last sysfs file: /sys/devices/virtual/block/dm-1/removable
> [ 6791.054761] CPU 0
> [ 6791.054923] Modules linked in: dm_snapshot tun fuse xt_multiport
> nf_nat_tftp nf_conntrack_tftp nf_nat_pptp nf_conntrack_pptp
> nf_conntrack_proto_gre nf_nat_proto_gre ntfs parport_pc parport k8temp
> floppy forcedeth [last unloaded: scsi_wait_scan]
> [ 6791.055653] Pid: 8696, comm: udevd Tainted: G W 22.214.171.124 #2
> [ 6791.055828] RIP: e030:[<ffffffff8100cc33>] [<ffffffff8100cc33>]
> [ 6791.056010] RSP: e02b:ffff88001242fdb8 EFLAGS: 00010282
> [ 6791.056010] RAX: 00000000ffffffea RBX: 000000000002af28 RCX:
> [ 6791.056010] RDX: 0000000000000000 RSI: 0000000000000001 RDI:
> [ 6791.056010] RBP: ffff88001242fdd8 R08: 00003ffffffff000 R09:
> [ 6791.056010] R10: 0000000000007ff0 R11: 000000000001b4fe R12:
> [ 6791.056010] R13: ffff880001d03010 R14: ffff88001a8e88f0 R15:
> [ 6791.056010] FS: 00007fdb8bfd57a0(0000) GS:ffff880002d6e000(0000)
> [ 6791.056010] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 6791.056010] CR2: 0000000000413e41 CR3: 000000001a84c000 CR4:
> [ 6791.056010] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> [ 6791.056010] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> [ 6791.056010] Process udevd (pid: 8696, threadinfo ffff88001242e000,
> task ffff880027f50000)
> [ 6791.056010] Stack:
> [ 6791.056010] ffff880000000000 000000000016f22f 0000000000000010
> [ 6791.056010] <0> ffff88001242fdf8 ffffffff8100e515 ffff8800125a6680
> [ 6791.056010] <0> ffff88001242fe08 ffffffff8100e548 ffff88001242fe48
> [ 6791.056010] Call Trace:
> [ 6791.056010] [<ffffffff8100e515>] xen_alloc_ptpage+0x66/0x6b
> [ 6791.056010] [<ffffffff8100e548>] xen_alloc_pte+0xe/0x10
> [ 6791.056010] [<ffffffff810c8ab2>] __pte_alloc+0x7e/0xf8
> [ 6791.056010] [<ffffffff810cae78>] handle_mm_fault+0xbb/0x7cb
> [ 6791.056010] [<ffffffff81582f75>] ? page_fault+0x25/0x30
> [ 6791.056010] [<ffffffff810381d1>] do_page_fault+0x273/0x28b
> [ 6791.056010] [<ffffffff81582f75>] page_fault+0x25/0x30
> [ 6791.056010] Code: ec 20 89 7d e0 48 89 f7 e8 c9 ff ff ff 48 8d 7d e0
> 48 89 45 e8 be 01 00 00 00 31 d2 41 ba f0 7f 00 00 e8 11 c7 ff ff 85 c0
> 74 04 <0f> 0b eb fe c9 c3 55 48 89 f8 a8 01 48 89 e5 53 74 21 48 bb ff
> [ 6791.056010] RIP [<ffffffff8100cc33>] pin_pagetable_pfn+0x31/0x37
> [ 6791.056010] RSP <ffff88001242fdb8>
> [ 6791.056010] ---[ end trace 4eaa2a86a8e2da24 ]---
> Some other things that I noticed... During boot, there were
> several messages that looked like this:
> udevd: worker did not accept message -1 (Connection refused) kill it
Are they atypical?
> (I may be slightly paraphrasing that)
> and this "WARNING" also appears:
> [ 0.004000] CPU: Physical Processor ID: 0
> [ 0.004000] CPU: Processor Core ID: 0
> [ 0.004015] mce: CPU supports 5 MCE banks
> [ 0.004231] Performance Events: AMD PMU driver.
> [ 0.004450] ------------[ cut here ]------------
> [ 0.004644] WARNING: at arch/x86/xen/enlighten.c:742
That's not a big concern. It's the AMD perf counter driver trying to
access the registers which Xen doesn't allow it to access.
> Any ideas, or does this look more like a bug with LVM/DM?
Possibly some unexpected Xen/LVM interaction rather than an outright bug.
> ( I've also tacked this new information, including the new kernel
> configuration onto the text file at:
> http://www.pridelands.org/~simba/hurricane-server.txt )
> I haven't tried disabling udev yet, but to be honest, I'm not even
> sure how to pull that off without really breaking things. Can I create
> and remove snapshots and logical volumes without udev on a system that's
> already kinda reliant on udev?
I think udev is the victim here, not the culprit.
> This post (and subsequent thread), made today, seems to be eerily
> similar to the problem I'm experiencing. I'm wondering if they're
Aside from udev being involved, they symptom looks quite different.
Xen-devel mailing list