Xen project Mailing List

Re: [Xen-devel] xennet: skb rides the rocket messages in domU dmesg

To: Mark Hurenkamp <mark.hurenkamp@xxxxxxxxx>

From: Jeremy Fitzhardinge <jeremy@xxxxxxxx>

Date: Tue, 01 Jun 2010 09:42:16 -0700

Delivery-date: Tue, 01 Jun 2010 10:52:54 -0700

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

On 05/29/2010 02:43 PM, Mark Hurenkamp wrote: >> That appears to mean that you're getting single packets which are larger >> than 18 pages long (72k). I'm not quite sure how that's possible, since >> I thought the datagram limit is 64k.. >> >> Are you using nfs over udp or tcp? (I think tcp, from your stack >> trace.) >> >> Does turning of tso/gso with ethtool make a difference? >> > Ok, i tried this on the running system, and it did seem to improve > things, but still i'd see some (other) messages. > After a reboot, with the new xen/stable-2.6.32.13.x based kernel > and switching tso and gso off with ethtool, these messages are > now completely gone (have the system up for about a day now). Hm. I don't think disabling them should be necessary, but the only downside in doing so is slightly higher per-packet processing cost. > > I do notice something else though (might have been there before, > but now it is the only message in domU dmesg), just after starting > nfs during boot of the domU: > > BUG: unable to handle kernel paging request at 00000002dcf32198 > IP: [<ffffffff811cf09a>] bitmap_scnprintf+0x5c/0xb6 > PGD a777067 PUD 0 > Oops: 0000 [#1] SMP > last sysfs file: /sys/devices/pci-0/pci0000:08/0000:08:02.0/local_cpus What device is 0000:08:02.0? > CPU 0 > Modules linked in: nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgss > autofs4 ipv6 wm8775 tea5767 cx25840 tuner_simple sunrpc tuner_types > tda9887 tda8290 tuner msp3400 saa7127 saa7115 ivtv i2c_algo_bit > cx2341x v4l2_common videodev v4l1_compat xen_fbfront > v4l2_compat_ioctl32 fb_sys_fops tveeprom sysimgblt joydev i2c_core > sysfillrect xen_kbdfront syscopyarea xen_netfront raid10 raid456 > async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy > async_tx raid1 raid0 multipath linear > Pid: 3468, comm: irqbalance Not tainted 2.6.32.13m7.1 #1 > RIP: e030:[<ffffffff811cf09a>] [<ffffffff811cf09a>] > bitmap_scnprintf+0x5c/0xb6 > RSP: e02b:ffff88001cbd9e18 EFLAGS: 00010246 > RAX: ffffffff81527f2b RBX: 0000000000000000 RCX: 0000000000000000 > RDX: 0000000000000000 RSI: 0000000000000ffe RDI: 0000000000000000 > RBP: ffff88001cbd9e48 R08: 0000000000000010 R09: 0000000000000001 > R10: 0000000000000357 R11: dead000000200200 R12: 0000000000000000 > R13: 0000000000000ffe R14: 00000002dcf32198 R15: ffff880002bbd000 > FS: 00007fc142b6d720(0000) GS:ffff8800046e0000(0000) > knlGS:0000000000000000 > CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 00000002dcf32198 CR3: 000000001ca58000 CR4: 0000000000002660 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process irqbalance (pid: 3468, threadinfo ffff88001cbd8000, task > ffff88001ded2920) > Stack: > 0000000000000200 ffff880002bbd000 ffff88001cbd9f58 ffff880002eeb858 > <0> ffff88001ce8ed10 ffffffff81616230 ffff88001cbd9e68 ffffffff811dd333 > <0> ffff880002eeb878 ffffffff81606368 ffff88001cbd9e98 ffffffff81273574 > Call Trace: > [<ffffffff811dd333>] local_cpus_show+0x44/0x57 > [<ffffffff81273574>] dev_attr_show+0x22/0x49 > [<ffffffff810a4e8e>] ? __get_free_pages+0x9/0x46 > [<ffffffff8112fbc2>] sysfs_read_file+0xb4/0x139 > [<ffffffff810da927>] vfs_read+0xa6/0x103 > [<ffffffff810daa3a>] sys_read+0x45/0x69 > [<ffffffff81011b02>] system_call_fastpath+0x16/0x1b > Code: e0 48 c7 c0 2b 7f 52 81 41 83 ec 20 31 db eb 60 44 89 e2 44 89 > e1 48 63 fb 83 e1 3f c1 fa 06 41 b9 01 00 00 00 48 63 d2 44 89 ee <49> > 8b 14 d6 29 de 48 d3 ea 49 8d 3c 3f 44 88 c1 41 83 ec 20 49 > RIP [<ffffffff811cf09a>] bitmap_scnprintf+0x5c/0xb6 > RSP <ffff88001cbd9e18> > CR2: 00000002dcf32198 > ---[ end trace 5f520ed1e48e5394 ]--- > > > During boot of dom0 i see the following when it is starting my domU > (seems to be more of a warning): > BUG: MAX_LOCK_DEPTH too low! > turning off the locking correctness validator. Interesting. That looks like a bug in the core kernel's mmu notifier machinery that we're using, but the only side-effect is that it will disable lockdep checking. > Pid: 5861, comm: qemu-dm Not tainted 2.6.32.13m7.1 #1 > Call Trace: > [<ffffffff8106a625>] __lock_acquire+0x431/0x459 > [<ffffffff810b029d>] ? vma_prio_tree_remove+0x27/0xda > [<ffffffff8106a6b1>] lock_acquire+0x64/0x81 > [<ffffffff810b939d>] ? mm_take_all_locks+0xe5/0x11c > [<ffffffff813cdb70>] _spin_lock_nest_lock+0x31/0x66 > [<ffffffff810b939d>] ? mm_take_all_locks+0xe5/0x11c > [<ffffffff813ccc0e>] ? mutex_lock_nested+0x34/0x39 > [<ffffffff810b939d>] mm_take_all_locks+0xe5/0x11c > [<ffffffff810cbcbc>] ? do_mmu_notifier_register+0x56/0x113 > [<ffffffff810cbcc4>] do_mmu_notifier_register+0x5e/0x113 > [<ffffffff810cbd94>] mmu_notifier_register+0xe/0x10 > [<ffffffff8123acdb>] gntdev_open+0x8f/0xcc > [<ffffffff81257dc2>] misc_open+0x188/0x21e > [<ffffffff810dd1f6>] chrdev_open+0x164/0x185 > [<ffffffff810dd092>] ? chrdev_open+0x0/0x185 > [<ffffffff810d8bd5>] __dentry_open+0x149/0x27f > [<ffffffff810d8dd1>] nameidata_to_filp+0x3d/0x4e > [<ffffffff810e59ed>] do_filp_open+0x4ee/0x9e9 > [<ffffffff8100e871>] ? xen_force_evtchn_callback+0xd/0xf > [<ffffffff8100eff2>] ? check_events+0x12/0x20 > [<ffffffff811d0637>] ? _raw_spin_unlock+0x8f/0x98 > [<ffffffff813cdb3a>] ? _spin_unlock+0x26/0x2b > [<ffffffff810eedf2>] ? alloc_fd+0x111/0x123 > [<ffffffff810d89a3>] do_sys_open+0x5e/0x10a > [<ffffffff810d8a78>] sys_open+0x1b/0x1d > [<ffffffff81011b02>] system_call_fastpath+0x16/0x1b > > > Probably not related, i see the following message in my dom0 from time > to time, and if it appears at the 'wrong' moment, it causes my system > to become completely unusable as soon as a process needs disk access. > > ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 > ata4.00: BMDMA stat 0x64 > ata4.00: failed command: READ DMA > ata4.00: cmd c8/00:08:99:13:5c/00:00:00:00:00/ef tag 0 dma 4096 in > res 51/40:00:a0:13:5c/00:00:00:00:00/ef Emask 0x9 (media error) > ata4.00: status: { DRDY ERR } > ata4.00: error: { UNC } > ata4.00: configured for UDMA/133 > ata4.01: configured for UDMA/133 > ata4: EH complete > > Not sure if this is related though, it could be just a bad disk (it > seems to be always related to the same disk), i'm going to replace the > disk, and see if that makes a difference. That looks like a real disk error - it's getting uncorrectable read errors. J _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.