[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Xen-users] kernel 3.9.2 - xen 4.2.2/4.3rc1 => BUG unable to handle kernel paging request netif_poll+0x49c/0xe8



Dropping Xen-user, CCing Jan. This should be discussed on Xen-devel.

On Thu, Jul 04, 2013 at 03:43:59PM +0200, Dion Kant wrote:
> Hello Wei and all other interested people,
> 
> I saw this thread from around May. It got silent on this thread after
> your post on May 31.
> 
> Is there any progress on this problem?
> 

Sorry, no. I haven't been able to allocate time slots, and as this is
OpenSUSE kernel with forward ported patches it takes a little bit more
time to follow the code.

Last time I requested the reporter to run a test but he has not come
back until now.

> I am running into this issue as well with the openSUSE 12.3
> distribution. This is with their 3.7.10-1.16-xen kernel and Xen version
> 4.2.1_12-1.12.10. On the net I see some discussion of people hitting
> this issue but not that much.  E.g., one of the symptoms is that a guest
> crashes when running zypper install or zypper update when the Internet
> connection is fast enough.
> 

Do you have references to other reports?

> OpenSUSE 3.4.X kernels are running ok as guest on top of the openSUSE
> 12.3 Xen distribution, but apparently since 3.7.10 and higher there is
> this issue.
> 
> I spent already quite some time in getting grip on the issue. I added a
> bug to bugzilla.novell.com but no response. See
> https://bugzilla.novell.com/show_bug.cgi?id=826374 for details.
> Apparently for hitting this bug (i.e. make it all the way to the crash),
> it is required to use some hardware which performs not too slow. With
> this I mean it is easy to find hardware which is unable to reproduce the
                                                    able?
> issue.
> 

I'm not quite sure about what you mean. Do you mean this bug can only
be triggered when your receive path has real hardware NIC invloved?

And reading your test case below it doesn't seem so. Dom0 to DomU
transmission crashes the guest per your example.

> In one of my recent experiments I changed the SLAB allocater to SLUB
> which provides more detailed kernel logging. Here is the log output
> after the first detected issue regarding xennet:
> 

But the log below is not about SLUB. I cannot understand why SLAB v.s
SLUB makes a difference.

> 2013-07-03T23:51:16.560229+02:00 domUA kernel: [   97.562370] netfront:
> Too many frags
> 2013-07-03T23:51:17.228143+02:00 domUA kernel: [   98.230466] netfront:
> Too many frags
> 2013-07-03T23:51:17.596074+02:00 domUA kernel: [   98.597300] netfront:
> Too many frags
> 2013-07-03T23:51:18.740215+02:00 domUA kernel: [   99.743080]
> net_ratelimit: 2 callbacks suppressed
> 2013-07-03T23:51:18.740242+02:00 domUA kernel: [   99.743084] netfront:
> Too many frags
> 2013-07-03T23:51:19.104100+02:00 domUA kernel: [  100.104281] netfront:
> Too many frags
> 2013-07-03T23:51:19.760134+02:00 domUA kernel: [  100.760594] netfront:
> Too many frags
> 2013-07-03T23:51:21.820154+02:00 domUA kernel: [  102.821202] netfront:
> Too many frags
> 2013-07-03T23:51:22.192188+02:00 domUA kernel: [  103.192655] netfront:
> Too many frags
> 2013-07-03T23:51:26.060144+02:00 domUA kernel: [  107.062447] netfront:
> Too many frags
> 2013-07-03T23:51:26.412116+02:00 domUA kernel: [  107.415165] netfront:
> Too many frags
> 2013-07-03T23:51:27.092147+02:00 domUA kernel: [  108.094615] netfront:
> Too many frags
> 2013-07-03T23:51:27.492112+02:00 domUA kernel: [  108.494255] netfront:
> Too many frags
> 2013-07-03T23:51:27.520194+02:00 domUA kernel: [  108.522445]

"Too many frags" means your frontend is generating malformed packets.
This is not normal. And apparently you didn't use the latest kernel in
tree because the log message should be "Too many slots" in the latest
OpenSuSE kernel.

> =============================================================================
> 2013-07-03T23:51:27.520206+02:00 domUA kernel: [  108.522448] BUG
> kmalloc-1024 (Tainted: G        W   ): Redzone overwritten
> 2013-07-03T23:51:27.520209+02:00 domUA kernel: [  108.522450]
> -----------------------------------------------------------------------------
> 2013-07-03T23:51:27.520212+02:00 domUA kernel: [  108.522450]
> 2013-07-03T23:51:27.520215+02:00 domUA kernel: [  108.522452] Disabling
> lock debugging due to kernel taint
> 2013-07-03T23:51:27.520217+02:00 domUA kernel: [  108.522454] INFO:
> 0xffff8800f66068f8-0xffff8800f66068ff. First byte 0x0 instead of 0xcc
> 2013-07-03T23:51:27.520220+02:00 domUA kernel: [  108.522461] INFO:
> Allocated in __alloc_skb+0x88/0x260 age=11 cpu=0 pid=1325
> 2013-07-03T23:51:27.520223+02:00 domUA kernel: [  108.522466]  
> set_track+0x6c/0x190
> 2013-07-03T23:51:27.520225+02:00 domUA kernel: [  108.522470]  
> alloc_debug_processing+0x83/0x109
> 2013-07-03T23:51:27.520228+02:00 domUA kernel: [  108.522472]  
> __slab_alloc.constprop.48+0x523/0x593
> 2013-07-03T23:51:27.520231+02:00 domUA kernel: [  108.522474]  
> __kmalloc_track_caller+0xb4/0x200
> 2013-07-03T23:51:27.520233+02:00 domUA kernel: [  108.522477]  
> __kmalloc_reserve+0x3c/0xa0
> 2013-07-03T23:51:27.520236+02:00 domUA kernel: [  108.522478]  
> __alloc_skb+0x88/0x260
> 2013-07-03T23:51:27.520239+02:00 domUA kernel: [  108.522483]  
> network_alloc_rx_buffers+0x76/0x5f0 [xennet]
> 2013-07-03T23:51:27.520241+02:00 domUA kernel: [  108.522486]  
> netif_poll+0xcf4/0xf30 [xennet]
> 2013-07-03T23:51:27.520243+02:00 domUA kernel: [  108.522489]  
> net_rx_action+0xf0/0x2e0
> 2013-07-03T23:51:27.520246+02:00 domUA kernel: [  108.522493]  
> __do_softirq+0x127/0x280
> 2013-07-03T23:51:27.520248+02:00 domUA kernel: [  108.522496]  
> call_softirq+0x1c/0x30
> 2013-07-03T23:51:27.520251+02:00 domUA kernel: [  108.522499]  
> do_softirq+0x56/0xd0
> 2013-07-03T23:51:27.520253+02:00 domUA kernel: [  108.522501]  
> irq_exit+0x52/0xd0
> 2013-07-03T23:51:27.520256+02:00 domUA kernel: [  108.522503]  
> evtchn_do_upcall+0x281/0x2e7
> 2013-07-03T23:51:27.520258+02:00 domUA kernel: [  108.522505]  
> do_hypervisor_callback+0x1e/0x30
> 2013-07-03T23:51:27.520261+02:00 domUA kernel: [  108.522507]  
> 0x7f45f0a2f1e0
> 2013-07-03T23:51:27.520263+02:00 domUA kernel: [  108.522509] INFO:
> Freed in skb_free_head+0x5c/0x70 age=14 cpu=0 pid=1325
> 2013-07-03T23:51:27.520266+02:00 domUA kernel: [  108.522512]  
> set_track+0x6c/0x190
> 2013-07-03T23:51:27.520269+02:00 domUA kernel: [  108.522513]  
> free_debug_processing+0x151/0x201
> 2013-07-03T23:51:27.520271+02:00 domUA kernel: [  108.522515]  
> __slab_free+0x47/0x499
> 2013-07-03T23:51:27.520274+02:00 domUA kernel: [  108.522517]  
> kfree+0x1df/0x230
> 2013-07-03T23:51:27.520276+02:00 domUA kernel: [  108.522519]  
> skb_free_head+0x5c/0x70
> 2013-07-03T23:51:27.520279+02:00 domUA kernel: [  108.522521]  
> skb_release_data+0xea/0xf0
> 2013-07-03T23:51:27.520281+02:00 domUA kernel: [  108.522522]  
> __kfree_skb+0x1e/0xb0
> 2013-07-03T23:51:27.520284+02:00 domUA kernel: [  108.522524]  
> kfree_skb+0x80/0xc0
> 2013-07-03T23:51:27.520286+02:00 domUA kernel: [  108.522527]  
> netif_poll+0x824/0xf30 [xennet]
> 2013-07-03T23:51:27.520289+02:00 domUA kernel: [  108.522529]  
> net_rx_action+0xf0/0x2e0
> 2013-07-03T23:51:27.520291+02:00 domUA kernel: [  108.522530]  
> __do_softirq+0x127/0x280
> 2013-07-03T23:51:27.520294+02:00 domUA kernel: [  108.522532]  
> call_softirq+0x1c/0x30
> 2013-07-03T23:51:27.520296+02:00 domUA kernel: [  108.522534]  
> do_softirq+0x56/0xd0
> 2013-07-03T23:51:27.520299+02:00 domUA kernel: [  108.522536]  
> irq_exit+0x52/0xd0
> 2013-07-03T23:51:27.520302+02:00 domUA kernel: [  108.522538]  
> evtchn_do_upcall+0x281/0x2e7
> 2013-07-03T23:51:27.520304+02:00 domUA kernel: [  108.522539]  
> do_hypervisor_callback+0x1e/0x30
> 2013-07-03T23:51:27.520307+02:00 domUA kernel: [  108.522541] INFO: Slab
> 0xffff8800ffd78100 objects=12 used=7 fp=0xffff8800f66074d0
> flags=0x400000000000408
> 2013-07-03T23:51:27.520310+02:00 domUA kernel: [  108.522543] INFO:
> Object 0xffff8800f66064f8 @offset=9464 fp=0x0000018800000000
> 2013-07-03T23:51:27.520312+02:00 domUA kernel: [  108.522543]
> 2013-07-03T23:51:27.520315+02:00 domUA kernel: [  108.522546] Bytes b4
> ffff8800f66064e8: 4a 40 ff ff 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a 
> J@......ZZZZZZZZ
> 2013-07-03T23:51:27.520318+02:00 domUA kernel: [  108.522548] Object
> ffff8800f66064f8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
> kkkkkkkkkkkkkkkk
> 2013-07-03T23:51:27.520320+02:00 domUA kernel: [  108.522549] Object
> ffff8800f6606508: 00 16 3e 29 7e 3c 00 25 90 69 ea 4e 08 00 45 08 
> ..>)~<.%.i.N..E.
> 2013-07-03T23:51:27.520323+02:00 domUA kernel: [  108.522551] Object
> ffff8800f6606518: fe bc 46 d7 40 00 40 06 d3 69 0a 57 06 91 0a 57 
> ..F.@.@..i.W...W
> 2013-07-03T23:51:27.520326+02:00 domUA kernel: [  108.522553] Object
> ffff8800f6606528: 06 b4 9b 86 00 16 57 4d 5e bd 89 4c 40 ad 80 10 
> ......WM^..L@...
> 2013-07-03T23:51:27.520329+02:00 domUA kernel: [  108.522554] Object
> ffff8800f6606538: 00 a6 20 a2 00 00 01 01 08 0a 01 eb 40 a7 ff ff  ..
> .........@...
> 2013-07-03T23:51:27.520332+02:00 domUA kernel: [  108.522556] Object
> ffff8800f6606548: 44 fa 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
> D.kkkkkkkkkkkkkk
> 2013-07-03T23:51:27.520335+02:00 domUA kernel: [  108.522557] Object
> ffff8800f6606558: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
> kkkkkkkkkkkkkkkk
> 2013-07-03T23:51:27.520337+02:00 domUA kernel: [  108.522559] Object
> ffff8800f6606568: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
> kkkkkkkkkkkkkkkk
> 
> Skipping some of the object dumping.......
> 
> 2013-07-03T23:51:27.520583+02:00 domUA kernel: [  108.522644] Object
> ffff8800f66068d8: 00 d7 e4 ff 00 88 ff ff 00 00 00 00 00 10 00 00 
> ................
> 2013-07-03T23:51:27.520586+02:00 domUA kernel: [  108.522646] Object
> ffff8800f66068e8: 00 92 dd ff 00 88 ff ff 00 00 00 00 88 01 00 00 
> ................
> 2013-07-03T23:51:27.520588+02:00 domUA kernel: [  108.522647] Redzone
> ffff8800f66068f8: 00 92 dd ff 00 88 ff ff                          ........
> 2013-07-03T23:51:27.520591+02:00 domUA kernel: [  108.522649] Padding
> ffff8800f6606a38: 5a 5a 5a 5a 5a 5a 5a 5a                          ZZZZZZZZ
> 2013-07-03T23:51:27.520594+02:00 domUA kernel: [  108.522651] Pid: 1325,
> comm: sshd Tainted: G    B   W    3.7.10-1.16-dbg-xen #3
> 2013-07-03T23:51:27.520597+02:00 domUA kernel: [  108.522652] Call Trace:
> 2013-07-03T23:51:27.520599+02:00 domUA kernel: [  108.522658] 
> [<ffffffff8000b097>] try_stack_unwind+0x87/0x1c0
> 2013-07-03T23:51:27.520602+02:00 domUA kernel: [  108.522662] 
> [<ffffffff80008fa5>] dump_trace+0xd5/0x250
> 2013-07-03T23:51:27.520605+02:00 domUA kernel: [  108.522665] 
> [<ffffffff8000b22c>] show_trace_log_lvl+0x5c/0x80
> 2013-07-03T23:51:27.520608+02:00 domUA kernel: [  108.522668] 
> [<ffffffff8000b265>] show_trace+0x15/0x20
> 2013-07-03T23:51:27.520610+02:00 domUA kernel: [  108.522672] 
> [<ffffffff80553a69>] dump_stack+0x77/0x80
> 2013-07-03T23:51:27.520612+02:00 domUA kernel: [  108.522676] 
> [<ffffffff801491b1>] print_trailer+0x131/0x140
> 2013-07-03T23:51:27.520615+02:00 domUA kernel: [  108.522680] 
> [<ffffffff80149709>] check_bytes_and_report+0xc9/0x120
> 2013-07-03T23:51:27.520617+02:00 domUA kernel: [  108.522683] 
> [<ffffffff8014a7f6>] check_object+0x56/0x240
> 2013-07-03T23:51:27.520620+02:00 domUA kernel: [  108.522687] 
> [<ffffffff805575b6>] free_debug_processing+0xc4/0x201
> 2013-07-03T23:51:27.520622+02:00 domUA kernel: [  108.522690] 
> [<ffffffff8055773a>] __slab_free+0x47/0x499
> 2013-07-03T23:51:27.520625+02:00 domUA kernel: [  108.522694] 
> [<ffffffff8014beff>] kfree+0x1df/0x230
> 2013-07-03T23:51:27.520627+02:00 domUA kernel: [  108.522697] 
> [<ffffffff8044a8cc>] skb_free_head+0x5c/0x70
> 2013-07-03T23:51:27.520630+02:00 domUA kernel: [  108.522701] 
> [<ffffffff8044a9ca>] skb_release_data+0xea/0xf0
> 2013-07-03T23:51:27.520632+02:00 domUA kernel: [  108.522704] 
> [<ffffffff8044a9ee>] __kfree_skb+0x1e/0xb0
> 2013-07-03T23:51:27.520635+02:00 domUA kernel: [  108.522709] 
> [<ffffffff8049fa2a>] tcp_recvmsg+0x99a/0xd50
> 2013-07-03T23:51:27.520637+02:00 domUA kernel: [  108.522714] 
> [<ffffffff804c796d>] inet_recvmsg+0xed/0x110
> 2013-07-03T23:51:27.520640+02:00 domUA kernel: [  108.522718] 
> [<ffffffff80440be8>] sock_aio_read+0x158/0x190
> 2013-07-03T23:51:27.520642+02:00 domUA kernel: [  108.522722] 
> [<ffffffff8015cb68>] do_sync_read+0x98/0xf0
> 2013-07-03T23:51:27.520645+02:00 domUA kernel: [  108.522726] 
> [<ffffffff8015d32d>] vfs_read+0xbd/0x180
> 2013-07-03T23:51:27.520647+02:00 domUA kernel: [  108.522729] 
> [<ffffffff8015d442>] sys_read+0x52/0xa0
> 2013-07-03T23:51:27.520650+02:00 domUA kernel: [  108.522733] 
> [<ffffffff8056ab3b>] system_call_fastpath+0x1a/0x1f
> 2013-07-03T23:51:27.520652+02:00 domUA kernel: [  108.522736] 
> [<00007f45ef74c960>] 0x7f45ef74c95f
> 2013-07-03T23:51:27.520655+02:00 domUA kernel: [  108.522738] FIX
> kmalloc-1024: Restoring 0xffff8800f66068f8-0xffff8800f66068ff=0xcc
> 2013-07-03T23:51:27.520657+02:00 domUA kernel: [  108.522738]
> 2013-07-03T23:51:27.679444+02:00 domUA kernel: [  108.671750]
> =============================================================================
> 2013-07-03T23:51:27.679454+02:00 domUA kernel: [  108.671753] BUG
> kmalloc-1024 (Tainted: G    B   W   ): Redzone overwritten
> 2013-07-03T23:51:27.679456+02:00 domUA kernel: [  108.671754]
> -----------------------------------------------------------------------------
> 2013-07-03T23:51:27.679458+02:00 domUA kernel: [  108.671754]
> 2013-07-03T23:51:27.679460+02:00 domUA kernel: [  108.671757] INFO:
> 0xffff8800f66068f8-0xffff8800f66068ff. First byte 0xcc instead of 0xbb
> 2013-07-03T23:51:27.679462+02:00 domUA kernel: [  108.671762] INFO:
> Allocated in __alloc_skb+0x88/0x260 age=48 cpu=0 pid=1325
> 2013-07-03T23:51:27.679464+02:00 domUA kernel: [  108.671765]  
> set_track+0x6c/0x190
> 2013-07-03T23:51:27.679466+02:00 domUA kernel: [  108.671767]  
> alloc_debug_processing+0x83/0x109
> 2013-07-03T23:51:27.679468+02:00 domUA kernel: [  108.671769]  
> __slab_alloc.constprop.48+0x523/0x593
> 2013-07-03T23:51:27.679469+02:00 domUA kernel: [  108.671771]  
> __kmalloc_track_caller+0xb4/0x200
> 2013-07-03T23:51:27.679471+02:00 domUA kernel: [  108.671773]  
> __kmalloc_reserve+0x3c/0xa0
> 2013-07-03T23:51:27.679473+02:00 domUA kernel: [  108.671775]  
> __alloc_skb+0x88/0x260
> 2013-07-03T23:51:27.679475+02:00 domUA kernel: [  108.671778]  
> network_alloc_rx_buffers+0x76/0x5f0 [xennet]
> 2013-07-03T23:51:27.679476+02:00 domUA kernel: [  108.671781]  
> netif_poll+0xcf4/0xf30 [xennet]
> 2013-07-03T23:51:27.679478+02:00 domUA kernel: [  108.671783]  
> net_rx_action+0xf0/0x2e0
> 

Seems like there's memory corruption in guest RX path.

> I noticed that after turning on all this debugging stuff, a real panic
> does not appear any more.
> 
> This happens while copying a file with scp from dom0 to the guest (scp
> bigfile domu:/dev/null).
> 
> In my lab, I am currently experimenting with a SuperMicro based system
> with Xen showing the following characteristics:
> 
> __  __            _  _    ____    _     _ ____     _   _ ____    _  ___ 
>  \ \/ /___ _ __   | || |  |___ \  / |   / |___ \   / | / |___ \  / |/ _ \
>   \  // _ \ '_ \  | || |_   __) | | |   | | __) |__| | | | __) | | | | | |
>   /  \  __/ | | | |__   _| / __/ _| |   | |/ __/|__| |_| |/ __/ _| | |_| |
>  /_/\_\___|_| |_|    |_|(_)_____(_)_|___|_|_____|  |_(_)_|_____(_)_|\___/
>                                    |_____|                               
> (XEN) Xen version 4.2.1_12-1.12.10 (abuild@) (gcc (SUSE Linux) 4.7.2
> 20130108 [gcc-4_7-branch revision 195012]) Wed May 29 20:31:49 UTC 2013
> (XEN) Latest ChangeSet: 25952
> (XEN) Bootloader: GNU GRUB 0.97
> (XEN) Command line: dom0_mem=2048M,max:2048M loglvl=all guest_loglvl=all
> (XEN) Video information:
> (XEN)  VGA is text mode 80x25, font 8x16
> (XEN)  VBE/DDC methods: V2; EDID transfer time: 1 seconds
> (XEN) Disc information:
> (XEN)  Found 4 MBR signatures
> (XEN)  Found 4 EDD information structures
> (XEN) Xen-e820 RAM map:
> (XEN)  0000000000000000 - 0000000000096400 (usable)
> (XEN)  0000000000096400 - 00000000000a0000 (reserved)
> (XEN)  00000000000e0000 - 0000000000100000 (reserved)
> (XEN)  0000000000100000 - 00000000bf780000 (usable)
> (XEN)  00000000bf78e000 - 00000000bf790000 type 9
> (XEN)  00000000bf790000 - 00000000bf79e000 (ACPI data)
> (XEN)  00000000bf79e000 - 00000000bf7d0000 (ACPI NVS)
> (XEN)  00000000bf7d0000 - 00000000bf7e0000 (reserved)
> (XEN)  00000000bf7ec000 - 00000000c0000000 (reserved)
> (XEN)  00000000e0000000 - 00000000f0000000 (reserved)
> (XEN)  00000000fee00000 - 00000000fee01000 (reserved)
> (XEN)  00000000ffc00000 - 0000000100000000 (reserved)
> (XEN)  0000000100000000 - 0000000340000000 (usable)
> 
> Skipping ACPI en SRAT
> 
> (XEN) System RAM: 12279MB (12573784kB)
> 
> (XEN) NUMA: Allocated memnodemap from 33e38a000 - 33e38e000
> (XEN) NUMA: Using 8 for the hash shift.
> (XEN) Domain heap initialised DMA width 30 bits
> (XEN) found SMP MP-table at 000ff780
> (XEN) DMI present.
> 
> (XEN) Enabling APIC mode:  Phys.  Using 2 I/O APICs
> (XEN) ACPI: HPET id: 0x8086a301 base: 0xfed00000
> (XEN) Failed to get Error Log Address Range.
> (XEN) Using ACPI (MADT) for SMP configuration information
> (XEN) SMP: Allowing 24 CPUs (8 hotplug CPUs)
> (XEN) IRQ limits: 48 GSI, 3040 MSI/MSI-X
> (XEN) Using scheduler: SMP Credit Scheduler (credit)
> (XEN) Detected 2400.115 MHz processor.
> (XEN) Initing memory sharing.
> (XEN) mce_intel.c:1238: MCA Capability: BCAST 1 SER 0 CMCI 1 firstbank 0
> extended MCE MSR 0
> (XEN) Intel machine check reporting enabled
> (XEN) PCI: MCFG configuration 0: base e0000000 segment 0000 buses 00 - ff
> (XEN) PCI: MCFG area at e0000000 reserved in E820
> (XEN) PCI: Using MCFG for segment 0000 bus 00-ff
> (XEN) Intel VT-d iommu 0 supported page sizes: 4kB.
> (XEN) Intel VT-d Snoop Control enabled.
> (XEN) Intel VT-d Dom0 DMA Passthrough not enabled.
> (XEN) Intel VT-d Queued Invalidation enabled.
> (XEN) Intel VT-d Interrupt Remapping enabled.
> (XEN) Intel VT-d Shared EPT tables not enabled.
> (XEN) I/O virtualisation enabled
> (XEN)  - Dom0 mode: Relaxed
> (XEN) Interrupt remapping enabled
> (XEN) Enabled directed EOI with ioapic_ack_old on!
> (XEN) ENABLING IO-APIC IRQs
> (XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=-1 pin2=-1
> (XEN) Platform timer is 14.318MHz HPET
> (XEN) Allocated console ring of 128 KiB.
> (XEN) VMX: Supported advanced features:
> (XEN)  - APIC MMIO access virtualisation
> (XEN)  - APIC TPR shadow
> (XEN)  - Extended Page Tables (EPT)
> (XEN)  - Virtual-Processor Identifiers (VPID)
> (XEN)  - Virtual NMI
> (XEN)  - MSR direct-access bitmap
> (XEN)  - Unrestricted Guest
> (XEN) HVM: ASIDs enabled.
> (XEN) HVM: VMX enabled
> (XEN) HVM: Hardware Assisted Paging (HAP) detected
> (XEN) HVM: HAP page sizes: 4kB, 2MB, 1GB
> (XEN) Brought up 16 CPUs
> (XEN) ACPI sleep modes: S3
> (XEN) mcheck_poll: Machine check polling timer started.
> (XEN) *** LOADING DOMAIN 0 ***
> (XEN)  Xen  kernel: 64-bit, lsb, compat32
> (XEN)  Dom0 kernel: 64-bit, lsb, paddr 0x2000 -> 0xa65000
> (XEN) PHYSICAL MEMORY ARRANGEMENT:
> (XEN)  Dom0 alloc.:   0000000336000000->0000000337000000 (516915 pages
> to be allocated)
> (XEN)  Init. ramdisk: 000000033f333000->0000000340000000
> (XEN) VIRTUAL MEMORY ARRANGEMENT:
> (XEN)  Loaded kernel: ffffffff80002000->ffffffff80a65000
> (XEN)  Init. ramdisk: 0000000000000000->0000000000000000
> (XEN)  Phys-Mach map: ffffea0000000000->ffffea0000400000
> (XEN)  Start info:    ffffffff80a65000->ffffffff80a654b4
> (XEN)  Page tables:   ffffffff80a66000->ffffffff80a6f000
> (XEN)  Boot stack:    ffffffff80a6f000->ffffffff80a70000
> (XEN)  TOTAL:         ffffffff80000000->ffffffff80c00000
> (XEN)  ENTRY ADDRESS: ffffffff80002000
> (XEN) Dom0 has maximum 16 VCPUs
> (XEN) Scrubbing Free RAM:
> .....................................................................................................done.
> (XEN) Initial low memory virq threshold set at 0x4000 pages.
> (XEN) Std. Loglevel: All
> (XEN) Guest Loglevel: All
> (XEN) Xen is relinquishing VGA console.
> 
> 
> 
> (XEN) ACPI: RSDP 000FACE0, 0024 (r2 ACPIAM)
> (XEN) ACPI: XSDT BF790100, 008C (r1 SMCI            20110827 MSFT       97)
> (XEN) ACPI: FACP BF790290, 00F4 (r4 082711 FACP1638 20110827 MSFT       97)
> (XEN) ACPI: DSDT BF7906A0, 6563 (r2  10600 10600000        0 INTL 20051117)
> (XEN) ACPI: FACS BF79E000, 0040
> (XEN) ACPI: APIC BF790390, 011E (r2 082711 APIC1638 20110827 MSFT       97)
> (XEN) ACPI: MCFG BF7904B0, 003C (r1 082711 OEMMCFG  20110827 MSFT       97)
> (XEN) ACPI: SLIT BF7904F0, 0030 (r1 082711 OEMSLIT  20110827 MSFT       97)
> (XEN) ACPI: OEMB BF79E040, 0085 (r1 082711 OEMB1638 20110827 MSFT       97)
> (XEN) ACPI: SRAT BF79A6A0, 01D0 (r2 082711 OEMSRAT         1 INTL        1)
> (XEN) ACPI: HPET BF79A870, 0038 (r1 082711 OEMHPET  20110827 MSFT       97)
> (XEN) ACPI: DMAR BF79E0D0, 0130 (r1    AMI  OEMDMAR        1 MSFT       97)
> (XEN) ACPI: SSDT BF7A1B30, 0363 (r1 DpgPmm    CpuPm       12 INTL 20051117)
> (XEN) ACPI: EINJ BF79A8B0, 0130 (r1  AMIER AMI_EINJ 20110827 MSFT       97)
> (XEN) ACPI: BERT BF79AA40, 0030 (r1  AMIER AMI_BERT 20110827 MSFT       97)
> (XEN) ACPI: ERST BF79AA70, 01B0 (r1  AMIER AMI_ERST 20110827 MSFT       97)
> (XEN) ACPI: HEST BF79AC20, 00A8 (r1  AMIER ABC_HEST 20110827 MSFT       97)
> (XEN) System RAM: 12279MB (12573784kB)
> (XEN) SRAT: PXM 0 -> APIC 0 -> Node 0
> (XEN) SRAT: PXM 0 -> APIC 2 -> Node 0
> (XEN) SRAT: PXM 0 -> APIC 18 -> Node 0
> (XEN) SRAT: PXM 0 -> APIC 20 -> Node 0
> (XEN) SRAT: PXM 0 -> APIC 1 -> Node 0
> (XEN) SRAT: PXM 0 -> APIC 3 -> Node 0
> (XEN) SRAT: PXM 0 -> APIC 19 -> Node 0
> (XEN) SRAT: PXM 0 -> APIC 21 -> Node 0
> 
> I am happy to assist in more kernel probing. It is even possible for me
> to setup access for someone to this machine.
> 

Excellent. Last time Jan suspected that we potentially overrun the frag
list of a skb (which would corrupt memory) but it has not been verified.

I also skimmed your bug report on novell bugzilla which did suggest
memory corruption.

I wrote a patch to crash the kernel immediately when looping over the
frag list, probably we could start from there? (You might need to adjust
context, but it is only a one-liner which should be easy).


Wei.

======
diff --git a/drivers/xen/netfront/netfront.c b/drivers/xen/netfront/netfront.c
index 6e5d233..9583011 100644
--- a/drivers/xen/netfront/netfront.c
+++ b/drivers/xen/netfront/netfront.c
@@ -1306,6 +1306,7 @@ static RING_IDX xennet_fill_frags(struct netfront_info 
*np,
        struct sk_buff *nskb;

        while ((nskb = __skb_dequeue(list))) {
+               BUG_ON(nr_frags >= MAX_SKB_FRAGS);
                struct netif_rx_response *rx =
                        RING_GET_RESPONSE(&np->rx, ++cons);


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.