WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

[Xen-users] repeated kernel crashes with PCI passthru

To: xen-users@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-users] repeated kernel crashes with PCI passthru
From: Csillag Kristof <csillag.kristof@xxxxxxxxx>
Date: Sun, 11 Jul 2010 00:35:08 +0200
Delivery-date: Sat, 10 Jul 2010 15:36:36 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.10) Gecko/20100620 Icedove/3.0.5
Hi all,

I have recently upgraded one of my Debian servers from XEN 3.2 / Kernel
2.6.26 to XEN 4.0 / Kernel 2.6.32.

I have configured PCI passthru for a NIC

Since the current Debian pvops kernel does not have the xen pci frontend
driver required for PCI passthru, I am running a XEN kernel in both dom0
and domU, so actual kernel versions are:

dom0:  2.6.32-5-xen-amd64 #1 SMP Tue Jun 1
domU: 2.6.32-5-xen-686 #1 SMP Tue Jul 6
the hypervisor is 4.0.1-rc3

(Random notes:
 1. the dom0 is 64bit, this domU is 32bit.
 2. The dom0 kernel is not the latest (-16), but the one before (-15),
because the current one won't boot up, see #588509 and #588426.
)

   * * *

So, the system boots up as it should, but sometimes the domU crashes, with 
messages like these:

---------------------

[27047.101954] BUG: unable to handle kernel paging request at 00d90200
[27047.101979] IP: [<c11f01aa>] skb_release_data+0x71/0x90
[27047.102000] *pdpt = 0000000001c21027 *pde = 0000000000000000 
[27047.102019] Thread overran stack, or stack corrupted
[27047.102031] Oops: 0000 [#1] SMP 
[27047.102047] last sysfs file: /sys/devices/virtual/net/ppp0/uevent
[27047.102060] Modules linked in: tun xt_limit nf_nat_irc nf_nat_ftp ipt_LOG 
ipt_MASQUERADE xt_DSCP ipt_REJECT nf_conntrack_irc nf_conntrack_ftp xt_state 
xt_TCPMSS xt_tcpmss xt_tcpudp pppoe pppox ppp_generic slhc sundance mii 
iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_mangle 
iptable_filter ip_tables x_tables dm_snapshot dm_mirror dm_region_hash dm_log 
dm_mod loop evdev snd_pcsp snd_pcm snd_timer snd xen_netfront soundcore 
snd_page_alloc ext3 jbd mbcache thermal_sys xen_blkfront
[27047.102275] 
[27047.102285] Pid: 0, comm: swapper Not tainted (2.6.32-5-xen-686 #1) 
[27047.102298] EIP: 0061:[<c11f01aa>] EFLAGS: 00010206 CPU: 0
[27047.102310] EIP is at skb_release_data+0x71/0x90
[27047.102321] EAX: 00d90200 EBX: 00000000 ECX: c2939c10 EDX: cec6b500
[27047.102333] ESI: cf8f0a80 EDI: cf8f09c0 EBP: c13919c8 ESP: c1383eec
[27047.102346]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069
[27047.102358] Process swapper (pid: 0, ti=c1382000 task=c13c2ba0 task.ti=c13820
[27047.102371] Stack:
[27047.102379]  cf8f0a80 c293a700 c11efdfb cf8f09c0 c11f4c35 00000011 c1380000 
00000002
[27047.102415] <0> 00000008 c13919c8 c103c1ec c14594b0 00000001 0000000a 
00000000 00000100
[27047.102455] <0> c1380000 00000000 c13c5d18 00000000 c103c2c4 00000000 
c1383f5c c103c39a
[27047.102499] Call Trace:
[27047.102512]  [<c11efdfb>] ? __kfree_skb+0xf/0x6e
[27047.102527]  [<c11f4c35>] ? net_tx_action+0x58/0xf9
[27047.102542]  [<c103c1ec>] ? __do_softirq+0xaa/0x151
[27047.102557]  [<c103c2c4>] ? do_softirq+0x31/0x3c
[27047.102570]  [<c103c39a>] ? irq_exit+0x26/0x58
[27047.102586]  [<c1198a46>] ? xen_evtchn_do_upcall+0x22/0x2c
[27047.102604]  [<c1009b7f>] ? xen_do_upcall+0x7/0xc
[27047.102630]  [<c10023a7>] ? hypercall_page+0x3a7/0x1001
[27047.102647]  [<c1006169>] ? xen_safe_halt+0xf/0x1b
[27047.102661]  [<c10042bf>] ? xen_idle+0x23/0x30
[27047.102676]  [<c1008168>] ? cpu_idle+0x89/0xa5
[27047.102691]  [<c13fb80d>] ? start_kernel+0x318/0x31d
[27047.102706]  [<c13fd3c3>] ? xen_start_kernel+0x615/0x61c
[27047.102721]  [<c1409045>] ? print_local_APIC+0x61/0x380
[27047.102732] Code: 8b 44 02 30 e8 9a 4f ea ff 8b 96 a4 00 00 00 0f b7 42 04 
39 c3 7c e5 8b 96 a4 00 00 00 8b 42 1c 85 c0 74 16 c7 42 1c 00 00 00 00 <8b> 18 
e8 d2 fc ff ff 85 db 74 04 89 d8 eb f1 8b 86 a8 00 00 00 
[27047.102981] EIP: [<c11f01aa>] skb_release_data+0x71/0x90 SS:ESP 0069:c1383eec
[27047.103003] CR2: 0000000000d90200
[27047.103018] ---[ end trace a577dfc0e629cd07 ]---
[27047.103028] Kernel panic - not syncing: Fatal exception in interrupt
[27047.103042] Pid: 0, comm: swapper Tainted: G      D    2.6.32-5-xen-686 #1
[27047.103053] Call Trace:
[27047.103065]  [<c128ae0d>] ? panic+0x38/0xe4
[27047.103078]  [<c128d419>] ? oops_end+0x91/0x9d
[27047.103092]  [<c1021b5a>] ? no_context+0x134/0x13d
[27047.103106]  [<c1021c78>] ? __bad_area_nosemaphore+0x115/0x11d
[27047.103121]  [<c10067f0>] ? check_events+0x8/0xc
[27047.103135]  [<c10067e7>] ? xen_restore_fl_direct_end+0x0/0x1
[27047.103155]  [<d0823fdb>] ? xennet_poll+0xaeb/0xb04 [xen_netfront]
[27047.103170]  [<c10211df>] ? pvclock_clocksource_read+0xf9/0x10f
[27047.103185]  [<c10060e8>] ? xen_force_evtchn_callback+0xc/0x10
[27047.103200]  [<c114a00f>] ? xen_swiotlb_unmap_page+0x0/0x7
[27047.103214]  [<c10067f0>] ? check_events+0x8/0xc
[27047.103227]  [<c10060e8>] ? xen_force_evtchn_callback+0xc/0x10
[27047.103242]  [<c128e3f4>] ? do_page_fault+0x115/0x307
[27047.103255]  [<c128e2df>] ? do_page_fault+0x0/0x307
[27047.103268]  [<c1021c8a>] ? bad_area_nosemaphore+0xa/0xc
[27047.103282]  [<c128cb0b>] ? error_code+0x73/0x78
[27047.103295]  [<c11f01aa>] ? skb_release_data+0x71/0x90
[27047.103308]  [<c11efdfb>] ? __kfree_skb+0xf/0x6e
[27047.103321]  [<c11f4c35>] ? net_tx_action+0x58/0xf9
[27047.103335]  [<c103c1ec>] ? __do_softirq+0xaa/0x151
[27047.103348]  [<c103c2c4>] ? do_softirq+0x31/0x3c
[27047.103361]  [<c103c39a>] ? irq_exit+0x26/0x58
[27047.103374]  [<c1198a46>] ? xen_evtchn_do_upcall+0x22/0x2c
[27047.103388]  [<c1009b7f>] ? xen_do_upcall+0x7/0xc
[27047.103401]  [<c10023a7>] ? hypercall_page+0x3a7/0x1001
[27047.103415]  [<c1006169>] ? xen_safe_halt+0xf/0x1b
[27047.103428]  [<c10042bf>] ? xen_idle+0x23/0x30
[27047.103440]  [<c1008168>] ? cpu_idle+0x89/0xa5
[27047.103454]  [<c13fb80d>] ? start_kernel+0x318/0x31d
[27047.103467]  [<c13fd3c3>] ? xen_start_kernel+0x615/0x61c
[27047.103481]  [<c1409045>] ? print_local_APIC+0x61/0x380
------------------------------------------------------------------------------------

Then, since the IRQ of the card is shared with the SATA controller,
this basically kills the whole host, requiring a HW reset.

(Sometimes this second problem also occurs when I am rebooting the domU 
normally;
see http://lists.xensource.com/archives/html/xen-devel/2009-07/msg00224.html
for the thread about the shared IRQ problem. )

This happens once in a few days, sometimes in a few hours, basically making
the whole system unusable.

   * * *

Does anybody have any idea what could be happening here? How can I fix this?

Thank you for your help:

    Kristof Csillag






_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

<Prev in Thread] Current Thread [Next in Thread>
  • [Xen-users] repeated kernel crashes with PCI passthru, Csillag Kristof <=