xen-devel
RE: [Xen-devel] Re: VM hung after running sometime
To: |
<jeremy@xxxxxxxx> |
Subject: |
RE: [Xen-devel] Re: VM hung after running sometime |
From: |
MaoXiaoyun <tinnycloud@xxxxxxxxxxx> |
Date: |
Mon, 27 Sep 2010 19:56:39 +0800 |
Cc: |
xen devel <xen-devel@xxxxxxxxxxxxxxxxxxx>, keir.fraser@xxxxxxxxxxxxx |
Delivery-date: |
Mon, 27 Sep 2010 05:08:49 -0700 |
Envelope-to: |
www-data@xxxxxxxxxxxxxxxxxxx |
Importance: |
Normal |
In-reply-to: |
<BAY121-W448D51FD3B4758F954EED9DA630@xxxxxxx> |
List-help: |
<mailto:xen-devel-request@lists.xensource.com?subject=help> |
List-id: |
Xen developer discussion <xen-devel.lists.xensource.com> |
List-post: |
<mailto:xen-devel@lists.xensource.com> |
List-subscribe: |
<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe> |
List-unsubscribe: |
<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe> |
References: |
<C8BE230D.239BA%keir.fraser@xxxxxxxxxxxxx>, , <4C98EB42.4020808@xxxxxxxx>, <BAY121-W10DFCBC2F3B78D89381527DA600@xxxxxxx>, , <4C994B08.7050509@xxxxxxxx> <BAY121-W688EF9F79369127219FB3DA600@xxxxxxx>, , <4C9A4B7A.3010308@xxxxxxxx>, <BAY121-W31394CA05B46D94F7DC8F8DA610@xxxxxxx>, <4C9BE0A9.40709@xxxxxxxx>, <BAY121-W448D51FD3B4758F954EED9DA630@xxxxxxx> |
Sender: |
xen-devel-bounces@xxxxxxxxxxxxxxxxxxx |
Hi Jeremy:
About the NIC crash, it turns out to our NIC driver problem.
The crash no longer show up after the driver upgraded.
The irqbanlance disabled test is running smoothly so far.
Meanwhile, we had merged your patch to our current kernel(2.6.31), and start the test.
Unfortunately, one of the VM hang in a few minutes after it started.
But this time some abnormal kernel backtrace logged in /var/log/message.
I wonder if the patch is compatible with our current kernel? Or some extra modifications I need?
Consider the good result of irqbalance disabled test, I'm afried I may commit some mistakes in patch
merge since I'm newer to git stuff(-_-!!).
So I attached the merged patch (only event.c), could you help to review -:)?
Thanks for your time.
Kernel backtrace below:
---------------------------------------------------------------------------------------------------------------------
14 Sep 27 18:36:10 pc1 kernel: ------------[ cut here ]------------ 15 Sep 27 18:36:10 pc1 kernel: WARNING: at net/core/skbuff.c:475 skb_release_head_state+0x71/0xf8() 16 Sep 27 18:36:10 pc1 kernel: Hardware name: PowerEdge R710 18 Sep 27 18:36:10 pc1 kernel: Pid: 0, comm: swapper Tainted: G W 2.6.31.13xen #4 19 Sep 27 18:36:10 pc1 kernel: Call Trace: 20 Sep 27 18:36:10 pc1 kernel: <IRQ> [<ffffffff8136c751>] ? skb_release_head_state+0x71/0xf8 21 Sep 27 18:36:10 pc1 kernel: [<ffffffff810535ba>] warn_slowpath_common+0x7c/0x94 22 Sep 27 18:36:10 pc1 kernel: [<ffffffff810535e6>] warn_slowpath_null+0x14/0x16 23 Sep 27 18:36:10 pc1 kernel: [<ffffffff8136c751>] skb_release_head_state+0x71/0xf8 24 Sep 27 18:36:10 pc1 kernel: [<ffffffff8136c7ee>] skb_release_all+0x16/0x22 25 Sep 27 18:36:10 pc1 kernel: [<ffffffff8136c837>] __
kfree_skb+0x16/0x84 26 Sep 27 18:36:10 pc1 kernel: [<ffffffff8136c8d2>] consume_skb+0x2d/0x2f 27 Sep 27 18:36:10 pc1 kernel: [<ffffffffa0069aab>] bnx2_poll_work+0x1b7/0xa0f [bnx2] 28 Sep 27 18:36:10 pc1 kernel: [<ffffffff81260f00>] ? HYPERVISOR_event_channel_op+0x1a/0x4d 29 Sep 27 18:36:10 pc1 kernel: [<ffffffff8126102a>] ? unmask_evtchn+0x4f/0xa3 30 Sep 27 18:36:10 pc1 kernel: [<ffffffff8100eb71>] ? xen_force_evtchn_callback+0xd/0xf 31 Sep 27 18:36:10 pc1 kernel: [<ffffffff8100f292>] ? check_events+0x12/0x20 32 Sep 27 18:36:10 pc1 kernel: [<ffffffff81414226>] ? _spin_lock_irqsave+0x1e/0x37 33 Sep 27 18:36:10 pc1 kernel: [<ffffffff8100f27f>] ? xen_restore_fl_direct_end+0x0/0x1 34 Sep 27 18:36:10 pc1 kernel: [<ffffffffa006c8a7>] bnx2_poll_msix+0x38/0x92 [bnx2] 35 Sep 27 18:36:10 pc1 kernel: [<ffffffff81382eaf>] netpoll_poll+0xa3/
0x38f 36 Sep 27 18:36:10 pc1 kernel: [<ffffffff810ec08b>] ? __kmalloc_track_caller+0x11a/0x12c 37 Sep 27 18:36:10 pc1 kernel: [<ffffffff8100f27f>] ? xen_restore_fl_direct_end+0x0/0x1 38 Sep 27 18:36:10 pc1 kernel: [<ffffffff813832b4>] netpoll_send_skb+0x119/0x1f7 39 Sep 27 18:36:10 pc1 kernel: [<ffffffff8138360d>] netpoll_send_udp+0x1e4/0x1f1 40 Sep 27 18:36:10 pc1 kernel: [<ffffffffa021d18f>] write_msg+0x8d/0xd2 [netconsole] 41 Sep 27 18:36:10 pc1 kernel: [<ffffffff8100f27f>] ? xen_restore_fl_direct_end+0x0/0x1 42 Sep 27 18:36:10 pc1 kernel: [<ffffffff81053937>] __call_console_drivers+0x6c/0x7e 43 Sep 27 18:36:10 pc1 kernel: [<ffffffff810539a9>] _call_console_drivers+0x60/0x64 44 Sep 27 18:36:10 pc1 kernel: [<ffffffff81414226>] ? _spin_lock_irqsave+0x1e/0x37 45 Sep 27 18:36:10 pc1 kernel: [<ffffffff81053df2>] release_console_se
m+0x11a/0x19c 46 Sep 27 18:36:10 pc1 kernel: [<ffffffff810543b9>] vprintk+0x2e1/0x31a 47 Sep 27 18:36:10 pc1 kernel: [<ffffffff8100f1a5>] ? xen_clocksource_get_cycles+0x9/0x1c 48 Sep 27 18:36:10 pc1 kernel: [<ffffffff8100f0d6>] ? xen_clocksource_read+0x21/0x23 49 Sep 27 18:36:10 pc1 kernel: [<ffffffff8100eb71>] ? xen_force_evtchn_callback+0xd/0xf 50 Sep 27 18:36:10 pc1 kernel: [<ffffffff81054499>] printk+0xa7/0xa9 51 Sep 27 18:36:10 pc1 kernel: [<ffffffff8100f27f>] ? xen_restore_fl_direct_end+0x0/0x1 52 Sep 27 18:36:10 pc1 kernel: [<ffffffff8100f0d6>] ? xen_clocksource_read+0x21/0x23 53 Sep 27 18:36:10 pc1 kernel: [<ffffffff8100f1a5>] ? xen_clocksource_get_cycles+0x9/0x1c 54 Sep 27 18:36:10 pc1 kernel: [<ffffffff81070fcf>] ? clocksource_read+0xf/0x11 55 Sep 27 18:36:10 pc1 kernel: [<ffffffff81071695>] ? getnstimeofday+0x5b/0xbb<
BR>56 Sep 27 18:36:10 pc1 kernel: [<ffffffff8126125d>] ? cpumask_next+0x1e/0x20 57 Sep 27 18:36:10 pc1 kernel: [<ffffffff812627c1>] xen_debug_interrupt+0x256/0x289 58 Sep 27 18:36:10 pc1 kernel: [<ffffffff81098276>] handle_IRQ_event+0x66/0x120 59 Sep 27 18:36:10 pc1 kernel: [<ffffffff81099947>] handle_percpu_irq+0x41/0x6e 60 Sep 27 18:36:10 pc1 kernel: [<ffffffff812624dd>] xen_evtchn_do_upcall+0x102/0x190 61 Sep 27 18:36:10 pc1 kernel: [<ffffffff81014fbe>] xen_do_hypervisor_callback+0x1e/0x30 62 Sep 27 18:36:10 pc1 kernel: <EOI> [<ffffffff810093aa>] ? hypercall_page+0x3aa/0x1000 63 Sep 27 18:36:10 pc1 kernel: [<ffffffff810093aa>] ? hypercall_page+0x3aa/0x1000 64 Sep 27 18:36:10 pc1 kernel: [<ffffffff8100ebb7>] ? xen_safe_halt+0x10/0x1a 65 Sep 27 18:36:10 pc1 kernel: [<ffffffff8100c0f5>] ? xen_idle+0x3b/0x52 66 Sep 27
18:36:10 pc1 kernel: [<ffffffff81012c9d>] ? cpu_idle+0x5d/0x8c 67 Sep 27 18:36:10 pc1 kernel: [<ffffffff8140aaa3>] ? cpu_bringup_and_idle+0x13/0x15 68 Sep 27 18:36:10 pc1 kernel: ---[ end trace d83eb1ebe87fed96 ]---
From: tinnycloud@xxxxxxxxxxx To: jeremy@xxxxxxxx CC: xen-devel@xxxxxxxxxxxxxxxxxxx; keir.fraser@xxxxxxxxxxxxx Subject: RE: [Xen-devel] Re: VM hung after running sometime Date: Sat, 25 Sep 2010 17:33:23 +0800
Hi Jeremy: The test of irqbalance disabled is running. Currently one server was crashed on NIC. Trace.jpg in attachments is the screenshot from serial port, and trace.txt is from /varl/log/message. Do you think it has connection with irqbalance disabled, or some other possibilities? In addition, I find in /proc/interrupts, all interrupts are happend on cpu0(please refer to interrputs.txt attached). Could it be a possible cause of server crash, and is there a way I can configure manually to distribute those interrupts evenly? Meanwhile, I wil start the new test with kernel patched soon. Thanks.
> Date: Thu, 23 Sep 2010 16:20:09 -0700 > From: jeremy@xxxxxxxx > To: tinnycloud@xxxxxxxxxxx > CC: xen-devel@xxxxxxxxxxxxxxxxxxx; keir.fraser@xxxxxxxxxxxxx > Subject: Re: [Xen-devel] Re: VM hung after running sometime > > On 09/22/2010 05:55 PM, MaoXiaoyun wrote: > > The interrputs file is attached. The server has 24 HVM domains > > runnning about 40 hours. > > > > Well, we may upgrade to the new kernel in the further, but currently > > we prefer the fix has least impact on our present server. > > So it is really nice of you if you could offer the sets of patches, > > also, it would be our fisrt choice. > > Try cherry-picking: > 8401e9b96f80f9c0128e7c8fc5a01abfabbfa021 xen: use percpu interrupts for > IPIs and VIRQs > 66fd3052fec7e7c21a9d88ba1a03bc062f5fb53d xen: handle events as > edge-triggered > 29a2e2a7bd19233c62461b104c6923
3f15ce99ec xen/apic: use handle_edge_irq > for pirq events > f61692642a2a2b83a52dd7e64619ba3bb29998af xen/pirq: do EOI properly for > pirq events > 0672fb44a111dfb6386022071725c5b15c9de584 xen/events: change to using fasteoi > 2789ef00cbe2cdb38deb30ee4085b88befadb1b0 xen: make pirq interrupts use > fasteoi > d0936845a856816af2af48ddf019366be68e96ba xen/evtchn: rename > enable/disable_dynirq -> unmask/mask_irq > c6a16a778f86699b339585ba5b9197035d77c40f xen/evtchn: rename > retrigger_dynirq -> irq > f4526f9a78ffb3d3fc9f81636c5b0357fc1beccd xen/evtchn: make pirq > enable/disable unmask/mask > 43d8a5030a502074f3c4aafed4d6095ebd76067c xen/evtchn: pirq_eoi does unmask > cb23e8d58ca35b6f9e10e1ea5682bd61f2442ebd xen/evtchn: correction, pirq > hypercall does not unmask > 2390c371ecd32d9f06e22871636185382bf70ab7 xen/events: use > PHYSDEVOP_pirq_eoi_gmfn to get pirq need-EOI info<
BR>> 158d6550716687486000a828c601706b55322ad0 xen/pirq: use eoi as enable > d2ea486300ca6e207ba178a425fbd023b8621bb1 xen/pirq: use fasteoi for MSI too > f0d4a0552f03b52027fb2c7958a1cbbe210cf418 xen/apic: fix pirq_eoi_gmfn resume >
|
events.c
Description: Text document
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
<Prev in Thread] |
Current Thread |
[Next in Thread>
|
- RE: [Xen-devel] Re: VM hung after running sometime, (continued)
- RE: [Xen-devel] Re: VM hung after running sometime, MaoXiaoyun
- Re: [Xen-devel] Re: VM hung after running sometime, Jeremy Fitzhardinge
- RE: [Xen-devel] Re: VM hung after running sometime, MaoXiaoyun
- Re: [Xen-devel] Re: VM hung after running sometime, Jeremy Fitzhardinge
- RE: [Xen-devel] Re: VM hung after running sometime, MaoXiaoyun
- Re: [Xen-devel] Re: VM hung after running sometime, Jeremy Fitzhardinge
- RE: [Xen-devel] Re: VM hung after running sometime, MaoXiaoyun
- RE: [Xen-devel] Re: VM hung after running sometime, MaoXiaoyun
- Re: [Xen-devel] Re: VM hung after running sometime, wei song
- Re: [Xen-devel] Re: VM hung after running sometime, Jeremy Fitzhardinge
- RE: [Xen-devel] Re: VM hung after running sometime,
MaoXiaoyun <=
- RE: [Xen-devel] Re: VM hung after running sometime, MaoXiaoyun
- RE: [Xen-devel] Re: VM hung after running sometime, MaoXiaoyun
- Re: [Xen-devel] Re: VM hung after running sometime, Jeremy Fitzhardinge
- RE: [Xen-devel] Re: VM hung after running sometime, MaoXiaoyun
- Re: [Xen-devel] Re: VM hung after running sometime, Jeremy Fitzhardinge
|
|
|