WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

Re: [Xen-users] Re: CPU soft lockup XEN 4.1rc (Solved)

To: Ian Tobin <itobin@xxxxxxxxxxxxx>
Subject: Re: [Xen-users] Re: CPU soft lockup XEN 4.1rc (Solved)
From: Mark Brown <mbrown@xxxxxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 02 Sep 2011 08:35:33 -0400
Cc: xen-users@xxxxxxxxxxxxxxxxxxx
Delivery-date: Fri, 02 Sep 2011 05:37:26 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <039E0B4AA9103344A80DA55DDDC76A933B30E6@xxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
References: <4E5E8089.40801@xxxxxxxxxxxxxxxxxxxxxxxxx> <4E602D73.4020407@xxxxxxxxxxx> <039E0B4AA9103344A80DA55DDDC76A933B30E6@xxxxxxxxxxxxxxxxxxxxxx>
Reply-to: mbrown@xxxxxxxxxxxxxxxxxxxxxxxxx
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0.1) Gecko/20110830 Thunderbird/6.0.1
Ian,

yes - it does. Usually the DomU would crash after about 4-20 GB of heavy
IO. After the changed configuration (see below) I was able to transfer >
1TB of data and it yet has to crash.

My guess is that somehow the clock-time gets affected by some
(?marginal) value and causes the lockup.

Thanks a lot to Marco Marongiu for the detailed and well written post.

Marc

On 9/2/2011 5:57 AM, Ian Tobin wrote:
> Hi,
> 
> Are you saying this one worked?
> 
> # in /etc/xen/*.conf
> extra="clocksource=jiffies"
> 
> we have the same issue with one of our DomUs (CentOS)
> 
> thanks
> 
> Ian
> 
> 
> 
> -----Original Message-----
> From: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
> [mailto:xen-users-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Matthias
> Bannach
> Sent: 02 September 2011 02:12
> To: mbrown@xxxxxxxxxxxxxxxxxxxxxxxxx
> Cc: xen-users@xxxxxxxxxxxxxxxxxxx
> Subject: [Xen-users] Re: CPU soft lockup XEN 4.1rc (Solved)
> 
> All,
> 
> Ha - finally - solved. Guess google is not the answer, searching the
> mailing list is. After much frustration I found the following:
> 
> http://wiki.debian.org/Xen#A.27clocksource.2BAC8-0.3ATimewentbackwards.2
> 7
> 
> based on a post by Marco Marongiu
> 
> http://my.opera.com/marcomarongiu/blog/2010/08/18/debugging-ntp-again-pa
> rt-4-and-last
> 
> For me lockup solution #2 worked:
> 
> # DomU and Dom0
> # in /etc/sysctl.conf
> clocksource=jiffies
> independent_wallclock=0
> # then sysctl -p
> 
> # in /etc/xen/*.conf
> extra="clocksource=jiffies"
> 
> And voila - no more lockups, nothing with the motherboards (which I
> thought not to be the cause based on success with non-xen
> configurations)
> 
> Not sure if this is a kernel or XEN problem though.
> 
> Hope this helps others
> 
> On 8/31/2011 2:42 PM, Mark Brown wrote:
>> Hello,
>>
>> Similar to others I have freezeups on the system, it is consistent 
>> with high IO load. If the system runs (even with multiple) XenU it 
>> does not happen. But I can consistently force the situation to occur.
>>
>> Running 4 dd processes dumping 20GB each on a LVM/mdadm soft RAID5 
>> volume it consistenly crashes in a DomU. Running without XEN I do not 
>> see the problem at all - (e.g. after about 3TB of read/write) nothing 
>> happened.
>>
>> Any suggestion would be very welcome.
>>
>> Marc
>>
>> [ .. more .. ]
>> It appears to be very unpredictable of when it actually occurs, here 
>> are a few examples. Kind of odd that on Aug29th it always happened on 
>> the same second ;-{.
>>
>>> syslog.2:Aug 29 17:35:47 nwsc-xen-Q45 kernel: [ 2698.560009] BUG: 
>>> soft lockup - CPU#0 stuck for 146s! [events/0:9] syslog.2:Aug 29 
>>> 17:35:47 nwsc-xen-Q45 kernel: [ 2698.561016] BUG: soft lockup - CPU#1
> 
>>> stuck for 146s! [rsyslogd:2024] syslog.2:Aug 29 22:57:27 nwsc-xen-Q45
> 
>>> kernel: [ 4198.404353] BUG: soft lockup - CPU#0 stuck for 122s! 
>>> [md1_raid5:1243] syslog.2:Aug 29 23:07:27 nwsc-xen-Q45 kernel: [ 
>>> 4798.336110] BUG: soft lockup - CPU#0 stuck for 101s! [xend:2583] 
>>> syslog.2:Aug 29 23:07:27 nwsc-xen-Q45 kernel: [ 4798.337007] BUG: 
>>> soft lockup - CPU#1 stuck for 101s! [bdi-default:19] syslog.2:Aug 29 
>>> 23:12:27 nwsc-xen-Q45 kernel: [ 5098.304013] BUG: soft lockup - CPU#0
> 
>>> stuck for 136s! [blkback.5.xvdd1:7226] syslog.2:Aug 29 23:12:27 
>>> nwsc-xen-Q45 kernel: [ 5098.305010] BUG: soft lockup - CPU#1 stuck 
>>> for 136s! [sh:7262] syslog.6:Aug 17 12:07:08 nwsc-xen-Q45 kernel: [ 
>>> 2998.596016] BUG: soft lockup - CPU#0 stuck for 73s! [xend:2506] 
>>> syslog.6:Aug 17 12:07:08 nwsc-xen-Q45 kernel: [ 2998.597555] BUG: 
>>> soft lockup - CPU#1 stuck for 73s! [md0_raid5:598] syslog.6:Aug 17 
>>> 12:17:08 nwsc-xen-Q45 kernel: [ 3598.534068] BUG: soft lockup - CPU#1
> 
>>> stuck for 150s! [xend:2506]
>>
>> It does not appear to relate to a specific process. (Those above are 
>> from Xen 4.0.1 with Debian 2.6.32-5-xen-amd64).
>>
>> This one is with Xen 4.1.2-rc2-pre/Debian 2.6.32-5-xen-amd64. Both are
> 
>> on Intel DQ45CB board with 4GB ram.
>>
>>> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348062] BUG: soft lockup
> - CPU#0 stuck for 79s! [xend:2767]
>>> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348073] Modules linked
> in: xt_tcpudp xt_physdev iptable_filter ip_tables x_ta
> bles ext4 jbd2 crc16 sata_sil24 hid_apple sky2 via_velocity crc_ccitt
> usb_storage raid456 md_mod async_raid6_recov async_
> pq raid6_pq async_xor xor async_memcpy async_tx dm_mod ext3 jbd mbcache
> firewire_sbp2 loop sr_mod cdrom sg xenfs xen_evtc                    hn
> bridge stp 3w_9xxx usbhid hid sd_mod crc_t10dif snd_hda_codec_analog
> snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss
> snd_mixer_oss snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event
> snd_seq snd_timer snd_seq_device firewire_ohci psmouse
> i2c_i801 video firewire_core uhci_hcd ata_piix snd crc_itu_t output
> serio_raw evdev ahci pcspkr ehci_hcd i2c_core usbcor
> e nls_base e1000e button ata_generic soundcore snd_page_alloc libata
> thermal scsi_mod processor thermal_sys acpi_processo                   
>  
>> r
>>> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348219] CPU 0:
>>> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348222] Modules linked
> in: xt_tcpudp xt_physdev iptable_filter ip_tables x_ta
> bles ext4 jbd2 crc16 sata_sil24 hid_apple sky2 via_velocity crc_ccitt
> usb_storage raid456 md_mod async_raid6_recov async_
> pq raid6_pq async_xor xor async_memcpy async_tx dm_mod ext3 jbd mbcache
> firewire_sbp2 loop sr_mod cdrom sg xenfs xen_evtc                    hn
> bridge stp 3w_9xxx usbhid hid sd_mod crc_t10dif snd_hda_codec_analog
> snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss
> snd_mixer_oss snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event
> snd_seq snd_timer snd_seq_device firewire_ohci psmouse
> i2c_i801 video firewire_core uhci_hcd ata_piix snd crc_itu_t output
> serio_raw evdev ahci pcspkr ehci_hcd i2c_core usbcor
> e nls_base e1000e button ata_generic soundcore snd_page_alloc libata
> thermal scsi_mod processor thermal_sys acpi_processo                   
>  
>> r
>>> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348318] Pid: 2767, comm: 
>>> xend Not tainted 2.6.32-5-xen-amd64 #1 Aug 31 13:05:41 nwsc-xen-Q45 
>>> kernel: [ 4039.348322] RIP: e033:[<00007fa4064c0289>]  
>>> [<00007fa4064c0289>] 0x7fa4064c0289 Aug 31 13:05:41 nwsc-xen-Q45 
>>> kernel: [ 4039.348330] RSP: e02b:00007fa402ee54a0  EFLAGS: 00000206 
>>> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348334] RAX: 
>>> 0000000001c3a320 RBX: 0000000001f8ace0 RCX: 00007fa40650f844 Aug 31 
>>> 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348338] RDX: ffffffffffffffe0
> RSI: 0000000000000000 RDI: 00007fa4067a9e40 Aug 31 13:05:41 nwsc-xen-Q45
> kernel: [ 4039.348341] RBP: 0000000000000000 R08: 0000000000000008 R09:
> 0000000000000001 Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348345]
> R10: 0000000000000000 R11: 0000000000000246 R12: 00007fa4067a9e40 Aug 31
> 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348349] R13: 00007fa402ee555c R14:
> 00007fa402ee5548 R15: 00000000ffffffff
>>> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348356] FS:
> 00007fa402ee6700(0000) GS:ffff880002995000(0000) knlGS:000000000
> 0000000
>>> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348360] CS:  e033 DS: 
>>> 0000 ES: 0000 CR0: 000000008005003b Aug 31 13:05:41 nwsc-xen-Q45 
>>> kernel: [ 4039.348363] CR2: 00007fb2ed832e28 CR3: 00000000bba8e000 
>>> CR4: 0000000000002660 Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 
>>> 4039.348367] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000 Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348371]
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Aug 31
> 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348375] Call Trace:
>>>
>>> Aug 31 13:07:51 nwsc-xen-Q45 init: Id "T1" respawning too fast: 
>>> disabled for 5 minutes
>>
> 
> 
> _______________________________________________
> Xen-users mailing list
> Xen-users@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-users
> 
> 
> 


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

<Prev in Thread] Current Thread [Next in Thread>