[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Prepping a bugfix push



On 12/03/09 16:24, Brendan Cully wrote:
On Thursday, 03 December 2009 at 15:22, Jeremy Fitzhardinge wrote:
On 12/03/09 11:35, Brendan Cully wrote:
Not a patch, but I've just tried out xm save -c again with the latest
xen changes, and while I no longer see the grant table version panic,
the guest's devices (aside from the console) appear to be wedged on
resume. Is anyone else seeing this?

After a while on the console I see messages like this:

INFO: task syslogd:2219 blocked for more than 120 seconds.

which I assume is trouble with the block device.
Is there any backtrace or other info associated with this?
I take it you couldn't reproduce? Let me know what you want
captured. In the meantime, here's the kernel log:

Nope, works for me :) Ian has been doing testing with hundreds or thousands of suspend/resumes, so it *can* be reliable.

That said, I am seeing some odd hangs after suspend/resume with the bugfix branch merged into Linux mainline...

Can you ping the domain when its in that state?

    J

Dec  4 00:16:12 p32 kernel: suspending xenstore...
Dec  4 00:16:12 p32 kernel: Grant tables using version 2 layout.

Here's a backtrace after the block message:

INFO: task kjournald:880 blocked for more than 120 seconds.
"echo 0>  /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kjournald     D 00000000     0   880      2 0x00000000
  cfb6be70 00000246 cf8fa000 00000000 00000002 00000000 c14972a0 00000001
  cf8fa000 c152f380 c152f380 c152f380 cf8fa1fc cf850b10 cf850c9c d0808380
  c1924584 00000000 cfb6be70 c1006f04 c1920284 c1924588 c1920284 cfb49854
Call Trace:
  [<c1006f04>] ? check_events+0x8/0xc
  [<c137b0a8>] io_schedule+0x1e/0x28
  [<c10c2555>] sync_buffer+0x33/0x37
  [<c137b3a1>] __wait_on_bit+0x45/0x62
  [<c10c2522>] ? sync_buffer+0x0/0x37
  [<c137b41d>] out_of_line_wait_on_bit+0x5f/0x72
  [<c10c2522>] ? sync_buffer+0x0/0x37
  [<c1049b7e>] ? wake_bit_function+0x0/0x47
  [<c10c24e5>] __wait_on_buffer+0x23/0x25
  [<c1102940>] journal_commit_transaction+0x50b/0xdeb
  [<c1006efb>] ? xen_restore_fl_direct_end+0x0/0x1
  [<c137c4f4>] ? _spin_unlock_irqrestore+0x17/0x1d
  [<c103fad4>] ? try_to_del_timer_sync+0x49/0x52
  [<c1105b66>] kjournald+0xab/0x1f0
  [<c1049b46>] ? autoremove_wake_function+0x0/0x38
  [<c1105abb>] ? kjournald+0x0/0x1f0
  [<c1049a85>] kthread+0x74/0x7f
  [<c1049a11>] ? kthread+0x0/0x7f
  [<c1009b37>] kernel_thread_helper+0x7/0x10

and here's the result of xm sysrq<dom>  l
SysRq : Show backtrace of all active CPUs

and xm sysrq<dom>  w

SysRq : Show Blocked State
   task                PC stack   pid father
pdflush       D cf40297c     0   150      2 0x00000000
  cf95dec8 00000246 00000000 cf40297c c1550280 c1553680 c10067df c1550280
  00000000 c152f380 c152f380 c152f380 c1550280 cf8b3130 cf8b32bc d0808380
  c1553680 ffff995d cf95dec8 c1006f04 c1550280 c1553a6c c1550280 c1006efb
Call Trace:
  [<c10067df>] ? xen_force_evtchn_callback+0xf/0x14
  [<c1006f04>] ? check_events+0x8/0xc
  [<c1006efb>] ? xen_restore_fl_direct_end+0x0/0x1
  [<c137b1cb>] schedule_timeout+0xd6/0x13e
  [<c10067df>] ? xen_force_evtchn_callback+0xf/0x14
  [<c103fc07>] ? process_timeout+0x0/0xa
  [<c137a8ea>] io_schedule_timeout+0x1e/0x28
  [<c108e9a8>] congestion_wait+0x51/0x66
  [<c1049b46>] ? autoremove_wake_function+0x0/0x38
  [<c1083ef3>] wb_kupdate+0xb3/0xfa
  [<c10842d5>] ? pdflush+0x0/0x205
  [<c10843d5>] pdflush+0x100/0x205
  [<c1006efb>] ? xen_restore_fl_direct_end+0x0/0x1
  [<c137c4f4>] ? _spin_unlock_irqrestore+0x17/0x1d
  [<c1083e40>] ? wb_kupdate+0x0/0xfa
  [<c1049a85>] kthread+0x74/0x7f
  [<c1049a11>] ? kthread+0x0/0x7f
  [<c1009b37>] kernel_thread_helper+0x7/0x10
kjournald     D 00000000     0   880      2 0x00000000
  cfb6be70 00000246 cf8fa000 00000000 00000002 00000000 c14972a0 00000001
  cf8fa000 c152f380 c152f380 c152f380 cf8fa1fc cf850b10 cf850c9c d0808380
  c1924584 00000000 cfb6be70 c1006f04 c1920284 c1924588 c1920284 cfb49854
Call Trace:
  [<c1006f04>] ? check_events+0x8/0xc
  [<c137b0a8>] io_schedule+0x1e/0x28
  [<c10c2555>] sync_buffer+0x33/0x37
  [<c137b3a1>] __wait_on_bit+0x45/0x62
  [<c10c2522>] ? sync_buffer+0x0/0x37
  [<c137b41d>] out_of_line_wait_on_bit+0x5f/0x72
  [<c10c2522>] ? sync_buffer+0x0/0x37
  [<c1049b7e>] ? wake_bit_function+0x0/0x47
  [<c10c24e5>] __wait_on_buffer+0x23/0x25
  [<c1102940>] journal_commit_transaction+0x50b/0xdeb
  [<c1006efb>] ? xen_restore_fl_direct_end+0x0/0x1
  [<c137c4f4>] ? _spin_unlock_irqrestore+0x17/0x1d
  [<c103fad4>] ? try_to_del_timer_sync+0x49/0x52
  [<c1105b66>] kjournald+0xab/0x1f0
  [<c1049b46>] ? autoremove_wake_function+0x0/0x38
  [<c1105abb>] ? kjournald+0x0/0x1f0
  [<c1049a85>] kthread+0x74/0x7f
  [<c1049a11>] ? kthread+0x0/0x7f
  [<c1009b37>] kernel_thread_helper+0x7/0x10
syslogd       D 00000000     0  2219      1 0x00000000
  cfb33e40 00000286 00000000 00000000 0000aa70 00000000 c10067df cfb7026c
  00000001 c152f380 c152f380 c152f380 00000000 cfa2a130 cfa2a2bc d0808380
  cfb33e64 cfb70254 c1006efb c137c4f4 c10282ff cfb33e58 cfb33e40 c1049d05
Call Trace:
  [<c10067df>] ? xen_force_evtchn_callback+0xf/0x14
  [<c1006efb>] ? xen_restore_fl_direct_end+0x0/0x1
  [<c137c4f4>] ? _spin_unlock_irqrestore+0x17/0x1d
  [<c10282ff>] ? __wake_up+0x3a/0x42
  [<c1049d05>] ? prepare_to_wait+0x41/0x46
  [<c110535f>] log_wait_commit+0x87/0xef
  [<c1049b46>] ? autoremove_wake_function+0x0/0x38
  [<c1100ca8>] journal_stop+0x1dc/0x29f
  [<c110156d>] ? journal_start+0x88/0xb2
  [<c11015b4>] journal_force_commit+0x1d/0x1f
  [<c10f2d8a>] ext3_force_commit+0x1c/0x22
  [<c10ed096>] ext3_write_inode+0x2d/0x3b
  [<c10bd21f>] writeback_single_inode+0x1f5/0x2be
  [<c108326f>] ? do_writepages+0x37/0x39
  [<c10bdaa1>] sync_inode+0x1f/0x37
  [<c10e9409>] ext3_sync_file+0x99/0xb0
  [<c10c06fc>] vfs_fsync+0x73/0xb1
  [<c10c075c>] do_fsync+0x22/0x32
  [<c10c078b>] sys_fsync+0xd/0xf
  [<c1009058>] sysenter_do_call+0x12/0x2c
getty         D cf402840     0  2330      1 0x00000004
  cfb23cc0 00000286 00000000 cf402840 00000000 00000000 00000000 00000000
  00000000 c152f380 c152f380 c152f380 00000000 cf8740f0 cf87427c d0808380
  cfb23d14 c19202c8 c1006efb c137c4f4 00000002 cfb23d08 cfb23cc0 c1049d05
Call Trace:
  [<c1006efb>] ? xen_restore_fl_direct_end+0x0/0x1
  [<c137c4f4>] ? _spin_unlock_irqrestore+0x17/0x1d
  [<c1049d05>] ? prepare_to_wait+0x41/0x46
  [<c1101016>] do_get_write_access+0x2ab/0x3cd
  [<c1006efb>] ? xen_restore_fl_direct_end+0x0/0x1
  [<c1049b7e>] ? wake_bit_function+0x0/0x47
  [<c1101153>] journal_get_write_access+0x1b/0x2a
  [<c10f72c2>] __ext3_journal_get_write_access+0x19/0x3f
  [<c10eab31>] ext3_reserve_inode_write+0x34/0x6a
  [<c10eab7e>] ext3_mark_inode_dirty+0x17/0x2e
  [<c10eda05>] ext3_dirty_inode+0x6b/0x6d
  [<c107ce9c>] ? file_remove_suid+0x1b/0x59
  [<c10bd91d>] __mark_inode_dirty+0x23/0xd2
  [<c10b6618>] file_update_time+0x76/0xb3
  [<c107e0b9>] __generic_file_aio_write_nolock+0x235/0x4b3
  [<c107f0e5>] ? filemap_fault+0x76/0x3b1
  [<c107e5a5>] generic_file_aio_write+0x5d/0xd1
  [<c10e9267>] ext3_file_write+0x27/0xa5
  [<c10a527a>] do_sync_write+0xcd/0x103
  [<c1049b46>] ? autoremove_wake_function+0x0/0x38
  [<c1006efb>] ? xen_restore_fl_direct_end+0x0/0x1
  [<c10d2c2c>] ? locks_free_lock+0x31/0x40
  [<c10d4103>] ? fcntl_setlk+0x3e/0x185
  [<c10a596e>] vfs_write+0x8b/0x13f
  [<c137e474>] ? do_page_fault+0x1b4/0x3a0
  [<c10a51ad>] ? do_sync_write+0x0/0x103
  [<c10a5f1d>] sys_write+0x3d/0x64
  [<c1009058>] sysenter_do_call+0x12/0x2c
Sched Debug Version: v0.09, 2.6.31.6 #8
now at 194118.700060 msecs
   .jiffies                                 : 4294940775
   .sysctl_sched_latency                    : 20.000000
   .sysctl_sched_min_granularity            : 4.000000
   .sysctl_sched_wakeup_granularity         : 5.000000
   .sysctl_sched_child_runs_first           : 0.000001
   .sysctl_sched_features                   : 113917

cpu#0, 2500.152 MHz
   .nr_running                    : 1
   .load                          : 3121
   .nr_switches                   : 17143
   .nr_load_updates               : 48354
   .nr_uninterruptible            : 4
   .next_balance                  : 4294.940896
   .curr->pid                     : 14
   .clock                         : 193919.954084
   .cpu_load[0]                   : 0
   .cpu_load[1]                   : 0
   .cpu_load[2]                   : 0
   .cpu_load[3]                   : 0
   .cpu_load[4]                   : 0

cfs_rq[0]:
   .exec_clock                    : 0.000000
   .MIN_vruntime                  : 0.000001
   .min_vruntime                  : 13296.631519
   .max_vruntime                  : 0.000001
   .spread                        : 0.000000
   .spread0                       : 0.000000
   .nr_running                    : 1
   .load                          : 3121
   .nr_spread_over                : 0

rt_rq[0]:
   .rt_nr_running                 : 0
   .rt_throttled                  : 0
   .rt_time                       : 0.000000
   .rt_runtime                    : 950.000000

runnable tasks:
             task   PID         tree-key  switches  prio     exec-runtime       
  sum-exec        sum-sleep
----------------------------------------------------------------------------------------------------------
R       xenwatch    14     13276.942024       146   115               0         
      0               0.000000               0.000000               0.000000



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.