WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] hang in pvfb resulting from save/restore?

To: Markus Armbruster <armbru@xxxxxxxxxx>
Subject: [Xen-devel] hang in pvfb resulting from save/restore?
From: Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Date: Sat, 24 May 2008 08:08:00 +0100
Cc: Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Sat, 24 May 2008 00:08:51 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Thunderbird 2.0.0.14 (X11/20080501)
I suspect there's a bug in xen-pvfb, possibly triggered by save/restore.

If I run X on pvfb, running a couple of instances of something busy like glxgears, and then do a few rounds of save/restore, one of my "events" kernel threads goes into 100% CPU spin and X stops responding. I'm not sure what it's doing, but after a while the softlockup detector triggers:

INFO: task X:3408 blocked for more than 240 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
X             D cf5f7da0     0  3408   3403
cf5f7de4 00000202 00000000 cf5f7da0 c05c3400 cf5f7da8 c01022b3 ca838000 ca838268 c184fa80 00000000 00000040 cf5f6000 ca838000 0003909e c0149d2c c025a8c3 00000000 00000000 00000000 ffffffff c05d8288 c05d8284 00000200 Call Trace:
[<c01022b3>] ? xen_restore_fl+0x2e/0x52
[<c0149d2c>] ? lock_contended+0x15a/0x16f
[<c025a8c3>] ? fb_deferred_io_mkwrite+0x23/0x56
[<c045e9de>] mutex_lock_nested+0x17d/0x296
[<c025a8c3>] ? fb_deferred_io_mkwrite+0x23/0x56
[<c025a8c3>] fb_deferred_io_mkwrite+0x23/0x56
[<c0164bb9>] do_wp_page+0xdc/0x6bc
[<c01022b3>] ? xen_restore_fl+0x2e/0x52
[<c0149ebc>] ? lock_acquired+0x17b/0x194
[<c016826b>] handle_mm_fault+0xa2b/0xb36
[<c010c9a4>] ? restore_i387+0xeb/0x138
[<c01022b3>] ? xen_restore_fl+0x2e/0x52
[<c014bb2b>] ? lock_acquire+0x99/0xa6
[<c011d1eb>] ? do_page_fault+0x433/0x934
[<c0142089>] ? down_read_trylock+0x37/0x41
[<c011d29f>] do_page_fault+0x4e7/0x934
[<c0105ac1>] ? restore_sigcontext+0x14d/0x1cb
[<c0148ca7>] ? trace_hardirqs_off+0xb/0xd
[<c01022b3>] ? xen_restore_fl+0x2e/0x52
[<c0148ca7>] ? trace_hardirqs_off+0xb/0xd
[<c011cdb8>] ? do_page_fault+0x0/0x934
[<c04607c2>] error_code+0x72/0x78
=======================
INFO: lockdep is turned off.

The rest of the system is working OK (though I expect things are getting queued up on events/1).

I haven't dug in to really see what the problem is, but given that I just implemented save/restore, it seems likely that pvfb's save/restore handling will be a bit untested ;)

Other info:

X's wchan is fb_deferred_io_mkwrite

I can't work out where events/1 is spinning. xenctx shows the eip is xen_irq_disable, apparently unchanging; xenctx doesn't seem to be able to read the stack, so I don't have any context.

...and now the whole vm has locked up, so I can't investigate more.

   J

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>