[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] [PATCH] Fix softlockup issue after vcpu hotplug


  • To: "Keir Fraser" <Keir.Fraser@xxxxxxxxxxxx>, "Tian, Kevin" <kevin.tian@xxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxx>
  • From: "Graham, Simon" <Simon.Graham@xxxxxxxxxxx>
  • Date: Thu, 1 Feb 2007 17:40:49 -0500
  • Delivery-date: Thu, 01 Feb 2007 14:40:46 -0800
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>
  • Thread-index: AcdESFqDCWsISfq5RGeHgxcxVzRqmQACelaDAAAZiDAAAQnwRAATVWmwABVdeuAAQqkN4AAKff/fAAh92KA=
  • Thread-topic: [Xen-devel] [PATCH] Fix softlockup issue after vcpu hotplug

> 
> No, the patch that Kevin provided cannot work because it touches the
> watchdog before jiffies has been updated. Since both jiffy update and
> watchdog check happens inside do_timer(), this is a hard problem to
fix
> for
> Linux 2.6.16. You could push the watchdog touch inside the following
> loop
> that calls do_timer(): I think that would work!
> 

OK, I've spent a little time to really understand this today
(hopefully!) and I think I know now why none of the patches to date (for
2.6.16 anyway) work -- the problem is they only touched the wdt one time
BUT timer_interrupt in time-xen.c has a loop that repeatedly calls
do_timer to advance the jiffies and check for timeout until the entire
delta time since the last time called is accounted for... any single one
of those do_timer calls might result in a watchdog timer expiration.

It's also not really correct to only touch the watchdog if the stolen
time is > 5s -- you might be currently sitting at 8s since the watchdog
was last updated and get called after 2s of stolen time and that will
cause a timeout.

What's more, if you get called with more than 20s of stolen time (e.g.
after save/restore or pause/unpause), you really need to tickle the
watchdog timer multiple times (at least once for every 10s worth of
jiffies in the total stolen time).

So -- my proposal (patch attached for 2.6.16) is to touch the watchdog
inside the loop that calls do_timer(), right after the call IF the
remaining amount of stolen time is greater than NS_PER_TICK -- since
each call to do_timer advances jiffies by one, this could only go wrong
if there was only a single jiffy left until the watchdog timer expires
on entry and I think that's OK!

I also considered only touching the watchdog timer every 5s or so, but I
think the code to do that would have more overhead than simply touching
it for every do_timer() call (since it's just a call that copies jiffies
to the per-cpu watchdog timer value).

Take a look and let me know what you think (the printk could be removed
-- I just put it in so I could tell the code was running).

Simon

Attachment: softlockup.patch
Description: softlockup.patch

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.