WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[PATCH][RESEND]RE: [Xen-devel] [PATCH] Fix softlockup issue after vcpu h

To: "Keir Fraser" <Keir.Fraser@xxxxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: [PATCH][RESEND]RE: [Xen-devel] [PATCH] Fix softlockup issue after vcpu hotplug
From: "Tian, Kevin" <kevin.tian@xxxxxxxxx>
Date: Wed, 31 Jan 2007 14:17:45 +0800
Delivery-date: Tue, 30 Jan 2007 22:17:39 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <C1E5053F.812F%Keir.Fraser@xxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AcdESFqDCWsISfq5RGeHgxcxVzRqmQACelaDAAAZiDAAAP4q2wADxf1QAAFVV3AAAMUYXAAACCkwAACFZyAAAV9OMAABEoucACBxTEA=
Thread-topic: [PATCH][RESEND]RE: [Xen-devel] [PATCH] Fix softlockup issue after vcpu hotplug
>From: Keir Fraser [mailto:Keir.Fraser@xxxxxxxxxxxx]
>Sent: 2007年1月30日 22:23
>
>On 30/1/07 2:11 pm, "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote:
>
>> BTW, do you think whether it's worthy to destroy vcpu from
>> scheduler when it's down and then re-init that vcpu into scheduler
>> when it's on? I don't know whether this will make any influence to
>> accounting of scheduler. Actually domain save/restore doesn't show
>> this bug, and one obvious distinct compared to vcpu-hotplug is that
>> domain is restored in a new context...
>
>I wouldn't expect this to make any significant difference to scheduling
>accounting, certainly over a multi-second time period.
>
>Does the time you hoy-unplug the vcpu for make a difference to how
>often you
>see this problem? Did you try repro'ing with a 2.6.16 kernel?
>
> -- Keir

Hi, Keir,
        I verified that attached patch does fix the issue by restricting max 
timeout to 1s. Either vcpu unplug/plug, or suspend cancel works fine. 
Actually domain runs well several hours after intensive testing.

        I also tried 2.6.16, and it's immune to this issue. I add some debug 
info in both 2.6.16 and 2.6.18, to print out delta value when delta > 1s. 
The results further proves our analysis.

        In 2.6.16, all the prints are:
                Delta 101 > HZ for cpuN
                Delta 101 > HZ for cpuN
                Delta 101 > HZ for cpuN
                ...

        While in 2.6.18, something like:
                Delta 199 > HZ for cpuN
                Delta 156 > HZ for cpuN
                Delta 192 > HZ for cpuN
                Delta 102 > HZ for cpuN
                ...
        After unplug/plug a cpu:
                Delta 951 > HZ for cpuN
                ...
        And then soflockup warning jumps out.

        So in 2.6.16, watchdog thread itself promises max timeout
to about 1s by hooking a timer, while In 2.6.18, the max timeout 
value is volatile

        So I'm inclined to consider it as a fix, since there's no easy way 
to deduce an appropriate timeout without explicit/hard-code knowledge 
on such requirement like watchdog thread. How do you think? :-)

P.S. The warning reported by Simon on 2.6.16 may be fixed by my 
previous patch, due to the late check.

Thanks,
Kevin

Attachment: fix_softlockup_2618.patch
Description: fix_softlockup_2618.patch

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
<Prev in Thread] Current Thread [Next in Thread>