WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

Re: [Xen-users] "xm save" only works once...

To: xen-users@xxxxxxxxxxxxxxxxxxx
Subject: Re: [Xen-users] "xm save" only works once...
From: Ralph Passgang <ralph@xxxxxxxxxxxxx>
Date: Thu, 25 Aug 2005 11:26:16 +0200
Cc: Steven Hand <Steven.Hand@xxxxxxxxxxxx>
Delivery-date: Thu, 25 Aug 2005 09:24:34 +0000
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <E1E7IEv-0003EE-00@xxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
References: <E1E7IEv-0003EE-00@xxxxxxxxxxxxxxxxx>
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: KMail/1.8.1
Am Montag, 22. August 2005 21:45 schrieb Steven Hand:
> >Am Freitag, 19. August 2005 04:14 schrieb Steven Hand:
> >> >Am Montag, 15. August 2005 23:29 schrieb Anthony Liguori:
> >> >> Steven Hand wrote:
> >> >> >>I am using Xen-2.0.7 on a Dual Intel Xeon 2.8GHz system with 4GB of
> >> >> >> ram. I am using 2.6.11 as kernel for my domain 0. Domain 0 uses
> >> >> >> Debian Sarge with a backported Xen 2.0.7 package (only litte
> >> >> >> changes to the debian 2.0.6 package, nothing important enough to
> >> >> >> get metioned). All kernels were compiled against vanilla kernels
> >> >> >> with xen-patch. The domain U's are using 2.6.11 or 2.4.30 (debian,
> >> >> >> suse).
> >> >> >>
> >> >> >>I have no problems within domains and everything is running very
> >> >> >> smoothly, exepct one thing (which was also not working correctly
> >> >> >> in xen-2.0.6 for me): I can save a domain with "xm save
> >> >> >> <domainname> <suspendfile>" once and I can restore this domain
> >> >> >> again, but if I try a second "xm save ..." it simply seems to
> >> >> >> hang. Nothing happens and the last thing in the logs are these
> >> >> >> lines:
> >> >> >
> >> >> >Is this the same with both 2.4 and 2.6 domUs? I've noticed something
> >> >> > similar with 2.0.7 but only with 2.4 domUs ... it would be useful
> >> >> > to know if it affects 2.6 also - I'm trying to track it down.
> >> >
> >> >yes, it's the same with 2.4 and 2.6 domUs...
> >> >
> >> >> There's a very similiar problem in 3.0 that has to do with a race
> >> >> condition with the xc_save/Xend interaction.  xc_save thinks it has
> >> >> sent the "suspend" command over the pipe and Xend is waiting for it
> >> >> to arrive.
> >> >
> >> >... but after some more testing I noticed another interessting thing.
> >> > "xm save" hangs if the suspend file doesn't exist. For the first time
> >> > after a dom0 reboot it's normaly no problem, but if I delete the file
> >> > and try a "xm save" again it will not work for 95%.
> >> >
> >> >If I keep the save-file and then make a "xm save" and a "xm restore" it
> >> > seems to be no problem. I made 10 tests and all worked.
> >>
> >> Fix attached below - it's actually nothing to do with whether the file
> >> exists or not. Rather the problem is that on occasion xfrd sends a
> >> response and a request in the same 'message', and Xend only deals with
> >> the first.
> >>
> >> The below fixes this for me - please let me know if it works for you,
> >
> >I can't test it right now, because the server is in production use now. I
> > have to schedule a maintaince window to reboot the system (and that is
> > needed if the problem is not fixed and a "xm save" crashes.
>
> Ok (although I'm confident the fix is a strict stability improvement - I
> stress tested over 15,000 save/restore cycles at a variety of frequencies
> without a single problem).
>
> But then again, it's your server :-)
>
> Since the problem was a race condition and hence timing (and concurrency
> at the hardware level) are likely to affect the probability of it
> occurring. So e.g. SMP versus not, or slow versus fast machine, or anything
> like this could increase the chance you'd see it.
>
> >I let you know if I could test the patch on the production system (or
> > another smp/ht system), but that can take some more days... sorry.
>
> No probs - the fix is in 2.0-testing but that also includes a bunch of
> other stuff, so probably best to just apply that patch locally.

Hi Steven,

I tried the your patch last night after announcing a short downtime to our 
customers.

After applying the patch and rebuilding xen the problems were gone.

I tried 200 saves & restores on diffrent domUs and had no problem at all. Now 
it doesn't care if the save file exists before the "xm save" command or not. 
I know that this was not the bug itself, but the file exists thing triggered 
the race condition on our system before.

thanks for your help and work on xen...

regards,
 Ralph

>
> cheers,
>
> S.
>
> _______________________________________________
> Xen-users mailing list
> Xen-users@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-users

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

<Prev in Thread] Current Thread [Next in Thread>