WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

Re: [Xen-users] Live migration: 2500ms downtime

To: "Tim Wood" <twwood@xxxxxxxxx>
Subject: Re: [Xen-users] Live migration: 2500ms downtime
From: "Marconi Rivello" <marconirivello@xxxxxxxxx>
Date: Tue, 21 Aug 2007 15:29:33 -0300
Cc: xen-users@xxxxxxxxxxxxxxxxxxx
Delivery-date: Tue, 21 Aug 2007 11:30:07 -0700
Dkim-signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:references; b=enLOXIuO3vh3zdJETvqu8LN+rF1fNKsELLZHXHcyGxhwfWFoWFaRuI56UKHQcy0bFSz/Z5Mo7liQbkfq+VAiZ2okRgdZH1BzjPHmKzJrbuMtypH3/VFfnzaZrQ5ijyNfFWwA/h4ucr2bkOvsJuCX7m3G1xftzEzhDo1tN7vLdjg=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:references; b=aovsAFUFG7+VJcm7lqacsiSg7E6CLk8nm6eC8Cxy8DfDWZVlkKfsXQe2+d5nowpIzT7TvTyF7cYMboV80r802ZpucJ6103yOra+/xQQOSq87acCsVh7EghNFvsiorPmlAkiXkEMKmaRraBKn9nPF62hftUD94P4p81qqIxf44pI=
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <9c92ada10708151159g66a6c8edic3d64b2412989f1d@xxxxxxxxxxxxxx>
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
References: <bd15b9d50708100415h72b7036ek5d248f1ff1bbe308@xxxxxxxxxxxxxx> <f9264670708100515te52ef07l9e5e86e925b58224@xxxxxxxxxxxxxx> <bd15b9d50708100721k41309835u8cb845d017882b19@xxxxxxxxxxxxxx> <f9264670708100808g7d54ed83m3537c80888fb64ba@xxxxxxxxxxxxxx> <bd15b9d50708100842l7c4486famff66d9a516ba9162@xxxxxxxxxxxxxx> <20070810162307.GA941@xxxxxxxxxxxxxxxxxxxxxx> <bd15b9d50708101018r22b5732fp6bc94508685688b8@xxxxxxxxxxxxxx> <20070810203139.GA4302@xxxxxxxxxxxxxxxxxxxxxx> <bd15b9d50708102057r49ed7853ye9a2c9eef2988abb@xxxxxxxxxxxxxx> <9c92ada10708151159g66a6c8edic3d64b2412989f1d@xxxxxxxxxxxxxx>
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
Hi there,

Following suggestions, I installed SLES 10 SP1, with Xen 3.0.4. Although the migration downtime diminished, it is still an order of magnitude higher than what it should be. It is now around 1.2s.

That is still pretty impressive, if it weren't for the fact that there's still no ARP after the migration, so the switch doesn't update it's tables with which port the VM is on, isolating it from the outside world.

I measured the downtime pinging from within the VM to an outside host, in 100ms intervals. With the constant pinging, the VM advertises itself to the switch, and the communication continues after 12 packets lost... But if there is no activity from the VM, I can only get it back to life pinging it from the new dom0. That generates an ICMP reply, that goes out the physical ethernet and advertises it's new location.

I was looking at the external-migration-tool option in xend-config.sxp, but can't figure out how to use it or if it would even be useful to automatically ping the VM after the migration. I can't find any documentation about it or examples.

Any ideas?

Thanks.



On 8/15/07, Tim Wood <twwood@xxxxxxxxx> wrote:
I have also noticed the problem that domains take much longer to
migrate if they are set to have memory < maxmem.  I know this was a
problem back in the Xen 3.0.1 days and had thought I had heard that it
was getting fixed with newer versions.

At that time if you looked at one of the xen debug logs there were
millions of lines of errors every time you attempted to migrate a
domain which had its memory image shrunk... I would suggest that you
check if your current setup is also producing this kind of output and
search for that error message -- I know there were some messages on
this mailing list about it in the past, but can't find them right now.

good luck!

On 8/10/07, Marconi Rivello < marconirivello@xxxxxxxxx> wrote:
>
>
> On 8/10/07, Luciano Rocha <strange@xxxxxxxxxxxxx> wrote:
> > On Fri, Aug 10, 2007 at 02:18:53PM -0300, Marconi Rivello wrote:
> > > Hi, Luciano.
> > >
> > > On 8/10/07, Luciano Rocha <strange@xxxxxxxxxxxxx> wrote:
> > > >
> > > > On Fri, Aug 10, 2007 at 12:42:08PM -0300, Marconi Rivello wrote:
> > > > > Another issue that I described on a previous email (which,
> > > > unfortunately,
> > > > > didn't get any replies) is that this downtime increases to more than
> 20
> > > > > seconds if I set the domU's memory to 512MB (the maxmem set is
> 1024MB).
> > > > I
> > > > > repeated the test successively, from one side to the other, with mem
> set
> > > > to
> > > > > 512 and 1024, and the result was always the same. Around 3s with mem
> =
> > > > > maxmem, and around 24s with mem=512 and maxmem=1024.
> > > > >
> > > >
> > > > You are using the option --live to migrate, aren't you?
> > >
> > >
> > > Yes, I am. :)
> >
> > Oh. Well, then, could you try without? :)
>
> I could, but what I'm whining :) about is to have a period of
> unresponsiveness of a couple of seconds, instead of a tenth of a second. If
> I do a stop-copy-restart migration it will be even longer.
>
> > Also, try the reverse. Ping an outside host in the domU.
>
>  I will. In fact, I will try all the monitoring suggestions (from you and
> the others). Inside domU, outside, third machine, ICMP, ARP...
>
> > > Even if I weren't, it would make sense to expect a lower downtime (or
> the
> > > same downtime) by reducing the domU memory. But it takes longer if I
> reduce
> > > the domU's memory.
> >
> > That is odd. Is the Dom0 memory the same (ie., fixed)?
> >
> > > Would you happen to have any ideas on why it behaves like that?
> >
> > No idea. I might expect a longer migration time for a machine with a
> > very active working set, but not a much longer downtime. That should be
> > only a freeze, final sync, and resume on the other side.
> >
> > --
> > lfr
> > 0/0
> >
> > _______________________________________________
> > Xen-users mailing list
> > Xen-users@xxxxxxxxxxxxxxxxxxx
> > http://lists.xensource.com/xen-users
> >
> >
>
>
> I would like to thank everyone who contributed with ideas. It was very
> helpful. Unfortunately, I will be gone for the next week on a training, and
> will only be able to further investigate when I get back to work. When I do,
> I will do some more tests and post what I find out or not.
>
> Thanks again,
> Marconi.
>
> _______________________________________________
> Xen-users mailing list
> Xen-users@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-users
>

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users