WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Is continuous replication of state possible?

To: Ian Pratt <Ian.Pratt@xxxxxxxxxxxx>
Subject: Re: [Xen-devel] Is continuous replication of state possible?
From: Jacob Gorm Hansen <jacobg@xxxxxxx>
Date: Sat, 08 Jan 2005 18:21:47 -0800
Cc: Per Buer <perbu@xxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxxx
Delivery-date: Sun, 09 Jan 2005 02:33:55 +0000
Envelope-to: xen+James.Bulpin@xxxxxxxxxxxx
In-reply-to: <E1Cmw7t-0004lc-00@xxxxxxxxxxxxxxxxx>
List-archive: <http://sourceforge.net/mailarchive/forum.php?forum=xen-devel>
List-help: <mailto:xen-devel-request@lists.sourceforge.net?subject=help>
List-id: List for Xen developers <xen-devel.lists.sourceforge.net>
List-post: <mailto:xen-devel@lists.sourceforge.net>
List-subscribe: <https://lists.sourceforge.net/lists/listinfo/xen-devel>, <mailto:xen-devel-request@lists.sourceforge.net?subject=subscribe>
List-unsubscribe: <https://lists.sourceforge.net/lists/listinfo/xen-devel>, <mailto:xen-devel-request@lists.sourceforge.net?subject=unsubscribe>
References: <E1Cmw7t-0004lc-00@xxxxxxxxxxxxxxxxx>
Sender: xen-devel-admin@xxxxxxxxxxxxxxxxxxxxx
User-agent: Mozilla Thunderbird 0.8 (X11/20041122)
Ian Pratt wrote:
I've been playing around with Xen a couple of months now and I am very much impressed. I am particularly found of the "live migration" feature. I was wondering if it is possible to make a instance continuously replicate the state of another instance and then make the other instance run if the original instance fails.


Software-implemented hardware fault-tolerance is on the Xen
research roadmap.

It basically just requires deterministic execution and event
injection. Doing this for uniprocessor guests is fairly straight
forward. Doing it for SMP guests (with decent performance) is
going to be a huge challenge, as determinism is hard to achieve. We're
looking in to it...

I did a little reading on this subject a couple of years back, and it seems that on Pentiums getting deterministic execution is impossible even for UPs, as long as you allow preemptive multitasking. Because (according to the Intel manuals) the precision of the Pentium performance counters cannot be relied on, the timer and other interrupts will essentially act as a random generator. Naturally you can do peridic checkpointing, but there will be no way correctness can be guaranteed, unless you coordinate all outgoing traffic between replicas before making it visible to the outside world.

There is a paper by Bressoud and Schneider about hypervisor-based fault tolerance on the PA-RISC (which had precise performance counters) which is worth reading, I found a copy online at http://roc.cs.berkeley.edu/294fall01/readings/bressoud.pdf .

I think is more likely to work at a higher level, when you know the semantics of your application, Dmitrii Zagorodnov did some work on that and reported good results, see for instance http://ieeexplore.ieee.org/iel5/8589/27228/01209950.pdf .

Best regards,
Jacob


-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/xen-devel