WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

RE: [Xen-devel] DomU crash during migration when suspendingsource domain

To: "Graham, Simon" <Simon.Graham@xxxxxxxxxxx>, "Keir Fraser" <keir@xxxxxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: RE: [Xen-devel] DomU crash during migration when suspendingsource domain
From: "Petersson, Mats" <Mats.Petersson@xxxxxxx>
Date: Wed, 14 Feb 2007 16:15:34 +0100
Delivery-date: Wed, 14 Feb 2007 07:15:10 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <342BAC0A5467384983B586A6B0B3767104A6A851@xxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AcdP6h4+HveIAzruQ3+gt7NQNapEGwANqzaeAADJUVAAAHIl2wAGcwGgAAF4ck4AAAWwQAAAtdeQ
Thread-topic: [Xen-devel] DomU crash during migration when suspendingsource domain
Simon,  

> -----Original Message-----
> From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx 
> [mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of 
> Graham, Simon
> Sent: 14 February 2007 14:43
> To: Keir Fraser; xen-devel@xxxxxxxxxxxxxxxxxxx
> Subject: RE: [Xen-devel] DomU crash during migration when 
> suspendingsource domain
> 
> 
> > In general we *cannot* expect to support CPUs with 
> different features
> > in
> > CPUID. We plan to fix this in two ways:
> >  1. Allow a guest to be given a restricted CPUID view (e.g., with
> > features
> > masked out, or cacheinfo leaves missing).
> 
> Do you plan to do this for PV domains as well as HVM?

PV guests already have an "emulated" CPUID instruction (by prefixing the
regular CPUID so that it turns into "illegal opcode","CPUID". 

At present, there's not much filtering going on in that code, but it's
capable of filtering any and all CPUID functionality used by a PV guest.


HVM also has the same capability of filtering. 

Would it make sense to have a sparse array of CPUID leaves and masks for
the respective entries, or did you have some better idea?

> 
> >  2. Where a guest has been exposed to extended features and leaves,
> > prevent
> > it from being migrated to a less-capable CPU.
> > 
> 
> I guess I'm not quite sure I fully understand -- since we hot 
> remove all
> the processors (but one - I guess that is an issue) and then hot add
> them again after migration, you would think it would be OK to 
> hot add a
> completely different processor -- of course there will be issues with
> the Linux code given that you cant actually test this on a
> non-virtualized system.

The real problem with migrating to a "lesser" platform is things like:
Linux starts by determining which method is the best for calculating TCP
checksums, copying disk-blocks, etc, etc. Let's say that the kernel
decides to use SSE registers for this purpose. You then migrate this to
a processor that doesn't have SSE instructions... Fail Fail Fail. 

Other features of the same sort would be large pages (not supported by
Xen at present), PAE support, etc, etc. 

Or indeed, just knowing how many sets of cache information are available
to a particular CPU type. 
> 
> > A further option (3) for cache info might be to fake out the leaves
> for
> > CPUs
> > that do not support them. But I'm not sure whether, for 
> example, this
> > would
> > be compatible with AMD's CPUID instruction.
> > 

I don't see anything wrong with this in general. AMD has cache-info in
the 80000xxx range of CPUID (for recent CPUs, older ones doesn't have
any cache info in the processor). So selectively fake that into
something saying "not available" (such as setting to zero), would be
fine for those. 

It gets interesting of course when moving from AMD to Intel or other way
around, as the code may "remember" that it's on one or t'other, and not
look in the right place for the info. 

--
Mats
> 
> Agreed.
> 
> > This issue is hardly specific to HA/FT. You can safely 
> build yourself
> a
> > HA/FT cluster out of homogeneous hardware. Building it out 
> of odds and
> > ends
> > you have already is going to be hard or impossible to 
> guarantee safety
> > of in
> > general. I don't believe anyone sells or supports software to allow
> you
> > to
> > do this, and there's a reason for that.
> 
> You misunderstand my point -- in an FT environment, you MUST 
> be able to
> upgrade and repair hardware without taking the domain down -- clearly
> this would normally be to an equivalent or higher functionality system
> but we cant guarantee that there wont be a new spiffy processor that
> causes this same issue to arise or that we wont run into some similar
> issue when replacing faulty hardware (the original system might no
> longer be available for example).
> 
> Simon
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
> 
> 
> 



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel