RE: [Xen-devel] Questioning the Xen Design of the VMM

To:	"Steven Rostedt" <srostedt@xxxxxxxxxx>
Subject:	RE: [Xen-devel] Questioning the Xen Design of the VMM
From:	"Petersson, Mats" <Mats.Petersson@xxxxxxx>
Date:	Tue, 8 Aug 2006 19:14:31 +0200
Cc:	Al Boldi <a1426z@xxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date:	Tue, 08 Aug 2006 10:15:21 -0700
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxx
In-reply-to:	<44D8BE41.9010709@xxxxxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index:	Aca7CUUtkrljpOZyScSbgg4Q+5MatQAAx/zg
Thread-topic:	[Xen-devel] Questioning the Xen Design of the VMM

> -----Original Message-----
> From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx 
> [mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of 
> Steven Rostedt
> Sent: 08 August 2006 17:39
> To: Petersson, Mats
> Cc: Al Boldi; xen-devel@xxxxxxxxxxxxxxxxxxx
> Subject: Re: [Xen-devel] Questioning the Xen Design of the VMM
> 
> Mats, thanks for the examples of where the hypervisor needs to know 
> otherwise x86 guest doesn't do what it expects to be done.
> 
> I've just recently started working with Xen, but my 
> background has been 
> more with other architectures than x86.  I understand all that you 
> explained, but one: see below. (I'm posting to the list so 
> that others 
> can learn too ;)
> 
> Petersson, Mats wrote:
> >  
> 
> [ snipped a lot of good info ]
> 
> > 
> > Another problem is "hidden bits" in registers. 
> > 
> > Let's say this:
> > 
> >     mov     cr0, eax
> >     mov     eax, ecx
> >     or      $1, eax
> >     mov     eax, cr0
> >     mov     $0x10, eax
> >     mov     eax, fs
> >     mov     ecx, cr0
> >     
> >     mov     $0xF000000, eax
> >     mov     $10000, ecx
> > $1:
> >     mov     $0, fs:eax
> >     add     $4, eax
> >     dec     ecx
> >     jnz     $1
> > 
> > Let's now say that we have an interrupt that the hypervisor 
> would handle
> > in the loop in the above code. The hypervisor itself uses 
> FS for some
> > special purpose, and thus needs to save/restore the FS 
> register. When it
> > returns, the system will crash (GP fault) because the FS 
> register limit
> > is 0xFFFF (64KB) and eax is greater than the limit - but 
> the limit of FS
> > was set to 0xFFFFFFFF before we took the interrupt... Incorrect
> > behaviour like this is terribly difficult to deal with, and 
> there really
> > isn't any good way to solve these issues [other than not 
> allowing the
> > code to run when it does "funny" things like this - or to 
> perform the
> > necessary code in "translation mode" - i.e. emulate each 
> instruction ->
> > slow(ish)]. 
> > 
> 
> The above I'm confused on.  In x86, the hypervisor can't store the fs 
> register fully before returning from the interrupt??  You stated that 
> the fs register limit was 0xffffffff before the interrupt, 
> but ends up 
> being 0xffff afterwards.  As I mentioned, I'm just learning the 
> internals of x86, so my full comprehension on segment 
> registers of x86 
> is still a little fuzzy.
> 
> Could you explain further here?

Sure, this code-snippet enters protected mode (bit 0 of CR0) and sets up
FS from the Global Descriptor Table. FS visible part (16 bits) gets set
to the value 0x10, and the limit is set to whatever happens to be in the
descriptor table, and I didn't actually specify what that value is, but
rather implied that the value for the limit is (0xfffff << 12 | 0xFFF)
(i.e. the limit is 2^20 - 1 and the granularity bit is set to 1 ->
multiply by 4096 and set lower bits to one). 

As we leave protected mode, the contents of FS is still maintained,
including the 80 bits of hidden information (limit, base and
attributes). 

However, if we then take an interrupt (or otherwise need to save/restore
FS), we'd loose all the hidden bits, and restoring it later would need
to figure out "how it got loaded" to make sure it's hidden parts are
re-loaded. 

It's unlikely that you'd see this scenario in Xen, since Xen works on
para-virtual kernels [unless we've got virtualization hardware, in which
case the hypervisor CAN SEE the internal parts of FS (or any other
segment register)]. 

Another tricky situation is:

        GDT[5] = {base = 0x1000, limit=0x1000, attr=<something> }
        FS = GDT[5];
        CLI();
        GDT [5] = [base = 0x2000, limit = 0x1000, attr=<something> }
        ... 
        ...
        ...
        FS = GDT[5];
        STI();

Now, whilst this tricky code is unreliable on real hardware too (if
interrupts were enabled), if you have a situation where the guest can
not accept interrupts, but the hypervisor can, it would break if the
code with ... in it were to have an interrupt, because we'd have lost
the value of FS (we'd reload the NEW value of GDT[5] at the end of
interrupt, assuming it saves FS). 

Hidden parts of segment registers is one of the "security features" of
the 286 architecture, but it also creates some pretty interesting
scenarios for us programmers... 

--
Mats

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

RE: [Xen-devel] Questioning the Xen Design of the VMM