Petersson, Mats wrote:
> > > Al Boldi wrote:
> > > > I maybe missing something, but why should the Xen-design
> > > > require the guest to be patched?
> The main reason to use a para-virtual kernel that it performs better
> than the fully virtualized version.
> > So HVM solves the problem, but why can't this layer be implemented in
> > software?
> It CAN, and has been done.
You mean full virtualization using binary translation in software?
My understanding was, that HVM implies full virtualization without the need
for binary translation in software.
> It is however, a little bit difficult to
> cover some of the "strange" corner cases, as the x86 processor wasn't
> really designed to handle virtualization natively [until these
> extensions where added].
You mean AMDV/IntelVT extensions?
If so, then these extensions don't actively participate in the act of
virtualization, but rather fix some x86-arch shortcomings, that make it
easier for software (i.e. Xen) to virtualize, thus circumventing the need to
do binary translation. Is this a correct reading?
> This is why you end up with binary translation
> in VMWare for example. For example, let's say that we use the method of
> "ring compression" (which is when the guest-OS is moved from Ring 0
> [full privileges] to Ring 1 [less than full privileges]), and the
> hypervisor wants to have full control of interrupt flags:
> pushf // Save interrupt flag.
> cli // Disable interrupts
> popf // Restore interrupt flag.
> In Ring 0, all this works just fine - but of course, we don't know that
> the guest-OS tried to disable interrupts, so we have to change
> something. In Ring 1, the guest can't disable interrupts, so the CLI
> instruction can be intercepted. Great. But pushf/popf is a valid
> instruction in all four rings - it just doesn't change the interrupt
> enable flag in the flags register if you're not allowed to use the
> CLI/STI instructions! So, that means that interrupts are disabled
> forever after [until an STI instruction gets found by chance, at least].
> And if the next bit of code is:
> mov someaddress, eax // someaddress is
> updated by an interrupt!
> cmp someaddress, eax // Check it...
> jz $1
> Then we'd very likely never get out of there, since the actual interrupt
> causing someaddress to change is believed by the VMM to be disabled.
> There is no real way to make popf trap [other than supplying it with
> invalid arguments in virtual 8086 mode, which isn't really a practical
> thing to do here!]
> Another problem is "hidden bits" in registers.
> Let's say this:
> mov cr0, eax
> mov eax, ecx
> or $1, eax
> mov eax, cr0
> mov $0x10, eax
> mov eax, fs
> mov ecx, cr0
> mov $0xF000000, eax
> mov $10000, ecx
> mov $0, fs:eax
> add $4, eax
> dec ecx
> jnz $1
> Let's now say that we have an interrupt that the hypervisor would handle
> in the loop in the above code. The hypervisor itself uses FS for some
> special purpose, and thus needs to save/restore the FS register. When it
> returns, the system will crash (GP fault) because the FS register limit
> is 0xFFFF (64KB) and eax is greater than the limit - but the limit of FS
> was set to 0xFFFFFFFF before we took the interrupt... Incorrect
> behaviour like this is terribly difficult to deal with, and there really
> isn't any good way to solve these issues [other than not allowing the
> code to run when it does "funny" things like this - or to perform the
> necessary code in "translation mode" - i.e. emulate each instruction ->
Or introduce AMDV/IntelVT extensions?
> > I'm sure there can't be a performance issue, as this
> > virtualization doesn't
> > occur on the physical resource level, but is (should be)
> > rather implemented
> > as some sort of a multiplexed routing algorithm, I think :)
> I'm not entirely sure what this statement is trying to say, but as I
> understand the situation, performance is entirely the reason why the Xen
> paravirtual model was implemented - all other VMM's are slower [although
> it's often hard to prove that, since for example Vmware have the rule
> that they have to give permission before publishing benchmarks of their
> product, and of course that permission would only be given in cases
> where there is some benefit to them].
> One of the obvious reasons for para-virtual being better than full
> virtualization is that it can be used in a "batched" mode. Let's say we
> have some code that does this:
> p = malloc(2000 * 4096);
> Let's then say that the guts of malloc ends up in something like this:
> for(v = random_virtual_address, p = start_page; p < end_page;
> p++, v+=4096)
> map_one_page_to_user(p, v);
> In full virtualization, we have no way to understand that someone is
> mapping 2000 pages to the same user-process in one guest, we'd just see
> writes to the page-table one page at a time.
> In the para-virtual case, we could do something like:
> hypervisor_map_pages_to_user(current_process, start_page,
> Now, the hypervisor knows "the full story" and can map all those pages
> in one go - much quicker, I would say. There's still more work than in
> the native case, but it's much closer to the native case.
Sure, but wouldn't this be for the price of losing guest-OS transparency?
Xen-devel mailing list