[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Could Xen hyperviosr be able to invoke Linux systemcalls?

On Tue, 2015-08-18 at 01:18 +0000, Kun Cheng wrote:

> On Tue, Aug 18, 2015 at 3:25 AM Dario Faggioli
> <dario.faggioli@xxxxxxxxxx> wrote:
>         On Mon, 2015-08-17 at 00:55 +0000, Kun Cheng wrote:
>         >
>         >
>         > On Mon, Aug 17, 2015 at 12:16 AM Frediano Ziglio
>         <freddy77@xxxxxxxxx>
>         >
>         > What I'm planing is adding page migration support for NUMA
>         aware
>         > scheduling. In such a case the most time I'll be dealing
>         with Xen's
>         > memory management & scheduling part to make relevant pages
>         migrate to
>         > another node with their VCPU. However, Linux kernel has
>         already
>         > implemented some basic mechanisms so the whole work would be
>         better by
>         > leveraging the kernel's  existing code or functions.
>         >
>         No, not at all. As you figured (or at least had intuition
>         about)
>         yourself, Xen does run below Linux. Actually, it runs below
>         any guest,
>         including Dom0, which is a special guest but still a guest,
>         and can even
>         not be a Linux guest.
>         So there's no code sharing, or no mechanism to invoke Linux
>         code and
>         have it affect Xen's scheduling or memory management (and
>         never will
>         be :-P).

> Not being able to share the existing kernel mechanism is some kind of
> frustrating......
You think? Well, I guess I see what you mean. However, being able to do
custom things, specifically tailored to the kind of workload that Xen
focuses on (i.e., virtualization, of course), instead of having to rely
on tweaking a general purpose operating system, trying to bending it as
much as possible to some specific needs (i.e., basically, what KVM is
doing), is one of Xen's strengths.

Then, whether or not we always manage to take proper advantage of that
it's another pair of hands.

> But just as you said it's the point of virtualization. And now I gain
> a better understanding why you said it would be tough ;)   (I start to
> envy KVM guys, LOL)
Yeah, sometimes it happens that they get something sort of "for free",
but I really believe what I just said above, so no anvy. :-)

>         So, in summary, what you're after should be achieved entirely
>         inside
>         Xen. It is possible than, in the PV guest case, you'd need
>         some help
>         from the guest. However, that would be in the form of "Xen
>         asking/forcing the guest to do something on the *guest*
>         *itself*", not
>         in the form of "Xen asking dom0 to do something on Xen's own
>         memory/scheduling or (directly) on other guests' memory".
>         Hope this helps clearing things out for you. :-)

> At this point I still have other plans.  But 'asking the guest to do
> something on the guest itself' sounds like exposing the virtual NUMA
> topology to the guest (vNUMA). 
How so? We already have it, although it's not yet fully usable (right
for PV guests) due to other issues. But I don't see what that has to do
with what we're talking about.

In the PV case, virtual NUMA what virtual NUMA topology takes is:
 - the tools and the hypervisor being able to allocate memory for the
   guest in a specific way (matching the topology we want the guest to
 - the hypervisor to store the virtual topology somewhere, in order to
   be able to provide it to the guest
 - the guest to ask about its own NUMA topology via a PV path
   (hypercalls), rather than via ACPI (which basically doesn't exist in

Again, what does this have to do with memory migration?

> I wrote this email because hypervisor is responsible to allocate
> machine memory for each guest. Then, in a PV case there are P2M and
> M2P to help address translation (and shadow page tables in HVMs). So
> what first came to my mind was hypervisor should move the pages for
> guests and then P2M things should better be renewed somehow. However
> inside a guest domain, its OS can only manage the guest physical
> memory, which I don't think is able to be moved to another node by
> itself.
A PV guests know about the fact that it is a PV guest (that's the point
of paravirtualization), and in fact, it performs hypercalls ad
everything. However, such a knowledge does not go as far as being aware
of the host NUMA layout, and being able to move its own memory to a
different NUMA node in the host.

What I recommend you, is to have a look at the migration code. It's kind
of a beast, I know, but it's been rewrote almost from scratch just very
recently, and I'm sure now it's a lot better and easier to understand
than before.

Reason I'm suggesting this is that, particularly for PV, moving the
guest's RAM under its own feet is going to be possible oly with
something similar to performing a local migration. The main difference
is that we may want to be able to do it more 'lively' (i.e., without
stopping the world, even for a small amount of time, as it happens in
migration), as well as that we may want to be able to move specific
chunks of memory, rather than all of it.

These are not small differences, and the migration code wouldn't
probably be reusable as it is, but it's the closest thing to what you're
saying you're trying to achieve that I can imagine.

> Maybe I misunderstood you words... 'asking the guest to do something
> on the guest itself' confuses me a bit, could you explain more details
> of your thought if it's convenient for you?
Yeah, my bad. Perhaps, for now, it's better if you forget about this.
Very quickly, what I was hinting at is some mechanisms that we could
come up with (but that will be one of the last steps) for putting the PV
guest under some kind of quiescent state, i.e., a state where it does
not change its page tables --as we're fiddling with them-- without being
completely suspended. If we'll ever get there, I think that this could
only be done with some cooperation from the guest, e.g., having it going
through a protocol that we'd need to define, upon request from the
hypervisor. But that's just speculation at this time, and we really
shouldn't think at it until we get there... It's not like there aren't
super difficult problem to solve already! :-P


<<This happens because I choose it to happen!>> (Raistlin Majere)
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.