Xen project Mailing List

On Tue, Aug 18, 2015 at 5:16 PM Dario Faggioli <dario.faggioli@xxxxxxxxxx> wrote:

On Tue, 2015-08-18 at 01:18 +0000, Kun Cheng wrote:

> On Tue, Aug 18, 2015 at 3:25 AM Dario Faggioli
> <dario.faggioli@xxxxxxxxxx> wrote:
>
>Â Â Â Â ÂOn Mon, 2015-08-17 at 00:55 +0000, Kun Cheng wrote:
>Â Â Â Â Â>
>Â Â Â Â Â>
>Â Â Â Â Â> On Mon, Aug 17, 2015 at 12:16 AM Frediano Ziglio
>Â Â Â Â Â<freddy77@xxxxxxxxx>
>Â Â Â Â Â>
>Â Â Â Â Â> What I'm planing is adding page migration support for NUMA
>Â Â Â Â Âaware
>Â Â Â Â Â> scheduling. In such a case the most time I'll be dealing
>Â Â Â Â Âwith Xen's
>Â Â Â Â Â> memory management & scheduling part to make relevant pages
>Â Â Â Â Âmigrate to
>Â Â Â Â Â> another node with their VCPU. However, Linux kernel has
>Â Â Â Â Âalready
>Â Â Â Â Â> implemented some basic mechanisms so the whole work would be
>Â Â Â Â Âbetter by
>Â Â Â Â Â> leveraging the kernel'sÂ existing code or functions.
>Â Â Â Â Â>
>Â Â Â Â ÂNo, not at all. As you figured (or at least had intuition
>Â Â Â Â Âabout)
>Â Â Â Â Âyourself, Xen does run below Linux. Actually, it runs below
>Â Â Â Â Âany guest,
>Â Â Â Â Âincluding Dom0, which is a special guest but still a guest,
>Â Â Â Â Âand can even
>Â Â Â Â Ânot be a Linux guest.
>
>Â Â Â Â ÂSo there's no code sharing, or no mechanism to invoke Linux
>Â Â Â Â Âcode and
>Â Â Â Â Âhave it affect Xen's scheduling or memory management (and
>Â Â Â Â Ânever will
>Â Â Â Â Âbe :-P).

>
>
> Not being able to share the existing kernel mechanism is some kind of
> frustrating......
>
You think? Well, I guess I see what you mean. However, being able to do
custom things, specifically tailored to the kind of workload that Xen
focuses on (i.e., virtualization, of course), instead of having to rely
on tweaking a general purpose operating system, trying to bending it as
much as possible to some specific needs (i.e., basically, what KVM is
doing), is one of Xen's strengths.

Agreed. "Decoupling the hypervisor from a specific OS kernel", that's what Xen does.

Then, whether or not we always manage to take proper advantage of that
it's another pair of hands.

> But just as you said it's the point of virtualization. And now I gain
> a better understanding why you said it would be tough ;)Â Â(I start to
> envy KVM guys, LOL)
>
Yeah, sometimes it happens that they get something sort of "for free",
but I really believe what I just said above, so no anvy. :-)

>Â Â Â Â ÂSo, in summary, what you're after should be achieved entirely
>Â Â Â Â Âinside
>Â Â Â Â ÂXen. It is possible than, in the PV guest case, you'd need
>Â Â Â Â Âsome help
>Â Â Â Â Âfrom the guest. However, that would be in the form of "Xen
>Â Â Â Â Âasking/forcing the guest to do something on the *guest*
>Â Â Â Â Â*itself*", not
>Â Â Â Â Âin the form of "Xen asking dom0 to do something on Xen's own
>Â Â Â Â Âmemory/scheduling or (directly) on other guests' memory".
>
>Â Â Â Â ÂHope this helps clearing things out for you. :-)
>

> At this point I still have other plans.Â But 'asking the guest to do
> something on the guest itself' sounds like exposing the virtual NUMA
> topology to the guest (vNUMA).
>
How so? We already have it, although it's not yet fully usable (right
for PV guests) due to other issues. But I don't see what that has to do
with what we're talking about.

In the PV case, virtual NUMA what virtual NUMA topology takes is:
Â- the tools and the hypervisor being able to allocate memory for the
Â Âguest in a specific way (matching the topology we want the guest to
Â Âhave)
Â- the hypervisor to store the virtual topology somewhere, in order to
Â Âbe able to provide it to the guest
Â- the guest to ask about its own NUMA topology via a PV path
Â Â(hypercalls), rather than via ACPI (which basically doesn't exist in
Â ÂPV)

Again, what does this have to do with memory migration?

No, vNUMA is not involved which I agree with you. The truth is vNUMA was the first came to my mind when you mentioned "to do something on the guest itself". That's all, never mind.

> I wrote this email because hypervisor is responsible to allocate
> machine memory for each guest. Then, in a PV case there are P2M and
> M2P to help address translation (and shadow page tables in HVMs). So
> what first came to my mind was hypervisor should move the pages for
> guests and then P2M things should better be renewed somehow. However
> inside a guest domain, its OS can only manage the guest physical
> memory, which I don't think is able to be moved to another node by
> itself.
>
A PV guests know about the fact that it is a PV guest (that's the point
of paravirtualization), and in fact, it performs hypercalls ad
everything. However, such a knowledge does not go as far as being aware
of the host NUMA layout, and being able to move its own memory to a
different NUMA node in the host.

What I recommend you, is to have a look at the migration code. It's kind
of a beast, I know, but it's been rewrote almost from scratch just very
recently, and I'm sure now it's a lot better and easier to understand
than before.

Reason I'm suggesting this is that, particularly for PV, moving the
guest's RAM under its own feet is going to be possible oly with
something similar to performing a local migration. The main difference
is that we may want to be able to do it more 'lively' (i.e., without
stopping the world, even for a small amount of time, as it happens in
migration), as well as that we may want to be able to move specific
chunks of memory, rather than all of it.

These are not small differences, and the migration code wouldn't
probably be reusable as it is, but it's the closest thing to what you're
saying you're trying to achieve that I can imagine.

Live migration between nodes is perhaps the easiest way. But it also has draw backs mainly because that migration is coarse-grained. Supposing that a VM has multiple VCPUs, if only some of them are moved to another node or some other nodes. Then it will be tough to decide which one should be the target node for the live migration. However, I also think live migration is the best 'first step'. But finally a fine grained memory migration is the destination. By the way, I am currently digging the migration code. ;)Â

>
> Maybe I misunderstood you words... 'asking the guest to do something
> on the guest itself' confuses me a bit, could you explain more details
> of your thought if it's convenient for you?
>
Yeah, my bad. Perhaps, for now, it's better if you forget about this.
Very quickly, what I was hinting at is some mechanisms that we could
come up with (but that will be one of the last steps) for putting the PV
guest under some kind of quiescent state, i.e., a state where it does
not change its page tables --as we're fiddling with them-- without being
completely suspended. If we'll ever get there, I think that this could
only be done with some cooperation from the guest, e.g., having it going
through a protocol that we'd need to define, upon request from the
hypervisor.

I see. So that's a basic idea about keeping a VM alive but not access the pages during the migration. Yes that's will be helpful to accelerate the whole process, e.g. Âwithout spending too much time on waiting for the lock. Â

But that's just speculation at this time, and we really
shouldn't think at it until we get there... It's not like there aren't
super difficult problem to solve already! :-P

Yeah, thinking too much will aggregate the difficulty. Â

Regards,
Dario

--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Thank you so much Dario!

Best regards,

KennethÂ

Re: [Xen-devel] Could Xen hyperviosr be able to invoke Linux systemcalls?