|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Proposed XENMEM_claim_pages hypercall: Analysis of problem and alternate solutions
On 18/12/12 22:17, Konrad Rzeszutek Wilk wrote: Hi Dan, an issue with your reasoning throughout has been the constant invocation of the multi host environment as a justification for your proposal. But this argument is not used in your proposal below beyond this mention in passing. Further, there is no relation between what you are changing (the hypervisor) and what you are claiming it is needed for (multi host VM management).Heh. I hadn't realized that the emails need to conform to a the way legal briefs are written in US :-) Meaning that each topic must be addressed. Every time we try to suggest alternatives, Dan goes on some rant about how we're on different planets, how we're all old-guard stuck in static-land thinking, and how we're focused on single-server use cases, but that multi-server use cases are so different. That's not a one-off, Dan has brought up the multi-server case as a reason that a user-space version won't work several times. But when it comes down to it, he (apparently) has barely mentioned it. If it's such a key reason point, why does he not bring it up here? It turns out we were right all along -- the whole multi-server thing has nothing to do with it. That's the point Andres is getting at, I think. (FYI I'm not wasting my time reading mail from Dan anymore on this subject. As far as I can tell in this entire discussion he has never changed his mind or his core argument in response to anything anyone has said, nor has he understood better our ideas or where we are coming from. He has only responded by generating more verbiage than anyone has the time to read and understand, much less respond to. That's why I suggested to Dan that he ask someone else to take over the conversation.) Anyhow, the multi-host env or a single-host env has the same issue - you try to launch multiple guests and you some of them might not launch. The changes that Dan is proposing (the claim hypercall) would provide the functionality to fix this problem.A fairly bizarre limitation of a balloon-based approach to memory management. Why on earth should the guest be allowed to change the size of its balloon, and therefore its footprint on the host. This may be justified with arguments pertaining to the stability of the in-guest workload. What they really reveal are limitations of ballooning. But the inadequacy of the balloon in itself doesn't automatically translate into justifying the need for a new hyper call.Why is this a limitation? Why shouldn't the guest the allowed to change its memory usage? It can go up and down as it sees fit. And if it goes down and it gets better performance - well, why shouldn't it do it? I concur it is odd - but it has been like that for decades. Well, it shouldn't be allowed to do it because it causes this problem you're having with creating guests in parallel. Ultimately, that is the core of your problem. So if you want us to solve the problem by implementing something in the hypervisor, then you need to justify why "Just don't have guests balloon down" is an unacceptable option. Saying "why shouldn't it", and "it's been that way for decades*" isn't a good enough reason. * Xen is only just 10, so "decades" is a bit of a hyperbole. :-) the hypervisor, which adjusts the domain memory footprint, which changes the number of free pages _without_ the toolstack knowledge. The toolstack controls constraints (essentially a minimum and maximum) which the hypervisor enforces. The toolstack can ensure that the minimum and maximum are identical to essentially disallow Linux from using this functionality. Indeed, this is precisely what Citrix's Dynamic Memory Controller (DMC) does: enforce min==max so that DMC always has complete control and, so, knowledge of any domain memory footprint changes. But DMC is not prescribed by the toolstack,Neither is enforcing min==max. This was my argument when previously commenting on this thread. The fact that you have enforcement of a maximum domain allocation gives you an excellent tool to keep a domain's unsupervised growth at bay. The toolstack can choose how fine-grained, how often to be alerted and stall the domain.There is a down-call (so events) to the tool-stack from the hypervisor when the guest tries to balloon in/out? So the need for this problem arose but the mechanism to deal with it has been shifted to the user-space then? What to do when the guest does this in/out balloon at freq intervals? I am missing actually the reasoning behind wanting to stall the domain? Is that to compress/swap the pages that the guest requests? Meaning an user-space daemon that does "things" and has ownership of the pages?and some real Oracle Linux customers use and depend on the flexibility provided by in-guest ballooning. So guest-privileged-user-driven- ballooning is a potential issue for toolstack-based capacity allocation. [IIGT: This is why I have brought up DMC several times and have called this the "Citrix model,".. I'm not trying to be snippy or impugn your morals as maintainers.] B) Xen's page sharing feature has slowly been completed over a number of recent Xen releases. It takes advantage of the fact that many pages often contain identical data; the hypervisor merges them to saveGreat care has been taken for this statement to not be exactly true. The hypervisor discards one of two pages that the toolstack tells it to (and patches the physmap of the VM previously pointing to the discard page). It doesn't merge, nor does it look into contents. The hypervisor doesn't care about the page contents. This is deliberate, so as to avoid spurious claims of "you are using technique X!"Is the toolstack (or a daemon in userspace) doing this? I would have thought that there would be some optimization to do this somewhere?physical RAM. When any "shared" page is written, the hypervisor "splits" the page (aka, copy-on-write) by allocating a new physical page. There is a long history of this feature in other virtualization products and it is known to be possible that, under many circumstances, thousands of splits may occur in any fraction of a second. The hypervisor does not notify or ask permission of the toolstack. So, page-splitting is an issue for toolstack-based capacity allocation, at least as currently coded in Xen. [Andre: Please hold your objection here until you read further.]Name is Andres. And please cc me if you'll be addressing me directly! Note that I don't disagree with your previous statement in itself. Although "page-splitting" is fairly unique terminology, and confusing (at least to me). CoW works.<nods> No, the xapi code does no such assumptions. After it tells a guest to balloon down, it watches to see what actually happens, and has heuristics to deal with "non-cooperative guests". It does assume that if it sets max_pages lower than or equal to the current amount of used memory, that the hypervisor will not allow the guest to balloon up -- but that's a pretty safe assumption. A guest can balloon down if it wants to, but as xapi does not consider that memory free, it will never use it. BTW, I don't know if you realize this: Originally Xen would return an error if you tried to set max_pages below tot_pages. But as a result of the DMC work, it was seen as useful to allow the toolstack to tell the hypervisor once, "Once the VM has ballooned down to X, don't let it balloon up above X anymore." This goes back to the problem statement - if we try to parallize this we run in the problem that the amount of memory we thought we free is not true anymore. The start of this email has a good description of some of the issues. In essence, the max_pages does work - _if_ one does these operations in serial. We are trying to make this work in parallel and without any failures - for that we - one way that is quite simplistic is the claim hypercall. It sets up a 'stake' of the amount of memory that the hypervisor should reserve. This way other guests creations/ ballonning do not infringe on the 'claimed' amount. I'm not sure what you mean by "do these operations in serial" in this context. Each of your "reservation hypercalls" has to happen in serial. If we had a user-space daemon that was in charge of freeing up or reserving memory, each request to that daemon would happen in serial as well. But once the allocation / reservation happened, the domain builds could happen in parallel. I believe with this hypercall the Xapi can be made to do its operations in parallel as well. xapi can already boot guests in parallel when there's enough memory to do so -- what operations did you have in mind? I haven't followed all of the discussion (for reasons mentioned above), but I think the alternative to Dan's solution is something like below. Maybe you can tell me why it's not very suitable: Have one place in the user-space -- either in the toolstack, or a separate daemon -- that is responsible for knowing all the places where memory might be in use. Memory can be in use either by Xen, or by one of several VMs, or in a tmem pool. In your case, when not creating VMs, it can remove all limitations -- allow the guests or tmem to grow or shrink as much as they want. When a request comes in for a certain amount of memory, it will go and set each VM's max_pages, and the max tmem pool size. It can then check whether there is enough free memory to complete the allocation or not (since there's a race between checking how much memory a guest is using and setting max_pages). If that succeeds, it can return "success". If, while that VM is being built, another request comes in, it can again go around and set the max sizes lower. It has to know how much of the memory is "reserved" for the first guest being built, but if there's enough left after that, it can return "success" and allow the second VM to start being built. After the VMs are built, the toolstack can remove the limits again if it wants, again allowing the free flow of memory. Do you see any problems with this scheme? All it requires is for the toolstack to be able to temporarliy set limits on both guests ballooning up and on tmem allocating more than a certain amount of memory. We already have mechanisms for the first, so if we had a "max_pages" for tmem, then you'd have all the tools you need to implement it. This is the point at which Dan says something about giant multi-host deployments, which has absolutely no bearing on the issue -- the reservation happens at a host level, whether it's in userspace or the hypervisor. It's also where he goes on about how we're stuck in an old stodgy static world and he lives in a magical dynamic hippie world of peace and free love... er, free memory. Which is also not true -- in the scenario I describe above, tmem is actively being used, and guests can actively balloon down and up, while the VM builds are happening. In Dan's proposal, tmem and guests are prevented from allocating "reserved" memory by some complicated scheme inside the allocator; in the above proposal, tmem and guests are prevented from allocating "reserved" memory by simple hypervisor-enforced max_page settings. The end result looks the same to me. -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |