Xen project Mailing List

Re: [Xen-devel] Proposed XENMEM_claim_pages hypercall: Analysis of problem and alternate solutions

To: George Dunlap <george.dunlap@xxxxxxxxxxxxx>

From: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>

Date: Mon, 14 Jan 2013 10:18:17 -0800 (PST)

Cc: "Keir \(Xen.org\)" <keir@xxxxxxx>, Ian Campbell <Ian.Campbell@xxxxxxxxxx>, Andres Lagar-Cavilla <andreslc@xxxxxxxxxxxxxx>, "Tim \(Xen.org\)" <tim@xxxxxxx>, xen-devel@xxxxxxxxxxxxx, Konrad Rzeszutek Wilk <konrad@xxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>

Delivery-date: Mon, 14 Jan 2013 18:30:23 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

> From: George Dunlap [mailto:george.dunlap@xxxxxxxxxxxxx] > Subject: Re: [Xen-devel] Proposed XENMEM_claim_pages hypercall: Analysis of > problem and alternate > solutions Hi George -- I trust we have gotten past the recent unpleasantness? I do value your technical input to this debate (even when we disagree), so I thank you for continuing the discussion below. > On 09/01/13 14:44, Dan Magenheimer wrote: > >> From: Ian Campbell [mailto:Ian.Campbell@xxxxxxxxxx] > >> Subject: Re: [Xen-devel] Proposed XENMEM_claim_pages hypercall: Analysis > >> of problem and alternate > >> solutions > >> > >> On Tue, 2013-01-08 at 19:41 +0000, Dan Magenheimer wrote: > >>> [1] A clarification: In the Oracle model, there is only maxmem; > >>> i.e. current_maxmem is always the same as lifetime_maxmem; > >> This is exactly what I am proposing that you change in order to > >> implement something like the claim mechanism in the toolstack. > >> > >> If your model is fixed in stone and cannot accommodate changes of this > >> type then there isn't much point in continuing this conversation. > >> > >> I think we need to agree on this before we consider the rest of your > >> mail in detail, so I have snipped all that for the time being. > > Agreed that it is not fixed in stone. I should have said > > "In the _current_ Oracle model" and that footnote was only for > > comparison purposes. So, please, do proceed in commenting on the > > two premises I outlined. > > > >>> i.e. d->max_pages is fixed for the life of the domain and > >>> only d->tot_pages varies; i.e. no intelligence is required > >>> in the toolstack. AFAIK, the distinction between current_maxmem > >>> and lifetime_maxmem was added for Citrix DMC support. > >> I don't believe Xen itself has any such concept, the distinction is > >> purely internal to the toolstack and which value it chooses to push down > >> to d->max_pages. > > Actually I believe a change was committed to the hypervisor specifically > > to accommodate this. George mentioned it earlier in this thread... > > I'll have to dig to find the specific changeset but the change allows > > the toolstack to reduce d->max_pages so that it is (temporarily) > > less than d->tot_pages. Such a change would clearly be unnecessary > > if current_maxmem was always the same as lifetime_maxmem. > > Not exactly. You could always change d->max_pages; and so there was > never a concept of "lifetime_maxmem" inside of Xen. (Well, not exactly "always", but since Aug 2006... changeset 11257. There being no documentation, it's not clear whether the addition of a domctl to modify d->max_pages was intended to be used frequently by the toolstack, as opposed to used only rarely and only by a responsible host system administrator.) > The change I think you're talking about is this. While you could always > change d->max_pages, it used to be the case that if you tried to set > d->max_pages to a value less than d->tot_pages, it would return > -EINVAL*. What this meant was that if you wanted to use d->max_pages > to enforce a ballooning request, you had to do the following: > 1. Issue a balloon request to the guest > 2. Wait for the guest to successfully balloon down to the new target > 3. Set d->max_pages to the new target. > > The waiting made the logic more complicated, and also introduced a race > between steps 2 and 3. So the change was made so that Xen would > tolerate setting max_pages to less than tot_pages. Then things looked > like this: > 1. Set d->max_pages to the new target > 2. Issue a balloon request to the guest. > > The new semantics guaranteed that the guest would not be able to "change > its mind" and ask for memory back after freeing it without the toolstack > needing to closely monitor the actual current usage. > > But even before the change, it was still possible to change max_pages; > so the change doesn't have any bearing on the discussion here. > > -George > > * I may have some of the details incorrect (e.g., maybe it was > d->tot_pages+something else, maybe it didn't return -EINVAL but failed > in some other way), but the general idea is correct. Yes, understood. Ian please correct me if I am wrong, but I believe your proposal (at least as last stated) does indeed, in some cases, set d->max_pages less than or equal to d->tot_pages. So AFAICT the change does very much have a bearing on the discussion here. > The new semantics guaranteed that the guest would not be able to "change > its mind" and ask for memory back after freeing it without the toolstack > needing to closely monitor the actual current usage. Exactly. So, in your/Ian's model, you are artificially constraining a guest's memory growth, including any dynamic allocations*. If, by bad luck, you do that at a moment when the guest was growing and is very much in need of that additional memory, the guest may now swapstorm or OOM, and the toolstack has seriously impacted a running guest. Oracle considers this both unacceptable and unnecessary. In the Oracle model, d->max_pages never gets changed, except possibly by explicit rare demand by a host administrator. In the Oracle model, the toolstack has no business arbitrarily changing a constraint for a guest that can have a serious impact on the guest. In the Oracle model, each guest shrinks and grows its memory needs self-adaptively, only constrained by the vm.cfg at the launch of the guest and the physical limits of the machine (max-of-sums because it is done in the hypervisor, not sum-of-maxes). All this uses working shipping code upstream in Xen and Linux... except that you are blocking from open source the proposed XENMEM_claim_pages hypercall. So, I think it is very fair (not snide) to point out that a change was made to the hypervisor to accommodate your/Ian's memory-management model, a change that Oracle considers unnecessary, a change explicitly supporting your/Ian's model, which is a model that has not been implemented in open source and has no clear (let alone proven) policy to guide it. Yet you wish to block a minor hypervisor change which is needed to accommodate Oracle's shipping memory-management model? Please reconsider. Thanks, Dan * To repeat my definition of that term, "dynamic allocations" means any increase to d->tot_pages that is unbeknownst to the toolstack, including specifically in-guest ballooning and certain tmem calls. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.