Xen project Mailing List

Re: [Xen-devel] QEMU bumping memory bug analysis

On 08/06/15 12:39, Stefano Stabellini wrote: > On Fri, 5 Jun 2015, Wei Liu wrote: >> On Fri, Jun 05, 2015 at 06:10:17PM +0100, Stefano Stabellini wrote: >>> On Fri, 5 Jun 2015, Wei Liu wrote: >>>> Hi all >>>> >>>> This bug is now considered a blocker for 4.6 release. >>>> >>>> The premises of the problem remain the same (George's translated >>>> version): >>>> >>>> 1. QEMU may need extra pages from Xen to implement option ROMS, and so at >>>> the moment it calls set_max_mem() to increase max_pages so that it can >>>> allocate more pages to the guest. libxl doesn't know what max_pages a >>>> domain needs prior to qemu start-up. >>>> >>>> 2. Libxl doesn't know max_pages even after qemu start-up, because there >>>> is no mechanism to communicate between qemu and libxl. >>> I might not know what is the right design for the overall solution, but >>> I do know that libxl shouldn't have its own state tracking for >>> max_pages, because max_pages is kept, maintained and enforced by Xen. >>> >>> Ian might still remember, but at the beginning of the xl/libxl project, >>> we had few simple design principles. One of which was that we should not >>> have two places where we keep track of the same thing. If Xen keeps >>> track of something, libxl should avoid it. >>> >>> In this specific case, libxl can ask Xen at any time what max_pages is >>> for the domain, so I don't think that libxl should store it or have its >>> own tracking for it. >>> >>> Even if QEMU called into libxl to change max_pages, I don't think that >>> libxl should store max_pages anywhere. It is already stored in Xen and >>> can be retrieved at any time. >>> >> I think you're talking about keep track of that record permanently. I >> only care about getting the right value at the right time and transfer >> it to the other end. Getting the value whenever needed is OK. >> >>>> 3. QEMU calls xc_domain_setmaxmem to increase max_pages by N pages. >>>> Those pages are only accounted for in the hypervisor. Libxl >>>> (currently) does not extract that value from the hypervisor. >>>> >>>> Several solutions were proposed: >>>> >>>> 1. Add a new record type in libxc migration stream and call setmaxmem >>>> in the middle of xc migration stream. >>>> >>>> Main objections are calling xc_domain_setmaxmem in the middle of xc >>>> migration stream is layer violation. Also this prevents us from >>>> disaggregating domain construction to a less privileged domain. >>> It seems to me that max_pages is one of the memory properties of the >>> domain, so it should be saved and restored together with the rest of >>> memory. >>> >> #1 was actually referring to Don's patch. When processing libxc record >> in the middle of the stream, we should not alter the size of memory. >> It's not safe because we don't know whether that record comes early >> enough before we exceed the limit. >> >> The only safe way of doing it is to mandate that specific record at >> the beginning of libxc stream, which might have other implications. > I see, that makes sense > > >>>> 2. Use libxl toolstack save restore blob to tranmit max pages >>>> information to remote end. >>>> >>>> This is considered a bodge and has been proven not to work because >>>> toolstack blob restore happens after xc_domain_restore. >>> Saving and restoring max_pages in libxl seems to me like a layering >>> violation. I would avoid 2. 3. 4. and 5. >>> >> No, not really. If we follow the principle of "libxl is the >> arbitrator". It needs to have thorough information on every aspect of >> the domain and set up limit. > I am not sure I buy this "libxl is the arbitrator" concept. I am not > seeing libxl as adding much value in this context. > > >>>> 3. Add a libxl layer that wraps necessary information, take over >>>> Andrew's work on libxl migration v2. Having a libxl layer that's not >>>> part of migration v2 is a waste of effort. >>>> >>>> There are several obstacles for libxl migration v2 at the moment. Libxl >>>> layer in migration v2 still has unresolved issues. It has >>>> inter-dependency with Remus / COLO. >>>> >>>> Most importantly it doesn't inherently solve the problem. It still >>>> requires the current libxl JSON blob to contain information about max >>>> pages (or information used to derive max pages). >>>> >>>> Andrew, correct me if I'm wrong. >>>> >>>> 4. Add a none user configurable field in current libxl JSON structure to >>>> record max pages information. >>>> >>>> This is not desirable. All fields in libxl JSON should be user >>>> configurable. >>>> >>>> 5. Add a user configurable field in current libxl JSON structure to >>>> record how much more memory this domain needs. Admin is required to >>>> fill in that value manually. In the mean time we revert the change in >>>> QEMU and declare QEMU with that change buggy. >>> QEMU 2.3.0 was released with that change in it, so it is not quite >>> possible to revert it. Also I think it is the right change for QEMU. >>> >> It has security implications. Here is my reply copied from my mail to >> Ian: >> >> I'm considering removing xc_domain_setmaxmem needs regardless of this >> bug because that's going to cause problem in QEMU upstream stubdom with >> strict XSM policy and deprivileged QEMU (may not have privilege to call >> setmaxmem). > QEMU running in the stubdom should be able to set the maxmem for its > target domain, but not for the others. At the moment, set_max_mem is the only method the toolstack has of putting a hard upper bound on a domains memory usage (along with a few others like the shadow mem size, and PoD cache.) In a disaggregated case, no deprivileged entity should be able to play with this limit. Being able to do so renders the security moot, as a compromised stubdom can force a host OOM. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.