Xen project Mailing List

Re: [Xen-devel] [RFC][PATCH] 0/9 Populate-on-demand memory

To: "Tian, Kevin" <kevin.tian@xxxxxxxxx>

From: "George Dunlap" <George.Dunlap@xxxxxxxxxxxxx>

Date: Wed, 24 Dec 2008 14:42:56 +0000

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>

Delivery-date: Wed, 24 Dec 2008 06:43:21 -0800

Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references:x-google-sender-auth; b=S81B0kjidOVKicrr4g500cU6xdKQ5wCtlHEAe5lIJQNQueYoinv9HZQhBTej9kEFFi 57hpkk6jyzuUIv/QJ4erY6XiaDXtywVTQahuE+yfMzEZDz41f/FSKBxekFAmrMlyA2Lq z8g5cNAGdMJ41AbScLO3ggKMvb/7+drQ0KZzw=

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

On Wed, Dec 24, 2008 at 1:46 AM, Tian, Kevin <kevin.tian@xxxxxxxxx> wrote: >>* When the balloon driver loads, it inflates the balloon size to >>(maxmem - target), giving the memory back to Xen. When this is >>accomplished, the "populate-on-demand" portion of boot is effectively >>finished. >> > > Another tricky point could be with VT-d. If one guest page is used as > DMA target before balloon driver is installed, and no early access on > that page (like start-of-day scrubber), then PoD action will not be > triggered... > Not sure the possibility of such condition, but you may need to have > some thought or guard on that. em... after more thinking, actually PoD > pages may be alive even after balloon driver is installed. I guess before > coming up a solution you may add a check on whether target domain > has passthrough device to decide whether this feature is on on-the-fly. Hmm, I haven't looked at VT-d integration; it at least requires some examination. How are gfns translated to mfns for the VT-d hardware? Does it use the hardware EPT tables? Is the transaction re-startable if we get an EPT fault and then fix the EPT table? Any time gfn_to_mfn() is called, unless it's specifcally called with the "query" type, the gfn is populated. That's why qemu, the domain builder, &c work currently without any modifications. But if VT-d uses the EPT tables to translate requests for a guest in hardware, and the device requests can't be easily re-started after an EPT fault, then this won't work. A second issue is with the emergency sweep: if a page which happens to be zero ends up being the target of a DMA, we may get: * Device request to write to gfn X, which translates to mfn Y. * Demand-fault on gfn Z, with no pages in the cache. * Emergency sweep scans through gfn space, finds that mfn Y is empty. It replaces gfn X with a PoD entry, and puts mfn Y behind gfn Z. * The request finishes. Either the request then fails (because EPT translation for gfn X is not valid anymore), or it silently succeeds in writing to mfn Y, which is now behind gfn Z instead of gfn X. If we can't tell that there's an outstanding I/O on the page, then we can't do an emergency sweep. If we have some way of knowing that there's *some* outstanding I/O to *some* page, we could pause the guest until the I/O completes, then do the sweep. At any rate, until we have that worked out, we should probably add some "seatbelt" code to make sure that people don't use PoD for a VT-d enabled domain. I know absolutely nothing about the VT-d code; could you either write a patch to do this check, or give me an idea of the simplest thing to check? >>NB that this code is designed to work only in conjunction with a >>balloon driver. If the balloon driver is not loaded, eventually all >>pages will be dirtied (non-zero), the emergency sweep will fail, and >>there will be no memory to back outstanding PoD pages. When this >>happens, the domain will crash. > > In that case, is it better to increase PoD target to configured max mem? > It looks uncomfortable to crash a domain just because some optimization > doesn't apply. :-) If this happened, it wouldn't be because an optimization didn't apply, but because we purposely tried to use a feature for which a key component failed or wasn't properly in place. If we set up a domain with VT-d access on a box with no VT-d hardware, it would fail as well -- just during boot, not 5 minutes after it. :-) We could to allocate a new page at that point; but it's likely that the allocation will fail unless there happens to be memory lying around somewhere, not used by dom0 or any other doamin. And if that were the case, why not just start it with that much memory to begin with? The only way to make this more robust would be to pause the domain, send a message back to xend, have it try to balloon down domain 0 (or possibly other domains), increase the PoD cache size, and then unpause the domain again. This is not only a lot of work, but many of the failure modes will be really hard to handle; e.g., if qemu makes a hypercall that ends up doing a gfn_to_mfn() translation which fails, we would need to make that whole operation re-startable. I did look at this, but it's a ton of work, and a lot of code changes (including interface changes bewteen Xen and dom0 components), for a situation which really should never happen in a properly configured system. There's no reason that with a balloon driver which loads during boot, and a properly configured target (i.e., not unreasonably small), the driver shouldn't be able to quickly reach its target. > Last, do you have any performance data on how this patch may impact > the boot process, or even some workload after login? I do not have any solid numbers. Perceptually, I haven't noticed anything too slow. I'll do some simple benchmarks. -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.