Xen project Mailing List

Re: [Xen-devel] [PATCH 2/4] x86/PoD: Identify when a domain has already been killed from PoD exhaustion

To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

From: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>

Date: Tue, 24 Nov 2015 16:51:32 +0000

Cc: George Dunlap <george.dunlap@xxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxx>

Delivery-date: Tue, 24 Nov 2015 16:51:39 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Mon, Nov 2, 2015 at 2:32 PM, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote: > On 02/11/15 14:07, George Dunlap wrote: >> On 30/10/15 18:33, Andrew Cooper wrote: >>> p2m_pod_demand_populate() can be entered repeatedly during a single path >>> through the hypervisor, e.g. on a toolstack batch map operation. >>> >>> The domain might be crashed, but the interface currently lacks a way of >>> passing an error back through the generic p2m layer. >>> >>> Longterm the p2m layer needs reworking to allow errors to be returned, but >>> in >>> the short term, avoid repeatedly re-sweeping the domain after it has already >>> been crashed from PoD exhaustion. >>> >>> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> >>> --- >>> CC: Jan Beulich <JBeulich@xxxxxxxx> >>> CC: George Dunlap <george.dunlap@xxxxxxxxxxxxx> >>> --- >>> xen/arch/x86/mm/p2m-pod.c | 3 ++- >>> xen/include/asm-x86/p2m.h | 2 ++ >>> 2 files changed, 4 insertions(+), 1 deletion(-) >>> >>> diff --git a/xen/arch/x86/mm/p2m-pod.c b/xen/arch/x86/mm/p2m-pod.c >>> index be15cf3..6fb054f 100644 >>> --- a/xen/arch/x86/mm/p2m-pod.c >>> +++ b/xen/arch/x86/mm/p2m-pod.c >>> @@ -1048,7 +1048,7 @@ p2m_pod_demand_populate(struct p2m_domain *p2m, >>> unsigned long gfn, >>> /* This check is done with the pod lock held. This will make sure that >>> * even if d->is_dying changes under our feet, p2m_pod_empty_cache() >>> * won't start until we're done. */ >>> - if ( unlikely(d->is_dying) ) >>> + if ( unlikely(d->is_dying) || p2m->pod.dead ) >> So after getting lost in a maze of twisty passages, it looks like >> "d->is_dying" might be the wrong thing to check here. d->is_dying is >> *only* set, AFAICT, in two places: >> - in domain_kill(), which is only called for XEN_DOMCTL_destroydomain >> - in domain_create(), if the creation failed for some reason. > > d->is_dying indicates that the domains resources are starting to be torn > down. It is the correct check here. > >> >> Would it make more sense to check d->is_shutting_down instead? > > A domain resume hypercall can clear d->is_shutting_down, making it an > inappropriate check. Well if domain_resume() cleared it, then the sweep would be run one more time and then crashed again, which wouldn't hurt too much. The real reason d->is_shutting_down is an inappropriate check is that there are lots of *other* reasons why a domain might shut down -- or even crash -- which should not prevent p2m sweeps from happening on the domain. The only legitimate reason to do no more sweeps, other than the final tear-down, is a failed PoD sweep. So I think I've convinced myself that something like this is (unfortunately) necessary. I do think it should have a more descriptive name -- "out_of_memory" or "sweep_failed" would be better than "dead". In some imaginary future where p2m functions have proper error handling, one could imagine the toolstack deciding to repopulate the PoD cache and clear this flag, allowing the domain to continue rather than crashing it. Thanks, -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.