Xen project Mailing List

Re: [Xen-devel] [RFC] [PATCH] x86/mm: use wait queues for mem_paging

From: "Andres Lagar-Cavilla" <andres@xxxxxxxxxxxxxxxx>

Date: Fri, 17 Feb 2012 08:05:59 -0800

Cc: olaf@xxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxx, andres@xxxxxxxxxxxxxx, adin@xxxxxxxxxxxxxx

Delivery-date: Fri, 17 Feb 2012 16:06:19 +0000

Domainkey-signature: a=rsa-sha1; c=nofws; d=lagarcavilla.org; h=message-id :in-reply-to:references:date:subject:from:to:cc:reply-to :mime-version:content-type:content-transfer-encoding; q=dns; s= lagarcavilla.org; b=PnKmTxVJeGmMhC/dyrCT5v1lO9hcfK7Eq7pPUTrNuo4e PosT4YENDAChSl+Afz3svW/xosMdIaHRXEoTwo3LHdzWjUUcb/JfdsIeVsBXx7Zk Oo0rHlsREUTOa9mDTXT+q7hOJuxp/+yCdGSQs7AdGS5fQAnFzOiNy+0PB6VN+tA=

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

> > On Feb 16, 2012, at 10:20 AM, Tim Deegan wrote: > >> Hi, >> >> As promised, a revised version of the waitq-on-missing-pages patch. >> I've cut it back a fair bit; I think the resulting patch is maybe a bit >> less efficient, but easier to understand. >> >> I've smoke-tested it a bit and haven't seen anything catch fire but I >> suspect the VMs aren't very happy. I'll debug more throughly when I'm >> back in the office next week, but would really like some feedback from >> the rest of you in the meantime. Well, thanks for taking a stab at it. Looks fairly well. My main observation is that we do not want to unconditionally go on a wait queue everywhere. For example, the grant table code pretty reliable unwinds itself and correctly tells the grant mapper to retry, in the presence of a paged page. The same could be said of emulation (X86_EMUL_RETRY). And certainly the balloon code should not sleep waiting for populate! Hence I had proposed introducing get_gfn_sleep, and I'd be happy if the wait queue code is invoked only there. And then we can call get_gfn_sleep in vmx_load_pdptrs, in guest page table walks, and in __hvm_copy. I understand that is cleaner not to have to make that distinction, but I don't want to fix things that aren't broken :) More comments inline >> >> Cheers, >> >> Tim. >> >> # HG changeset patch >> # User Tim Deegan <tim@xxxxxxx> >> # Parent 01c1bcbc62224833304ede44187400f65e8a6b4c >> [RFC] x86/mm: use wait queues for mem_paging >> >> Use a wait queue to put a guest vcpu to sleep while the requested gfn is >> in paging state. This adds missing p2m_mem_paging_populate() calls to >> some callers of the new get_gfn* variants, which would crash now >> because they get an invalid mfn. It also fixes guest crashes due to >> unexpected returns from do_memory_op because copy_to/from_guest ran into >> a paged gfn. Now those places will always get a valid mfn. >> >> This is based on an earlier RFC patch by Olaf Hering, but heavily >> simplified (removing a per-gfn queue of waiting vcpus in favour of >> a scan of all vcpus on page-in). >> >> Signed-off-by: Olaf Hering <olaf@xxxxxxxxx> >> Signed-off-by: Tim Deegan <tim@xxxxxxx> >> >> diff -r 01c1bcbc6222 xen/arch/x86/mm/p2m.c >> --- a/xen/arch/x86/mm/p2m.c Thu Feb 16 08:48:23 2012 +0100 >> +++ b/xen/arch/x86/mm/p2m.c Thu Feb 16 14:39:13 2012 +0000 >> @@ -160,13 +160,49 @@ mfn_t __get_gfn_type_access(struct p2m_d >> } >> >> /* For now only perform locking on hap domains */ >> - if ( locked && (hap_enabled(p2m->domain)) ) >> + locked = locked && hap_enabled(p2m->domain); >> + >> +#ifdef __x86_64__ When are we getting rid of 32 bit hypervisor? (half kidding. Only half) >> +again: >> +#endif >> + if ( locked ) >> /* Grab the lock here, don't release until put_gfn */ >> gfn_lock(p2m, gfn, 0); >> >> mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order); >> >> #ifdef __x86_64__ >> + if ( p2m_is_paging(*t) && q != p2m_query >> + && p2m->domain == current->domain ) >> + { >> + if ( locked ) >> + gfn_unlock(p2m, gfn, 0); >> + >> + /* Ping the pager */ >> + if ( *t == p2m_ram_paging_out || *t == p2m_ram_paged ) >> + p2m_mem_paging_populate(p2m->domain, gfn); You want to make sure the mfn is not valid. For p2m_ram_paging_out we still have the mfn in place. >> + >> + /* Safety catch: it _should_ be safe to wait here but if it's >> not, >> + * crash the VM, not the host */ >> + if ( in_atomic() ) >> + { >> + WARN(); >> + domain_crash(p2m->domain); >> + } >> + else >> + { >> + current->arch.mem_paging_gfn = gfn; >> + wait_event(current->arch.mem_paging_wq, ({ >> + mfn = p2m->get_entry(p2m, gfn, t, a, >> + p2m_query, page_order); >> + (*t != p2m_ram_paging_in); Well, first of all mfn->get_entry will not happen in a locked context, so you will exit the wait loop not holding the p2m lock and crash later. So you want __get_gfn_type_access with q = p2m_query, and then put_gfn if the condition fails. Probably not gonna fit in a ({}) block ;) I think checking for (!p2m_is_paging(*t)) is more appropriate. Or even (!p2m_is_paging(*t) && mfn_valid(mfn)). Although I'm mildly hesitant about the latter because conceivably another vcpu may beat us to grabbing the p2m lock after the gfn is populated, and do something to its mfn. >> + })); >> + goto again; >> + } >> + } >> +#endif >> + >> +#ifdef __x86_64__ >> if ( q == p2m_unshare && p2m_is_shared(*t) ) >> { >> ASSERT(!p2m_is_nestedp2m(p2m)); >> @@ -942,25 +978,25 @@ void p2m_mem_paging_drop_page(struct dom >> * This function needs to be called whenever gfn_to_mfn() returns any of >> the p2m >> * paging types because the gfn may not be backed by a mfn. >> * >> - * The gfn can be in any of the paging states, but the pager needs only >> be >> - * notified when the gfn is in the paging-out path (paging_out or >> paged). This >> - * function may be called more than once from several vcpus. If the >> vcpu belongs >> - * to the guest, the vcpu must be stopped and the pager notified that >> the vcpu >> - * was stopped. The pager needs to handle several requests for the same >> gfn. >> + * The gfn can be in any of the paging states, but the pager needs only >> + * be notified when the gfn is in the paging-out path (paging_out or >> + * paged). This function may be called more than once from several >> + * vcpus. The pager needs to handle several requests for the same gfn. >> * >> - * If the gfn is not in the paging-out path and the vcpu does not >> belong to the >> - * guest, nothing needs to be done and the function assumes that a >> request was >> - * already sent to the pager. In this case the caller has to try again >> until the >> - * gfn is fully paged in again. >> + * If the gfn is not in the paging-out path nothing needs to be done >> and >> + * the function assumes that a request was already sent to the pager. >> + * In this case the caller has to try again until the gfn is fully >> paged >> + * in again. >> */ >> void p2m_mem_paging_populate(struct domain *d, unsigned long gfn) >> { >> struct vcpu *v = current; >> - mem_event_request_t req; >> + mem_event_request_t req = {0}; >> p2m_type_t p2mt; >> p2m_access_t a; >> mfn_t mfn; >> struct p2m_domain *p2m = p2m_get_hostp2m(d); >> + int send_request = 0; >> >> /* We're paging. There should be a ring */ >> int rc = mem_event_claim_slot(d, &d->mem_event->paging); Given all the optimizations around the send_request flag, it might be worth moving the claim_slot call to an if(send_request) block at the bottom, and get rid of cancel_slot altogether. Claim_slot could potentially go (or cause someone else to go) to a wait queue, and it's best to not be that rude. >> @@ -974,9 +1010,6 @@ void p2m_mem_paging_populate(struct doma >> else if ( rc < 0 ) >> return; >> >> - memset(&req, 0, sizeof(req)); >> - req.type = MEM_EVENT_TYPE_PAGING; >> - >> /* Fix p2m mapping */ >> gfn_lock(p2m, gfn, 0); >> mfn = p2m->get_entry(p2m, gfn, &p2mt, &a, p2m_query, NULL); >> @@ -986,26 +1019,29 @@ void p2m_mem_paging_populate(struct doma >> /* Evict will fail now, tag this request for pager */ >> if ( p2mt == p2m_ram_paging_out ) >> req.flags |= MEM_EVENT_FLAG_EVICT_FAIL; >> - >> - set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K, p2m_ram_paging_in, >> a); >> + if ( p2mt == p2m_ram_paging_out && mfn_valid(mfn) && v->domain >> == d ) >> + /* Short-cut back to paged-in state (but not for foreign >> + * mappings, or the pager couldn't map it to page it out) >> */ >> + set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K, >> + paging_mode_log_dirty(d) >> + ? p2m_ram_logdirty : p2m_ram_rw, a); >> + else >> + { >> + set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K, >> p2m_ram_paging_in, a); >> + send_request = 1; >> + } >> } >> gfn_unlock(p2m, gfn, 0); >> >> - /* Pause domain if request came from guest and gfn has paging type >> */ >> - if ( p2m_is_paging(p2mt) && v->domain == d ) >> + /* No need to inform pager if the gfn is not in the page-out path >> */ >> + if ( !send_request ) >> { >> - vcpu_pause_nosync(v); I see no sin in stacking vcpu_pauses. Plus, this will need to stay if we (hopefully) adopt get_gfn_sleep. >> - req.flags |= MEM_EVENT_FLAG_VCPU_PAUSED; >> - } >> - /* No need to inform pager if the gfn is not in the page-out path >> */ >> - else if ( p2mt != p2m_ram_paging_out && p2mt != p2m_ram_paged ) >> - { >> - /* gfn is already on its way back and vcpu is not paused */ >> mem_event_cancel_slot(d, &d->mem_event->paging); >> return; >> } >> >> /* Send request to pager */ >> + req.type = MEM_EVENT_TYPE_PAGING; >> req.gfn = gfn; >> req.p2mt = p2mt; >> req.vcpu_id = v->vcpu_id; >> @@ -1122,6 +1158,7 @@ void p2m_mem_paging_resume(struct domain >> { >> struct p2m_domain *p2m = p2m_get_hostp2m(d); >> mem_event_response_t rsp; >> + struct vcpu *v; >> p2m_type_t p2mt; >> p2m_access_t a; >> mfn_t mfn; >> @@ -1147,9 +1184,10 @@ void p2m_mem_paging_resume(struct domain >> } >> gfn_unlock(p2m, rsp.gfn, 0); >> } >> - /* Unpause domain */ >> - if ( rsp.flags & MEM_EVENT_FLAG_VCPU_PAUSED ) >> - vcpu_unpause(d->vcpu[rsp.vcpu_id]); I see no sin in stacking vcpu pauses, redux... That's all I have. Thanks! Andres >> + /* Wake any vcpus that were waiting for this GFN */ >> + for_each_vcpu ( d, v ) >> + if ( v->arch.mem_paging_gfn == rsp.gfn ) >> + wake_up_all(&v->arch.mem_paging_wq); >> } >> } >> >> diff -r 01c1bcbc6222 xen/include/asm-x86/domain.h >> --- a/xen/include/asm-x86/domain.h Thu Feb 16 08:48:23 2012 +0100 >> +++ b/xen/include/asm-x86/domain.h Thu Feb 16 14:39:13 2012 +0000 >> @@ -4,6 +4,7 @@ >> #include <xen/config.h> >> #include <xen/mm.h> >> #include <xen/radix-tree.h> >> +#include <xen/wait.h> >> #include <asm/hvm/vcpu.h> >> #include <asm/hvm/domain.h> >> #include <asm/e820.h> >> @@ -491,6 +492,12 @@ struct arch_vcpu >> >> struct paging_vcpu paging; >> >> +#ifdef CONFIG_X86_64 >> + /* Mem-paging: this vcpu is waiting for a gfn to be paged in */ >> + struct waitqueue_head mem_paging_wq; >> + unsigned long mem_paging_gfn; >> +#endif >> + >> #ifdef CONFIG_X86_32 >> /* map_domain_page() mapping cache. */ >> struct mapcache_vcpu mapcache; > > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.