|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [RFC] [PATCH] x86/mm: use wait queues for mem_paging
>
> On Feb 16, 2012, at 10:20 AM, Tim Deegan wrote:
>
>> Hi,
>>
>> As promised, a revised version of the waitq-on-missing-pages patch.
>> I've cut it back a fair bit; I think the resulting patch is maybe a bit
>> less efficient, but easier to understand.
>>
>> I've smoke-tested it a bit and haven't seen anything catch fire but I
>> suspect the VMs aren't very happy. I'll debug more throughly when I'm
>> back in the office next week, but would really like some feedback from
>> the rest of you in the meantime.
Well, thanks for taking a stab at it. Looks fairly well. My main
observation is that we do not want to unconditionally go on a wait queue
everywhere. For example, the grant table code pretty reliable unwinds
itself and correctly tells the grant mapper to retry, in the presence of a
paged page. The same could be said of emulation (X86_EMUL_RETRY). And
certainly the balloon code should not sleep waiting for populate!
Hence I had proposed introducing get_gfn_sleep, and I'd be happy if the
wait queue code is invoked only there. And then we can call get_gfn_sleep
in vmx_load_pdptrs, in guest page table walks, and in __hvm_copy.
I understand that is cleaner not to have to make that distinction, but I
don't want to fix things that aren't broken :)
More comments inline
>>
>> Cheers,
>>
>> Tim.
>>
>> # HG changeset patch
>> # User Tim Deegan <tim@xxxxxxx>
>> # Parent 01c1bcbc62224833304ede44187400f65e8a6b4c
>> [RFC] x86/mm: use wait queues for mem_paging
>>
>> Use a wait queue to put a guest vcpu to sleep while the requested gfn is
>> in paging state. This adds missing p2m_mem_paging_populate() calls to
>> some callers of the new get_gfn* variants, which would crash now
>> because they get an invalid mfn. It also fixes guest crashes due to
>> unexpected returns from do_memory_op because copy_to/from_guest ran into
>> a paged gfn. Now those places will always get a valid mfn.
>>
>> This is based on an earlier RFC patch by Olaf Hering, but heavily
>> simplified (removing a per-gfn queue of waiting vcpus in favour of
>> a scan of all vcpus on page-in).
>>
>> Signed-off-by: Olaf Hering <olaf@xxxxxxxxx>
>> Signed-off-by: Tim Deegan <tim@xxxxxxx>
>>
>> diff -r 01c1bcbc6222 xen/arch/x86/mm/p2m.c
>> --- a/xen/arch/x86/mm/p2m.c Thu Feb 16 08:48:23 2012 +0100
>> +++ b/xen/arch/x86/mm/p2m.c Thu Feb 16 14:39:13 2012 +0000
>> @@ -160,13 +160,49 @@ mfn_t __get_gfn_type_access(struct p2m_d
>> }
>>
>> /* For now only perform locking on hap domains */
>> - if ( locked && (hap_enabled(p2m->domain)) )
>> + locked = locked && hap_enabled(p2m->domain);
>> +
>> +#ifdef __x86_64__
When are we getting rid of 32 bit hypervisor? (half kidding. Only half)
>> +again:
>> +#endif
>> + if ( locked )
>> /* Grab the lock here, don't release until put_gfn */
>> gfn_lock(p2m, gfn, 0);
>>
>> mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order);
>>
>> #ifdef __x86_64__
>> + if ( p2m_is_paging(*t) && q != p2m_query
>> + && p2m->domain == current->domain )
>> + {
>> + if ( locked )
>> + gfn_unlock(p2m, gfn, 0);
>> +
>> + /* Ping the pager */
>> + if ( *t == p2m_ram_paging_out || *t == p2m_ram_paged )
>> + p2m_mem_paging_populate(p2m->domain, gfn);
You want to make sure the mfn is not valid. For p2m_ram_paging_out we
still have the mfn in place.
>> +
>> + /* Safety catch: it _should_ be safe to wait here but if it's
>> not,
>> + * crash the VM, not the host */
>> + if ( in_atomic() )
>> + {
>> + WARN();
>> + domain_crash(p2m->domain);
>> + }
>> + else
>> + {
>> + current->arch.mem_paging_gfn = gfn;
>> + wait_event(current->arch.mem_paging_wq, ({
>> + mfn = p2m->get_entry(p2m, gfn, t, a,
>> + p2m_query, page_order);
>> + (*t != p2m_ram_paging_in);
Well, first of all mfn->get_entry will not happen in a locked context, so
you will exit the wait loop not holding the p2m lock and crash later. So
you want __get_gfn_type_access with q = p2m_query, and then put_gfn if the
condition fails. Probably not gonna fit in a ({}) block ;)
I think checking for (!p2m_is_paging(*t)) is more appropriate.
Or even (!p2m_is_paging(*t) && mfn_valid(mfn)). Although I'm mildly
hesitant about the latter because conceivably another vcpu may beat us to
grabbing the p2m lock after the gfn is populated, and do something to its
mfn.
>> + }));
>> + goto again;
>> + }
>> + }
>> +#endif
>> +
>> +#ifdef __x86_64__
>> if ( q == p2m_unshare && p2m_is_shared(*t) )
>> {
>> ASSERT(!p2m_is_nestedp2m(p2m));
>> @@ -942,25 +978,25 @@ void p2m_mem_paging_drop_page(struct dom
>> * This function needs to be called whenever gfn_to_mfn() returns any of
>> the p2m
>> * paging types because the gfn may not be backed by a mfn.
>> *
>> - * The gfn can be in any of the paging states, but the pager needs only
>> be
>> - * notified when the gfn is in the paging-out path (paging_out or
>> paged). This
>> - * function may be called more than once from several vcpus. If the
>> vcpu belongs
>> - * to the guest, the vcpu must be stopped and the pager notified that
>> the vcpu
>> - * was stopped. The pager needs to handle several requests for the same
>> gfn.
>> + * The gfn can be in any of the paging states, but the pager needs only
>> + * be notified when the gfn is in the paging-out path (paging_out or
>> + * paged). This function may be called more than once from several
>> + * vcpus. The pager needs to handle several requests for the same gfn.
>> *
>> - * If the gfn is not in the paging-out path and the vcpu does not
>> belong to the
>> - * guest, nothing needs to be done and the function assumes that a
>> request was
>> - * already sent to the pager. In this case the caller has to try again
>> until the
>> - * gfn is fully paged in again.
>> + * If the gfn is not in the paging-out path nothing needs to be done
>> and
>> + * the function assumes that a request was already sent to the pager.
>> + * In this case the caller has to try again until the gfn is fully
>> paged
>> + * in again.
>> */
>> void p2m_mem_paging_populate(struct domain *d, unsigned long gfn)
>> {
>> struct vcpu *v = current;
>> - mem_event_request_t req;
>> + mem_event_request_t req = {0};
>> p2m_type_t p2mt;
>> p2m_access_t a;
>> mfn_t mfn;
>> struct p2m_domain *p2m = p2m_get_hostp2m(d);
>> + int send_request = 0;
>>
>> /* We're paging. There should be a ring */
>> int rc = mem_event_claim_slot(d, &d->mem_event->paging);
Given all the optimizations around the send_request flag, it might be
worth moving the claim_slot call to an if(send_request) block at the
bottom, and get rid of cancel_slot altogether. Claim_slot could
potentially go (or cause someone else to go) to a wait queue, and it's
best to not be that rude.
>> @@ -974,9 +1010,6 @@ void p2m_mem_paging_populate(struct doma
>> else if ( rc < 0 )
>> return;
>>
>> - memset(&req, 0, sizeof(req));
>> - req.type = MEM_EVENT_TYPE_PAGING;
>> -
>> /* Fix p2m mapping */
>> gfn_lock(p2m, gfn, 0);
>> mfn = p2m->get_entry(p2m, gfn, &p2mt, &a, p2m_query, NULL);
>> @@ -986,26 +1019,29 @@ void p2m_mem_paging_populate(struct doma
>> /* Evict will fail now, tag this request for pager */
>> if ( p2mt == p2m_ram_paging_out )
>> req.flags |= MEM_EVENT_FLAG_EVICT_FAIL;
>> -
>> - set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K, p2m_ram_paging_in,
>> a);
>> + if ( p2mt == p2m_ram_paging_out && mfn_valid(mfn) && v->domain
>> == d )
>> + /* Short-cut back to paged-in state (but not for foreign
>> + * mappings, or the pager couldn't map it to page it out)
>> */
>> + set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K,
>> + paging_mode_log_dirty(d)
>> + ? p2m_ram_logdirty : p2m_ram_rw, a);
>> + else
>> + {
>> + set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K,
>> p2m_ram_paging_in, a);
>> + send_request = 1;
>> + }
>> }
>> gfn_unlock(p2m, gfn, 0);
>>
>> - /* Pause domain if request came from guest and gfn has paging type
>> */
>> - if ( p2m_is_paging(p2mt) && v->domain == d )
>> + /* No need to inform pager if the gfn is not in the page-out path
>> */
>> + if ( !send_request )
>> {
>> - vcpu_pause_nosync(v);
I see no sin in stacking vcpu_pauses. Plus, this will need to stay if we
(hopefully) adopt get_gfn_sleep.
>> - req.flags |= MEM_EVENT_FLAG_VCPU_PAUSED;
>> - }
>> - /* No need to inform pager if the gfn is not in the page-out path
>> */
>> - else if ( p2mt != p2m_ram_paging_out && p2mt != p2m_ram_paged )
>> - {
>> - /* gfn is already on its way back and vcpu is not paused */
>> mem_event_cancel_slot(d, &d->mem_event->paging);
>> return;
>> }
>>
>> /* Send request to pager */
>> + req.type = MEM_EVENT_TYPE_PAGING;
>> req.gfn = gfn;
>> req.p2mt = p2mt;
>> req.vcpu_id = v->vcpu_id;
>> @@ -1122,6 +1158,7 @@ void p2m_mem_paging_resume(struct domain
>> {
>> struct p2m_domain *p2m = p2m_get_hostp2m(d);
>> mem_event_response_t rsp;
>> + struct vcpu *v;
>> p2m_type_t p2mt;
>> p2m_access_t a;
>> mfn_t mfn;
>> @@ -1147,9 +1184,10 @@ void p2m_mem_paging_resume(struct domain
>> }
>> gfn_unlock(p2m, rsp.gfn, 0);
>> }
>> - /* Unpause domain */
>> - if ( rsp.flags & MEM_EVENT_FLAG_VCPU_PAUSED )
>> - vcpu_unpause(d->vcpu[rsp.vcpu_id]);
I see no sin in stacking vcpu pauses, redux...
That's all I have. Thanks!
Andres
>> + /* Wake any vcpus that were waiting for this GFN */
>> + for_each_vcpu ( d, v )
>> + if ( v->arch.mem_paging_gfn == rsp.gfn )
>> + wake_up_all(&v->arch.mem_paging_wq);
>> }
>> }
>>
>> diff -r 01c1bcbc6222 xen/include/asm-x86/domain.h
>> --- a/xen/include/asm-x86/domain.h Thu Feb 16 08:48:23 2012 +0100
>> +++ b/xen/include/asm-x86/domain.h Thu Feb 16 14:39:13 2012 +0000
>> @@ -4,6 +4,7 @@
>> #include <xen/config.h>
>> #include <xen/mm.h>
>> #include <xen/radix-tree.h>
>> +#include <xen/wait.h>
>> #include <asm/hvm/vcpu.h>
>> #include <asm/hvm/domain.h>
>> #include <asm/e820.h>
>> @@ -491,6 +492,12 @@ struct arch_vcpu
>>
>> struct paging_vcpu paging;
>>
>> +#ifdef CONFIG_X86_64
>> + /* Mem-paging: this vcpu is waiting for a gfn to be paged in */
>> + struct waitqueue_head mem_paging_wq;
>> + unsigned long mem_paging_gfn;
>> +#endif
>> +
>> #ifdef CONFIG_X86_32
>> /* map_domain_page() mapping cache. */
>> struct mapcache_vcpu mapcache;
>
>
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |