[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC] [PATCH] x86/mm: use wait queues for mem_paging



Hi,

At 08:05 -0800 on 17 Feb (1329465959), Andres Lagar-Cavilla wrote:
> Well, thanks for taking a stab at it. Looks fairly well. My main
> observation is that we do not want to unconditionally go on a wait queue
> everywhere. For example, the grant table code pretty reliable unwinds
> itself and correctly tells the grant mapper to retry, in the presence of a
> paged page.

The grant table code is unlikely to hit a paged-out gfn in the caller's
address space.

> The same could be said of emulation (X86_EMUL_RETRY).

Fair enough.  But the emulator uses hvm_copy() in its callbacks anyway
we'd need another flag to say 'special handling for p2m fixups plz' in
all the hvm_copy() interfaces.  Unpleasant!

> And certainly the balloon code should not sleep waiting for populate!

The balloon code should not be trying to populate at all!

> Hence I had proposed introducing get_gfn_sleep, and I'd be happy if the
> wait queue code is invoked only there. And then we can call get_gfn_sleep
> in vmx_load_pdptrs, in guest page table walks, and in __hvm_copy.

I see.  In the longer term I definitely do not want get_gfn()'s callers
to have to deal with p2m_is_paged(), and PoD, and sharing, and every
other p2m hack we come up with.  I want all that stuff hidden behind
get_gfn_*(), with at most a flag to say 'fix up magic MM tricks'.

I don't like get_gfn_sleep() very much but I'll see if I can find a way
to do the equivalent with a flag to get_gfn_*().  But I suspect it will
end up with a tri-state argument of:
 a) just give me the current state;
 b) try (without sleeping) to fix up paging, PoD and sharing but 
    still maybe make me handle it myself by error propagation; and
 c) just fix it up.

'b' is a pretty weird interface.  Specially given that in corner cases,
'make me handle it' might involve hitting a wait queue anyway.  If we
intend in the fullness of time to get rid of it (as I hope we would)
then probably we shouldn't introduce it.  (And if that means pushing
this whole change out past 4.2, we should consider doing that.)

> >> diff -r 01c1bcbc6222 xen/arch/x86/mm/p2m.c
> >> --- a/xen/arch/x86/mm/p2m.c        Thu Feb 16 08:48:23 2012 +0100
> >> +++ b/xen/arch/x86/mm/p2m.c        Thu Feb 16 14:39:13 2012 +0000
> >> @@ -160,13 +160,49 @@ mfn_t __get_gfn_type_access(struct p2m_d
> >>     }
> >>
> >>     /* For now only perform locking on hap domains */
> >> -    if ( locked && (hap_enabled(p2m->domain)) )
> >> +    locked = locked && hap_enabled(p2m->domain);
> >> +
> >> +#ifdef __x86_64__
> When are we getting rid of 32 bit hypervisor? (half kidding. Only half)

Can't be soon enough for me.  I've been sharpening the axe for that one
for years now. :)

> >> +again:
> >> +#endif
> >> +    if ( locked )
> >>         /* Grab the lock here, don't release until put_gfn */
> >>         gfn_lock(p2m, gfn, 0);
> >>
> >>     mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order);
> >>
> >> #ifdef __x86_64__
> >> +    if ( p2m_is_paging(*t) && q != p2m_query
> >> +         && p2m->domain == current->domain )
> >> +    {
> >> +        if ( locked )
> >> +            gfn_unlock(p2m, gfn, 0);
> >> +
> >> +        /* Ping the pager */
> >> +        if ( *t == p2m_ram_paging_out || *t == p2m_ram_paged )
> >> +            p2m_mem_paging_populate(p2m->domain, gfn);
> 
> You want to make sure the mfn is not valid. For p2m_ram_paging_out we
> still have the mfn in place.

Doesn't p2m_mem_paging_populate already DTRT in all these cases?  If
not, it really should.

> >> +
> >> +        /* Safety catch: it _should_ be safe to wait here but if it's
> >> not,
> >> +         * crash the VM, not the host */
> >> +        if ( in_atomic() )
> >> +        {
> >> +            WARN();
> >> +            domain_crash(p2m->domain);
> >> +        }
> >> +        else
> >> +        {
> >> +            current->arch.mem_paging_gfn = gfn;
> >> +            wait_event(current->arch.mem_paging_wq, ({
> >> +                        mfn = p2m->get_entry(p2m, gfn, t, a,
> >> +                                             p2m_query, page_order);
> >> +                        (*t != p2m_ram_paging_in);
> 
> Well, first of all mfn->get_entry will not happen in a locked context, so
> you will exit the wait loop not holding the p2m lock and crash later.

Yes - we always exit the loop without the lock; then we 'goto again' and
do things properly.  I realise it's a bit inefficient, but on the
page-in path the p2m lookups shouldn't be the bottleneck. :)  I did
write the version that handled the locking in this loop but it got a bit
twisty, plus we need the 'goto again' for other reasons (see below).

As a follow-up I ought to make the page-sharing fixup code that's just
below this use 'goto again' as well; we shouldn't really leave this
function until we get a lookup where all these cases are dealt with.

> So you want __get_gfn_type_access with q = p2m_query, and then put_gfn if the
> condition fails. Probably not gonna fit in a ({}) block ;)
> 
> I think checking for (!p2m_is_paging(*t)) is more appropriate.

No, that was deliberate, and IIRC a change from Olaf's patch.  If the
page gets paged in and back out while we're asleep, the type goes back
to p2m_ram_paged, and in that case we _must_ break out and retry or this
vcpu might stall forever.

> >> void p2m_mem_paging_populate(struct domain *d, unsigned long gfn)
> >> {
> >>     struct vcpu *v = current;
> >> -    mem_event_request_t req;
> >> +    mem_event_request_t req = {0};
> >>     p2m_type_t p2mt;
> >>     p2m_access_t a;
> >>     mfn_t mfn;
> >>     struct p2m_domain *p2m = p2m_get_hostp2m(d);
> >> +    int send_request = 0;
> >>
> >>     /* We're paging. There should be a ring */
> >>     int rc = mem_event_claim_slot(d, &d->mem_event->paging);
> 
> Given all the optimizations around the send_request flag, it might be
> worth moving the claim_slot call to an if(send_request) block at the
> bottom, and get rid of cancel_slot altogether. Claim_slot could
> potentially go (or cause someone else to go) to a wait queue, and it's
> best to not be that rude.

Sure.  It might be tricky to make sure we unwind properly in the failure
case, but I'll have a go at a follow-up patch that looks at that.

(This is not really a regression in this patch AFAICS - in the existing code
the get_gfn() callers end up calling this function anyway.)

> >> -    /* Pause domain if request came from guest and gfn has paging type
> >> */
> >> -    if ( p2m_is_paging(p2mt) && v->domain == d )
> >> +    /* No need to inform pager if the gfn is not in the page-out path
> >> */
> >> +    if ( !send_request )
> >>     {
> >> -        vcpu_pause_nosync(v);
> 
> I see no sin in stacking vcpu_pauses. Plus, this will need to stay if we
> (hopefully) adopt get_gfn_sleep.

Yes, if we do it that way, this pause definitely needs to stay.  But if
not then the waitqueue takes care of this case entirely, and this code
is just redundant and confusing.

Cheers,

Tim.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.