[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [RFC] [PATCH] x86/mm: use wait queues for mem_paging
Hi, I'm about to repost this patch, with a few other changes and fixups. Further comments below. At 12:57 -0800 on 17 Feb (1329483430), Andres Lagar-Cavilla wrote: > > At 08:05 -0800 on 17 Feb (1329465959), Andres Lagar-Cavilla wrote: > >> Well, thanks for taking a stab at it. Looks fairly well. My main > >> observation is that we do not want to unconditionally go on a wait queue > >> everywhere. For example, the grant table code pretty reliable unwinds > >> itself and correctly tells the grant mapper to retry, in the presence of > >> a > >> paged page. > > > > The grant table code is unlikely to hit a paged-out gfn in the caller's > > address space. > > Oh yeah they do. At least for the target gfn's behind the grants. That's > why we need the retry loops in the PV backends in dom0. The target gfns aren't in the caller's p2m, though. > > The balloon code should not be trying to populate at all! > > Agreed. get_gfn_query is needed there. Nothing bad happens because we > immediately send drop_page. But it's wrong-ish. Fixed > >> Hence I had proposed introducing get_gfn_sleep, and I'd be happy if the > >> wait queue code is invoked only there. And then we can call > >> get_gfn_sleep > >> in vmx_load_pdptrs, in guest page table walks, and in __hvm_copy. > > > > I see. In the longer term I definitely do not want get_gfn()'s callers > > to have to deal with p2m_is_paged(), and PoD, and sharing, and every > > other p2m hack we come up with. I want all that stuff hidden behind > > get_gfn_*(), with at most a flag to say 'fix up magic MM tricks'. > > > > I don't like get_gfn_sleep() very much but I'll see if I can find a way > > to do the equivalent with a flag to get_gfn_*(). But I suspect it will > > end up with a tri-state argument of: > > a) just give me the current state; > > b) try (without sleeping) to fix up paging, PoD and sharing but > > still maybe make me handle it myself by error propagation; and > > c) just fix it up. > > > > 'b' is a pretty weird interface. Specially given that in corner cases, > > 'make me handle it' might involve hitting a wait queue anyway. If we > > intend in the fullness of time to get rid of it (as I hope we would) > > then probably we shouldn't introduce it. (And if that means pushing > > this whole change out past 4.2, we should consider doing that.) > > > It's weird. b) is an evolutionary side-effect. It should disappear. I > guess what I'm arguing for is "it should disappear in the long term, not > on a patch so close to 4.2" Well, I tried making the 'but don't go to a waitq' flag; the implementation is trivial on top of the patches I'm about to post but the question of when to set it was so ugly I gave up. I have hit another stumbling block though - get_two_gfns() can't be safely ask get_gfn to populate or unshare if that might involve a waitq, since by definition one of its two calls is made with a lock held. I can't see a nice way of having it retry (not one that's guaranteed to make progress) without pulling the fixup-and-retry up into that function. I might do that in the next revision. > > Can't be soon enough for me. I've been sharpening the axe for that one > > for years now. :) > > What's stopping the death blow? People said they still need it. > >> You want to make sure the mfn is not valid. For p2m_ram_paging_out we > >> still have the mfn in place. > > > > Doesn't p2m_mem_paging_populate already DTRT in all these cases? If > > not, it really should. > > It's not about populate, it's about killing the domain unnecessarily in > the in_atomic check. Oh I see, thanks. Fixed. > >> >> + > >> >> + /* Safety catch: it _should_ be safe to wait here but if > >> it's > >> >> not, > >> >> + * crash the VM, not the host */ > >> >> + if ( in_atomic() ) > >> >> + { > >> >> + WARN(); > >> >> + domain_crash(p2m->domain); > >> >> + } > >> >> + else > >> >> + { > >> >> + current->arch.mem_paging_gfn = gfn; > >> >> + wait_event(current->arch.mem_paging_wq, ({ > >> >> + mfn = p2m->get_entry(p2m, gfn, t, a, > >> >> + p2m_query, page_order); > >> >> + (*t != p2m_ram_paging_in); > >> > >> Well, first of all mfn->get_entry will not happen in a locked context, > >> so > >> you will exit the wait loop not holding the p2m lock and crash later. > > > > Yes - we always exit the loop without the lock; then we 'goto again' and > > do things properly. I realise it's a bit inefficient, but on the > > page-in path the p2m lookups shouldn't be the bottleneck. :) I did > > write the version that handled the locking in this loop but it got a bit > > twisty, plus we need the 'goto again' for other reasons (see below). > > Yes, but, you'll be reading the p2m in a racy manner. I don't think there are any real races here. Either we see p2m_paging_in (and someone will eventually wake us when it pages in) or we don't (and we go back and do the lookup safely). > > As a follow-up I ought to make the page-sharing fixup code that's just > > below this use 'goto again' as well; we shouldn't really leave this > > function until we get a lookup where all these cases are dealt with. On closer inspection the page-sharing version is OK because it doesn't release the lock, and the unshare function won't leave us with a paged-out page. > I think we won't be able to have a magic > get_gfn_solve_everything_silently() call until we find a solution to the > fact that get_gfn happens in locked contexts. Yes, I agree, and I'm coming round to the opinion that we might be too late to fix that in 4.2. :| > I've though about a wacky > idea that > 1. schedules a sleep-n-retry softirq > 2. fails the call > 3. avoids crashing the domain on the unwind path > 4. when executing softirqs pauses the vcpu as per 1. > 5. When woken up retries the last failed hypercall/vmexit/whatever > > I shouldn't have said that out loud... The first time I approached the idea of paging guest RAM (before Patrick started on the current system) I prototyped something not a million miles from this (basically, a sort of setjmp() on hypervisor entry) but the question of making all hypercalls and VMEXITs idempotent scared me off. :) > >> >> void p2m_mem_paging_populate(struct domain *d, unsigned long gfn) > >> >> { > >> >> struct vcpu *v = current; > >> >> - mem_event_request_t req; > >> >> + mem_event_request_t req = {0}; > >> >> p2m_type_t p2mt; > >> >> p2m_access_t a; > >> >> mfn_t mfn; > >> >> struct p2m_domain *p2m = p2m_get_hostp2m(d); > >> >> + int send_request = 0; > >> >> > >> >> /* We're paging. There should be a ring */ > >> >> int rc = mem_event_claim_slot(d, &d->mem_event->paging); > >> > >> Given all the optimizations around the send_request flag, it might be > >> worth moving the claim_slot call to an if(send_request) block at the > >> bottom, and get rid of cancel_slot altogether. Claim_slot could > >> potentially go (or cause someone else to go) to a wait queue, and it's > >> best to not be that rude. > > > > Sure. It might be tricky to make sure we unwind properly in the failure > > case, but I'll have a go at a follow-up patch that looks at that. > > > > (This is not really a regression in this patch AFAICS - in the existing > > code > > the get_gfn() callers end up calling this function anyway.) > > *After* they've put gfn. Not a regression, just cleaner code imo. This is also called after putting the gfn - but in any case I've reshuffled the mem_event_claim_slot in the new version. Cheers, Tim. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |