[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 5/8] xen/vmap: allow vmap() to be called during early boot



On Tue, 2020-02-04 at 11:00 +0000, George Dunlap wrote:
> On Mon, Feb 3, 2020 at 4:37 PM David Woodhouse <dwmw2@xxxxxxxxxxxxx> wrote:
> > 
> > On Mon, 2020-02-03 at 14:00 +0000, Julien Grall wrote:
> > > Hi David,
> > > 
> > > On 01/02/2020 00:33, David Woodhouse wrote:
> > > > From: David Woodhouse <dwmw@xxxxxxxxxxxx>
> > > 
> > > I am a bit concerned with this change, particularly the consequence this
> > > have for the page-tables. There is an assumption that intermediate
> > > page-tables allocated via the boot allocator will never be freed.
> > > 
> > > On x86, a call to vunmap() will not free page-tables, but a subsequent
> > > call to vmap() may free it depending on the mapping size. So we would
> > > call free_domheap_pages() rather than init_heap_pages().
> > > 
> > > I am not entirely sure what is the full consequence, but I think this is
> > > a call for investigation and write it down a summary in the commit 
> > > message.
> > 
> > This isn't just about page tables, right? It's about *any* allocation
> > given out by the boot allocator, being freed with free_heap_pages() ?
> > 
> > Given the amount of code that has conditionals in both alloc and free
> > paths along the lines of…
> > 
> >   if (system_state > SYS_STATE_boot)
> >       use xenheap
> >   else
> >       use boot allocator
> > 
> > … I'm not sure I'd really trust the assumption that such a thing never
> > happens; that no pages are ever allocated from the boot allocator and
> > then freed into the heap.
> > 
> > In fact it does work fine except for some esoteric corner cases,
> > because init_heap_pages() is mostly just a trivial loop over
> > free_heap_pages().
> > 
> > The corner cases are if you call free_heap_pages() on boot-allocated
> > memory which matches one or more of the following criteria:
> > 
> >  • Includes MFN #0,
> > 
> >  • Includes the first page the heap has seen on a given node, so
> >    init_node_heap() has to be called, or
> > 
> >  • High-order allocations crossing from one node to another.
> 
> I was asked to forward a message relating to MFN 0 and allocations
> crossing zones from a private discussion on the security list:
> 
> 8<---
> 
> > I am having difficulty seeing how invalidating MFN0 would solve the issue 
> > here.
> > The zone number for a specific page is calculated from the most significant 
> > bit
> > position set in it's MFN. As a result, each successive zone contains an 
> > order of
> > magnitude more pages. You would need to invalidate the first or last MFN in 
> > each
> > zone.
> 
> Because (unless Jan and I are reading the code wrong):
> 
> * Chunks can only be merged such that they end up on order-boundaries.
> * Chunks can only be merged if they are the same order.
> * Zone boundaries are on order boundaries.
> 
> So say you're freeing mfn 0x100, and mfn 0xff is free.  In that loop, (1
> << order) & mfn will always be 0, so it will always only look "forward"
> fro things to merge, not backwards.
> 
> Suppose on the other hand, that you're freeing mfn 0x101, and 0x98
> through 0x100 are free.  The loop will look "backwards" and merge with
> 0x100; but then it will look "forwards" again.
> 
> Now suppose you've merged 0x100-0x1ff, and the order moves up to size
> 0x100.  Now the mask becomes 0x1ff; so it can't merge with 0x200-0x2ff
> (which would cross zones); instead it looks backwards to 0x0-0xff.
> 
> We don't think it's possible for things to be merged across zones unless
> it can (say) start at 0xff, and merge all the way back to 0x0; which
> can't be done if 0x0 is never on the free list.
> 
> That's the idea anyway.  That would explain why we've never seen it on
> x86 -- due to the way the architecture is, mfn 0 is never on the free list.
> 
> --->8

Thanks.

I still don't really get it. What if the zone boundary is at MFN 0x300?

What prevents the buddy allocator from merging a range a 0x200-0x2FF
with another from 0x300-0x3FF, creating a single range 0x200-0x400
which crosses nodes?

The MFN0 trick only works if all zone boundaries must be at an address
which is 2ⁿ, doesn't it? Is that always true?

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.