[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen 4.3 + tmem = Xen BUG at domain_page.c:143



>>> On 11.06.13 at 23:06, konrad wilk <konrad.wilk@xxxxxxxxxx> wrote:
> On 6/11/2013 2:52 PM, konrad wilk wrote:
>> And that does point the picture that we have exhausted the full 32 
>> entries of mapcache.
>>
>> Now off to find out who is holding them and why. Aren't these 
>> operations (map/unmap domain_page) suppose to be shortlived?

Yes, they are.

> And found the culprit. With some EIP logging:
> [...]
> (XEN) domain_page.c:216:d1 [31] mfn=1eb4ed, 
> [tmh_persistent_pool_page_get+0x26d/0x2d8]
> 
> And a brief look at the code it looks as any calls to the xmalloc_pool 
> code ends up
> calling map_domain_page. Since most of the tmem code is using the pool 
> to store
> guest pages (looking briefly at tmem_malloc), this would explain why we 
> ran out of
> 32 slots. Especially as we don't free them until the guest puts the 
> persistent pages back.

Yes, this is (and never was) a valid use model for
map_domain_page(). What's really odd is the respective difference
between tmh_mempool_page_get() (using page_to_virt() on the
result of tmh_alloc_page()) and tmh_persistent_pool_page_get()
(using __map_domain_page() on what _tmh_alloc_page_thispool()
returned), while both allocation functions end up calling
alloc_domheap_page(). With Dan no longer around, it may be
hard to understand the reasons behind this brokenness.

As tmem gets disabled anyway when there is memory not covered
by the 1:1 mapping, switching tmh_persistent_pool_page_get() to
use page_to_virt() would appear to be the obvious immediate
solution. Re-enabling it on such huge memory systems is going to
require a re-design anyway afaict.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.