[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen 4.3 + tmem = Xen BUG at domain_page.c:143



>>> On 12.06.13 at 13:00, George Dunlap <George.Dunlap@xxxxxxxxxxxxx> wrote:
> create ^
> title it map_domain_page second-stage emergency fallback path never taken
> thanks
> 
> On Tue, Jun 11, 2013 at 7:52 PM, konrad wilk <konrad.wilk@xxxxxxxxxx> wrote:
>>
>>> The BUG_ON() here is definitely valid - a few lines down, after the
>>> enclosing if(), we use it in ways that requires this to not have
>>> triggered. It basically tells you whether an in range idx was found,
>>> which apparently isn't the case here.
>>>
>>> As I think George already pointed out - printing accum here would
>>> be quite useful: It should have at least one of the low 32 bits set,
>>> given that dcache->entries must be at most 32 according to the
>>> data you already got logged.
>>
>>
>> With extra debugging (see attached patch)
>>
>> (XEN) domain_page.c:125:d1 mfn: 1eb483, [0]: bffff1ff, ~ffffffff40000e00,
>> idx: 9 garbage: 40000e00, inuse: ffffffff
>> (XEN) domain_page.c:125:d1 mfn: 1eb480, [0]: fdbfffff, ~ffffffff02400000,
>> idx: 22 garbage: 2400000, inuse: ffffffff
>> (XEN) domain_page.c:125:d1 mfn: 2067ca, [0]: fffff7ff, ~ffffffff00000800,
>> idx: 11 garbage: 800, inuse: ffffffff
>> (XEN) domain_page.c:125:d1 mfn: 183642, [0]: ffffffff, ~ffffffff00000000,
>> idx: 32 garbage: 0, inuse: ffffffff
> 
> So regardless of the fact that tmem is obviously holding what are
> supposed to be short-term references for so long, there is something
> that seems not quite right about this failure path.
> 
> It looks like the algorithm is:
> 1. Clean the garbage map and update the inuse list
> 2. If anything has been cleaned up, use the first not-inuse entry
> 3. Otherwise, do something else ("replace a hash entry" -- not sure
> exactly what that means).
> 
> What we see above is that this failure path succeeds three times, but
> fails the fourth time: there are, in fact, no zero entries after the
> garbage clean-up; however, because "inuse" is 32-bit (effectively) and
> "accum" is 64-bit, ~inuse always has bits 32-63 set, and so will
> always return true and never fall back to the "something else"

Right, that's what occurred to me too yesterday, but the again
I knew I had seen this code path executed. Now that I look again,
I think I understand why: All of my Dom0-s and typical DomU-s
have a vCPU count divisible by 4, and with MAPCACHE_VCPU_ENTRIES
being 16, the full unsigned long would always be used.

> This is probably not something we need to fix for 4.3, but we should
> put it on our to-do list.

Actually I think we should fix this right away.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.