[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen 4.3 + tmem = Xen BUG at domain_page.c:143

>>> On 12.06.13 at 15:16, George Dunlap <george.dunlap@xxxxxxxxxxxxx> wrote:
> On 12/06/13 13:12, Jan Beulich wrote:
>>>>> On 12.06.13 at 13:00, George Dunlap <George.Dunlap@xxxxxxxxxxxxx> wrote:
>>> create ^
>>> title it map_domain_page second-stage emergency fallback path never taken
>>> thanks
>>> On Tue, Jun 11, 2013 at 7:52 PM, konrad wilk <konrad.wilk@xxxxxxxxxx> wrote:
>>>>> The BUG_ON() here is definitely valid - a few lines down, after the
>>>>> enclosing if(), we use it in ways that requires this to not have
>>>>> triggered. It basically tells you whether an in range idx was found,
>>>>> which apparently isn't the case here.
>>>>> As I think George already pointed out - printing accum here would
>>>>> be quite useful: It should have at least one of the low 32 bits set,
>>>>> given that dcache->entries must be at most 32 according to the
>>>>> data you already got logged.
>>>> With extra debugging (see attached patch)
>>>> (XEN) domain_page.c:125:d1 mfn: 1eb483, [0]: bffff1ff, ~ffffffff40000e00,
>>>> idx: 9 garbage: 40000e00, inuse: ffffffff
>>>> (XEN) domain_page.c:125:d1 mfn: 1eb480, [0]: fdbfffff, ~ffffffff02400000,
>>>> idx: 22 garbage: 2400000, inuse: ffffffff
>>>> (XEN) domain_page.c:125:d1 mfn: 2067ca, [0]: fffff7ff, ~ffffffff00000800,
>>>> idx: 11 garbage: 800, inuse: ffffffff
>>>> (XEN) domain_page.c:125:d1 mfn: 183642, [0]: ffffffff, ~ffffffff00000000,
>>>> idx: 32 garbage: 0, inuse: ffffffff
>>> So regardless of the fact that tmem is obviously holding what are
>>> supposed to be short-term references for so long, there is something
>>> that seems not quite right about this failure path.
>>> It looks like the algorithm is:
>>> 1. Clean the garbage map and update the inuse list
>>> 2. If anything has been cleaned up, use the first not-inuse entry
>>> 3. Otherwise, do something else ("replace a hash entry" -- not sure
>>> exactly what that means).
>>> What we see above is that this failure path succeeds three times, but
>>> fails the fourth time: there are, in fact, no zero entries after the
>>> garbage clean-up; however, because "inuse" is 32-bit (effectively) and
>>> "accum" is 64-bit, ~inuse always has bits 32-63 set, and so will
>>> always return true and never fall back to the "something else"
>> Right, that's what occurred to me too yesterday, but the again
>> I knew I had seen this code path executed. Now that I look again,
>> I think I understand why: All of my Dom0-s and typical DomU-s
>> have a vCPU count divisible by 4, and with MAPCACHE_VCPU_ENTRIES
>> being 16, the full unsigned long would always be used.
>>> This is probably not something we need to fix for 4.3, but we should
>>> put it on our to-do list.
>> Actually I think we should fix this right away.
> How often is the second path taken in practice?

On non-debug builds, not at all except on systems with more than
5Tb (as all of the map_domain_page() code).

Once domain page mappings are needed, this depends on the use
pattern of the function. In any case this is going to be way more
frequent than the one time per-vCPU setup.

> And, you said this doesn't happen with debug=n builds -- why not exactly?

That was with the user (tmem) in mind: For <= 5Tb systems, as
said above map_domain_page() has a short cut. And for >5Tb
systems tmem gets turned off.

But any other users of the function could still run into this on
huge memory systems, and with this being one of the listed new
features of 4.3 I think we should fix it.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.