[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Xen 4.3 + tmem = Xen BUG at domain_page.c:143
On 12/06/13 13:12, Jan Beulich wrote: On 12.06.13 at 13:00, George Dunlap <George.Dunlap@xxxxxxxxxxxxx> wrote:create ^ title it map_domain_page second-stage emergency fallback path never taken thanks On Tue, Jun 11, 2013 at 7:52 PM, konrad wilk <konrad.wilk@xxxxxxxxxx> wrote:The BUG_ON() here is definitely valid - a few lines down, after the enclosing if(), we use it in ways that requires this to not have triggered. It basically tells you whether an in range idx was found, which apparently isn't the case here. As I think George already pointed out - printing accum here would be quite useful: It should have at least one of the low 32 bits set, given that dcache->entries must be at most 32 according to the data you already got logged.With extra debugging (see attached patch) (XEN) domain_page.c:125:d1 mfn: 1eb483, [0]: bffff1ff, ~ffffffff40000e00, idx: 9 garbage: 40000e00, inuse: ffffffff (XEN) domain_page.c:125:d1 mfn: 1eb480, [0]: fdbfffff, ~ffffffff02400000, idx: 22 garbage: 2400000, inuse: ffffffff (XEN) domain_page.c:125:d1 mfn: 2067ca, [0]: fffff7ff, ~ffffffff00000800, idx: 11 garbage: 800, inuse: ffffffff (XEN) domain_page.c:125:d1 mfn: 183642, [0]: ffffffff, ~ffffffff00000000, idx: 32 garbage: 0, inuse: ffffffffSo regardless of the fact that tmem is obviously holding what are supposed to be short-term references for so long, there is something that seems not quite right about this failure path. It looks like the algorithm is: 1. Clean the garbage map and update the inuse list 2. If anything has been cleaned up, use the first not-inuse entry 3. Otherwise, do something else ("replace a hash entry" -- not sure exactly what that means). What we see above is that this failure path succeeds three times, but fails the fourth time: there are, in fact, no zero entries after the garbage clean-up; however, because "inuse" is 32-bit (effectively) and "accum" is 64-bit, ~inuse always has bits 32-63 set, and so will always return true and never fall back to the "something else"Right, that's what occurred to me too yesterday, but the again I knew I had seen this code path executed. Now that I look again, I think I understand why: All of my Dom0-s and typical DomU-s have a vCPU count divisible by 4, and with MAPCACHE_VCPU_ENTRIES being 16, the full unsigned long would always be used.This is probably not something we need to fix for 4.3, but we should put it on our to-do list.Actually I think we should fix this right away. How often is the second path taken in practice? And, you said this doesn't happen with debug=n builds -- why not exactly?I'm trying to assess the actual risk of not fixing it, vs the risk of fixing it. -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |