[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] 4.11.0 RC1 panic
>>> On 22.05.18 at 13:01, <bouyer@xxxxxxxxxxxxxxx> wrote: > On Tue, May 15, 2018 at 03:30:17AM -0600, Jan Beulich wrote: >> - reduce the test environment (ideally to a simple [XTF?] test), or >> - at least narrow the conditions, or > > Now that I know where to find the domU number in the panic message, > I can say that, so far, only 32bit domUs have caused this assert failure. > >> - at the very least summarize the relevant actions NetBSD takes in >> terms of page table management, to hopefully reduce the sets of >> code paths potentially involved (for example, across a larger set of >> crashes knowing whether UNPIN is always involved would be >> helpful; I've been blindly assuming it would be short of having >> further data) > > So far I've seen 2 stack traces with 4.11: > (XEN) Xen call trace: > (XEN) [<ffff82d080284bd2>] mm.c#dec_linear_entries+0x12/0x20 > (XEN) [<ffff82d08028922e>] mm.c#_put_page_type+0x13e/0x350 > (XEN) [<ffff82d08023a00d>] _spin_lock+0xd/0x50 > (XEN) [<ffff82d0802898af>] mm.c#put_page_from_l2e+0xdf/0x110 > (XEN) [<ffff82d080288c59>] free_page_type+0x2f9/0x790 > (XEN) [<ffff82d0802891f7>] mm.c#_put_page_type+0x107/0x350 > (XEN) [<ffff82d0802898ef>] put_page_type_preemptible+0xf/0x10 > (XEN) [<ffff82d080272adb>] domain.c#relinquish_memory+0xab/0x460 > (XEN) [<ffff82d080276ae3>] domain_relinquish_resources+0x203/0x290 > (XEN) [<ffff82d0802068bd>] domain_kill+0xbd/0x150 > (XEN) [<ffff82d0802039e3>] do_domctl+0x7d3/0x1a90 > (XEN) [<ffff82d080203210>] do_domctl+0/0x1a90 > (XEN) [<ffff82d080367b95>] pv_hypercall+0x1f5/0x430 > (XEN) [<ffff82d08036e422>] lstar_enter+0xa2/0x120 > (XEN) [<ffff82d08036e42e>] lstar_enter+0xae/0x120 > (XEN) [<ffff82d08036e422>] lstar_enter+0xa2/0x120 > (XEN) [<ffff82d08036e42e>] lstar_enter+0xae/0x120 > (XEN) [<ffff82d08036e422>] lstar_enter+0xa2/0x120 > (XEN) [<ffff82d08036e42e>] lstar_enter+0xae/0x120 > (XEN) [<ffff82d08036e48c>] lstar_enter+0x10c/0x120 That's interesting: So far I've been working with the assumption that there would be a race of the put_page_from_l2e() with some other piece of code. The issue happening out of domain_relinquish_resources() pretty much excludes this, and instead suggests that such a race (if there is one in the first place, but you seeing this only sporadically highly suggests so) would sit somewhere earlier, perhaps when the page gets established as a recursive L2 one. Unless someone else gets to this earlier than me, I'll have to go through the related code another time with this property in mind. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |