[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] 4.11.0 RC1 panic



>>> On 22.05.18 at 13:01, <bouyer@xxxxxxxxxxxxxxx> wrote:
> On Tue, May 15, 2018 at 03:30:17AM -0600, Jan Beulich wrote:
>> - reduce the test environment (ideally to a simple [XTF?] test), or
>> - at least narrow the conditions, or
> 
> Now that I know where to find the domU number in the panic message,
> I can say that, so far, only 32bit domUs have caused this assert failure.
> 
>> - at the very least summarize the relevant actions NetBSD takes in
>>   terms of page table management, to hopefully reduce the sets of
>>   code paths potentially involved (for example, across a larger set of
>>   crashes knowing whether UNPIN is always involved would be
>>   helpful; I've been blindly assuming it would be short of having
>>   further data)
> 
> So far I've seen 2 stack traces with 4.11:
> (XEN) Xen call trace:
> (XEN)    [<ffff82d080284bd2>] mm.c#dec_linear_entries+0x12/0x20
> (XEN)    [<ffff82d08028922e>] mm.c#_put_page_type+0x13e/0x350
> (XEN)    [<ffff82d08023a00d>] _spin_lock+0xd/0x50
> (XEN)    [<ffff82d0802898af>] mm.c#put_page_from_l2e+0xdf/0x110
> (XEN)    [<ffff82d080288c59>] free_page_type+0x2f9/0x790
> (XEN)    [<ffff82d0802891f7>] mm.c#_put_page_type+0x107/0x350
> (XEN)    [<ffff82d0802898ef>] put_page_type_preemptible+0xf/0x10
> (XEN)    [<ffff82d080272adb>] domain.c#relinquish_memory+0xab/0x460
> (XEN)    [<ffff82d080276ae3>] domain_relinquish_resources+0x203/0x290
> (XEN)    [<ffff82d0802068bd>] domain_kill+0xbd/0x150
> (XEN)    [<ffff82d0802039e3>] do_domctl+0x7d3/0x1a90
> (XEN)    [<ffff82d080203210>] do_domctl+0/0x1a90
> (XEN)    [<ffff82d080367b95>] pv_hypercall+0x1f5/0x430
> (XEN)    [<ffff82d08036e422>] lstar_enter+0xa2/0x120
> (XEN)    [<ffff82d08036e42e>] lstar_enter+0xae/0x120
> (XEN)    [<ffff82d08036e422>] lstar_enter+0xa2/0x120
> (XEN)    [<ffff82d08036e42e>] lstar_enter+0xae/0x120
> (XEN)    [<ffff82d08036e422>] lstar_enter+0xa2/0x120
> (XEN)    [<ffff82d08036e42e>] lstar_enter+0xae/0x120
> (XEN)    [<ffff82d08036e48c>] lstar_enter+0x10c/0x120

That's interesting: So far I've been working with the assumption that
there would be a race of the put_page_from_l2e() with some other
piece of code. The issue happening out of domain_relinquish_resources()
pretty much excludes this, and instead suggests that such a race (if
there is one in the first place, but you seeing this only sporadically
highly suggests so) would sit somewhere earlier, perhaps when the
page gets established as a recursive L2 one. Unless someone else
gets to this earlier than me, I'll have to go through the related code
another time with this property in mind.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.