[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] arm/p2m: Fix regression during domain shutdown with active mem_access



On Tue, Jan 24, 2017 at 3:32 PM, Julien Grall <julien.grall@xxxxxxx> wrote:
>
>
> On 24/01/2017 22:19, Tamas K Lengyel wrote:
>>
>> On Tue, Jan 24, 2017 at 3:13 PM, Julien Grall <julien.grall@xxxxxxx>
>> wrote:
>>>
>>> Hi Tamas,
>>>
>>> On 24/01/2017 22:10, Tamas K Lengyel wrote:
>>>>
>>>>
>>>> The change in commit 438c5fe4f0c introduced a regression for domains
>>>> where
>>>> mem_acces is or was active. When relinquish_p2m_mapping attempts to
>>>> clear
>>>> a page where the order is not 0 the following ASSERT is triggered:
>>>>
>>>>     ASSERT(!p2m->mem_access_enabled || page_order == 0);
>>>>
>>>> This regression was unfortunately not caught during testing in
>>>> preparation
>>>> for the 4.8 release.
>>>>
>>>> As at this point during domain shutdown it is safe to skip mem_access
>>>> paths
>>>> altogether (pages are being relinquished), this patch flips the
>>>> mem_access_enabled flag to forgo any radix-tree lookups and to avoid
>>>> tripping the ASSERT.
>>>>
>>>> Ideally this fix would be part of Xen 4.8.1.
>>>
>>>
>>>
>>> How about fixing the ASSERT rather than turning-off memaccess crudely?
>>>
>>> For instance by whether whether the domain is dying.
>>>
>>
>> We can do that too if preferred. This way though we also shortcut all
>> calls to p2m_mem_access_radix_set, so shutdown would be faster.
>
>
> Well yes and no. You will have to free the radix tree afterwards anyway. So
> if you disable mem_access, the radix tree will fully free in p2m_teardown.
>
> This may be faster to destroy a domain. But you may notice a lag on CPU for
> a bit as p2m_teardown is not preemptible (radix_tree_destroy can take some
> times depending on the size of the tree). On the other side, p2m_relinquish
> is preemptible so by removing element one by one it will be slower but don't
> introduce the potential lag in p2m_teardown as the function will be
> preempted if there is work to do (such as an interrupt need to be injected
> in the guest issuing the destroy hypercall).
>
> I have to chose I would go on free element one by one as it potentially
> avoids lag. But Stefano may disagree here.
>

Preemption makes sense. OTOH the teardown is getting called for all
GFNs, while the radix tree is likely just a subset (potentially only a
couple gfns). So instead of checking all possible gfns if they are
present in the tree and then removing them, just to be followed by
destroying the tree, we can just destroy the tree once and be done
with it. Since the most expensive operation with the tree are adding
and removing nodes, IMHO we should skip that if possible.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.