[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH V5 2/3] x86/mm: allocate logdirty_ranges for altp2ms

On 11/14/18 1:58 PM, George Dunlap wrote:
> On 11/13/18 6:43 PM, Razvan Cojocaru wrote:
>> On 11/13/18 7:57 PM, George Dunlap wrote:
>>> On 11/11/18 2:07 PM, Razvan Cojocaru wrote:
>>>> @@ -2341,6 +2380,7 @@ int p2m_destroy_altp2m_by_id(struct domain *d, 
>>>> unsigned int idx)
>>>>          {
>>>>              p2m_flush_table(d->arch.altp2m_p2m[idx]);
>>>>              /* Uninit and reinit ept to force TLB shootdown */
>>>> +            p2m_free_logdirty(d->arch.altp2m_p2m[idx]);
>>>>              ept_p2m_uninit(d->arch.altp2m_p2m[idx]);
>>>>              ept_p2m_init(d->arch.altp2m_p2m[idx]);
>>>>              d->arch.altp2m_eptp[idx] = mfn_x(INVALID_MFN);
>>> (In case I forget: Also, this is called without holding the appropriate
>>> p2m lock. )
>> Could you please provide more details? I have assumed that at the point
>> of calling a function called p2m_destroy_altp2m_by_id() it should be
>> safe to tear the altp2m down without further precaution.
> Are you absolutely positive that at this point there's no way anywhere
> else in Xen might be doing something with this p2m struct?
> If so, then 1) there should be a comment there explaining why that's the
> case, and 2) ideally we should refactor p2m_flush_table such that we can
> call what's now p2m_flush_table_locked() without the lock.

AFAICT the only place p2m_destroy_altp2m_by_id() is ever called is in
arch/x86/hvm/hvm.c's do_altp2m_op() (on HVMOP_altp2m_destroy_p2m), which
is done under domain lock. Is that insufficient?

>> I think you're saying that I should p2m_lock(d->arch.altp2m_p2m[idx])
>> just for the duration of the p2m_free_logdirty() call?
> The same argument really goes for ept_p2m_uninit/init -- uninit actually
> frees a data structure; if anyone else may be using that, you'll run
> into a use-after-free bug.  (Although that really needs to be changed as
> well -- freeing and re-allocating a structure just to set all the bits
> is ridiculous.)
> If we need locking, then I'd grab the p2m lock before p2m_flush_table()
> (calling p2m_flush_table_locked() instead), and release it after the
> ept_p2m_init().
> I realize you didn't write this code, and so I'm not holding you
> responsible for all the changes I mentioned above.  But if we're going
> to add the p2m_free_logdirty() call, we do need to either grab the lock
> or add a comment explaining why it's not necessary; we might as well fix
> it properly at the same time.
> p2m_flush_table() already grabs and releases the lock; so grabbing the
> lock over all four calls won't add any more overhead (or risk of
> deadlock) than what we already have.

Of course, I'll use p2m_flush_table_locked().

>>> I'm a bit suspicious of long strings of these sorts of functions in the
>>> middle of another function.  It turns out that there are three copies of
>>> this sequence of function calls (p2m_flush_table -> ept_p2m_uninit ->
>>> ept_p2m_init):
>>> * Here (p2m_destroy_altp2m_id), when the user asks for the alt2m index
>>> to be destroyed
>>> * In p2m_flush_altp2m(), which is called when altp2m is disabled for a
>>> domain
>>> * In p2m_reset_altp2m(), which is called when an entry in the hostp2m is
>>> set to INVALID_MFN.
>>> Presumably in p2m_reset_altp2m() we don't want to call
>>> p2m_free_logdirty(), as the altp2m is still active and we want to keep
>>> the logdirty ranges around.  But in p2m_flush_altp2m(), I'm pretty sure
>>> we do want to discard them: when altp2m is enabled again,
>>> p2m_init_logdirty() will return early, leaving the old rangesets in
>>> place; if the hostp2m rangesets have changed between the time altp2m was
>>> disabled and enabled again, the rangeset_merge() may have incorrect results.
>> I'll call p2m_free_logdirty() in p2m_flush_altp2m() as well.
> I was more thinking of refactoring those two to share the same code, and
> potentially having p2m_reset_altp2m() share the same code as well.  The
> reason you missed the p2m_flush_altp2m() there was because of the code
> duplication.

Right, I'll do my best to refactor that then. TBH I'm not a big fan of
that extra verbosity either but thought the least code churn would be
good for reviewing.

> Or alternately...
>>>> Is there any particular reason we allocate the p2m structures on domain
>>>> creation, but not logdirty range structures?  It seems like allocating
>>>> altp2m structures on-demand, rather than at domain creation time, might
>>>> make a lot of the reasoning here simpler.
>>> I assume that this question is not addressed to me, since I'm not able
>>> to answer it - I can only assume that having less heap used has been
>>> preferred.
> I'm asking you because you've recently been going through this code, and
> probably have at least as much familiarity with it as I do.  I can't
> immediately see any reason to allocate them at domain creation time.
> Maybe you can and maybe you can't, but I won't know until I ask. :-)

I've looked at the code closer today, and there's no reason as far as I
can tell why we shouldn't allocate altp2ms on-demand. However, changing
the code is somewhat involved at this point, since there's a lot of:

2357     if ( d->arch.altp2m_eptp[idx] != mfn_x(INVALID_MFN) )
2358     {
2359         p2m = d->arch.altp2m_p2m[idx];
2361         if ( !_atomic_read(p2m->active_vcpus) )
2362         {
2363             p2m_flush_table(d->arch.altp2m_p2m[idx]);
2364             /* Uninit and reinit ept to force TLB shootdown */
2365             ept_p2m_uninit(d->arch.altp2m_p2m[idx]);
2366             ept_p2m_init(d->arch.altp2m_p2m[idx]);
2367             d->arch.altp2m_eptp[idx] = mfn_x(INVALID_MFN);
2368             rc = 0;
2369         }
2370     }

going on. That is, code checking that d->arch.altp2m_eptp[idx] !=
mfn_x(INVALID_MFN), and then blindly assuming that p2m will not be NULL
and is usable.

>> Actually I now realize that you're asking why the hostp2m rangeset is
>> created via paging_domain_init() in arch_domain_create() (so immediately
>> on domain creation) while I'm allocating the altp2m rangesets on altp2m
>> init.
>> I'm doing that to save memory, since we can have MAX_ALTP2M altp2ms
>> (which is currently 10), and only two active altp2ms - that means that I
>> would allocate 10 rangesets and only use two. In fact we're currently
>> only using 2 altp2ms and the hostp2m for our #VE work. That saves the
>> space required for 8 rangesets. If that's not much, or if you think that
>> the benefits of allocating them early outweigh the costs we can switch
>> to allocating them on domain creation, like the hostp2m, and perhaps
>> always keeping them in sync.
> On the contrary, I was thinking of leaving the altp2m_p2m[N] NULL until
> it becomes used; and at that point allocating both the p2m structure and
> the logdirty rangesets; and when deactivating altp2m_p2m[N], freeing
> both the logdirty rangesets and the p2m structure.
> One advantage of that is that we'd reduce the amount of memory used --
> not just for you, but for the vast majority of people who aren't using
> the altp2m functionality; the other advantage is that it simplifies the
> disable / enable logic: Everything that needs to be done is done in one
> place, rather than half in one place and half in another.
> I don't necessarily expect you to do that refactoring, but as you're
> familiar with the code, and have the most investment in its future, it
> makes sense to discuss the possibilities with you. :-)

I agree that that's a valid optimization, and it looks worth doing.
However, the huge priority now is to get the display working since this
is completely crippling altp2m use (so quite urgent, both for Tamas and
us) - so in the interest of getting things to work I propose to, for the
time being, get this series in as soon as acceptable (that is, with the
current altp2m allocation strategy(, and we'll come back later for the
allocation optimizations.

Does that sound reasonable?

Thank you,

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.