[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v11 2/3] x86/mem_sharing: reset a fork



On Wed, Mar 18, 2020 at 5:36 AM Jan Beulich <jbeulich@xxxxxxxx> wrote:
>
> On 28.02.2020 19:40, Tamas K Lengyel wrote:
> > --- a/xen/arch/x86/mm/mem_sharing.c
> > +++ b/xen/arch/x86/mm/mem_sharing.c
> > @@ -1775,6 +1775,91 @@ static int fork(struct domain *cd, struct domain *d)
> >       return rc;
> >   }
> >
> > +/*
> > + * The fork reset operation is intended to be used on short-lived forks 
> > only.
> > + */
> > +static int fork_reset(struct domain *d, struct domain *cd,
>
> Could I talk you into using pd instead of d, to even more
> clearly distinguish which of the two domain's is meant? Also
> in principle this might be possible to be a pointer to const,
> albeit I realize this may need changes you likely don't want
> to do in a prereq patch (and maybe there's actually a reason
> why it can't be).

The names c and cd are used across the mem_sharing codebase, for
consistency I'm keeping that.

>
> > +                      struct mem_sharing_op_fork_reset *fr)
> > +{
> > +    int rc = 0;
> > +    struct p2m_domain* p2m = p2m_get_hostp2m(cd);
>
> Star and blank want to switch places here.
>
> > +    struct page_info *page, *tmp;
> > +    unsigned long list_position = 0, preempt_count = 0, restart = 
> > fr->opaque;
> > +
> > +    domain_pause(cd);
> > +
> > +    page_list_for_each_safe(page, tmp, &cd->page_list)
>
> You may not iterate a domain's page list without holding its
> page-alloc lock. Even if the domain is paused, other entities
> (like the controlling domain) may cause the list to be altered.
> With this the question then of course becomes whether holding
> that lock for this long is acceptable. I guess you need to
> somehow mark the pages you've processed, either by a flag or
> by moving between separate lists. Domain cleanup does something
> along these lines.
>
> > +    {
> > +        p2m_type_t p2mt;
> > +        p2m_access_t p2ma;
> > +        gfn_t gfn;
> > +        mfn_t mfn;
> > +        bool shared = false;
> > +
> > +        list_position++;
> > +
> > +        /* Resume were we left of before preemption */
> > +        if ( restart && list_position < restart )
> > +            continue;
>
> This assumes the list to not have been changed across a continuation,
> which isn't going to fly.


OK, I'm going to drop continuation here completely. I was reluctant to
add it to begin with since this hypercall should only be called when
the number of pages is low so there wouldn't be continuation anyway.
This is work I'm unable to assign more time for, if someone in the
future really needs continuation they are welcome to figure it out.

>
> > +        mfn = page_to_mfn(page);
> > +        if ( mfn_valid(mfn) )
>
> All pages on a domain's list should have a valid MFN - what are you
> trying to protect against here?

I saw no documentation stating what you stated above. If that's the
case it can be dropped.

>
> > +        {
> > +
> > +            gfn = mfn_to_gfn(cd, mfn);
>
> Stray blank line above here?
>
> > +            mfn = __get_gfn_type_access(p2m, gfn_x(gfn), &p2mt, &p2ma,
> > +                                        0, NULL, false);
> > +
> > +            if ( p2m_is_ram(p2mt) && !p2m_is_shared(p2mt) )
> > +            {
> > +                /* take an extra reference, must work for a shared page */
>
> The comment (and also the next one further down) looks contradictory
> to the if() immediately ahead, at least to me. Could you clarify the
> situation, please?

I don't understand your question.  The comment explains exactly what
happens. Taking an extra reference must work. If it didn't, trigger an
ASSERT_UNREACHABLE. Which part is confusing?

>
> > +                if( !get_page(page, cd) )
> > +                {
> > +                    ASSERT_UNREACHABLE();
> > +                    return -EINVAL;
> > +                }
> > +
> > +                shared = true;
> > +                preempt_count += 0x10;
> > +
> > +                /*
> > +                 * Must succeed, it's a shared page that exists and
> > +                 * thus its size is guaranteed to be 4k so we are not 
> > splitting
> > +                 * large pages.
> > +                 */
> > +                rc = p2m->set_entry(p2m, gfn, INVALID_MFN, PAGE_ORDER_4K,
> > +                                    p2m_invalid, p2m_access_rwx, -1);
> > +                ASSERT(!rc);
> > +
> > +                put_page_alloc_ref(page);
> > +                put_page(page);
> > +            }
> > +        }
> > +
> > +        if ( !shared )
> > +            preempt_count++;
> > +
> > +        /* Preempt every 2MiB (shared) or 32MiB (unshared) - arbitrary. */
> > +        if ( preempt_count >= 0x2000 )
> > +        {
> > +            if ( hypercall_preempt_check() )
> > +            {
> > +                rc = -ERESTART;
>
> You use a negative return value here, but ...
>
> > +                break;
> > +            }
> > +            preempt_count = 0;
> > +        }
> > +    }
> > +
> > +    if ( rc )
> > +        fr->opaque = list_position;
> > +    else
> > +        rc = copy_settings(cd, d);
> > +
> > +    domain_unpause(cd);
> > +    return rc;
> > +}
> > +
> >   int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
> >   {
> >       int rc;
> > @@ -2066,6 +2151,36 @@ int 
> > mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
> >           break;
> >       }
> >
> > +    case XENMEM_sharing_op_fork_reset:
> > +    {
> > +        struct domain *pd;
> > +
> > +        rc = -ENOSYS;
> > +        if ( !mem_sharing_is_fork(d) )
> > +            goto out;
> > +
> > +        rc = rcu_lock_live_remote_domain_by_id(d->parent->domain_id, &pd);
> > +        if ( rc )
> > +            goto out;
> > +
> > +        rc = fork_reset(pd, d, &mso.u.fork_reset);
> > +
> > +        rcu_unlock_domain(pd);
> > +
> > +        if ( rc > 0 )
>
> ... you check for a positive value here. I didn't get around to
> look at earlier versions, so I can only guess the -ERESTART above
> was changed to later on.

I'm dropping continuation for the next revision so this no longer matters.

Thanks,
Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.