[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt



On Thu, Jan 10, 2019 at 2:19 AM Roger Pau Monné <royger@xxxxxxxxxxx> wrote:
>
>  On Mon, Jan 7, 2019 at 8:44 AM Christopher Clark
> <christopher.w.clark@xxxxxxxxx> wrote:
> >
> > Initialises basic data structures and performs teardown of argo state
> > for domain shutdown.

> > diff --git a/xen/common/argo.c b/xen/common/argo.c
> > index 6f782f7..86195d3 100644
> > --- a/xen/common/argo.c
> > +++ b/xen/common/argo.c
> > @@ -17,7 +17,177 @@
> >   */
> >
> >  #include <xen/errno.h>
> > +#include <xen/sched.h>
> > +#include <xen/domain.h>
> > +#include <xen/argo.h>
> > +#include <xen/event.h>
> > +#include <xen/domain_page.h>
> >  #include <xen/guest_access.h>
> > +#include <xen/time.h>
> > +#include <public/argo.h>
>
> We usually try to sort header includes alphabetically, and I would add
> a newline between the xen/* and the public/* header includes.

ack, thanks, done.

>
> > +
> > +DEFINE_XEN_GUEST_HANDLE(xen_argo_addr_t);
> > +DEFINE_XEN_GUEST_HANDLE(xen_argo_ring_t);
> > +
> > +/* Xen command line option to enable argo */
> > +static bool __read_mostly opt_argo_enabled;
> > +boolean_param("argo", opt_argo_enabled);
>
> I would drop the opt_* prefix, new options added recently don't
> include the prefix already.

after the later in-thread discussion w/ yourself and Jan, I've
retained this prefix

> > +/* Data about a domain's own ring that it has registered */
> > +struct argo_ring_info
> > +{
> > +    /* next node in the hash, protected by L2 */
> > +    struct hlist_node node;
> > +    /* this ring's id, protected by L2 */
> > +    struct argo_ring_id id;
> > +    /* L3 */
> > +    spinlock_t lock;
> > +    /* length of the ring, protected by L3 */
> > +    uint32_t len;
> > +    /* number of pages in the ring, protected by L3 */
> > +    uint32_t npage;
>
> Can you infer number of pages form the length of the ring, or the
> other way around?
>
> I'm not sure why both need to be stored here.

Yes, you're right. I've removed the npage struct member and calculated
the value where needed from the len.

> > +    /* number of pages translated into mfns, protected by L3 */
> > +    uint32_t nmfns;
> > +    /* cached tx pointer location, protected by L3 */
> > +    uint32_t tx_ptr;
>
> All this fields are not part of any public structure, so I wonder if
> it would be better to simply use unsigned int for those, or size_t.

ack, done.

> > +    /* mapped ring pages protected by L3 */
> > +    uint8_t **mfn_mapping;
>
> Why 'uint8_t *', wouldn't it be better to just use 'void *' if it's a mapping?

Yes. Have switched it to be void*.

> > +/* A space-available notification that is awaiting sufficient space */
> > +struct pending_ent
> > +{
> > +    /* List node within argo_ring_info's pending list */
> > +    struct hlist_node node;
> > +    /*
> > +     * List node within argo_domain's wildcard_pend_list. Only used if the
> > +     * ring is one with a wildcard partner (ie. that any domain may send 
> > to)
> > +     * to enable cancelling signals on wildcard rings on domain destroy.
> > +     */
> > +    struct hlist_node wildcard_node;
> > +    /*
> > +     * Pointer to the ring_info that this ent pertains to. Used to ensure 
> > that
> > +     * ring_info->npending is decremented when ents for wildcard rings are
> > +     * cancelled for domain destroy.
> > +     * Caution: Must hold the correct locks before accessing ring_info via 
> > this.
> > +     */
> > +    struct argo_ring_info *ring_info;
> > +    /* domain to be notified when space is available */
> > +    domid_t domain_id;
> > +    uint16_t pad;
>
> No need for the pad in internal structures.

ack - removed pad.

> > +/*
> > + * The value of the argo element in a struct domain is
> > + * protected by the global lock argo_lock: L1
> > + */
> > +#define ARGO_HTABLE_SIZE 32
> > +struct argo_domain
> > +{
> > +    /* L2 */
> > +    rwlock_t lock;
> > +    /*
> > +     * Hash table of argo_ring_info about rings this domain has registered.
> > +     * Protected by L2.
> > +     */
> > +    struct hlist_head ring_hash[ARGO_HTABLE_SIZE];
> > +    /* Counter of rings registered by this domain. Protected by L2. */
> > +    uint32_t ring_count;
> > +
> > +    /* Lsend */
> > +    spinlock_t send_lock;
> > +    /*
> > +     * Hash table of argo_send_info about rings other domains have 
> > registered
> > +     * for this domain to send to. Single partner, non-wildcard rings.
> > +     * Protected by Lsend.
> > +     */
> > +    struct hlist_head send_hash[ARGO_HTABLE_SIZE];
> > +
> > +    /* Lwildcard */
> > +    spinlock_t wildcard_lock;
> > +    /*
> > +     * List of pending space-available signals for this domain about 
> > wildcard
> > +     * rings registered by other domains. Protected by Lwildcard.
> > +     */
> > +    struct hlist_head wildcard_pend_list;
> > +};
> > +
> > +/*
> > + * Locking is organized as follows:
> > + *
> > + * Terminology: R(<lock>) means taking a read lock on the specified lock;
> > + *              W(<lock>) means taking a write lock on it.
> > + *
> > + * L1 : The global lock: argo_lock
> > + * Protects the argo elements of all struct domain *d in the system.
> > + * It does not protect any of the elements of d->argo, only their
> > + * addresses.
> > + *
> > + * By extension since the destruction of a domain with a non-NULL
> > + * d->argo will need to free the d->argo pointer, holding W(L1)
> > + * guarantees that no domains pointers that argo is interested in
> > + * become invalid whilst this lock is held.
> > + */
> > +
> > +static DEFINE_RWLOCK(argo_lock); /* L1 */
>
> You also add an argo_lock to each domain struct which doesn't seem to
> be mentioned here at all.

You're right! Thanks - that's a nice find. That lock is not used at all.
I'd missed it since it's just not referenced anywhere in the argo.c file.
I've removed it.

> Shouldn't that lock be the one that protects d->argo? (instead of this global 
> lock?)

According the design that is in place at the moment, no, but
I need to study that option a bit before I can comment further on
whether it would make sense to add it in order to do so.
I imagine not though because we're not looking to add any more locks.

> > +/*
> > + * L2 : The per-domain ring hash lock: d->argo->lock
> > + * Holding a read lock on L2 protects the ring hash table and
> > + * the elements in the hash_table d->argo->ring_hash, and
> > + * the node and id fields in struct argo_ring_info in the
> > + * hash table.
> > + * Holding a write lock on L2 protects all of the elements of
> > + * struct argo_ring_info.
> > + *
> > + * To take L2 you must already have R(L1). W(L1) implies W(L2) and L3.
> > + *
> > + * L3 : The ringinfo lock: argo_ring_info *ringinfo; ringinfo->lock
> > + * Protects all the fields within the argo_ring_info, aside from the ones 
> > that
> > + * L2 already protects: node, id, lock.
> > + *
> > + * To aquire L3 you must already have R(L2). W(L2) implies L3.
> > + *
> > + * Lsend : The per-domain single-sender partner rings lock: 
> > d->argo->send_lock
> > + * Protects the per-domain send hash table : d->argo->send_hash
> > + * and the elements in the hash table, and the node and id fields
> > + * in struct argo_send_info in the hash table.
> > + *
> > + * To take Lsend, you must already have R(L1). W(L1) implies Lsend.
> > + * Do not attempt to acquire a L2 on any domain after taking and while
> > + * holding a Lsend lock -- acquire the L2 (if one is needed) beforehand.
> > + *
> > + * Lwildcard : The per-domain wildcard pending list lock: 
> > d->argo->wildcard_lock
> > + * Protects the per-domain list of outstanding signals for space 
> > availability
> > + * on wildcard rings.
> > + *
> > + * To take Lwildcard, you must already have R(L1). W(L1) implies Lwildcard.
> > + * No other locks are acquired after obtaining Lwildcard.
> > + */
>
> IMO I think the locking is overly complicated, and there's no
> reasoning why so many locks are needed. Wouldn't it be enough to start
> with a single lock that protects the whole d->argo existence and
> contents?
>
> I would start with a very simple (as simple as possible) locking
> structure and go improving from there if there are performance
> bottlenecks.

It definitely doesn't help when there's an extra lock lying around
just to be confusing. Sorry.

The locking discipline in this code is challenging and you are right that
there hasn't a explanation given as to _why_ there are the locks that there
are. I will fix that. I can also review the placement of the ASSERTs that
check (and document) the locks within the code, if that helps.

The current locking comments describe the how, but the why hasn't been
covered so far and it is needed. The unreasonably-short version is: this
code is *hot* when the communication paths are in use -- it operates the
data path -- and there needs to be isolation for paths using rings from the
potentially malicious or disruptive activities of other domains, or even
other vcpus of the same domain operating other rings.

I am confident that the locking (that actually gets operated) is correct and
justified though, and I hope that adding some new clear documentation for it
can address this.

> >  /* Change this to #define ARGO_DEBUG here to enable more debug messages */
> >  #undef ARGO_DEBUG
> > @@ -28,10 +198,299 @@
> >  #define argo_dprintk(format, ... ) ((void)0)
> >  #endif
> >
> > +static void
> > +ring_unmap(struct argo_ring_info *ring_info)
> > +{
> > +    unsigned int i;
> > +
> > +    if ( !ring_info->mfn_mapping )
> > +        return;
> > +
> > +    for ( i = 0; i < ring_info->nmfns; i++ )
> > +    {
> > +        if ( !ring_info->mfn_mapping[i] )
> > +            continue;
> > +        if ( ring_info->mfns )
> > +            argo_dprintk(XENLOG_ERR "argo: unmapping page %"PRI_mfn" from 
> > %p\n",
> > +                         mfn_x(ring_info->mfns[i]),
> > +                         ring_info->mfn_mapping[i]);
> > +        unmap_domain_page_global(ring_info->mfn_mapping[i]);
> > +        ring_info->mfn_mapping[i] = NULL;
> > +    }
>
> As noted in another patch, I would consider mapping this in contiguous
> virtual address space using vmap, but I'm not going to insist.

Thanks - I will look at what is involved to do it, but I may well have
to leave it as-is for now.

>
> > +}
> > +
> > +static void
> > +wildcard_pending_list_remove(domid_t domain_id, struct pending_ent *ent)
> > +{
> > +    struct domain *d = get_domain_by_id(domain_id);
>
> Newline.

ack

>
> > +    if ( !d )
> > +        return;
> > +
> > +    if ( d->argo )
>
> Don't you need to pick d->argo_lock here to prevent d->argo from being
> removed under your feet?

No, because wildcard_pending_list_remove is called from:

* pending_find and pending_cancel:
  with both R(L2) and L3 held (of a different domain, but
  that's ok), which means R(L1) is held,
  so d->argo is safe in wildcard_pending_list_remove.

* pending_remove_all:
  which is only called from ring_remove_info,
  which has ASSERTS that either R(L1) or R(L2) is held,
  and R(L2) means R(L1) must already be held (following protocol).
  so d->argo is safe in wildcard_pending_list_remove.

>
> > +    {
> > +        spin_lock(&d->argo->wildcard_lock);
> > +        hlist_del(&ent->wildcard_node);
> > +        spin_unlock(&d->argo->wildcard_lock);
> > +    }
> > +    put_domain(d);
> > +}
> > +
> > +static void
> > +pending_remove_all(struct argo_ring_info *ring_info)
> > +{
> > +    struct hlist_node *node, *next;
> > +    struct pending_ent *ent;
> > +
> > +    hlist_for_each_entry_safe(ent, node, next, &ring_info->pending, node)
>
> As a side note, it might be interesting to introduce a helper like
> hlist_first_entry_or_null, that would remove the need to have an extra
> *next element, and would be the more natural way to drain a hlist
> (seeing that you have the same pattern in
> wildcard_rings_pending_remove).
>
> > +    {
> > +        if ( ring_info->id.partner_id == XEN_ARGO_DOMID_ANY )
> > +            wildcard_pending_list_remove(ent->domain_id, ent);
> > +        hlist_del(&ent->node);
> > +        xfree(ent);
> > +    }
> > +    ring_info->npending = 0;
> > +}
> > +
> > +static void
> > +wildcard_rings_pending_remove(struct domain *d)
> > +{
> > +    struct hlist_node *node, *next;
> > +    struct pending_ent *ent;
> > +
> > +    ASSERT(rw_is_write_locked(&argo_lock));
> > +
> > +    hlist_for_each_entry_safe(ent, node, next, 
> > &d->argo->wildcard_pend_list,
> > +                              node)
> > +    {
> > +        hlist_del(&ent->node);
> > +        ent->ring_info->npending--;
> > +        hlist_del(&ent->wildcard_node);
> > +        xfree(ent);
> > +    }
> > +}
> > +
> > +static void
> > +ring_remove_mfns(const struct domain *d, struct argo_ring_info *ring_info)
> > +{
> > +    unsigned int i;
> > +
> > +    ASSERT(rw_is_write_locked(&d->argo->lock) ||
> > +           rw_is_write_locked(&argo_lock));
>
> I think the above requires a comment of why two different locks are
> used to protect the ring mfns, and why just having one of them locked
> is enough.

ack, I need to add further locking docs.

>
> > +
> > +    if ( !ring_info->mfns )
> > +        return;
> > +
> > +    if ( !ring_info->mfn_mapping )
> > +    {
> > +        ASSERT_UNREACHABLE();
> > +        return;
> > +    }
> > +
> > +    ring_unmap(ring_info);
> > +
> > +    for ( i = 0; i < ring_info->nmfns; i++ )
> > +        if ( !mfn_eq(ring_info->mfns[i], INVALID_MFN) )
> > +            put_page_and_type(mfn_to_page(ring_info->mfns[i]));
> > +
> > +    xfree(ring_info->mfns);
> > +    ring_info->mfns = NULL;
>
> Xen has a handly macro for this, XFREE. That would free the memory and
> assign the pointer to NULL.

thanks, that's great - done

>
> > +    ring_info->npage = 0;
> > +    xfree(ring_info->mfn_mapping);
> > +    ring_info->mfn_mapping = NULL;
> > +    ring_info->nmfns = 0;
> > +}
> > +
> > +static void
> > +ring_remove_info(struct domain *d, struct argo_ring_info *ring_info)
>
> I think the domain parameter can be constifyed here, since it's only
> used by ring_remove_mfnsnd that function already expects a const
> domain struct.

ack, yes, thanks.

> > +
> >  long
> >  do_argo_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
> >             XEN_GUEST_HANDLE_PARAM(void) arg2, unsigned long arg3,
> >             unsigned long arg4)
> >  {
> > -    return -ENOSYS;
> > +    struct domain *currd = current->domain;
> > +    long rc = -EFAULT;
> > +
> > +    argo_dprintk("->do_argo_op(%u,%p,%p,%d,%d)\n", cmd,
> > +                 (void *)arg1.p, (void *)arg2.p, (int) arg3, (int) arg4);
> > +
> > +    if ( unlikely(!opt_argo_enabled) )
> > +    {
> > +        rc = -EOPNOTSUPP;
> > +        return rc;
>
> Why not just 'return -EOPNOTSUPP;'?

good point - will do

> > +
> > +int
> > +argo_init(struct domain *d)
> > +{
> > +    struct argo_domain *argo;
> > +
> > +    if ( !opt_argo_enabled )
> > +    {
> > +        argo_dprintk("argo disabled, domid: %d\n", d->domain_id);
> > +        return 0;
> > +    }
> > +
> > +    argo_dprintk("init: domid: %d\n", d->domain_id);
> > +
> > +    argo = xmalloc(struct argo_domain);
> > +    if ( !argo )
> > +        return -ENOMEM;
> > +
> > +    write_lock(&argo_lock);
> > +
> > +    argo_domain_init(argo);
> > +
> > +    d->argo = argo;
>
> Where's the d->argo_lock initialization?

It was added to domain.c in this patch, but there is now no need, that lock
is gone. Thanks for the catch.

>
> > +
> > +    write_unlock(&argo_lock);
> > +
> > +    return 0;
> > +}
> > +
> > +void
> > +argo_destroy(struct domain *d)
> > +{
> > +    BUG_ON(!d->is_dying);
> > +
> > +    write_lock(&argo_lock);
> > +
> > +    argo_dprintk("destroy: domid %d d->argo=%p\n", d->domain_id, d->argo);
> > +
> > +    if ( d->argo )
> > +    {
> > +        domain_rings_remove_all(d);
> > +        partner_rings_remove(d);
> > +        wildcard_rings_pending_remove(d);
> > +        xfree(d->argo);
> > +        d->argo = NULL;
> > +    }
> > +    write_unlock(&argo_lock);
> > +}
> > +
> > +void
> > +argo_soft_reset(struct domain *d)
> > +{
> > +    write_lock(&argo_lock);
> > +
> > +    argo_dprintk("soft reset d=%d d->argo=%p\n", d->domain_id, d->argo);
> > +
> > +    if ( d->argo )
> > +    {
> > +        domain_rings_remove_all(d);
> > +        partner_rings_remove(d);
> > +        wildcard_rings_pending_remove(d);
> > +
> > +        if ( !opt_argo_enabled )
> > +        {
> > +            xfree(d->argo);
> > +            d->argo = NULL;
>
> Can opt_argo_enabled change during runtime?

Not at the moment, no. It should be made changeable
later, but keeping it fixed assists with derisking this for
release consideration.

>
> > +        }
> > +        else
> > +            argo_domain_init(d->argo);
> > +    }
> > +
> > +    write_unlock(&argo_lock);
> >  }
> > diff --git a/xen/common/domain.c b/xen/common/domain.c
> > index c623dae..9596840 100644
> > --- a/xen/common/domain.c
> > +++ b/xen/common/domain.c
> > @@ -32,6 +32,7 @@
> >  #include <xen/grant_table.h>
> >  #include <xen/xenoprof.h>
> >  #include <xen/irq.h>
> > +#include <xen/argo.h>
> >  #include <asm/debugger.h>
> >  #include <asm/p2m.h>
> >  #include <asm/processor.h>
> > @@ -277,6 +278,10 @@ static void _domain_destroy(struct domain *d)
> >
> >      xfree(d->pbuf);
> >
> > +#ifdef CONFIG_ARGO
> > +    argo_destroy(d);
> > +#endif
>
> Instead of adding such ifdefs you could provide dummy argo_destroy
> inline functions in argo.h when CONFIG_ARGO is not set.

ack, have done this.

Thanks,

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.