[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 04/14] argo: init, destroy and soft-reset, with enable command line opt



On Tue, Jan 15, 2019 at 01:27:36AM -0800, Christopher Clark wrote:
> Initialises basic data structures and performs teardown of argo state
> for domain shutdown.
> 
> Inclusion of the Argo implementation is dependent on CONFIG_ARGO.
> 
> Introduces a new Xen command line parameter 'argo': bool to enable/disable
> the argo hypercall. Defaults to disabled.
> 
> New headers:
>   public/argo.h: with definions of addresses and ring structure, including
>   indexes for atomic update for communication between domain and hypervisor.
> 
>   xen/argo.h: to expose the hooks for integration into domain lifecycle:
>     argo_init: per-domain init of argo data structures for domain_create.
>     argo_destroy: teardown for domain_destroy and the error exit
>                   path of domain_create.
>     argo_soft_reset: reset of domain state for domain_soft_reset.
> 
> Adds two new fields to struct domain:
>     rwlock_t argo_lock;
>     struct argo_domain *argo;
> 
> In accordance with recent work on _domain_destroy, argo_destroy is
> idempotent. It will tear down: all rings registered by this domain, all
> rings where this domain is the single sender (ie. specified partner,
> non-wildcard rings), and all pending notifications where this domain is
> awaiting signal about available space in the rings of other domains.
> 
> A count will be maintained of the number of rings that a domain has
> registered in order to limit it below the fixed maximum limit defined here.
> 
> Macros are defined to verify the internal locking state within the argo
> implementation. The macros are ASSERTed on entry to functions to validate
> and document the required lock state prior to calling.
> 
> The software license on the public header is the BSD license, standard
> procedure for the public Xen headers. The public header was originally
> posted under a GPL license at: [1]:
> https://lists.xenproject.org/archives/html/xen-devel/2013-05/msg02710.html
> 
> The following ACK by Lars Kurth is to confirm that only people being
> employees of Citrix contributed to the header files in the series posted at
> [1] and that thus the copyright of the files in question is fully owned by
> Citrix. The ACK also confirms that Citrix is happy for the header files to
> be published under a BSD license in this series (which is based on [1]).

I've got some minor comments, and a couple of questions, but I think
the code is looking good.

> diff --git a/xen/common/argo.c b/xen/common/argo.c
> index 6f782f7..1958fdc 100644
> --- a/xen/common/argo.c
> +++ b/xen/common/argo.c
> @@ -16,8 +16,223 @@
>   * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
>   */
>  
> +#include <xen/argo.h>
> +#include <xen/domain.h>
> +#include <xen/domain_page.h>
>  #include <xen/errno.h>
> +#include <xen/event.h>
>  #include <xen/guest_access.h>
> +#include <xen/nospec.h>
> +#include <xen/sched.h>
> +#include <xen/time.h>
> +
> +#include <public/argo.h>
> +
> +DEFINE_XEN_GUEST_HANDLE(xen_argo_addr_t);
> +DEFINE_XEN_GUEST_HANDLE(xen_argo_ring_t);
> +
> +/* Xen command line option to enable argo */
> +static bool __read_mostly opt_argo_enabled;
> +boolean_param("argo", opt_argo_enabled);
> +
> +typedef struct argo_ring_id
> +{
> +    xen_argo_port_t aport;
> +    domid_t partner_id;
> +    domid_t domain_id;
> +} argo_ring_id;
> +
> +/* Data about a domain's own ring that it has registered */
> +struct argo_ring_info
> +{
> +    /* next node in the hash, protected by rings_L2 */
> +    struct hlist_node node;
> +    /* this ring's id, protected by rings_L2 */
> +    struct argo_ring_id id;
> +    /* L3, the ring_info lock: protects the members of this struct below */
> +    spinlock_t L3_lock;
> +    /* length of the ring, protected by L3 */
> +    unsigned int len;
> +    /* number of pages translated into mfns, protected by L3 */
> +    unsigned int nmfns;
> +    /* cached tx pointer location, protected by L3 */
> +    unsigned int tx_ptr;
> +    /* mapped ring pages protected by L3 */
> +    void **mfn_mapping;
> +    /* list of mfns of guest ring, protected by L3 */
> +    mfn_t *mfns;
> +    /* list of struct pending_ent for this ring, protected by L3 */
> +    struct hlist_head pending;
> +    /* number of pending entries queued for this ring, protected by L3 */
> +    unsigned int npending;
> +};
> +
> +/* Data about a single-sender ring, held by the sender (partner) domain */
> +struct argo_send_info
> +{
> +    /* next node in the hash, protected by send_L2 */
> +    struct hlist_node node;
> +    /* this ring's id, protected by send_L2 */
> +    struct argo_ring_id id;
> +};
> +
> +/* A space-available notification that is awaiting sufficient space */
> +struct pending_ent
> +{
> +    /* List node within argo_ring_info's pending list */
> +    struct hlist_node node;
> +    /*
> +     * List node within argo_domain's wildcard_pend_list. Only used if the
> +     * ring is one with a wildcard partner (ie. that any domain may send to)
> +     * to enable cancelling signals on wildcard rings on domain destroy.
> +     */
> +    struct hlist_node wildcard_node;
> +    /*
> +     * Pointer to the ring_info that this ent pertains to. Used to ensure 
> that
> +     * ring_info->npending is decremented when ents for wildcard rings are
> +     * cancelled for domain destroy.
> +     * Caution: Must hold the correct locks before accessing ring_info via 
> this.
> +     */
> +    struct argo_ring_info *ring_info;
> +    /* minimum ring space available that this signal is waiting upon */
> +    unsigned int len;
> +    /* domain to be notified when space is available */
> +    domid_t domain_id;
> +};
> +
> +/*
> + * The value of the argo element in a struct domain is
> + * protected by L1_global_argo_rwlock
> + */
> +#define ARGO_HTABLE_SIZE 32
> +struct argo_domain
> +{
> +    /* rings_L2 */
> +    rwlock_t rings_L2_rwlock;
> +    /*
> +     * Hash table of argo_ring_info about rings this domain has registered.
> +     * Protected by rings_L2.
> +     */
> +    struct hlist_head ring_hash[ARGO_HTABLE_SIZE];
> +    /* Counter of rings registered by this domain. Protected by rings_L2. */
> +    unsigned int ring_count;
> +
> +    /* send_L2 */
> +    spinlock_t send_L2_lock;

Other locks are rw locks, while this is a spinlock, I guess that's
because there aren't many concurrent read-only accesses to
send_hash?

> +    /*
> +     * Hash table of argo_send_info about rings other domains have registered
> +     * for this domain to send to. Single partner, non-wildcard rings.
> +     * Protected by send_L2.
> +     */
> +    struct hlist_head send_hash[ARGO_HTABLE_SIZE];
> +
> +    /* wildcard_L2 */
> +    spinlock_t wildcard_L2_lock;
> +    /*
> +     * List of pending space-available signals for this domain about wildcard
> +     * rings registered by other domains. Protected by wildcard_L2.
> +     */
> +    struct hlist_head wildcard_pend_list;
> +};
> +
> +/*
> + * Locking is organized as follows:
> + *
> + * Terminology: R(<lock>) means taking a read lock on the specified lock;
> + *              W(<lock>) means taking a write lock on it.
> + *
> + * == L1 : The global read/write lock: L1_global_argo_rwlock
> + * Protects the argo elements of all struct domain *d in the system.
> + * It does not protect any of the elements of d->argo, only their
> + * addresses.

But if you W(L1), you can basically modify anything, in all d->argo
structs, so it does seem to protect the elements of d->argo when
write-locked?

> + * By extension since the destruction of a domain with a non-NULL
> + * d->argo will need to free the d->argo pointer, holding W(L1)
> + * guarantees that no domains pointers that argo is interested in
> + * become invalid whilst this lock is held.

AFAICT holding W(L1) guarantees not only that pointers doesn't change,
but that there are no changes at all in any of the d->argo contained
data.

> + */
> +
> +static DEFINE_RWLOCK(L1_global_argo_rwlock); /* L1 */
> +
> +/*
> + * == rings_L2 : The per-domain ring hash lock: d->argo->rings_L2_rwlock
> + *
> + * Holding a read lock on rings_L2 protects the ring hash table and
> + * the elements in the hash_table d->argo->ring_hash, and
> + * the node and id fields in struct argo_ring_info in the
> + * hash table.
> + * Holding a write lock on rings_L2 protects all of the elements of all the
> + * struct argo_ring_info belonging to this domain.
> + *
> + * To take rings_L2 you must already have R(L1). W(L1) implies W(rings_L2) 
> and
> + * L3.
> + *
> + * == L3 : The individual ring_info lock: ring_info->L3_lock
> + *
> + * Protects all the fields within the argo_ring_info, aside from the ones 
> that
> + * rings_L2 already protects: node, id, lock.
> + *
> + * To acquire L3 you must already have R(rings_L2). W(rings_L2) implies L3.
> + *
> + * == send_L2 : The per-domain single-sender partner rings lock:
> + *              d->argo->send_L2_lock
> + *
> + * Protects the per-domain send hash table : d->argo->send_hash
> + * and the elements in the hash table, and the node and id fields
> + * in struct argo_send_info in the hash table.
> + *
> + * To take send_L2, you must already have R(L1). W(L1) implies send_L2.
> + * Do not attempt to acquire a rings_L2 on any domain after taking and while
> + * holding a send_L2 lock -- acquire the rings_L2 (if one is needed) 
> beforehand.
> + *
> + * == wildcard_L2 : The per-domain wildcard pending list lock:
> + *                  d->argo->wildcard_L2_lock
> + *
> + * Protects the per-domain list of outstanding signals for space availability
> + * on wildcard rings.
> + *
> + * To take wildcard_L2, you must already have R(L1). W(L1) implies 
> wildcard_L2.
> + * No other locks are acquired after obtaining wildcard_L2.
> + */
> +
> +/*
> + * Lock state validations macros
> + *
> + * These macros encode the logic to verify that the locking has adhered to 
> the
> + * locking discipline above.
> + * eg. On entry to logic that requires holding at least R(rings_L2), this:
> + *      ASSERT(LOCKING_Read_rings_L2(d));
> + *
> + * checks that the lock state is sufficient, validating that one of the
> + * following must be true when executed:       R(rings_L2) && R(L1)
> + *                                        or:  W(rings_L2) && R(L1)
> + *                                        or:  W(L1)
> + */
> +
> +/* RAW macros here are only used to assist defining the other macros below */
> +#define RAW_LOCKING_Read_L1 (rw_is_locked(&L1_global_argo_rwlock))

Not sure whether it's relevant or not, but this macro would return
true as long as the lock is taken, regardless of whether it's read or
write locked. If you want to make sure it's only read-locked then you
will have to use:

rw_is_locked(&L1_global_argo_rwlock) &&
!rw_is_write_locked(&L1_global_argo_rwlock)

AFAICT.

> +#define RAW_LOCKING_Read_rings_L2(d) \
> +    (rw_is_locked(&d->argo->rings_L2_rwlock) && RAW_LOCKING_Read_L1)
> +
> +/* The LOCKING macros defined below here are for use at verification points 
> */
> +#define LOCKING_Write_L1 (rw_is_write_locked(&L1_global_argo_rwlock))
> +#define LOCKING_Read_L1 (RAW_LOCKING_Read_L1 || LOCKING_Write_L1)

You can drop the LOCKING_Write_L1 here, since with the current macros
RAW_LOCKING_Read_L1 will return true regardless of whether the lock is
read or write locked.

> +
> +#define LOCKING_Write_rings_L2(d) \
> +    ((RAW_LOCKING_Read_L1 && rw_is_write_locked(&d->argo->rings_L2_rwlock)) 
> || \

For safety you need parentheses around d here:

rw_is_write_locked(&(d)->argo->rings_L2_rwlock)

And also in the macros below, same applies to r.

> +     LOCKING_Write_L1)
> +
> +#define LOCKING_Read_rings_L2(d) \
> +    ((RAW_LOCKING_Read_L1 && rw_is_locked(&d->argo->rings_L2_rwlock)) || \
> +     LOCKING_Write_rings_L2(d) || LOCKING_Write_L1)
> +
> +#define LOCKING_L3(d, r) \
> +    ((RAW_LOCKING_Read_rings_L2(d) && spin_is_locked(&r->L3_lock)) || \
> +     LOCKING_Write_rings_L2(d) || LOCKING_Write_L1)
> +
> +#define LOCKING_send_L2(d) \
> +    ((RAW_LOCKING_Read_L1 && spin_is_locked(&d->argo->send_L2_lock)) || \
> +     LOCKING_Write_L1)
>  
>  /* Change this to #define ARGO_DEBUG here to enable more debug messages */
>  #undef ARGO_DEBUG
> @@ -28,10 +243,365 @@
>  #define argo_dprintk(format, ... ) ((void)0)
>  #endif
>  
> +/* 
> + * FIXME for 4.12:
> + *  * Replace this hash function to get better distribution across buckets.
> + *  * Don't use casts in the replacement function.
> + *  * Drop the use of array_index_nospec.
> + */
> +/*
> + * This hash function is used to distribute rings within the per-domain
> + * hash tables (d->argo->ring_hash and d->argo_send_hash). The hash table
> + * will provide a struct if a match is found with a 'argo_ring_id' key:
> + * ie. the key is a (domain id, argo port, partner domain id) tuple.
> + * Since argo port number varies the most in expected use, and the Linux 
> driver
> + * allocates at both the high and low ends, incorporate high and low bits to
> + * help with distribution.
> + * Apply array_index_nospec as a defensive measure since this operates
> + * on user-supplied input and the array size that it indexes into is known.
> + */
> +static unsigned int
> +hash_index(const struct argo_ring_id *id)
> +{
> +    unsigned int hash;
> +
> +    hash = (uint16_t)(id->aport >> 16);
> +    hash ^= (uint16_t)id->aport;
> +    hash ^= id->domain_id;
> +    hash ^= id->partner_id;
> +    hash &= (ARGO_HTABLE_SIZE - 1);
> +
> +    return array_index_nospec(hash, ARGO_HTABLE_SIZE);
> +}
> +
> +static struct argo_ring_info *
> +find_ring_info(const struct domain *d, const struct argo_ring_id *id)
> +{
> +    unsigned int ring_hash_index;
> +    struct hlist_node *node;
> +    struct argo_ring_info *ring_info;
> +
> +    ASSERT(LOCKING_Read_rings_L2(d));
> +
> +    ring_hash_index = hash_index(id);
> +
> +    argo_dprintk("d->argo=%p, d->argo->ring_hash[%u]=%p id=%p\n",
> +                 d->argo, ring_hash_index,
> +                 d->argo->ring_hash[ring_hash_index].first, id);
> +    argo_dprintk("id.aport=%x id.domain=vm%u id.partner_id=vm%u\n",
> +                 id->aport, id->domain_id, id->partner_id);
> +
> +    hlist_for_each_entry(ring_info, node, 
> &d->argo->ring_hash[ring_hash_index],
> +                         node)
> +    {
> +        const struct argo_ring_id *cmpid = &ring_info->id;
> +
> +        if ( cmpid->aport == id->aport &&
> +             cmpid->domain_id == id->domain_id &&
> +             cmpid->partner_id == id->partner_id )
> +        {
> +            argo_dprintk("ring_info=%p\n", ring_info);
> +            return ring_info;
> +        }
> +    }
> +    argo_dprintk("no ring_info found\n");
> +
> +    return NULL;
> +}
> +
> +static void
> +ring_unmap(const struct domain *d, struct argo_ring_info *ring_info)
> +{
> +    unsigned int i;
> +
> +    ASSERT(LOCKING_L3(d, ring_info));
> +
> +    if ( !ring_info->mfn_mapping )
> +        return;
> +
> +    for ( i = 0; i < ring_info->nmfns; i++ )
> +    {
> +        if ( !ring_info->mfn_mapping[i] )
> +            continue;
> +        if ( ring_info->mfns )
> +            argo_dprintk(XENLOG_ERR "argo: unmapping page %"PRI_mfn" from 
> %p\n",
> +                         mfn_x(ring_info->mfns[i]),
> +                         ring_info->mfn_mapping[i]);

Is it actually possible to have a mapped page without a matching mfn
stored in the mfns array? That would imply there's no reference
taken on such mapped page, which could be dangerous? I think you might
want to add an ASSERT(ring_info->mfns) instead of the current if
condition?

(Maybe I'm missing something here).

>  long
>  do_argo_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
>             XEN_GUEST_HANDLE_PARAM(void) arg2, unsigned long arg3,
>             unsigned long arg4)
>  {
> -    return -ENOSYS;
> +    long rc = -EFAULT;
> +
> +    argo_dprintk("->do_argo_op(%u,%p,%p,%lu,0x%lx)\n", cmd,
> +                 (void *)arg1.p, (void *)arg2.p, arg3, arg4);
> +
> +    if ( unlikely(!opt_argo_enabled) )
> +        return -EOPNOTSUPP;

I think this should return -ENOSYS, an hypervisor built with
CONFIG_ARGO but without argo enabled on the command line shouldn't
behave differently than an hypervisor build without CONFIG_ARGO.

> +
> +    switch (cmd)
> +    {
> +    default:
> +        rc = -EOPNOTSUPP;
> +        break;
> +    }
> +
> +    argo_dprintk("<-do_argo_op(%u)=%ld\n", cmd, rc);
> +
> +    return rc;
> +}
> +
> +static void
> +argo_domain_init(struct argo_domain *argo)
> +{
> +    unsigned int i;
> +
> +    rwlock_init(&argo->rings_L2_rwlock);
> +    spin_lock_init(&argo->send_L2_lock);
> +    spin_lock_init(&argo->wildcard_L2_lock);
> +    argo->ring_count = 0;

No need to set ring_count to 0, since you allocate the struct with
xzalloc it's going to be zeroed already.

In the argo_soft_reset case domain_rings_remove_all should have
already set ring_count to 0.

> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
> index 4956a77..6e69afa 100644
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -490,6 +490,11 @@ struct domain
>          unsigned int guest_request_enabled       : 1;
>          unsigned int guest_request_sync          : 1;
>      } monitor;
> +
> +#ifdef CONFIG_ARGO
> +    /* Argo interdomain communication support */
> +    struct argo_domain *argo;

I'm likely missing something, but argo_domain is declared in argo.c,
don't you need a forward declaration here for this to build?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.