[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 07/15] argo: implement the register op



 On Mon, Jan 7, 2019 at 8:44 AM Christopher Clark
<christopher.w.clark@xxxxxxxxx> wrote:
>
> The register op is used by a domain to register a region of memory for
> receiving messages from either a specified other domain, or, if specifying a
> wildcard, any domain.
>
> This operation creates a mapping within Xen's private address space that
> will remain resident for the lifetime of the ring. In subsequent commits,
> the hypervisor will use this mapping to copy data from a sending domain into
> this registered ring, making it accessible to the domain that registered the
> ring to receive data.
>
> Wildcard any-sender rings are default disabled and registration will be
> refused with EPERM unless they have been specifically enabled with the
> argo-mac boot option introduced here. The reason why the default for
> wildcard rings is 'deny' is that there is currently no means to protect the
> ring from DoS by a noisy domain spamming the ring, affecting other domains
> ability to send to it. This will be addressed with XSM policy controls in
> subsequent work.
>
> Since denying access to any-sender rings is a significant functional
> constraint, a new bootparam is provided to enable overriding this:
>  "argo-mac" variable has allowed values: 'permissive' and 'enforcing'.
> Even though this is a boolean variable, use these descriptive strings in
> order to make it obvious to an administrator that this has potential
> security impact.
>
> The p2m type of the memory supplied by the guest for the ring must be
> p2m_ram_rw and the memory will be pinned as PGT_writable_page while the ring
> is registered.
>
> xen_argo_page_descr_t type is introduced as a page descriptor, to convey
> both the physical address of the start of the page and its granularity. The
> smallest granularity page is assumed to be 4096 bytes and the lower twelve
> bits of the type are used to indicate the size of page of memory supplied.
> The implementation of the hypercall op currently only supports 4K pages.
>
> array_index_nospec is used to guard the result of the ring id hash function.
> This is out of an abundance of caution, since this is a very basic hash
> function and it operates upon values supplied by the guest just before
> being used as an array index.
>
> Signed-off-by: Christopher Clark <christopher.clark6@xxxxxxxxxxxxxx>
> ---
> v2 self: disallow ring resize via reregister
> v2 feedback Jan: drop cookie, implement teardown
> v2 feedback Jan: drop message from argo_message_op
> v2 self: move hash_index function below locking comment
> v2 self: OVERHAUL
> v2 self/Jan: remove use of magic verification field and tidy up
> v2 self: merge max and min ring size check clauses
> v2 feedback v1#13 Roger: use OS-supplied roundup; drop from public header
> v2 feedback #9, Jan: use the argo-mac bootparam at point of introduction
> v2 feedback #9, Jan: rename boot opt variable to comply with convention
> v2 feedback #9, Jan: rename the argo_mac bootparam to argo-mac
> v2 feedback #9 Jan: document argo boot opt in xen-command-line.markdown
> v1,2 feedback Jan/Roger/Paul: drop errno returning guest access functions
> v1 feedback Roger, Jan: drop argo prefix on static functions
> v1 feedback Roger: s/pfn/gfn/ and retire always-64-bit type
> v2. feedback Jan: document the argo-mac boot opt
> v2. feedback Jan: simplify re-register, drop mappings
> v1 #13 feedback Jan: revise use of guest_handle_okay vs __copy ops
>
> v1 #13 feedback, Jan: register op : s/ECONNREFUSED/ESRCH/
> v1 #5 (#13) feedback Paul: register op: use currd in do_message_op
> v1 #13 feedback, Paul: register op: use mfn_eq comparator
> v1 #5 (#13) feedback Paul: register op: use currd in argo_register_ring
> v1 #13 feedback Paul: register op: whitespace, unsigned, bounds check
> v1 #13 feedback Paul: use of hex in limit constant definition
> v1 #13 feedback Paul, register op: set nmfns on loop termination
> v1 #13 feedback Paul: register op: do/while -> gotos, reindent
> v1 argo_ring_map_page: drop uint32_t for unsigned int
> v1. #13 feedback Julien: use page descriptors instead of gpfns.
>    - adds ABI support for pages with different granularity.
> v1 feedback #13, Paul: adjust log level of message
> v1 feedback #13, Paul: use gprintk for guest-triggered warning
> v1 feedback #13, Paul: gprintk and XENLOG_DEBUG for ring registration
> v1 feedback #13, Paul: use gprintk for errs in argo_ring_map_page
> v1 feedback #13, Paul: use ENOMEM if global mapping fails
> v1 feedback Paul: overflow check before shift
> v1: add define for copy_field_to_guest_errno
> v1: fix gprintk use for ARM as its defn dislikes split format strings
> v1: use copy_field_to_guest_errno
> v1 feedback #13, Jan: argo_hash_fn: no inline, rename, change type
> v1 feedback #13, Paul, Jan: EFAULT -> ENOMEM in argo_ring_map_page
> v1 feedback #13, Jan: rename page var in argo_ring_map_page
> v1 feedback #13, Jan: switch uint8_t* to void* and drop cast
> v1 feedback #13, Jan: switch memory barrier to smp_wmb
> v1 feedback #13, Jan: make 'ring' comment comply with single-line style
> v1 feedback #13, Jan: use xzalloc_array, drop loop NULL init
> v1 feedback #13, Jan: init bool with false rather than 0
> v1 feedback #13 Jan: use __copy; define and use __copy_field_to_guest_errno
> v1 feedback #13, Jan: use xzalloc, drop individual init zeroes
> v1 feedback #13, Jan: prefix public namespace with xen
> v1 feedback #13, Jan: blank line after op case in do_argo_message_op
> v1 self: reflow comment in argo_ring_map_page to within 80 char len
> v1 feedback #13, Roger: use true not 1 in assign to update_tx_ptr bool
> v1 feedback #21, Jan: fold in the array_index_nospec hash function guards
> v1 feedback #18, Jan: fold the max ring count limit into the series
> v1 self: use unsigned long type for XEN_ARGO_REGISTER_FLAG_MASK
> v1: feedback #15 Jan: handle upper-halves of hypercall args
> v1. feedback #13 Jan: add comment re: page alignment
> v1. self: confirm ring magic presence in supplied page array
> v1. feedback #13 Jan: add comment re: minimum ring size
> v1. feedback #13 Roger: use ASSERT_UNREACHABLE
> v1. feedback Roger: add comment to hash function
>
>  docs/misc/xen-command-line.pandoc  |  15 +
>  xen/common/argo.c                  | 566 
> +++++++++++++++++++++++++++++++++++++
>  xen/include/asm-arm/guest_access.h |   2 +
>  xen/include/asm-x86/guest_access.h |   2 +
>  xen/include/public/argo.h          |  72 +++++
>  xen/include/xlat.lst               |   1 +
>  6 files changed, 658 insertions(+)
>
> diff --git a/docs/misc/xen-command-line.pandoc 
> b/docs/misc/xen-command-line.pandoc
> index aea13eb..68d4415 100644
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -193,6 +193,21 @@ This allows domains access to the Argo hypercall, which 
> supports registration
>  of memory rings with the hypervisor to receive messages, sending messages to
>  other domains by hypercall and querying the ring status of other domains.
>
> +### argo-mac
> +> `= permissive | enforcing`

Why not call this argo-mac-permissive and make it a boolean? Default
would be 'false' and that would imply enforcing. This would get rid of
parse_opt_argo_mac since you could use the default boolean parser.

> +
> +> Default: `enforcing`
> +
> +Constrain the access control applied to the Argo communication mechanism.
> +
> +When `enforcing`, domains may not register rings that have wildcard specified
> +for the sender which would allow messages to be sent to the ring by any 
> domain.
> +This is to protect rings and the services that utilize them against DoS by a
> +malicious or buggy domain spamming the ring.
> +
> +When the boot option is set to `permissive`, this constraint is relaxed and
> +wildcard any-sender rings are allowed to be registered.
> +
>  ### asid (x86)
>  > `= <boolean>`
>
> diff --git a/xen/common/argo.c b/xen/common/argo.c
> index 86195d3..11988e7 100644
> --- a/xen/common/argo.c
> +++ b/xen/common/argo.c
> @@ -23,16 +23,41 @@
>  #include <xen/event.h>
>  #include <xen/domain_page.h>
>  #include <xen/guest_access.h>
> +#include <xen/lib.h>
> +#include <xen/nospec.h>
>  #include <xen/time.h>
>  #include <public/argo.h>
>
> +#define MAX_RINGS_PER_DOMAIN            128U
> +
> +/* All messages on the ring are padded to a multiple of the slot size. */
> +#define ROUNDUP_MESSAGE(a) (ROUNDUP((a), XEN_ARGO_MSG_SLOT_SIZE))
> +
>  DEFINE_XEN_GUEST_HANDLE(xen_argo_addr_t);
> +DEFINE_XEN_GUEST_HANDLE(xen_argo_page_descr_t);
> +DEFINE_XEN_GUEST_HANDLE(xen_argo_register_ring_t);
>  DEFINE_XEN_GUEST_HANDLE(xen_argo_ring_t);
>
>  /* Xen command line option to enable argo */
>  static bool __read_mostly opt_argo_enabled;
>  boolean_param("argo", opt_argo_enabled);
>
> +/* Xen command line option for conservative or relaxed access control */
> +bool __read_mostly opt_argo_mac_enforcing = true;
> +
> +static int __init parse_opt_argo_mac(const char *s)
> +{
> +    if ( !strcmp(s, "enforcing") )
> +        opt_argo_mac_enforcing = true;
> +    else if ( !strcmp(s, "permissive") )
> +        opt_argo_mac_enforcing = false;
> +    else
> +        return -EINVAL;
> +
> +    return 0;
> +}
> +custom_param("argo-mac", parse_opt_argo_mac);
> +
>  typedef struct argo_ring_id
>  {
>      uint32_t port;
> @@ -198,6 +223,31 @@ static DEFINE_RWLOCK(argo_lock); /* L1 */
>  #define argo_dprintk(format, ... ) ((void)0)
>  #endif
>
> +/*
> + * This hash function is used to distribute rings within the per-domain
> + * hash tables (d->argo->ring_hash and d->argo_send_hash). The hash table
> + * will provide a struct if a match is found with a 'argo_ring_id' key:
> + * ie. the key is a (domain id, port, partner domain id) tuple.
> + * Since port number varies the most in expected use, and the Linux driver
> + * allocates at both the high and low ends, incorporate high and low bits to
> + * help with distribution.
> + * Apply array_index_nospec as a defensive measure since this operates
> + * on user-supplied input and the array size that it indexes into is known.
> + */
> +static unsigned int
> +hash_index(const struct argo_ring_id *id)
> +{
> +    unsigned int hash;
> +
> +    hash = (uint16_t)(id->port >> 16);
> +    hash ^= (uint16_t)id->port;
> +    hash ^= id->domain_id;
> +    hash ^= id->partner_id;
> +    hash &= (ARGO_HTABLE_SIZE - 1);
> +
> +    return array_index_nospec(hash, ARGO_HTABLE_SIZE);
> +}
> +
>  static void
>  ring_unmap(struct argo_ring_info *ring_info)
>  {
> @@ -219,6 +269,78 @@ ring_unmap(struct argo_ring_info *ring_info)
>      }
>  }
>
> +static int
> +ring_map_page(struct argo_ring_info *ring_info, unsigned int i, void 
> **out_ptr)
> +{
> +    if ( i >= ring_info->nmfns )
> +    {
> +        gprintk(XENLOG_ERR,
> +               "argo: ring (vm%u:%x vm%d) %p attempted to map page  %u of 
> %u\n",
> +                ring_info->id.domain_id, ring_info->id.port,
> +                ring_info->id.partner_id, ring_info, i, ring_info->nmfns);
> +        return -ENOMEM;
> +    }
> +
> +    if ( !ring_info->mfns || !ring_info->mfn_mapping)
> +    {
> +        ASSERT_UNREACHABLE();
> +        ring_info->len = 0;
> +        return -ENOMEM;
> +    }
> +
> +    if ( !ring_info->mfn_mapping[i] )
> +    {
> +        /*
> +         * TODO:
> +         * The first page of the ring contains the ring indices, so both read
> +         * and write access to the page is required by the hypervisor, but
> +         * read-access is not needed for this mapping for the remainder of 
> the
> +         * ring.
> +         * Since this mapping will remain resident in Xen's address space for
> +         * the lifetime of the ring, and following the principle of least
> +         * privilege, it could be preferable to:
> +         *  # add a XSM check to determine what policy is wanted here
> +         *  # depending on the XSM query, optionally create this mapping as
> +         *    _write-only_ on platforms that can support it.
> +         *    (eg. Intel EPT/AMD NPT).

Why do Intel EPT or AMD NPT matter here?

You are mapping the page to Xen address space, which doesn't use
either EPT or NPT. Writable or read-only mappings would be created by
setting the right bit in the Xen page tables.

> +         */
> +        ring_info->mfn_mapping[i] = 
> map_domain_page_global(ring_info->mfns[i]);
> +

No need for the newline.

> +        if ( !ring_info->mfn_mapping[i] )
> +        {
> +            gprintk(XENLOG_ERR,
> +                "argo: ring (vm%u:%x vm%d) %p attempted to map page %u of 
> %u\n",
> +                    ring_info->id.domain_id, ring_info->id.port,
> +                    ring_info->id.partner_id, ring_info, i, 
> ring_info->nmfns);
> +            return -ENOMEM;
> +        }
> +        argo_dprintk("mapping page %"PRI_mfn" to %p\n",
> +                     mfn_x(ring_info->mfns[i]), ring_info->mfn_mapping[i]);
> +    }
> +
> +    if ( out_ptr )
> +        *out_ptr = ring_info->mfn_mapping[i];
> +
> +    return 0;
> +}
> +
> +static void
> +update_tx_ptr(struct argo_ring_info *ring_info, uint32_t tx_ptr)
> +{
> +    void *dst;
> +    uint32_t *p;
> +
> +    ASSERT(ring_info->mfn_mapping[0]);
> +
> +    ring_info->tx_ptr = tx_ptr;
> +
> +    dst = ring_info->mfn_mapping[0];
> +    p = dst + offsetof(xen_argo_ring_t, tx_ptr);

Hm, wouldn't it be easier to cast page 0 to the layout of the ring so
that you don't need to use pointer arithmetic to get the fields? Ie:
make dst be of type xen_argo_ring_t.

> +
> +    write_atomic(p, tx_ptr);
> +    smp_wmb();
> +}
> +
>  static void
>  wildcard_pending_list_remove(domid_t domain_id, struct pending_ent *ent)
>  {
> @@ -371,6 +493,418 @@ partner_rings_remove(struct domain *src_d)
>      }
>  }
>
> +static int
> +find_ring_mfn(struct domain *d, gfn_t gfn, mfn_t *mfn)
> +{
> +    p2m_type_t p2mt;
> +    int ret = 0;
> +
> +#ifdef CONFIG_X86
> +    *mfn = get_gfn_unshare(d, gfn_x(gfn), &p2mt);
> +#else
> +    *mfn = p2m_lookup(d, gfn, &p2mt);
> +#endif
> +
> +    if ( !mfn_valid(*mfn) )
> +        ret = -EINVAL;
> +#ifdef CONFIG_X86
> +    else if ( p2m_is_paging(p2mt) || (p2mt == p2m_ram_logdirty) )
> +        ret = -EAGAIN;
> +#endif
> +    else if ( (p2mt != p2m_ram_rw) ||
> +              !get_page_and_type(mfn_to_page(*mfn), d, PGT_writable_page) )
> +        ret = -EINVAL;
> +
> +#ifdef CONFIG_X86
> +    put_gfn(d, gfn_x(gfn));
> +#endif
> +
> +    return ret;
> +}
> +
> +static int
> +find_ring_mfns(struct domain *d, struct argo_ring_info *ring_info,
> +               uint32_t npage,
> +               XEN_GUEST_HANDLE_PARAM(xen_argo_page_descr_t) pg_descr_hnd,
> +               uint32_t len)
> +{
> +    unsigned int i;
> +    int ret = 0;
> +    mfn_t *mfns;
> +    uint8_t **mfn_mapping;
> +
> +    /*
> +     * first bounds check on npage here also serves as an overflow check
> +     * before left shifting it
> +     */
> +    if ( (unlikely(npage > (XEN_ARGO_MAX_RING_SIZE >> PAGE_SHIFT))) ||
> +         ((npage << PAGE_SHIFT) < len) )
> +        return -EINVAL;
> +
> +    if ( ring_info->mfns )
> +    {
> +        /* Ring already existed: drop the previous mapping. */
> +        gprintk(XENLOG_INFO,
> +         "argo: vm%u re-register existing ring (vm%u:%x vm%d) clears 
> mapping\n",
> +                d->domain_id, ring_info->id.domain_id,
> +                ring_info->id.port, ring_info->id.partner_id);
> +
> +        ring_remove_mfns(d, ring_info);
> +        ASSERT(!ring_info->mfns);
> +    }
> +
> +    mfns = xmalloc_array(mfn_t, npage);
> +    if ( !mfns )
> +        return -ENOMEM;
> +
> +    for ( i = 0; i < npage; i++ )
> +        mfns[i] = INVALID_MFN;
> +
> +    mfn_mapping = xzalloc_array(uint8_t *, npage);
> +    if ( !mfn_mapping )
> +    {
> +        xfree(mfns);
> +        return -ENOMEM;
> +    }
> +
> +    ring_info->npage = npage;
> +    ring_info->mfns = mfns;
> +    ring_info->mfn_mapping = mfn_mapping;
> +
> +    ASSERT(ring_info->npage == npage);
> +
> +    if ( ring_info->nmfns == ring_info->npage )
> +        return 0;
> +
> +    for ( i = ring_info->nmfns; i < ring_info->npage; i++ )

This loop seems to assume that there can be pages already added to the
ring, but IIRC you said that redimensioning of the ring was removed in
this version?

I think for an initial version it would be easier to don't allow
redimensioning of active rings, and just allow teardown and
re-initialization as the way to redimension a ring.

> +    {
> +        xen_argo_page_descr_t pg_descr;
> +        gfn_t gfn;
> +        mfn_t mfn;
> +
> +        ret = __copy_from_guest_offset(&pg_descr, pg_descr_hnd, i, 1) ?
> +                -EFAULT : 0;
> +        if ( ret )
> +            break;
> +
> +        /* Implementation currently only supports handling 4K pages */
> +        if ( (pg_descr & XEN_ARGO_PAGE_DESCR_SIZE_MASK) !=
> +                XEN_ARGO_PAGE_DESCR_SIZE_4K )
> +        {
> +            ret = -EINVAL;
> +            break;
> +        }
> +        gfn = _gfn(pg_descr >> PAGE_SHIFT);
> +
> +        ret = find_ring_mfn(d, gfn, &mfn);
> +        if ( ret )
> +        {
> +            gprintk(XENLOG_ERR,
> +               "argo: vm%u: invalid gfn %"PRI_gfn" r:(vm%u:%x vm%d) %p 
> %d/%d\n",
> +                    d->domain_id, gfn_x(gfn), ring_info->id.domain_id,
> +                    ring_info->id.port, ring_info->id.partner_id,
> +                    ring_info, i, ring_info->npage);
> +            break;
> +        }
> +
> +        ring_info->mfns[i] = mfn;
> +
> +        argo_dprintk("%d: %"PRI_gfn" -> %"PRI_mfn"\n",
> +                     i, gfn_x(gfn), mfn_x(ring_info->mfns[i]));
> +    }
> +
> +    ring_info->nmfns = i;
> +
> +    if ( ret )
> +        ring_remove_mfns(d, ring_info);
> +    else
> +    {
> +        ASSERT(ring_info->nmfns == ring_info->npage);
> +
> +        gprintk(XENLOG_DEBUG,
> +        "argo: vm%u ring (vm%u:%x vm%d) %p mfn_mapping %p npage %d nmfns 
> %d\n",
> +                d->domain_id, ring_info->id.domain_id,
> +                ring_info->id.port, ring_info->id.partner_id, ring_info,
> +                ring_info->mfn_mapping, ring_info->npage, ring_info->nmfns);
> +    }
> +
> +    return ret;
> +}
> +
> +static struct argo_ring_info *
> +ring_find_info(const struct domain *d, const struct argo_ring_id *id)
> +{
> +    unsigned int ring_hash_index;
> +    struct hlist_node *node;
> +    struct argo_ring_info *ring_info;
> +
> +    ASSERT(rw_is_locked(&d->argo->lock));
> +
> +    ring_hash_index = hash_index(id);
> +
> +    argo_dprintk("d->argo=%p, d->argo->ring_hash[%u]=%p id=%p\n",
> +                 d->argo, ring_hash_index,
> +                 d->argo->ring_hash[ring_hash_index].first, id);
> +    argo_dprintk("id.port=%x id.domain=vm%u id.partner_id=vm%d\n",
> +                 id->port, id->domain_id, id->partner_id);
> +
> +    hlist_for_each_entry(ring_info, node, 
> &d->argo->ring_hash[ring_hash_index],
> +                         node)
> +    {
> +        struct argo_ring_id *cmpid = &ring_info->id;

const?

> +
> +        if ( cmpid->port == id->port &&
> +             cmpid->domain_id == id->domain_id &&
> +             cmpid->partner_id == id->partner_id )
> +        {
> +            argo_dprintk("ring_info=%p\n", ring_info);
> +            return ring_info;
> +        }
> +    }
> +    argo_dprintk("no ring_info found\n");
> +
> +    return NULL;
> +}
> +
> +static long
> +register_ring(struct domain *currd,

If this is indeed the current domain (as the name suggests), why do
you need to pass it around? Or else just name the parameter d.

> +              XEN_GUEST_HANDLE_PARAM(xen_argo_register_ring_t) reg_hnd,
> +              XEN_GUEST_HANDLE_PARAM(xen_argo_page_descr_t) pg_descr_hnd,
> +              uint32_t npage, bool fail_exist)
> +{
> +    xen_argo_register_ring_t reg;
> +    struct argo_ring_id ring_id;
> +    void *map_ringp;
> +    xen_argo_ring_t *ringp;
> +    struct argo_ring_info *ring_info;
> +    struct argo_send_info *send_info = NULL;
> +    struct domain *dst_d = NULL;
> +    int ret = 0;
> +    uint32_t private_tx_ptr;
> +
> +    if ( copy_from_guest(&reg, reg_hnd, 1) )
> +    {
> +        ret = -EFAULT;
> +        goto out;

I don't see the point of using an out label, why not just use 'return
-EFAULT;' (here and below). This avoids the braces and also removes
the need for the ret assignment.

> +    }
> +
> +    /*
> +     * A ring must be large enough to transmit messages, so requires space 
> for:
> +     * * 1 message header, plus
> +     * * 1 payload slot (payload is always rounded to a multiple of 16 bytes)
> +     *   for the message payload to be written into, plus
> +     * * 1 more slot, so that the ring cannot be filled to capacity with a
> +     *   single message -- see the logic in ringbuf_insert -- allowing for 
> this
> +     *   ensures that there can be space remaining when a message is present.
> +     * The above determines the minimum acceptable ring size.
> +     */
> +    if ( (reg.len < (sizeof(struct xen_argo_ring_message_header)
> +                      + ROUNDUP_MESSAGE(1) + ROUNDUP_MESSAGE(1))) ||
> +         (reg.len > XEN_ARGO_MAX_RING_SIZE) ||
> +         (reg.len != ROUNDUP_MESSAGE(reg.len)) ||
> +         (reg.pad != 0) )
> +    {
> +        ret = -EINVAL;
> +        goto out;
> +    }
> +
> +    ring_id.partner_id = reg.partner_id;
> +    ring_id.port = reg.port;
> +    ring_id.domain_id = currd->domain_id;
> +
> +    read_lock(&argo_lock);
> +
> +    if ( !currd->argo )
> +    {
> +        ret = -ENODEV;
> +        goto out_unlock;
> +    }
> +
> +    if ( reg.partner_id == XEN_ARGO_DOMID_ANY )
> +    {
> +        if ( opt_argo_mac_enforcing )
> +        {
> +            ret = -EPERM;
> +            goto out_unlock;
> +        }
> +    }
> +    else
> +    {
> +        dst_d = get_domain_by_id(reg.partner_id);
> +        if ( !dst_d )
> +        {
> +            argo_dprintk("!dst_d, ESRCH\n");
> +            ret = -ESRCH;
> +            goto out_unlock;
> +        }
> +
> +        if ( !dst_d->argo )
> +        {
> +            argo_dprintk("!dst_d->argo, ECONNREFUSED\n");
> +            ret = -ECONNREFUSED;
> +            put_domain(dst_d);
> +            goto out_unlock;
> +        }
> +
> +        send_info = xzalloc(struct argo_send_info);
> +        if ( !send_info )
> +        {
> +            ret = -ENOMEM;
> +            put_domain(dst_d);
> +            goto out_unlock;
> +        }
> +        send_info->id = ring_id;
> +    }
> +
> +    write_lock(&currd->argo->lock);
> +
> +    if ( currd->argo->ring_count >= MAX_RINGS_PER_DOMAIN )
> +    {
> +        ret = -ENOSPC;
> +        goto out_unlock2;
> +    }
> +
> +    ring_info = ring_find_info(currd, &ring_id);
> +    if ( !ring_info )
> +    {
> +        ring_info = xzalloc(struct argo_ring_info);
> +        if ( !ring_info )
> +        {
> +            ret = -ENOMEM;
> +            goto out_unlock2;
> +        }
> +
> +        spin_lock_init(&ring_info->lock);
> +
> +        ring_info->id = ring_id;
> +        INIT_HLIST_HEAD(&ring_info->pending);
> +
> +        hlist_add_head(&ring_info->node,
> +                       &currd->argo->ring_hash[hash_index(&ring_info->id)]);
> +
> +        gprintk(XENLOG_DEBUG, "argo: vm%u registering ring (vm%u:%x vm%d)\n",
> +                currd->domain_id, ring_id.domain_id, ring_id.port,
> +                ring_id.partner_id);
> +    }
> +    else
> +    {
> +        if ( ring_info->len )
> +        {
> +            /*
> +             * If the caller specified that the ring must not already exist,
> +             * fail at attempt to add a completed ring which already exists.
> +             */
> +            if ( fail_exist )
> +            {
> +                argo_dprintk("disallowed reregistration of existing ring\n");
> +                ret = -EEXIST;
> +                goto out_unlock2;
> +            }
> +
> +            if ( ring_info->len != reg.len )
> +            {
> +                /*
> +                 * Change of ring size could result in entries on the pending
> +                 * notifications list that will never trigger.
> +                 * Simple blunt solution: disallow ring resize for now.
> +                 * TODO: investigate enabling ring resize.
> +                 */

I think ring resizing was removed on this version?

> +                gprintk(XENLOG_ERR,
> +                    "argo: vm%u attempted to change ring size(vm%u:%x 
> vm%d)\n",
> +                        currd->domain_id, ring_id.domain_id, ring_id.port,
> +                        ring_id.partner_id);
> +                /*
> +                 * Could return EINVAL here, but if the ring didn't already
> +                 * exist then the arguments would have been valid, so: 
> EEXIST.
> +                 */
> +                ret = -EEXIST;
> +                goto out_unlock2;
> +            }
> +
> +            gprintk(XENLOG_DEBUG,
> +                    "argo: vm%u re-registering existing ring (vm%u:%x 
> vm%d)\n",
> +                    currd->domain_id, ring_id.domain_id, ring_id.port,
> +                    ring_id.partner_id);
> +        }
> +    }
> +
> +    ret = find_ring_mfns(currd, ring_info, npage, pg_descr_hnd, reg.len);
> +    if ( ret )
> +    {
> +        gprintk(XENLOG_ERR,
> +                "argo: vm%u failed to find ring mfns (vm%u:%x vm%d)\n",
> +                currd->domain_id, ring_id.domain_id, ring_id.port,
> +                ring_id.partner_id);
> +
> +        ring_remove_info(currd, ring_info);
> +        goto out_unlock2;
> +    }
> +
> +    /*
> +     * The first page of the memory supplied for the ring has the 
> xen_argo_ring
> +     * structure at its head, which is where the ring indexes reside.
> +     */
> +    ret = ring_map_page(ring_info, 0, &map_ringp);
> +    if ( ret )
> +    {
> +        gprintk(XENLOG_ERR,
> +                "argo: vm%u failed to map ring mfn 0 (vm%u:%x vm%d)\n",
> +                currd->domain_id, ring_id.domain_id, ring_id.port,
> +                ring_id.partner_id);
> +
> +        ring_remove_info(currd, ring_info);
> +        goto out_unlock2;
> +    }
> +    ringp = map_ringp;
> +
> +    private_tx_ptr = read_atomic(&ringp->tx_ptr);
> +
> +    if ( (private_tx_ptr >= reg.len) ||
> +         (ROUNDUP_MESSAGE(private_tx_ptr) != private_tx_ptr) )
> +    {
> +        /*
> +         * Since the ring is a mess, attempt to flush the contents of it
> +         * here by setting the tx_ptr to the next aligned message slot past
> +         * the latest rx_ptr we have observed. Handle ring wrap correctly.
> +         */
> +        private_tx_ptr = ROUNDUP_MESSAGE(read_atomic(&ringp->rx_ptr));
> +
> +        if ( private_tx_ptr >= reg.len )
> +            private_tx_ptr = 0;
> +
> +        update_tx_ptr(ring_info, private_tx_ptr);
> +    }
> +
> +    ring_info->tx_ptr = private_tx_ptr;
> +    ring_info->len = reg.len;
> +    currd->argo->ring_count++;
> +
> +    if ( send_info )
> +    {
> +        spin_lock(&dst_d->argo->send_lock);
> +
> +        hlist_add_head(&send_info->node,
> +                       &dst_d->argo->send_hash[hash_index(&send_info->id)]);
> +
> +        spin_unlock(&dst_d->argo->send_lock);
> +    }
> +
> + out_unlock2:
> +    if ( !ret && send_info )
> +        xfree(send_info);

There's no need to check if send_info is set, xfree(NULL) is safe.

> +
> +    if ( dst_d )
> +        put_domain(dst_d);
> +
> +    write_unlock(&currd->argo->lock);
> +
> + out_unlock:
> +    read_unlock(&argo_lock);
> +
> + out:
> +    return ret;
> +}
> +
>  long
>  do_argo_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
>             XEN_GUEST_HANDLE_PARAM(void) arg2, unsigned long arg3,
> @@ -392,6 +926,38 @@ do_argo_op(unsigned int cmd, 
> XEN_GUEST_HANDLE_PARAM(void) arg1,
>
>      switch (cmd)
>      {
> +    case XEN_ARGO_OP_register_ring:
> +    {
> +        XEN_GUEST_HANDLE_PARAM(xen_argo_register_ring_t) reg_hnd =
> +            guest_handle_cast(arg1, xen_argo_register_ring_t);
> +        XEN_GUEST_HANDLE_PARAM(xen_argo_page_descr_t) pg_descr_hnd =
> +            guest_handle_cast(arg2, xen_argo_page_descr_t);
> +        /* arg3 is npage */
> +        /* arg4 is flags */
> +        bool fail_exist = arg4 & XEN_ARGO_REGISTER_FLAG_FAIL_EXIST;
> +
> +        if ( unlikely(arg3 > (XEN_ARGO_MAX_RING_SIZE >> PAGE_SHIFT)) )
> +        {
> +            rc = -EINVAL;
> +            break;
> +        }
> +        /*
> +         * Check access to the whole array here so we can use the faster 
> __copy
> +         * operations to read each element later.
> +         */
> +        if ( unlikely(!guest_handle_okay(pg_descr_hnd, arg3)) )
> +            break;
> +        /* arg4: reserve currently-undefined bits, require zero.  */
> +        if ( unlikely(arg4 & ~XEN_ARGO_REGISTER_FLAG_MASK) )
> +        {
> +            rc = -EINVAL;
> +            break;
> +        }
> +
> +        rc = register_ring(currd, reg_hnd, pg_descr_hnd, arg3, fail_exist);
> +        break;
> +    }
> +
>      default:
>          rc = -EOPNOTSUPP;
>          break;
> diff --git a/xen/include/asm-arm/guest_access.h 
> b/xen/include/asm-arm/guest_access.h
> index 8997a1c..70e9a78 100644
> --- a/xen/include/asm-arm/guest_access.h
> +++ b/xen/include/asm-arm/guest_access.h
> @@ -29,6 +29,8 @@ int access_guest_memory_by_ipa(struct domain *d, paddr_t 
> ipa, void *buf,
>  /* Is the guest handle a NULL reference? */
>  #define guest_handle_is_null(hnd)        ((hnd).p == NULL)
>
> +#define guest_handle_is_aligned(hnd, mask) (!((uintptr_t)(hnd).p & (mask)))
> +
>  /* Offset the given guest handle into the array it refers to. */
>  #define guest_handle_add_offset(hnd, nr) ((hnd).p += (nr))
>  #define guest_handle_subtract_offset(hnd, nr) ((hnd).p -= (nr))
> diff --git a/xen/include/asm-x86/guest_access.h 
> b/xen/include/asm-x86/guest_access.h
> index ca700c9..8dde5d5 100644
> --- a/xen/include/asm-x86/guest_access.h
> +++ b/xen/include/asm-x86/guest_access.h
> @@ -41,6 +41,8 @@
>  /* Is the guest handle a NULL reference? */
>  #define guest_handle_is_null(hnd)        ((hnd).p == NULL)
>
> +#define guest_handle_is_aligned(hnd, mask) (!((uintptr_t)(hnd).p & (mask)))
> +
>  /* Offset the given guest handle into the array it refers to. */
>  #define guest_handle_add_offset(hnd, nr) ((hnd).p += (nr))
>  #define guest_handle_subtract_offset(hnd, nr) ((hnd).p -= (nr))
> diff --git a/xen/include/public/argo.h b/xen/include/public/argo.h
> index 4818684..8947230 100644
> --- a/xen/include/public/argo.h
> +++ b/xen/include/public/argo.h
> @@ -31,6 +31,26 @@
>
>  #include "xen.h"
>
> +#define XEN_ARGO_DOMID_ANY       DOMID_INVALID
> +
> +/*
> + * The maximum size of an Argo ring is defined to be: 16GB

Is such a big size really required as the default maximum? The size of
the internal structures required to support a 16GB ring would be quite
big, has this been taken into account?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.