[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 07/15] argo: implement the register op



On Mon, Jan 14, 2019 at 6:19 AM Jan Beulich <JBeulich@xxxxxxxx> wrote:
>
> >>> On 07.01.19 at 08:42, <christopher.w.clark@xxxxxxxxx> wrote:
> > --- a/xen/common/argo.c
> > +++ b/xen/common/argo.c
> > @@ -23,16 +23,41 @@
> >  #include <xen/event.h>
> >  #include <xen/domain_page.h>
> >  #include <xen/guest_access.h>
> > +#include <xen/lib.h>
> > +#include <xen/nospec.h>
> >  #include <xen/time.h>
> >  #include <public/argo.h>
> >
> > +#define MAX_RINGS_PER_DOMAIN            128U
> > +
> > +/* All messages on the ring are padded to a multiple of the slot size. */
> > +#define ROUNDUP_MESSAGE(a) (ROUNDUP((a), XEN_ARGO_MSG_SLOT_SIZE))
>
> Pointless outermost pair of parentheses.

ack, removed

>
> > @@ -198,6 +223,31 @@ static DEFINE_RWLOCK(argo_lock); /* L1 */
> >  #define argo_dprintk(format, ... ) ((void)0)
> >  #endif
> >
> > +/*
> > + * This hash function is used to distribute rings within the per-domain
> > + * hash tables (d->argo->ring_hash and d->argo_send_hash). The hash table
> > + * will provide a struct if a match is found with a 'argo_ring_id' key:
> > + * ie. the key is a (domain id, port, partner domain id) tuple.
> > + * Since port number varies the most in expected use, and the Linux driver
> > + * allocates at both the high and low ends, incorporate high and low bits 
> > to
> > + * help with distribution.
> > + * Apply array_index_nospec as a defensive measure since this operates
> > + * on user-supplied input and the array size that it indexes into is known.
> > + */
> > +static unsigned int
> > +hash_index(const struct argo_ring_id *id)
> > +{
> > +    unsigned int hash;
> > +
> > +    hash = (uint16_t)(id->port >> 16);
> > +    hash ^= (uint16_t)id->port;
>
> I may have asked this before, but are the casts really needed
> with ...
>
> > +    hash ^= id->domain_id;
> > +    hash ^= id->partner_id;
> > +    hash &= (ARGO_HTABLE_SIZE - 1);
>
> ... the masking done here?
>
> > +    return array_index_nospec(hash, ARGO_HTABLE_SIZE);
>
> With the masking above - is this really needed?
>
> And then the question is whether the quality of the hash is
> sufficient: There won't be more set bits in the result than
> are in any of the three input values, so if they're all small,
> higher hash table entries won't be used at all. I would
> assume the goal to be that by the time 32 entities appear,
> chances be good that at least about 30 of the hash table
> entries are in use.

ok, I'll replace this function and address the above.
I'm out of time today so have added a FIXME for today's series posting
and will get it done tomorrow.

>
> > @@ -219,6 +269,78 @@ ring_unmap(struct argo_ring_info *ring_info)
> >      }
> >  }
> >
> > +static int
> > +ring_map_page(struct argo_ring_info *ring_info, unsigned int i, void 
> > **out_ptr)
> > +{
> > +    if ( i >= ring_info->nmfns )
> > +    {
> > +        gprintk(XENLOG_ERR,
> > +               "argo: ring (vm%u:%x vm%d) %p attempted to map page  %u of 
> > %u\n",
>
> ring_info->id.{domain,partner}_id look to be of the same type -
> why once %u and once %d? Same elsewhere.

Fixed across the series to use %u for domid_t output.

>
> > +                ring_info->id.domain_id, ring_info->id.port,
> > +                ring_info->id.partner_id, ring_info, i, ring_info->nmfns);
> > +        return -ENOMEM;
> > +    }
>
>     i = array_index_nospec(i, ring_info->nmfns);
>
> considering the array indexes here? Of course at this point only
> zero can be passed in, but I assume this changes in later patches
> and the index is at least indirectly guest controlled.

Added, thanks.

>
> > @@ -371,6 +493,418 @@ partner_rings_remove(struct domain *src_d)
> >      }
> >  }
> >
> > +static int
> > +find_ring_mfn(struct domain *d, gfn_t gfn, mfn_t *mfn)
>
> So you have find_ring_mfn(), find_ring_mfns(), and ring_find_info().
> Any chance you could use a consistent ordering of "ring" and "find"?
> Or is there a reason behind the apparent mismatch?

I've renamed them to use 'find_' as the common prefix. Look cleaner to
me. thanks.

>
> > +{
> > +    p2m_type_t p2mt;
> > +    int ret = 0;
> > +
> > +#ifdef CONFIG_X86
> > +    *mfn = get_gfn_unshare(d, gfn_x(gfn), &p2mt);
> > +#else
> > +    *mfn = p2m_lookup(d, gfn, &p2mt);
> > +#endif
> > +
> > +    if ( !mfn_valid(*mfn) )
> > +        ret = -EINVAL;
> > +#ifdef CONFIG_X86
> > +    else if ( p2m_is_paging(p2mt) || (p2mt == p2m_ram_logdirty) )
> > +        ret = -EAGAIN;
> > +#endif
> > +    else if ( (p2mt != p2m_ram_rw) ||
> > +              !get_page_and_type(mfn_to_page(*mfn), d, PGT_writable_page) )
> > +        ret = -EINVAL;
> > +
> > +#ifdef CONFIG_X86
> > +    put_gfn(d, gfn_x(gfn));
> > +#endif
> > +
> > +    return ret;
> > +}
>
> Please check whether you could leverage check_get_page_from_gfn()
> here. If you can't, please at least take inspiration as to e.g. the
> #ifdef-s from that function.

Have added a temporary FIXME for this and will do this tomorrow.

>
> > +static int
> > +find_ring_mfns(struct domain *d, struct argo_ring_info *ring_info,
> > +               uint32_t npage,
> > +               XEN_GUEST_HANDLE_PARAM(xen_argo_page_descr_t) pg_descr_hnd,
> > +               uint32_t len)
>
> Noticing it here, but perhaps still an issue elsewhere as well: Didn't
> we agree on removing unnecessary use of fixed width types? Or
> was that in the context on an earlier patch of v3?

These are fixed and hopefully all the others that do not belong are
also gone in v4.

>
> > +{
> > +    unsigned int i;
> > +    int ret = 0;
> > +    mfn_t *mfns;
> > +    uint8_t **mfn_mapping;
> > +
> > +    /*
> > +     * first bounds check on npage here also serves as an overflow check
> > +     * before left shifting it
> > +     */
> > +    if ( (unlikely(npage > (XEN_ARGO_MAX_RING_SIZE >> PAGE_SHIFT))) ||
>
> Isn't this redundant with the check in do_argo_p()?
>
> > +         ((npage << PAGE_SHIFT) < len) )
> > +        return -EINVAL;

Answering your point inline above: Yes - do_argo_op does the bounds
checking, so I've removed the entire check above.

> > +
> > +    if ( ring_info->mfns )
> > +    {
> > +        /* Ring already existed: drop the previous mapping. */
> > +        gprintk(XENLOG_INFO,
> > +         "argo: vm%u re-register existing ring (vm%u:%x vm%d) clears 
> > mapping\n",
>
> Indentation (also elsewhere).

Ack, fixed here and elsewhere.

>
> > +                d->domain_id, ring_info->id.domain_id,
> > +                ring_info->id.port, ring_info->id.partner_id);
> > +
> > +        ring_remove_mfns(d, ring_info);
> > +        ASSERT(!ring_info->mfns);
> > +    }
> > +
> > +    mfns = xmalloc_array(mfn_t, npage);
> > +    if ( !mfns )
> > +        return -ENOMEM;
> > +
> > +    for ( i = 0; i < npage; i++ )
> > +        mfns[i] = INVALID_MFN;
> > +
> > +    mfn_mapping = xzalloc_array(uint8_t *, npage);
> > +    if ( !mfn_mapping )
> > +    {
> > +        xfree(mfns);
> > +        return -ENOMEM;
> > +    }
> > +
> > +    ring_info->npage = npage;
> > +    ring_info->mfns = mfns;
> > +    ring_info->mfn_mapping = mfn_mapping;
>
> As the inverse to the cleanup sequence in an earlier patch: Please
> set ->npage last here even if it doesn't strictly matter.

npage is now gone after implementing Roger's feedback to only keep "len".

>
> > +    ASSERT(ring_info->npage == npage);
>
> What is this trying to make sure, seeing the assignment just a
> few lines up?

removed

>
> > +    if ( ring_info->nmfns == ring_info->npage )
> > +        return 0;
>
> Can this happen with the ring_remove_mfns() call above?

No, not any more, you're right.

>
> > +    for ( i = ring_info->nmfns; i < ring_info->npage; i++ )
>
> And hence can i start from other than zero here? And why not
> use the (possibly cheaper to access) function argument "npage"
> as the loop upper bound? The other similar loop a few lines up
> is coded that simpler way.

Yes, thanks, done.

>
> > +static long
> > +register_ring(struct domain *currd,
> > +              XEN_GUEST_HANDLE_PARAM(xen_argo_register_ring_t) reg_hnd,
> > +              XEN_GUEST_HANDLE_PARAM(xen_argo_page_descr_t) pg_descr_hnd,
> > +              uint32_t npage, bool fail_exist)
> > +{
> > +    xen_argo_register_ring_t reg;
> > +    struct argo_ring_id ring_id;
> > +    void *map_ringp;
> > +    xen_argo_ring_t *ringp;
> > +    struct argo_ring_info *ring_info;
> > +    struct argo_send_info *send_info = NULL;
> > +    struct domain *dst_d = NULL;
> > +    int ret = 0;
> > +    uint32_t private_tx_ptr;
> > +
> > +    if ( copy_from_guest(&reg, reg_hnd, 1) )
> > +    {
> > +        ret = -EFAULT;
> > +        goto out;
> > +    }
> > +
> > +    /*
> > +     * A ring must be large enough to transmit messages, so requires space 
> > for:
> > +     * * 1 message header, plus
> > +     * * 1 payload slot (payload is always rounded to a multiple of 16 
> > bytes)
> > +     *   for the message payload to be written into, plus
> > +     * * 1 more slot, so that the ring cannot be filled to capacity with a
> > +     *   single message -- see the logic in ringbuf_insert -- allowing for 
> > this
> > +     *   ensures that there can be space remaining when a message is 
> > present.
> > +     * The above determines the minimum acceptable ring size.
> > +     */
> > +    if ( (reg.len < (sizeof(struct xen_argo_ring_message_header)
> > +                      + ROUNDUP_MESSAGE(1) + ROUNDUP_MESSAGE(1))) ||
>
> These two summands don't look to fulfill the "cannot be filled to
> capacity" constraint the comment describes, as (aiui) messages
> can be larger than 16 bytes. What's the deal?

This is intended to be about bound checking reg.len against a minimum
size: the smallest ring that you can fit a message onto, as determined
by the logic in ringbuf_insert. The smallest message you can send is:
sizeof(struct xen_argo_ring_message_header) + ROUNDUP_MESSAGE(1)

and then ringbuf_insert won't accept a message unless there is (at
least) ROUNDUP_MESSAGE(1) space remaining, so that, plus the smallest
message size, is the smallest viable ring. There's no point accepting
registration of a ring smaller than that.

You're right that messages can be larger than 16 bytes, but they can
only be sent to rings that are larger than that minimum - on a minimum
sized ring, they'll be rejected by sendv.

>
> > +         (reg.len > XEN_ARGO_MAX_RING_SIZE) ||
> > +         (reg.len != ROUNDUP_MESSAGE(reg.len)) ||
> > +         (reg.pad != 0) )
> > +    {
> > +        ret = -EINVAL;
> > +        goto out;
> > +    }
> > +
> > +    ring_id.partner_id = reg.partner_id;
> > +    ring_id.port = reg.port;
> > +    ring_id.domain_id = currd->domain_id;
> > +
> > +    read_lock(&argo_lock);
>
> From here to ...
>
> > +    if ( !currd->argo )
> > +    {
> > +        ret = -ENODEV;
> > +        goto out_unlock;
> > +    }
> > +
> > +    if ( reg.partner_id == XEN_ARGO_DOMID_ANY )
> > +    {
> > +        if ( opt_argo_mac_enforcing )
> > +        {
> > +            ret = -EPERM;
> > +            goto out_unlock;
> > +        }
> > +    }
> > +    else
> > +    {
> > +        dst_d = get_domain_by_id(reg.partner_id);
> > +        if ( !dst_d )
> > +        {
> > +            argo_dprintk("!dst_d, ESRCH\n");
> > +            ret = -ESRCH;
> > +            goto out_unlock;
> > +        }
> > +
> > +        if ( !dst_d->argo )
> > +        {
> > +            argo_dprintk("!dst_d->argo, ECONNREFUSED\n");
> > +            ret = -ECONNREFUSED;
> > +            put_domain(dst_d);
> > +            goto out_unlock;
> > +        }
> > +
> > +        send_info = xzalloc(struct argo_send_info);
> > +        if ( !send_info )
> > +        {
> > +            ret = -ENOMEM;
> > +            put_domain(dst_d);
> > +            goto out_unlock;
> > +        }
> > +        send_info->id = ring_id;
> > +    }
>
> ... here, what exactly is it that requires the global read lock
> to be held ...
>
> > +    write_lock(&currd->argo->lock);
>
> ... prior to this? Holding locks around allocations is not
> forbidden, but should be avoided whenever possible.
>
> And then further why does the global read lock need
> continued holding until the end of the function?

I've added FIXME to review this tomorrow. I understand the
argo-internal locking protocols and this is adhering to what they
state, in that accesses to the argo structs of currd and dst_d are
protected by the global read lock here, but at the moment I'm less
clear on what the expectations are for standard Xen domain locks,
references and lifecycle are.

>
> > +    if ( currd->argo->ring_count >= MAX_RINGS_PER_DOMAIN )
> > +    {
> > +        ret = -ENOSPC;
> > +        goto out_unlock2;
> > +    }
> > +
> > +    ring_info = ring_find_info(currd, &ring_id);
> > +    if ( !ring_info )
> > +    {
> > +        ring_info = xzalloc(struct argo_ring_info);
> > +        if ( !ring_info )
> > +        {
> > +            ret = -ENOMEM;
> > +            goto out_unlock2;
> > +        }
> > +
> > +        spin_lock_init(&ring_info->lock);
> > +
> > +        ring_info->id = ring_id;
> > +        INIT_HLIST_HEAD(&ring_info->pending);
> > +
> > +        hlist_add_head(&ring_info->node,
> > +                       
> > &currd->argo->ring_hash[hash_index(&ring_info->id)]);
> > +
> > +        gprintk(XENLOG_DEBUG, "argo: vm%u registering ring (vm%u:%x 
> > vm%d)\n",
> > +                currd->domain_id, ring_id.domain_id, ring_id.port,
> > +                ring_id.partner_id);
> > +    }
> > +    else
> > +    {
> > +        if ( ring_info->len )
> > +        {
>
> Please fold into "else if ( )", removing a level of indentation.

ack, done

>
> > +            /*
> > +             * If the caller specified that the ring must not already 
> > exist,
> > +             * fail at attempt to add a completed ring which already 
> > exists.
> > +             */
> > +            if ( fail_exist )
> > +            {
> > +                argo_dprintk("disallowed reregistration of existing 
> > ring\n");
> > +                ret = -EEXIST;
> > +                goto out_unlock2;
> > +            }
> > +
> > +            if ( ring_info->len != reg.len )
> > +            {
> > +                /*
> > +                 * Change of ring size could result in entries on the 
> > pending
> > +                 * notifications list that will never trigger.
> > +                 * Simple blunt solution: disallow ring resize for now.
> > +                 * TODO: investigate enabling ring resize.
> > +                 */
> > +                gprintk(XENLOG_ERR,
> > +                    "argo: vm%u attempted to change ring size(vm%u:%x 
> > vm%d)\n",
> > +                        currd->domain_id, ring_id.domain_id, ring_id.port,
> > +                        ring_id.partner_id);
> > +                /*
> > +                 * Could return EINVAL here, but if the ring didn't already
> > +                 * exist then the arguments would have been valid, so: 
> > EEXIST.
> > +                 */
> > +                ret = -EEXIST;
> > +                goto out_unlock2;
> > +            }
> > +
> > +            gprintk(XENLOG_DEBUG,
> > +                    "argo: vm%u re-registering existing ring (vm%u:%x 
> > vm%d)\n",
> > +                    currd->domain_id, ring_id.domain_id, ring_id.port,
> > +                    ring_id.partner_id);
> > +        }
> > +    }
> > +
> > +    ret = find_ring_mfns(currd, ring_info, npage, pg_descr_hnd, reg.len);
> > +    if ( ret )
> > +    {
> > +        gprintk(XENLOG_ERR,
> > +                "argo: vm%u failed to find ring mfns (vm%u:%x vm%d)\n",
> > +                currd->domain_id, ring_id.domain_id, ring_id.port,
> > +                ring_id.partner_id);
> > +
> > +        ring_remove_info(currd, ring_info);
> > +        goto out_unlock2;
> > +    }
> > +
> > +    /*
> > +     * The first page of the memory supplied for the ring has the 
> > xen_argo_ring
> > +     * structure at its head, which is where the ring indexes reside.
> > +     */
> > +    ret = ring_map_page(ring_info, 0, &map_ringp);
> > +    if ( ret )
> > +    {
> > +        gprintk(XENLOG_ERR,
> > +                "argo: vm%u failed to map ring mfn 0 (vm%u:%x vm%d)\n",
> > +                currd->domain_id, ring_id.domain_id, ring_id.port,
> > +                ring_id.partner_id);
> > +
> > +        ring_remove_info(currd, ring_info);
> > +        goto out_unlock2;
> > +    }
> > +    ringp = map_ringp;
> > +
> > +    private_tx_ptr = read_atomic(&ringp->tx_ptr);
> > +
> > +    if ( (private_tx_ptr >= reg.len) ||
> > +         (ROUNDUP_MESSAGE(private_tx_ptr) != private_tx_ptr) )
> > +    {
> > +        /*
> > +         * Since the ring is a mess, attempt to flush the contents of it
> > +         * here by setting the tx_ptr to the next aligned message slot past
> > +         * the latest rx_ptr we have observed. Handle ring wrap correctly.
> > +         */
> > +        private_tx_ptr = ROUNDUP_MESSAGE(read_atomic(&ringp->rx_ptr));
> > +
> > +        if ( private_tx_ptr >= reg.len )
> > +            private_tx_ptr = 0;
> > +
> > +        update_tx_ptr(ring_info, private_tx_ptr);
> > +    }
> > +
> > +    ring_info->tx_ptr = private_tx_ptr;
> > +    ring_info->len = reg.len;
> > +    currd->argo->ring_count++;
> > +
> > +    if ( send_info )
> > +    {
> > +        spin_lock(&dst_d->argo->send_lock);
> > +
> > +        hlist_add_head(&send_info->node,
> > +                       
> > &dst_d->argo->send_hash[hash_index(&send_info->id)]);
> > +
> > +        spin_unlock(&dst_d->argo->send_lock);
> > +    }
> > +
> > + out_unlock2:
> > +    if ( !ret && send_info )
> > +        xfree(send_info);
> > +
> > +    if ( dst_d )
> > +        put_domain(dst_d);
> > +
> > +    write_unlock(&currd->argo->lock);
>
> Surely you can drop the lock before the other two cleanup
> actions? That would then allow you to add another label to
> absorb the two separate put_domain() calls on error paths.

That looks correct. Added a FIXME note now and will fix tomorrow. thanks.

>
> > --- a/xen/include/asm-arm/guest_access.h
> > +++ b/xen/include/asm-arm/guest_access.h
> > @@ -29,6 +29,8 @@ int access_guest_memory_by_ipa(struct domain *d, paddr_t 
> > ipa, void *buf,
> >  /* Is the guest handle a NULL reference? */
> >  #define guest_handle_is_null(hnd)        ((hnd).p == NULL)
> >
> > +#define guest_handle_is_aligned(hnd, mask) (!((uintptr_t)(hnd).p & (mask)))
>
> This is unused throughout the patch afaics.

Removed.

>
> > --- a/xen/include/public/argo.h
> > +++ b/xen/include/public/argo.h
> > @@ -31,6 +31,26 @@
> >
> >  #include "xen.h"
> >
> > +#define XEN_ARGO_DOMID_ANY       DOMID_INVALID
> > +
> > +/*
> > + * The maximum size of an Argo ring is defined to be: 16GB
> > + *  -- which is 0x1000000 bytes.
> > + * A byte index into the ring is at most 24 bits.
> > + */
> > +#define XEN_ARGO_MAX_RING_SIZE  (0x1000000ULL)
> > +
> > +/*
> > + * Page descriptor: encoding both page address and size in a 64-bit value.
> > + * Intended to allow ABI to support use of different granularity pages.
> > + * example of how to populate:
> > + * xen_argo_page_descr_t pg_desc =
> > + *      (physaddr & PAGE_MASK) | XEN_ARGO_PAGE_DESCR_SIZE_4K;
> > + */
> > +typedef uint64_t xen_argo_page_descr_t;
> > +#define XEN_ARGO_PAGE_DESCR_SIZE_MASK   0x0000000000000fffULL
> > +#define XEN_ARGO_PAGE_DESCR_SIZE_4K     0
>
> Are the _DESCR_ infixes here really useful?

These are now gone, with Julien's approval for the change back to use
the gfn-using interfaces.

>
> > @@ -56,4 +76,56 @@ typedef struct xen_argo_ring
> >  #endif
> >  } xen_argo_ring_t;
> >
> > +typedef struct xen_argo_register_ring
> > +{
> > +    uint32_t port;
> > +    domid_t partner_id;
> > +    uint16_t pad;
> > +    uint32_t len;
> > +} xen_argo_register_ring_t;
> > +
> > +/* Messages on the ring are padded to a multiple of this size. */
> > +#define XEN_ARGO_MSG_SLOT_SIZE 0x10
> > +
> > +struct xen_argo_ring_message_header
> > +{
> > +    uint32_t len;
> > +    xen_argo_addr_t source;
> > +    uint32_t message_type;
> > +#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
> > +    uint8_t data[];
> > +#elif defined(__GNUC__)
> > +    uint8_t data[0];
> > +#endif
> > +};
> > +
> > +/*
> > + * Hypercall operations
> > + */
> > +
> > +/*
> > + * XEN_ARGO_OP_register_ring
> > + *
> > + * Register a ring using the indicated memory.
> > + * Also used to reregister an existing ring (eg. after resume from 
> > hibernate).
> > + *
> > + * arg1: XEN_GUEST_HANDLE(xen_argo_register_ring_t)
> > + * arg2: XEN_GUEST_HANDLE(xen_argo_page_descr_t)
> > + * arg3: unsigned long npages
> > + * arg4: unsigned long flags
>
> The "unsigned long"-s here are not necessarily compatible with
> compat mode. At the very least flags above bit 31 won't be
> usable by compat mode guests. Hence I also question ...
>
> > + */
> > +#define XEN_ARGO_OP_register_ring     1
> > +
> > +/* Register op flags */
> > +/*
> > + * Fail exist:
> > + * If set, reject attempts to (re)register an existing established ring.
> > + * If clear, reregistration occurs if the ring exists, with the new ring
> > + * taking the place of the old, preserving tx_ptr if it remains valid.
> > + */
> > +#define XEN_ARGO_REGISTER_FLAG_FAIL_EXIST  0x1
> > +
> > +/* Mask for all defined flags. unsigned long type so ok for both 32/64-bit 
> > */
> > +#define XEN_ARGO_REGISTER_FLAG_MASK 0x1UL
>
> ... the UL suffix here. Also this last item should not be exposed
> (perhaps framed by "#ifdef __XEN__") and would perhaps anyway
> better be defined in terms of the other
> XEN_ARGO_REGISTER_FLAG_*.

Notes added in place for each of the above; will work on these tomorrow.

thanks

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.