|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v3 07/15] argo: implement the register op
On Thu, Jan 10, 2019 at 3:25 AM Roger Pau Monné <royger@xxxxxxxxx> wrote:
>
> On Mon, Jan 7, 2019 at 8:44 AM Christopher Clark
> <christopher.w.clark@xxxxxxxxx> wrote:
> >
> > The register op is used by a domain to register a region of memory for
> > receiving messages from either a specified other domain, or, if specifying a
> > wildcard, any domain.
> >
> > diff --git a/docs/misc/xen-command-line.pandoc
> > b/docs/misc/xen-command-line.pandoc
> > index aea13eb..68d4415 100644
> > --- a/docs/misc/xen-command-line.pandoc
> > +++ b/docs/misc/xen-command-line.pandoc
> > @@ -193,6 +193,21 @@ This allows domains access to the Argo hypercall,
> > which supports registration
> > of memory rings with the hypervisor to receive messages, sending messages
> > to
> > other domains by hypercall and querying the ring status of other domains.
> >
> > +### argo-mac
> > +> `= permissive | enforcing`
>
> Why not call this argo-mac-permissive and make it a boolean? Default
> would be 'false' and that would imply enforcing. This would get rid of
> parse_opt_argo_mac since you could use the default boolean parser.
Yes, that makes sense, thanks -- done
> > +static int
> > +ring_map_page(struct argo_ring_info *ring_info, unsigned int i, void
> > **out_ptr)
> > +{
> > + if ( i >= ring_info->nmfns )
> > + {
> > + gprintk(XENLOG_ERR,
> > + "argo: ring (vm%u:%x vm%d) %p attempted to map page %u of
> > %u\n",
> > + ring_info->id.domain_id, ring_info->id.port,
> > + ring_info->id.partner_id, ring_info, i, ring_info->nmfns);
> > + return -ENOMEM;
> > + }
> > +
> > + if ( !ring_info->mfns || !ring_info->mfn_mapping)
> > + {
> > + ASSERT_UNREACHABLE();
> > + ring_info->len = 0;
> > + return -ENOMEM;
> > + }
> > +
> > + if ( !ring_info->mfn_mapping[i] )
> > + {
> > + /*
> > + * TODO:
> > + * The first page of the ring contains the ring indices, so both
> > read
> > + * and write access to the page is required by the hypervisor, but
> > + * read-access is not needed for this mapping for the remainder of
> > the
> > + * ring.
> > + * Since this mapping will remain resident in Xen's address space
> > for
> > + * the lifetime of the ring, and following the principle of least
> > + * privilege, it could be preferable to:
> > + * # add a XSM check to determine what policy is wanted here
> > + * # depending on the XSM query, optionally create this mapping as
> > + * _write-only_ on platforms that can support it.
> > + * (eg. Intel EPT/AMD NPT).
>
> Why do Intel EPT or AMD NPT matter here?
I think (though could be wrong and am open to correction here) that
EPT and NPT enable the construction of write-only (ie not readable)
memory mappings. Standard page tables can't do that: with those,
if it's writable, it's also readable.
> You are mapping the page to Xen address space, which doesn't use
> either EPT or NPT. Writable or read-only mappings would be created by
> setting the right bit in the Xen page tables.
ok. I've dropped the comment.
>
> > + */
> > + ring_info->mfn_mapping[i] =
> > map_domain_page_global(ring_info->mfns[i]);
> > +
>
> No need for the newline.
ack.
> > +static void
> > +update_tx_ptr(struct argo_ring_info *ring_info, uint32_t tx_ptr)
> > +{
> > + void *dst;
> > + uint32_t *p;
> > +
> > + ASSERT(ring_info->mfn_mapping[0]);
> > +
> > + ring_info->tx_ptr = tx_ptr;
> > +
> > + dst = ring_info->mfn_mapping[0];
> > + p = dst + offsetof(xen_argo_ring_t, tx_ptr);
>
> Hm, wouldn't it be easier to cast page 0 to the layout of the ring so
> that you don't need to use pointer arithmetic to get the fields? Ie:
> make dst be of type xen_argo_ring_t.
Yes, good point - and that's what's already done elsewhere with the
rx_ptr, so that makes the code more consistent. Done.
> > +static int
> > +find_ring_mfns(struct domain *d, struct argo_ring_info *ring_info,
> > + uint32_t npage,
> > + XEN_GUEST_HANDLE_PARAM(xen_argo_page_descr_t) pg_descr_hnd,
> > + uint32_t len)
> > +{
> > + unsigned int i;
> > + int ret = 0;
> > + mfn_t *mfns;
> > + uint8_t **mfn_mapping;
> > +
> > + /*
> > + * first bounds check on npage here also serves as an overflow check
> > + * before left shifting it
> > + */
> > + if ( (unlikely(npage > (XEN_ARGO_MAX_RING_SIZE >> PAGE_SHIFT))) ||
> > + ((npage << PAGE_SHIFT) < len) )
> > + return -EINVAL;
> > +
> > + if ( ring_info->mfns )
> > + {
> > + /* Ring already existed: drop the previous mapping. */
> > + gprintk(XENLOG_INFO,
> > + "argo: vm%u re-register existing ring (vm%u:%x vm%d) clears
> > mapping\n",
> > + d->domain_id, ring_info->id.domain_id,
> > + ring_info->id.port, ring_info->id.partner_id);
> > +
> > + ring_remove_mfns(d, ring_info);
> > + ASSERT(!ring_info->mfns);
> > + }
> > +
> > + mfns = xmalloc_array(mfn_t, npage);
> > + if ( !mfns )
> > + return -ENOMEM;
> > +
> > + for ( i = 0; i < npage; i++ )
> > + mfns[i] = INVALID_MFN;
> > +
> > + mfn_mapping = xzalloc_array(uint8_t *, npage);
> > + if ( !mfn_mapping )
> > + {
> > + xfree(mfns);
> > + return -ENOMEM;
> > + }
> > +
> > + ring_info->npage = npage;
> > + ring_info->mfns = mfns;
> > + ring_info->mfn_mapping = mfn_mapping;
> > +
> > + ASSERT(ring_info->npage == npage);
> > +
> > + if ( ring_info->nmfns == ring_info->npage )
> > + return 0;
> > +
> > + for ( i = ring_info->nmfns; i < ring_info->npage; i++ )
>
> This loop seems to assume that there can be pages already added to the
> ring, but IIRC you said that redimensioning of the ring was removed in
> this version?
That's correct - it's currently rejected.
> I think for an initial version it would be easier to don't allow
> redimensioning of active rings, and just allow teardown and
> re-initialization as the way to redimension a ring.
ack. I'll look into how it affects this function and simplifying it.
>
> > + {
> > + xen_argo_page_descr_t pg_descr;
> > + gfn_t gfn;
> > + mfn_t mfn;
> > +
> > + ret = __copy_from_guest_offset(&pg_descr, pg_descr_hnd, i, 1) ?
> > + -EFAULT : 0;
> > + if ( ret )
> > + break;
> > +
> > + /* Implementation currently only supports handling 4K pages */
> > + if ( (pg_descr & XEN_ARGO_PAGE_DESCR_SIZE_MASK) !=
> > + XEN_ARGO_PAGE_DESCR_SIZE_4K )
> > + {
> > + ret = -EINVAL;
> > + break;
> > + }
> > + gfn = _gfn(pg_descr >> PAGE_SHIFT);
> > +
> > + ret = find_ring_mfn(d, gfn, &mfn);
> > + if ( ret )
> > + {
> > + gprintk(XENLOG_ERR,
> > + "argo: vm%u: invalid gfn %"PRI_gfn" r:(vm%u:%x vm%d) %p
> > %d/%d\n",
> > + d->domain_id, gfn_x(gfn), ring_info->id.domain_id,
> > + ring_info->id.port, ring_info->id.partner_id,
> > + ring_info, i, ring_info->npage);
> > + break;
> > + }
> > +
> > + ring_info->mfns[i] = mfn;
> > +
> > + argo_dprintk("%d: %"PRI_gfn" -> %"PRI_mfn"\n",
> > + i, gfn_x(gfn), mfn_x(ring_info->mfns[i]));
> > + }
> > +
> > + ring_info->nmfns = i;
> > +
> > + if ( ret )
> > + ring_remove_mfns(d, ring_info);
> > + else
> > + {
> > + ASSERT(ring_info->nmfns == ring_info->npage);
> > +
> > + gprintk(XENLOG_DEBUG,
> > + "argo: vm%u ring (vm%u:%x vm%d) %p mfn_mapping %p npage %d nmfns
> > %d\n",
> > + d->domain_id, ring_info->id.domain_id,
> > + ring_info->id.port, ring_info->id.partner_id, ring_info,
> > + ring_info->mfn_mapping, ring_info->npage,
> > ring_info->nmfns);
> > + }
> > +
> > + return ret;
> > +}
> > +
> > +static struct argo_ring_info *
> > +ring_find_info(const struct domain *d, const struct argo_ring_id *id)
> > +{
> > + unsigned int ring_hash_index;
> > + struct hlist_node *node;
> > + struct argo_ring_info *ring_info;
> > +
> > + ASSERT(rw_is_locked(&d->argo->lock));
> > +
> > + ring_hash_index = hash_index(id);
> > +
> > + argo_dprintk("d->argo=%p, d->argo->ring_hash[%u]=%p id=%p\n",
> > + d->argo, ring_hash_index,
> > + d->argo->ring_hash[ring_hash_index].first, id);
> > + argo_dprintk("id.port=%x id.domain=vm%u id.partner_id=vm%d\n",
> > + id->port, id->domain_id, id->partner_id);
> > +
> > + hlist_for_each_entry(ring_info, node,
> > &d->argo->ring_hash[ring_hash_index],
> > + node)
> > + {
> > + struct argo_ring_id *cmpid = &ring_info->id;
>
> const?
yep, thanks, done.
>
> > +
> > + if ( cmpid->port == id->port &&
> > + cmpid->domain_id == id->domain_id &&
> > + cmpid->partner_id == id->partner_id )
> > + {
> > + argo_dprintk("ring_info=%p\n", ring_info);
> > + return ring_info;
> > + }
> > + }
> > + argo_dprintk("no ring_info found\n");
> > +
> > + return NULL;
> > +}
> > +
> > +static long
> > +register_ring(struct domain *currd,
>
> If this is indeed the current domain (as the name suggests), why do
> you need to pass it around? Or else just name the parameter d.
After the later in-thread discussion between you and Jan, I've left
the argument name 'currd' but added the ASSERT recommended by Jan,
that currd matches domain->current.
I've done the same in the other functions (across the series) that
take currd as an argument, except for notify_check_pending where I've
just renamed the argument to 'd'; there's no reason in that function
that it needs to be handling the current domain.
>
> > + XEN_GUEST_HANDLE_PARAM(xen_argo_register_ring_t) reg_hnd,
> > + XEN_GUEST_HANDLE_PARAM(xen_argo_page_descr_t) pg_descr_hnd,
> > + uint32_t npage, bool fail_exist)
> > +{
> > + xen_argo_register_ring_t reg;
> > + struct argo_ring_id ring_id;
> > + void *map_ringp;
> > + xen_argo_ring_t *ringp;
> > + struct argo_ring_info *ring_info;
> > + struct argo_send_info *send_info = NULL;
> > + struct domain *dst_d = NULL;
> > + int ret = 0;
> > + uint32_t private_tx_ptr;
> > +
> > + if ( copy_from_guest(®, reg_hnd, 1) )
> > + {
> > + ret = -EFAULT;
> > + goto out;
>
> I don't see the point of using an out label, why not just use 'return
> -EFAULT;' (here and below). This avoids the braces and also removes
> the need for the ret assignment.
done.
>
> > + }
> > +
> > + /*
> > + * A ring must be large enough to transmit messages, so requires space
> > for:
> > + * * 1 message header, plus
> > + * * 1 payload slot (payload is always rounded to a multiple of 16
> > bytes)
> > + * for the message payload to be written into, plus
> > + * * 1 more slot, so that the ring cannot be filled to capacity with a
> > + * single message -- see the logic in ringbuf_insert -- allowing for
> > this
> > + * ensures that there can be space remaining when a message is
> > present.
> > + * The above determines the minimum acceptable ring size.
> > + */
> > + if ( (reg.len < (sizeof(struct xen_argo_ring_message_header)
> > + + ROUNDUP_MESSAGE(1) + ROUNDUP_MESSAGE(1))) ||
> > + (reg.len > XEN_ARGO_MAX_RING_SIZE) ||
> > + (reg.len != ROUNDUP_MESSAGE(reg.len)) ||
> > + (reg.pad != 0) )
> > + {
> > + ret = -EINVAL;
> > + goto out;
> > + }
> > +
> > + ring_id.partner_id = reg.partner_id;
> > + ring_id.port = reg.port;
> > + ring_id.domain_id = currd->domain_id;
> > +
> > + read_lock(&argo_lock);
> > +
> > + if ( !currd->argo )
> > + {
> > + ret = -ENODEV;
> > + goto out_unlock;
> > + }
> > +
> > + if ( reg.partner_id == XEN_ARGO_DOMID_ANY )
> > + {
> > + if ( opt_argo_mac_enforcing )
> > + {
> > + ret = -EPERM;
> > + goto out_unlock;
> > + }
> > + }
> > + else
> > + {
> > + dst_d = get_domain_by_id(reg.partner_id);
> > + if ( !dst_d )
> > + {
> > + argo_dprintk("!dst_d, ESRCH\n");
> > + ret = -ESRCH;
> > + goto out_unlock;
> > + }
> > +
> > + if ( !dst_d->argo )
> > + {
> > + argo_dprintk("!dst_d->argo, ECONNREFUSED\n");
> > + ret = -ECONNREFUSED;
> > + put_domain(dst_d);
> > + goto out_unlock;
> > + }
> > +
> > + send_info = xzalloc(struct argo_send_info);
> > + if ( !send_info )
> > + {
> > + ret = -ENOMEM;
> > + put_domain(dst_d);
> > + goto out_unlock;
> > + }
> > + send_info->id = ring_id;
> > + }
> > +
> > + write_lock(&currd->argo->lock);
> > +
> > + if ( currd->argo->ring_count >= MAX_RINGS_PER_DOMAIN )
> > + {
> > + ret = -ENOSPC;
> > + goto out_unlock2;
> > + }
> > +
> > + ring_info = ring_find_info(currd, &ring_id);
> > + if ( !ring_info )
> > + {
> > + ring_info = xzalloc(struct argo_ring_info);
> > + if ( !ring_info )
> > + {
> > + ret = -ENOMEM;
> > + goto out_unlock2;
> > + }
> > +
> > + spin_lock_init(&ring_info->lock);
> > +
> > + ring_info->id = ring_id;
> > + INIT_HLIST_HEAD(&ring_info->pending);
> > +
> > + hlist_add_head(&ring_info->node,
> > +
> > &currd->argo->ring_hash[hash_index(&ring_info->id)]);
> > +
> > + gprintk(XENLOG_DEBUG, "argo: vm%u registering ring (vm%u:%x
> > vm%d)\n",
> > + currd->domain_id, ring_id.domain_id, ring_id.port,
> > + ring_id.partner_id);
> > + }
> > + else
> > + {
> > + if ( ring_info->len )
> > + {
> > + /*
> > + * If the caller specified that the ring must not already
> > exist,
> > + * fail at attempt to add a completed ring which already
> > exists.
> > + */
> > + if ( fail_exist )
> > + {
> > + argo_dprintk("disallowed reregistration of existing
> > ring\n");
> > + ret = -EEXIST;
> > + goto out_unlock2;
> > + }
> > +
> > + if ( ring_info->len != reg.len )
> > + {
> > + /*
> > + * Change of ring size could result in entries on the
> > pending
> > + * notifications list that will never trigger.
> > + * Simple blunt solution: disallow ring resize for now.
> > + * TODO: investigate enabling ring resize.
> > + */
>
> I think ring resizing was removed on this version?
Yes: This is the code that was introduced to prevent it.
>
> > + gprintk(XENLOG_ERR,
> > + "argo: vm%u attempted to change ring size(vm%u:%x
> > vm%d)\n",
> > + currd->domain_id, ring_id.domain_id, ring_id.port,
> > + ring_id.partner_id);
> > + /*
> > + * Could return EINVAL here, but if the ring didn't already
> > + * exist then the arguments would have been valid, so:
> > EEXIST.
> > + */
> > + ret = -EEXIST;
> > + goto out_unlock2;
> > + }
> > +
> > + gprintk(XENLOG_DEBUG,
> > + "argo: vm%u re-registering existing ring (vm%u:%x
> > vm%d)\n",
> > + currd->domain_id, ring_id.domain_id, ring_id.port,
> > + ring_id.partner_id);
> > + }
> > + }
> > +
> > + ret = find_ring_mfns(currd, ring_info, npage, pg_descr_hnd, reg.len);
> > + if ( ret )
> > + {
> > + gprintk(XENLOG_ERR,
> > + "argo: vm%u failed to find ring mfns (vm%u:%x vm%d)\n",
> > + currd->domain_id, ring_id.domain_id, ring_id.port,
> > + ring_id.partner_id);
> > +
> > + ring_remove_info(currd, ring_info);
> > + goto out_unlock2;
> > + }
> > +
> > + /*
> > + * The first page of the memory supplied for the ring has the
> > xen_argo_ring
> > + * structure at its head, which is where the ring indexes reside.
> > + */
> > + ret = ring_map_page(ring_info, 0, &map_ringp);
> > + if ( ret )
> > + {
> > + gprintk(XENLOG_ERR,
> > + "argo: vm%u failed to map ring mfn 0 (vm%u:%x vm%d)\n",
> > + currd->domain_id, ring_id.domain_id, ring_id.port,
> > + ring_id.partner_id);
> > +
> > + ring_remove_info(currd, ring_info);
> > + goto out_unlock2;
> > + }
> > + ringp = map_ringp;
> > +
> > + private_tx_ptr = read_atomic(&ringp->tx_ptr);
> > +
> > + if ( (private_tx_ptr >= reg.len) ||
> > + (ROUNDUP_MESSAGE(private_tx_ptr) != private_tx_ptr) )
> > + {
> > + /*
> > + * Since the ring is a mess, attempt to flush the contents of it
> > + * here by setting the tx_ptr to the next aligned message slot past
> > + * the latest rx_ptr we have observed. Handle ring wrap correctly.
> > + */
> > + private_tx_ptr = ROUNDUP_MESSAGE(read_atomic(&ringp->rx_ptr));
> > +
> > + if ( private_tx_ptr >= reg.len )
> > + private_tx_ptr = 0;
> > +
> > + update_tx_ptr(ring_info, private_tx_ptr);
> > + }
> > +
> > + ring_info->tx_ptr = private_tx_ptr;
> > + ring_info->len = reg.len;
> > + currd->argo->ring_count++;
> > +
> > + if ( send_info )
> > + {
> > + spin_lock(&dst_d->argo->send_lock);
> > +
> > + hlist_add_head(&send_info->node,
> > +
> > &dst_d->argo->send_hash[hash_index(&send_info->id)]);
> > +
> > + spin_unlock(&dst_d->argo->send_lock);
> > + }
> > +
> > + out_unlock2:
> > + if ( !ret && send_info )
> > + xfree(send_info);
>
> There's no need to check if send_info is set, xfree(NULL) is safe.
done.
>
> > +
> > + if ( dst_d )
> > + put_domain(dst_d);
> > +
> > + write_unlock(&currd->argo->lock);
> > +
> > + out_unlock:
> > + read_unlock(&argo_lock);
> > +
> > + out:
> > + return ret;
> > +}
> > +
> > long
> > do_argo_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
> > XEN_GUEST_HANDLE_PARAM(void) arg2, unsigned long arg3,
> > @@ -392,6 +926,38 @@ do_argo_op(unsigned int cmd,
> > XEN_GUEST_HANDLE_PARAM(void) arg1,
> >
> > switch (cmd)
> > {
> > + case XEN_ARGO_OP_register_ring:
> > + {
> > + XEN_GUEST_HANDLE_PARAM(xen_argo_register_ring_t) reg_hnd =
> > + guest_handle_cast(arg1, xen_argo_register_ring_t);
> > + XEN_GUEST_HANDLE_PARAM(xen_argo_page_descr_t) pg_descr_hnd =
> > + guest_handle_cast(arg2, xen_argo_page_descr_t);
> > + /* arg3 is npage */
> > + /* arg4 is flags */
> > + bool fail_exist = arg4 & XEN_ARGO_REGISTER_FLAG_FAIL_EXIST;
> > +
> > + if ( unlikely(arg3 > (XEN_ARGO_MAX_RING_SIZE >> PAGE_SHIFT)) )
> > + {
> > + rc = -EINVAL;
> > + break;
> > + }
> > + /*
> > + * Check access to the whole array here so we can use the faster
> > __copy
> > + * operations to read each element later.
> > + */
> > + if ( unlikely(!guest_handle_okay(pg_descr_hnd, arg3)) )
> > + break;
> > + /* arg4: reserve currently-undefined bits, require zero. */
> > + if ( unlikely(arg4 & ~XEN_ARGO_REGISTER_FLAG_MASK) )
> > + {
> > + rc = -EINVAL;
> > + break;
> > + }
> > +
> > + rc = register_ring(currd, reg_hnd, pg_descr_hnd, arg3, fail_exist);
> > + break;
> > + }
> > +
> > default:
> > rc = -EOPNOTSUPP;
> > break;
> > diff --git a/xen/include/asm-arm/guest_access.h
> > b/xen/include/asm-arm/guest_access.h
> > index 8997a1c..70e9a78 100644
> > --- a/xen/include/asm-arm/guest_access.h
> > +++ b/xen/include/asm-arm/guest_access.h
> > @@ -29,6 +29,8 @@ int access_guest_memory_by_ipa(struct domain *d, paddr_t
> > ipa, void *buf,
> > /* Is the guest handle a NULL reference? */
> > #define guest_handle_is_null(hnd) ((hnd).p == NULL)
> >
> > +#define guest_handle_is_aligned(hnd, mask) (!((uintptr_t)(hnd).p & (mask)))
> > +
> > /* Offset the given guest handle into the array it refers to. */
> > #define guest_handle_add_offset(hnd, nr) ((hnd).p += (nr))
> > #define guest_handle_subtract_offset(hnd, nr) ((hnd).p -= (nr))
> > diff --git a/xen/include/asm-x86/guest_access.h
> > b/xen/include/asm-x86/guest_access.h
> > index ca700c9..8dde5d5 100644
> > --- a/xen/include/asm-x86/guest_access.h
> > +++ b/xen/include/asm-x86/guest_access.h
> > @@ -41,6 +41,8 @@
> > /* Is the guest handle a NULL reference? */
> > #define guest_handle_is_null(hnd) ((hnd).p == NULL)
> >
> > +#define guest_handle_is_aligned(hnd, mask) (!((uintptr_t)(hnd).p & (mask)))
> > +
> > /* Offset the given guest handle into the array it refers to. */
> > #define guest_handle_add_offset(hnd, nr) ((hnd).p += (nr))
> > #define guest_handle_subtract_offset(hnd, nr) ((hnd).p -= (nr))
> > diff --git a/xen/include/public/argo.h b/xen/include/public/argo.h
> > index 4818684..8947230 100644
> > --- a/xen/include/public/argo.h
> > +++ b/xen/include/public/argo.h
> > @@ -31,6 +31,26 @@
> >
> > #include "xen.h"
> >
> > +#define XEN_ARGO_DOMID_ANY DOMID_INVALID
> > +
> > +/*
> > + * The maximum size of an Argo ring is defined to be: 16GB
>
> Is such a big size really required as the default maximum? The size of
> the internal structures required to support a 16GB ring would be quite
> big, has this been taken into account?
Yes, that was incorrect. The comment is now fixed. 16MB is much more
reasonable.
thanks,
Christopher
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |