[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v3 07/15] argo: implement the register op
On Thu, Jan 10, 2019 at 3:25 AM Roger Pau Monné <royger@xxxxxxxxx> wrote: > > On Mon, Jan 7, 2019 at 8:44 AM Christopher Clark > <christopher.w.clark@xxxxxxxxx> wrote: > > > > The register op is used by a domain to register a region of memory for > > receiving messages from either a specified other domain, or, if specifying a > > wildcard, any domain. > > > > diff --git a/docs/misc/xen-command-line.pandoc > > b/docs/misc/xen-command-line.pandoc > > index aea13eb..68d4415 100644 > > --- a/docs/misc/xen-command-line.pandoc > > +++ b/docs/misc/xen-command-line.pandoc > > @@ -193,6 +193,21 @@ This allows domains access to the Argo hypercall, > > which supports registration > > of memory rings with the hypervisor to receive messages, sending messages > > to > > other domains by hypercall and querying the ring status of other domains. > > > > +### argo-mac > > +> `= permissive | enforcing` > > Why not call this argo-mac-permissive and make it a boolean? Default > would be 'false' and that would imply enforcing. This would get rid of > parse_opt_argo_mac since you could use the default boolean parser. Yes, that makes sense, thanks -- done > > +static int > > +ring_map_page(struct argo_ring_info *ring_info, unsigned int i, void > > **out_ptr) > > +{ > > + if ( i >= ring_info->nmfns ) > > + { > > + gprintk(XENLOG_ERR, > > + "argo: ring (vm%u:%x vm%d) %p attempted to map page %u of > > %u\n", > > + ring_info->id.domain_id, ring_info->id.port, > > + ring_info->id.partner_id, ring_info, i, ring_info->nmfns); > > + return -ENOMEM; > > + } > > + > > + if ( !ring_info->mfns || !ring_info->mfn_mapping) > > + { > > + ASSERT_UNREACHABLE(); > > + ring_info->len = 0; > > + return -ENOMEM; > > + } > > + > > + if ( !ring_info->mfn_mapping[i] ) > > + { > > + /* > > + * TODO: > > + * The first page of the ring contains the ring indices, so both > > read > > + * and write access to the page is required by the hypervisor, but > > + * read-access is not needed for this mapping for the remainder of > > the > > + * ring. > > + * Since this mapping will remain resident in Xen's address space > > for > > + * the lifetime of the ring, and following the principle of least > > + * privilege, it could be preferable to: > > + * # add a XSM check to determine what policy is wanted here > > + * # depending on the XSM query, optionally create this mapping as > > + * _write-only_ on platforms that can support it. > > + * (eg. Intel EPT/AMD NPT). > > Why do Intel EPT or AMD NPT matter here? I think (though could be wrong and am open to correction here) that EPT and NPT enable the construction of write-only (ie not readable) memory mappings. Standard page tables can't do that: with those, if it's writable, it's also readable. > You are mapping the page to Xen address space, which doesn't use > either EPT or NPT. Writable or read-only mappings would be created by > setting the right bit in the Xen page tables. ok. I've dropped the comment. > > > + */ > > + ring_info->mfn_mapping[i] = > > map_domain_page_global(ring_info->mfns[i]); > > + > > No need for the newline. ack. > > +static void > > +update_tx_ptr(struct argo_ring_info *ring_info, uint32_t tx_ptr) > > +{ > > + void *dst; > > + uint32_t *p; > > + > > + ASSERT(ring_info->mfn_mapping[0]); > > + > > + ring_info->tx_ptr = tx_ptr; > > + > > + dst = ring_info->mfn_mapping[0]; > > + p = dst + offsetof(xen_argo_ring_t, tx_ptr); > > Hm, wouldn't it be easier to cast page 0 to the layout of the ring so > that you don't need to use pointer arithmetic to get the fields? Ie: > make dst be of type xen_argo_ring_t. Yes, good point - and that's what's already done elsewhere with the rx_ptr, so that makes the code more consistent. Done. > > +static int > > +find_ring_mfns(struct domain *d, struct argo_ring_info *ring_info, > > + uint32_t npage, > > + XEN_GUEST_HANDLE_PARAM(xen_argo_page_descr_t) pg_descr_hnd, > > + uint32_t len) > > +{ > > + unsigned int i; > > + int ret = 0; > > + mfn_t *mfns; > > + uint8_t **mfn_mapping; > > + > > + /* > > + * first bounds check on npage here also serves as an overflow check > > + * before left shifting it > > + */ > > + if ( (unlikely(npage > (XEN_ARGO_MAX_RING_SIZE >> PAGE_SHIFT))) || > > + ((npage << PAGE_SHIFT) < len) ) > > + return -EINVAL; > > + > > + if ( ring_info->mfns ) > > + { > > + /* Ring already existed: drop the previous mapping. */ > > + gprintk(XENLOG_INFO, > > + "argo: vm%u re-register existing ring (vm%u:%x vm%d) clears > > mapping\n", > > + d->domain_id, ring_info->id.domain_id, > > + ring_info->id.port, ring_info->id.partner_id); > > + > > + ring_remove_mfns(d, ring_info); > > + ASSERT(!ring_info->mfns); > > + } > > + > > + mfns = xmalloc_array(mfn_t, npage); > > + if ( !mfns ) > > + return -ENOMEM; > > + > > + for ( i = 0; i < npage; i++ ) > > + mfns[i] = INVALID_MFN; > > + > > + mfn_mapping = xzalloc_array(uint8_t *, npage); > > + if ( !mfn_mapping ) > > + { > > + xfree(mfns); > > + return -ENOMEM; > > + } > > + > > + ring_info->npage = npage; > > + ring_info->mfns = mfns; > > + ring_info->mfn_mapping = mfn_mapping; > > + > > + ASSERT(ring_info->npage == npage); > > + > > + if ( ring_info->nmfns == ring_info->npage ) > > + return 0; > > + > > + for ( i = ring_info->nmfns; i < ring_info->npage; i++ ) > > This loop seems to assume that there can be pages already added to the > ring, but IIRC you said that redimensioning of the ring was removed in > this version? That's correct - it's currently rejected. > I think for an initial version it would be easier to don't allow > redimensioning of active rings, and just allow teardown and > re-initialization as the way to redimension a ring. ack. I'll look into how it affects this function and simplifying it. > > > + { > > + xen_argo_page_descr_t pg_descr; > > + gfn_t gfn; > > + mfn_t mfn; > > + > > + ret = __copy_from_guest_offset(&pg_descr, pg_descr_hnd, i, 1) ? > > + -EFAULT : 0; > > + if ( ret ) > > + break; > > + > > + /* Implementation currently only supports handling 4K pages */ > > + if ( (pg_descr & XEN_ARGO_PAGE_DESCR_SIZE_MASK) != > > + XEN_ARGO_PAGE_DESCR_SIZE_4K ) > > + { > > + ret = -EINVAL; > > + break; > > + } > > + gfn = _gfn(pg_descr >> PAGE_SHIFT); > > + > > + ret = find_ring_mfn(d, gfn, &mfn); > > + if ( ret ) > > + { > > + gprintk(XENLOG_ERR, > > + "argo: vm%u: invalid gfn %"PRI_gfn" r:(vm%u:%x vm%d) %p > > %d/%d\n", > > + d->domain_id, gfn_x(gfn), ring_info->id.domain_id, > > + ring_info->id.port, ring_info->id.partner_id, > > + ring_info, i, ring_info->npage); > > + break; > > + } > > + > > + ring_info->mfns[i] = mfn; > > + > > + argo_dprintk("%d: %"PRI_gfn" -> %"PRI_mfn"\n", > > + i, gfn_x(gfn), mfn_x(ring_info->mfns[i])); > > + } > > + > > + ring_info->nmfns = i; > > + > > + if ( ret ) > > + ring_remove_mfns(d, ring_info); > > + else > > + { > > + ASSERT(ring_info->nmfns == ring_info->npage); > > + > > + gprintk(XENLOG_DEBUG, > > + "argo: vm%u ring (vm%u:%x vm%d) %p mfn_mapping %p npage %d nmfns > > %d\n", > > + d->domain_id, ring_info->id.domain_id, > > + ring_info->id.port, ring_info->id.partner_id, ring_info, > > + ring_info->mfn_mapping, ring_info->npage, > > ring_info->nmfns); > > + } > > + > > + return ret; > > +} > > + > > +static struct argo_ring_info * > > +ring_find_info(const struct domain *d, const struct argo_ring_id *id) > > +{ > > + unsigned int ring_hash_index; > > + struct hlist_node *node; > > + struct argo_ring_info *ring_info; > > + > > + ASSERT(rw_is_locked(&d->argo->lock)); > > + > > + ring_hash_index = hash_index(id); > > + > > + argo_dprintk("d->argo=%p, d->argo->ring_hash[%u]=%p id=%p\n", > > + d->argo, ring_hash_index, > > + d->argo->ring_hash[ring_hash_index].first, id); > > + argo_dprintk("id.port=%x id.domain=vm%u id.partner_id=vm%d\n", > > + id->port, id->domain_id, id->partner_id); > > + > > + hlist_for_each_entry(ring_info, node, > > &d->argo->ring_hash[ring_hash_index], > > + node) > > + { > > + struct argo_ring_id *cmpid = &ring_info->id; > > const? yep, thanks, done. > > > + > > + if ( cmpid->port == id->port && > > + cmpid->domain_id == id->domain_id && > > + cmpid->partner_id == id->partner_id ) > > + { > > + argo_dprintk("ring_info=%p\n", ring_info); > > + return ring_info; > > + } > > + } > > + argo_dprintk("no ring_info found\n"); > > + > > + return NULL; > > +} > > + > > +static long > > +register_ring(struct domain *currd, > > If this is indeed the current domain (as the name suggests), why do > you need to pass it around? Or else just name the parameter d. After the later in-thread discussion between you and Jan, I've left the argument name 'currd' but added the ASSERT recommended by Jan, that currd matches domain->current. I've done the same in the other functions (across the series) that take currd as an argument, except for notify_check_pending where I've just renamed the argument to 'd'; there's no reason in that function that it needs to be handling the current domain. > > > + XEN_GUEST_HANDLE_PARAM(xen_argo_register_ring_t) reg_hnd, > > + XEN_GUEST_HANDLE_PARAM(xen_argo_page_descr_t) pg_descr_hnd, > > + uint32_t npage, bool fail_exist) > > +{ > > + xen_argo_register_ring_t reg; > > + struct argo_ring_id ring_id; > > + void *map_ringp; > > + xen_argo_ring_t *ringp; > > + struct argo_ring_info *ring_info; > > + struct argo_send_info *send_info = NULL; > > + struct domain *dst_d = NULL; > > + int ret = 0; > > + uint32_t private_tx_ptr; > > + > > + if ( copy_from_guest(®, reg_hnd, 1) ) > > + { > > + ret = -EFAULT; > > + goto out; > > I don't see the point of using an out label, why not just use 'return > -EFAULT;' (here and below). This avoids the braces and also removes > the need for the ret assignment. done. > > > + } > > + > > + /* > > + * A ring must be large enough to transmit messages, so requires space > > for: > > + * * 1 message header, plus > > + * * 1 payload slot (payload is always rounded to a multiple of 16 > > bytes) > > + * for the message payload to be written into, plus > > + * * 1 more slot, so that the ring cannot be filled to capacity with a > > + * single message -- see the logic in ringbuf_insert -- allowing for > > this > > + * ensures that there can be space remaining when a message is > > present. > > + * The above determines the minimum acceptable ring size. > > + */ > > + if ( (reg.len < (sizeof(struct xen_argo_ring_message_header) > > + + ROUNDUP_MESSAGE(1) + ROUNDUP_MESSAGE(1))) || > > + (reg.len > XEN_ARGO_MAX_RING_SIZE) || > > + (reg.len != ROUNDUP_MESSAGE(reg.len)) || > > + (reg.pad != 0) ) > > + { > > + ret = -EINVAL; > > + goto out; > > + } > > + > > + ring_id.partner_id = reg.partner_id; > > + ring_id.port = reg.port; > > + ring_id.domain_id = currd->domain_id; > > + > > + read_lock(&argo_lock); > > + > > + if ( !currd->argo ) > > + { > > + ret = -ENODEV; > > + goto out_unlock; > > + } > > + > > + if ( reg.partner_id == XEN_ARGO_DOMID_ANY ) > > + { > > + if ( opt_argo_mac_enforcing ) > > + { > > + ret = -EPERM; > > + goto out_unlock; > > + } > > + } > > + else > > + { > > + dst_d = get_domain_by_id(reg.partner_id); > > + if ( !dst_d ) > > + { > > + argo_dprintk("!dst_d, ESRCH\n"); > > + ret = -ESRCH; > > + goto out_unlock; > > + } > > + > > + if ( !dst_d->argo ) > > + { > > + argo_dprintk("!dst_d->argo, ECONNREFUSED\n"); > > + ret = -ECONNREFUSED; > > + put_domain(dst_d); > > + goto out_unlock; > > + } > > + > > + send_info = xzalloc(struct argo_send_info); > > + if ( !send_info ) > > + { > > + ret = -ENOMEM; > > + put_domain(dst_d); > > + goto out_unlock; > > + } > > + send_info->id = ring_id; > > + } > > + > > + write_lock(&currd->argo->lock); > > + > > + if ( currd->argo->ring_count >= MAX_RINGS_PER_DOMAIN ) > > + { > > + ret = -ENOSPC; > > + goto out_unlock2; > > + } > > + > > + ring_info = ring_find_info(currd, &ring_id); > > + if ( !ring_info ) > > + { > > + ring_info = xzalloc(struct argo_ring_info); > > + if ( !ring_info ) > > + { > > + ret = -ENOMEM; > > + goto out_unlock2; > > + } > > + > > + spin_lock_init(&ring_info->lock); > > + > > + ring_info->id = ring_id; > > + INIT_HLIST_HEAD(&ring_info->pending); > > + > > + hlist_add_head(&ring_info->node, > > + > > &currd->argo->ring_hash[hash_index(&ring_info->id)]); > > + > > + gprintk(XENLOG_DEBUG, "argo: vm%u registering ring (vm%u:%x > > vm%d)\n", > > + currd->domain_id, ring_id.domain_id, ring_id.port, > > + ring_id.partner_id); > > + } > > + else > > + { > > + if ( ring_info->len ) > > + { > > + /* > > + * If the caller specified that the ring must not already > > exist, > > + * fail at attempt to add a completed ring which already > > exists. > > + */ > > + if ( fail_exist ) > > + { > > + argo_dprintk("disallowed reregistration of existing > > ring\n"); > > + ret = -EEXIST; > > + goto out_unlock2; > > + } > > + > > + if ( ring_info->len != reg.len ) > > + { > > + /* > > + * Change of ring size could result in entries on the > > pending > > + * notifications list that will never trigger. > > + * Simple blunt solution: disallow ring resize for now. > > + * TODO: investigate enabling ring resize. > > + */ > > I think ring resizing was removed on this version? Yes: This is the code that was introduced to prevent it. > > > + gprintk(XENLOG_ERR, > > + "argo: vm%u attempted to change ring size(vm%u:%x > > vm%d)\n", > > + currd->domain_id, ring_id.domain_id, ring_id.port, > > + ring_id.partner_id); > > + /* > > + * Could return EINVAL here, but if the ring didn't already > > + * exist then the arguments would have been valid, so: > > EEXIST. > > + */ > > + ret = -EEXIST; > > + goto out_unlock2; > > + } > > + > > + gprintk(XENLOG_DEBUG, > > + "argo: vm%u re-registering existing ring (vm%u:%x > > vm%d)\n", > > + currd->domain_id, ring_id.domain_id, ring_id.port, > > + ring_id.partner_id); > > + } > > + } > > + > > + ret = find_ring_mfns(currd, ring_info, npage, pg_descr_hnd, reg.len); > > + if ( ret ) > > + { > > + gprintk(XENLOG_ERR, > > + "argo: vm%u failed to find ring mfns (vm%u:%x vm%d)\n", > > + currd->domain_id, ring_id.domain_id, ring_id.port, > > + ring_id.partner_id); > > + > > + ring_remove_info(currd, ring_info); > > + goto out_unlock2; > > + } > > + > > + /* > > + * The first page of the memory supplied for the ring has the > > xen_argo_ring > > + * structure at its head, which is where the ring indexes reside. > > + */ > > + ret = ring_map_page(ring_info, 0, &map_ringp); > > + if ( ret ) > > + { > > + gprintk(XENLOG_ERR, > > + "argo: vm%u failed to map ring mfn 0 (vm%u:%x vm%d)\n", > > + currd->domain_id, ring_id.domain_id, ring_id.port, > > + ring_id.partner_id); > > + > > + ring_remove_info(currd, ring_info); > > + goto out_unlock2; > > + } > > + ringp = map_ringp; > > + > > + private_tx_ptr = read_atomic(&ringp->tx_ptr); > > + > > + if ( (private_tx_ptr >= reg.len) || > > + (ROUNDUP_MESSAGE(private_tx_ptr) != private_tx_ptr) ) > > + { > > + /* > > + * Since the ring is a mess, attempt to flush the contents of it > > + * here by setting the tx_ptr to the next aligned message slot past > > + * the latest rx_ptr we have observed. Handle ring wrap correctly. > > + */ > > + private_tx_ptr = ROUNDUP_MESSAGE(read_atomic(&ringp->rx_ptr)); > > + > > + if ( private_tx_ptr >= reg.len ) > > + private_tx_ptr = 0; > > + > > + update_tx_ptr(ring_info, private_tx_ptr); > > + } > > + > > + ring_info->tx_ptr = private_tx_ptr; > > + ring_info->len = reg.len; > > + currd->argo->ring_count++; > > + > > + if ( send_info ) > > + { > > + spin_lock(&dst_d->argo->send_lock); > > + > > + hlist_add_head(&send_info->node, > > + > > &dst_d->argo->send_hash[hash_index(&send_info->id)]); > > + > > + spin_unlock(&dst_d->argo->send_lock); > > + } > > + > > + out_unlock2: > > + if ( !ret && send_info ) > > + xfree(send_info); > > There's no need to check if send_info is set, xfree(NULL) is safe. done. > > > + > > + if ( dst_d ) > > + put_domain(dst_d); > > + > > + write_unlock(&currd->argo->lock); > > + > > + out_unlock: > > + read_unlock(&argo_lock); > > + > > + out: > > + return ret; > > +} > > + > > long > > do_argo_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1, > > XEN_GUEST_HANDLE_PARAM(void) arg2, unsigned long arg3, > > @@ -392,6 +926,38 @@ do_argo_op(unsigned int cmd, > > XEN_GUEST_HANDLE_PARAM(void) arg1, > > > > switch (cmd) > > { > > + case XEN_ARGO_OP_register_ring: > > + { > > + XEN_GUEST_HANDLE_PARAM(xen_argo_register_ring_t) reg_hnd = > > + guest_handle_cast(arg1, xen_argo_register_ring_t); > > + XEN_GUEST_HANDLE_PARAM(xen_argo_page_descr_t) pg_descr_hnd = > > + guest_handle_cast(arg2, xen_argo_page_descr_t); > > + /* arg3 is npage */ > > + /* arg4 is flags */ > > + bool fail_exist = arg4 & XEN_ARGO_REGISTER_FLAG_FAIL_EXIST; > > + > > + if ( unlikely(arg3 > (XEN_ARGO_MAX_RING_SIZE >> PAGE_SHIFT)) ) > > + { > > + rc = -EINVAL; > > + break; > > + } > > + /* > > + * Check access to the whole array here so we can use the faster > > __copy > > + * operations to read each element later. > > + */ > > + if ( unlikely(!guest_handle_okay(pg_descr_hnd, arg3)) ) > > + break; > > + /* arg4: reserve currently-undefined bits, require zero. */ > > + if ( unlikely(arg4 & ~XEN_ARGO_REGISTER_FLAG_MASK) ) > > + { > > + rc = -EINVAL; > > + break; > > + } > > + > > + rc = register_ring(currd, reg_hnd, pg_descr_hnd, arg3, fail_exist); > > + break; > > + } > > + > > default: > > rc = -EOPNOTSUPP; > > break; > > diff --git a/xen/include/asm-arm/guest_access.h > > b/xen/include/asm-arm/guest_access.h > > index 8997a1c..70e9a78 100644 > > --- a/xen/include/asm-arm/guest_access.h > > +++ b/xen/include/asm-arm/guest_access.h > > @@ -29,6 +29,8 @@ int access_guest_memory_by_ipa(struct domain *d, paddr_t > > ipa, void *buf, > > /* Is the guest handle a NULL reference? */ > > #define guest_handle_is_null(hnd) ((hnd).p == NULL) > > > > +#define guest_handle_is_aligned(hnd, mask) (!((uintptr_t)(hnd).p & (mask))) > > + > > /* Offset the given guest handle into the array it refers to. */ > > #define guest_handle_add_offset(hnd, nr) ((hnd).p += (nr)) > > #define guest_handle_subtract_offset(hnd, nr) ((hnd).p -= (nr)) > > diff --git a/xen/include/asm-x86/guest_access.h > > b/xen/include/asm-x86/guest_access.h > > index ca700c9..8dde5d5 100644 > > --- a/xen/include/asm-x86/guest_access.h > > +++ b/xen/include/asm-x86/guest_access.h > > @@ -41,6 +41,8 @@ > > /* Is the guest handle a NULL reference? */ > > #define guest_handle_is_null(hnd) ((hnd).p == NULL) > > > > +#define guest_handle_is_aligned(hnd, mask) (!((uintptr_t)(hnd).p & (mask))) > > + > > /* Offset the given guest handle into the array it refers to. */ > > #define guest_handle_add_offset(hnd, nr) ((hnd).p += (nr)) > > #define guest_handle_subtract_offset(hnd, nr) ((hnd).p -= (nr)) > > diff --git a/xen/include/public/argo.h b/xen/include/public/argo.h > > index 4818684..8947230 100644 > > --- a/xen/include/public/argo.h > > +++ b/xen/include/public/argo.h > > @@ -31,6 +31,26 @@ > > > > #include "xen.h" > > > > +#define XEN_ARGO_DOMID_ANY DOMID_INVALID > > + > > +/* > > + * The maximum size of an Argo ring is defined to be: 16GB > > Is such a big size really required as the default maximum? The size of > the internal structures required to support a 16GB ring would be quite > big, has this been taken into account? Yes, that was incorrect. The comment is now fixed. 16MB is much more reasonable. thanks, Christopher _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |