[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v3 07/15] argo: implement the register op
On Mon, Jan 14, 2019 at 6:19 AM Jan Beulich <JBeulich@xxxxxxxx> wrote: > > >>> On 07.01.19 at 08:42, <christopher.w.clark@xxxxxxxxx> wrote: > > --- a/xen/common/argo.c > > +++ b/xen/common/argo.c > > @@ -23,16 +23,41 @@ > > #include <xen/event.h> > > #include <xen/domain_page.h> > > #include <xen/guest_access.h> > > +#include <xen/lib.h> > > +#include <xen/nospec.h> > > #include <xen/time.h> > > #include <public/argo.h> > > > > +#define MAX_RINGS_PER_DOMAIN 128U > > + > > +/* All messages on the ring are padded to a multiple of the slot size. */ > > +#define ROUNDUP_MESSAGE(a) (ROUNDUP((a), XEN_ARGO_MSG_SLOT_SIZE)) > > Pointless outermost pair of parentheses. ack, removed > > > @@ -198,6 +223,31 @@ static DEFINE_RWLOCK(argo_lock); /* L1 */ > > #define argo_dprintk(format, ... ) ((void)0) > > #endif > > > > +/* > > + * This hash function is used to distribute rings within the per-domain > > + * hash tables (d->argo->ring_hash and d->argo_send_hash). The hash table > > + * will provide a struct if a match is found with a 'argo_ring_id' key: > > + * ie. the key is a (domain id, port, partner domain id) tuple. > > + * Since port number varies the most in expected use, and the Linux driver > > + * allocates at both the high and low ends, incorporate high and low bits > > to > > + * help with distribution. > > + * Apply array_index_nospec as a defensive measure since this operates > > + * on user-supplied input and the array size that it indexes into is known. > > + */ > > +static unsigned int > > +hash_index(const struct argo_ring_id *id) > > +{ > > + unsigned int hash; > > + > > + hash = (uint16_t)(id->port >> 16); > > + hash ^= (uint16_t)id->port; > > I may have asked this before, but are the casts really needed > with ... > > > + hash ^= id->domain_id; > > + hash ^= id->partner_id; > > + hash &= (ARGO_HTABLE_SIZE - 1); > > ... the masking done here? > > > + return array_index_nospec(hash, ARGO_HTABLE_SIZE); > > With the masking above - is this really needed? > > And then the question is whether the quality of the hash is > sufficient: There won't be more set bits in the result than > are in any of the three input values, so if they're all small, > higher hash table entries won't be used at all. I would > assume the goal to be that by the time 32 entities appear, > chances be good that at least about 30 of the hash table > entries are in use. ok, I'll replace this function and address the above. I'm out of time today so have added a FIXME for today's series posting and will get it done tomorrow. > > > @@ -219,6 +269,78 @@ ring_unmap(struct argo_ring_info *ring_info) > > } > > } > > > > +static int > > +ring_map_page(struct argo_ring_info *ring_info, unsigned int i, void > > **out_ptr) > > +{ > > + if ( i >= ring_info->nmfns ) > > + { > > + gprintk(XENLOG_ERR, > > + "argo: ring (vm%u:%x vm%d) %p attempted to map page %u of > > %u\n", > > ring_info->id.{domain,partner}_id look to be of the same type - > why once %u and once %d? Same elsewhere. Fixed across the series to use %u for domid_t output. > > > + ring_info->id.domain_id, ring_info->id.port, > > + ring_info->id.partner_id, ring_info, i, ring_info->nmfns); > > + return -ENOMEM; > > + } > > i = array_index_nospec(i, ring_info->nmfns); > > considering the array indexes here? Of course at this point only > zero can be passed in, but I assume this changes in later patches > and the index is at least indirectly guest controlled. Added, thanks. > > > @@ -371,6 +493,418 @@ partner_rings_remove(struct domain *src_d) > > } > > } > > > > +static int > > +find_ring_mfn(struct domain *d, gfn_t gfn, mfn_t *mfn) > > So you have find_ring_mfn(), find_ring_mfns(), and ring_find_info(). > Any chance you could use a consistent ordering of "ring" and "find"? > Or is there a reason behind the apparent mismatch? I've renamed them to use 'find_' as the common prefix. Look cleaner to me. thanks. > > > +{ > > + p2m_type_t p2mt; > > + int ret = 0; > > + > > +#ifdef CONFIG_X86 > > + *mfn = get_gfn_unshare(d, gfn_x(gfn), &p2mt); > > +#else > > + *mfn = p2m_lookup(d, gfn, &p2mt); > > +#endif > > + > > + if ( !mfn_valid(*mfn) ) > > + ret = -EINVAL; > > +#ifdef CONFIG_X86 > > + else if ( p2m_is_paging(p2mt) || (p2mt == p2m_ram_logdirty) ) > > + ret = -EAGAIN; > > +#endif > > + else if ( (p2mt != p2m_ram_rw) || > > + !get_page_and_type(mfn_to_page(*mfn), d, PGT_writable_page) ) > > + ret = -EINVAL; > > + > > +#ifdef CONFIG_X86 > > + put_gfn(d, gfn_x(gfn)); > > +#endif > > + > > + return ret; > > +} > > Please check whether you could leverage check_get_page_from_gfn() > here. If you can't, please at least take inspiration as to e.g. the > #ifdef-s from that function. Have added a temporary FIXME for this and will do this tomorrow. > > > +static int > > +find_ring_mfns(struct domain *d, struct argo_ring_info *ring_info, > > + uint32_t npage, > > + XEN_GUEST_HANDLE_PARAM(xen_argo_page_descr_t) pg_descr_hnd, > > + uint32_t len) > > Noticing it here, but perhaps still an issue elsewhere as well: Didn't > we agree on removing unnecessary use of fixed width types? Or > was that in the context on an earlier patch of v3? These are fixed and hopefully all the others that do not belong are also gone in v4. > > > +{ > > + unsigned int i; > > + int ret = 0; > > + mfn_t *mfns; > > + uint8_t **mfn_mapping; > > + > > + /* > > + * first bounds check on npage here also serves as an overflow check > > + * before left shifting it > > + */ > > + if ( (unlikely(npage > (XEN_ARGO_MAX_RING_SIZE >> PAGE_SHIFT))) || > > Isn't this redundant with the check in do_argo_p()? > > > + ((npage << PAGE_SHIFT) < len) ) > > + return -EINVAL; Answering your point inline above: Yes - do_argo_op does the bounds checking, so I've removed the entire check above. > > + > > + if ( ring_info->mfns ) > > + { > > + /* Ring already existed: drop the previous mapping. */ > > + gprintk(XENLOG_INFO, > > + "argo: vm%u re-register existing ring (vm%u:%x vm%d) clears > > mapping\n", > > Indentation (also elsewhere). Ack, fixed here and elsewhere. > > > + d->domain_id, ring_info->id.domain_id, > > + ring_info->id.port, ring_info->id.partner_id); > > + > > + ring_remove_mfns(d, ring_info); > > + ASSERT(!ring_info->mfns); > > + } > > + > > + mfns = xmalloc_array(mfn_t, npage); > > + if ( !mfns ) > > + return -ENOMEM; > > + > > + for ( i = 0; i < npage; i++ ) > > + mfns[i] = INVALID_MFN; > > + > > + mfn_mapping = xzalloc_array(uint8_t *, npage); > > + if ( !mfn_mapping ) > > + { > > + xfree(mfns); > > + return -ENOMEM; > > + } > > + > > + ring_info->npage = npage; > > + ring_info->mfns = mfns; > > + ring_info->mfn_mapping = mfn_mapping; > > As the inverse to the cleanup sequence in an earlier patch: Please > set ->npage last here even if it doesn't strictly matter. npage is now gone after implementing Roger's feedback to only keep "len". > > > + ASSERT(ring_info->npage == npage); > > What is this trying to make sure, seeing the assignment just a > few lines up? removed > > > + if ( ring_info->nmfns == ring_info->npage ) > > + return 0; > > Can this happen with the ring_remove_mfns() call above? No, not any more, you're right. > > > + for ( i = ring_info->nmfns; i < ring_info->npage; i++ ) > > And hence can i start from other than zero here? And why not > use the (possibly cheaper to access) function argument "npage" > as the loop upper bound? The other similar loop a few lines up > is coded that simpler way. Yes, thanks, done. > > > +static long > > +register_ring(struct domain *currd, > > + XEN_GUEST_HANDLE_PARAM(xen_argo_register_ring_t) reg_hnd, > > + XEN_GUEST_HANDLE_PARAM(xen_argo_page_descr_t) pg_descr_hnd, > > + uint32_t npage, bool fail_exist) > > +{ > > + xen_argo_register_ring_t reg; > > + struct argo_ring_id ring_id; > > + void *map_ringp; > > + xen_argo_ring_t *ringp; > > + struct argo_ring_info *ring_info; > > + struct argo_send_info *send_info = NULL; > > + struct domain *dst_d = NULL; > > + int ret = 0; > > + uint32_t private_tx_ptr; > > + > > + if ( copy_from_guest(®, reg_hnd, 1) ) > > + { > > + ret = -EFAULT; > > + goto out; > > + } > > + > > + /* > > + * A ring must be large enough to transmit messages, so requires space > > for: > > + * * 1 message header, plus > > + * * 1 payload slot (payload is always rounded to a multiple of 16 > > bytes) > > + * for the message payload to be written into, plus > > + * * 1 more slot, so that the ring cannot be filled to capacity with a > > + * single message -- see the logic in ringbuf_insert -- allowing for > > this > > + * ensures that there can be space remaining when a message is > > present. > > + * The above determines the minimum acceptable ring size. > > + */ > > + if ( (reg.len < (sizeof(struct xen_argo_ring_message_header) > > + + ROUNDUP_MESSAGE(1) + ROUNDUP_MESSAGE(1))) || > > These two summands don't look to fulfill the "cannot be filled to > capacity" constraint the comment describes, as (aiui) messages > can be larger than 16 bytes. What's the deal? This is intended to be about bound checking reg.len against a minimum size: the smallest ring that you can fit a message onto, as determined by the logic in ringbuf_insert. The smallest message you can send is: sizeof(struct xen_argo_ring_message_header) + ROUNDUP_MESSAGE(1) and then ringbuf_insert won't accept a message unless there is (at least) ROUNDUP_MESSAGE(1) space remaining, so that, plus the smallest message size, is the smallest viable ring. There's no point accepting registration of a ring smaller than that. You're right that messages can be larger than 16 bytes, but they can only be sent to rings that are larger than that minimum - on a minimum sized ring, they'll be rejected by sendv. > > > + (reg.len > XEN_ARGO_MAX_RING_SIZE) || > > + (reg.len != ROUNDUP_MESSAGE(reg.len)) || > > + (reg.pad != 0) ) > > + { > > + ret = -EINVAL; > > + goto out; > > + } > > + > > + ring_id.partner_id = reg.partner_id; > > + ring_id.port = reg.port; > > + ring_id.domain_id = currd->domain_id; > > + > > + read_lock(&argo_lock); > > From here to ... > > > + if ( !currd->argo ) > > + { > > + ret = -ENODEV; > > + goto out_unlock; > > + } > > + > > + if ( reg.partner_id == XEN_ARGO_DOMID_ANY ) > > + { > > + if ( opt_argo_mac_enforcing ) > > + { > > + ret = -EPERM; > > + goto out_unlock; > > + } > > + } > > + else > > + { > > + dst_d = get_domain_by_id(reg.partner_id); > > + if ( !dst_d ) > > + { > > + argo_dprintk("!dst_d, ESRCH\n"); > > + ret = -ESRCH; > > + goto out_unlock; > > + } > > + > > + if ( !dst_d->argo ) > > + { > > + argo_dprintk("!dst_d->argo, ECONNREFUSED\n"); > > + ret = -ECONNREFUSED; > > + put_domain(dst_d); > > + goto out_unlock; > > + } > > + > > + send_info = xzalloc(struct argo_send_info); > > + if ( !send_info ) > > + { > > + ret = -ENOMEM; > > + put_domain(dst_d); > > + goto out_unlock; > > + } > > + send_info->id = ring_id; > > + } > > ... here, what exactly is it that requires the global read lock > to be held ... > > > + write_lock(&currd->argo->lock); > > ... prior to this? Holding locks around allocations is not > forbidden, but should be avoided whenever possible. > > And then further why does the global read lock need > continued holding until the end of the function? I've added FIXME to review this tomorrow. I understand the argo-internal locking protocols and this is adhering to what they state, in that accesses to the argo structs of currd and dst_d are protected by the global read lock here, but at the moment I'm less clear on what the expectations are for standard Xen domain locks, references and lifecycle are. > > > + if ( currd->argo->ring_count >= MAX_RINGS_PER_DOMAIN ) > > + { > > + ret = -ENOSPC; > > + goto out_unlock2; > > + } > > + > > + ring_info = ring_find_info(currd, &ring_id); > > + if ( !ring_info ) > > + { > > + ring_info = xzalloc(struct argo_ring_info); > > + if ( !ring_info ) > > + { > > + ret = -ENOMEM; > > + goto out_unlock2; > > + } > > + > > + spin_lock_init(&ring_info->lock); > > + > > + ring_info->id = ring_id; > > + INIT_HLIST_HEAD(&ring_info->pending); > > + > > + hlist_add_head(&ring_info->node, > > + > > &currd->argo->ring_hash[hash_index(&ring_info->id)]); > > + > > + gprintk(XENLOG_DEBUG, "argo: vm%u registering ring (vm%u:%x > > vm%d)\n", > > + currd->domain_id, ring_id.domain_id, ring_id.port, > > + ring_id.partner_id); > > + } > > + else > > + { > > + if ( ring_info->len ) > > + { > > Please fold into "else if ( )", removing a level of indentation. ack, done > > > + /* > > + * If the caller specified that the ring must not already > > exist, > > + * fail at attempt to add a completed ring which already > > exists. > > + */ > > + if ( fail_exist ) > > + { > > + argo_dprintk("disallowed reregistration of existing > > ring\n"); > > + ret = -EEXIST; > > + goto out_unlock2; > > + } > > + > > + if ( ring_info->len != reg.len ) > > + { > > + /* > > + * Change of ring size could result in entries on the > > pending > > + * notifications list that will never trigger. > > + * Simple blunt solution: disallow ring resize for now. > > + * TODO: investigate enabling ring resize. > > + */ > > + gprintk(XENLOG_ERR, > > + "argo: vm%u attempted to change ring size(vm%u:%x > > vm%d)\n", > > + currd->domain_id, ring_id.domain_id, ring_id.port, > > + ring_id.partner_id); > > + /* > > + * Could return EINVAL here, but if the ring didn't already > > + * exist then the arguments would have been valid, so: > > EEXIST. > > + */ > > + ret = -EEXIST; > > + goto out_unlock2; > > + } > > + > > + gprintk(XENLOG_DEBUG, > > + "argo: vm%u re-registering existing ring (vm%u:%x > > vm%d)\n", > > + currd->domain_id, ring_id.domain_id, ring_id.port, > > + ring_id.partner_id); > > + } > > + } > > + > > + ret = find_ring_mfns(currd, ring_info, npage, pg_descr_hnd, reg.len); > > + if ( ret ) > > + { > > + gprintk(XENLOG_ERR, > > + "argo: vm%u failed to find ring mfns (vm%u:%x vm%d)\n", > > + currd->domain_id, ring_id.domain_id, ring_id.port, > > + ring_id.partner_id); > > + > > + ring_remove_info(currd, ring_info); > > + goto out_unlock2; > > + } > > + > > + /* > > + * The first page of the memory supplied for the ring has the > > xen_argo_ring > > + * structure at its head, which is where the ring indexes reside. > > + */ > > + ret = ring_map_page(ring_info, 0, &map_ringp); > > + if ( ret ) > > + { > > + gprintk(XENLOG_ERR, > > + "argo: vm%u failed to map ring mfn 0 (vm%u:%x vm%d)\n", > > + currd->domain_id, ring_id.domain_id, ring_id.port, > > + ring_id.partner_id); > > + > > + ring_remove_info(currd, ring_info); > > + goto out_unlock2; > > + } > > + ringp = map_ringp; > > + > > + private_tx_ptr = read_atomic(&ringp->tx_ptr); > > + > > + if ( (private_tx_ptr >= reg.len) || > > + (ROUNDUP_MESSAGE(private_tx_ptr) != private_tx_ptr) ) > > + { > > + /* > > + * Since the ring is a mess, attempt to flush the contents of it > > + * here by setting the tx_ptr to the next aligned message slot past > > + * the latest rx_ptr we have observed. Handle ring wrap correctly. > > + */ > > + private_tx_ptr = ROUNDUP_MESSAGE(read_atomic(&ringp->rx_ptr)); > > + > > + if ( private_tx_ptr >= reg.len ) > > + private_tx_ptr = 0; > > + > > + update_tx_ptr(ring_info, private_tx_ptr); > > + } > > + > > + ring_info->tx_ptr = private_tx_ptr; > > + ring_info->len = reg.len; > > + currd->argo->ring_count++; > > + > > + if ( send_info ) > > + { > > + spin_lock(&dst_d->argo->send_lock); > > + > > + hlist_add_head(&send_info->node, > > + > > &dst_d->argo->send_hash[hash_index(&send_info->id)]); > > + > > + spin_unlock(&dst_d->argo->send_lock); > > + } > > + > > + out_unlock2: > > + if ( !ret && send_info ) > > + xfree(send_info); > > + > > + if ( dst_d ) > > + put_domain(dst_d); > > + > > + write_unlock(&currd->argo->lock); > > Surely you can drop the lock before the other two cleanup > actions? That would then allow you to add another label to > absorb the two separate put_domain() calls on error paths. That looks correct. Added a FIXME note now and will fix tomorrow. thanks. > > > --- a/xen/include/asm-arm/guest_access.h > > +++ b/xen/include/asm-arm/guest_access.h > > @@ -29,6 +29,8 @@ int access_guest_memory_by_ipa(struct domain *d, paddr_t > > ipa, void *buf, > > /* Is the guest handle a NULL reference? */ > > #define guest_handle_is_null(hnd) ((hnd).p == NULL) > > > > +#define guest_handle_is_aligned(hnd, mask) (!((uintptr_t)(hnd).p & (mask))) > > This is unused throughout the patch afaics. Removed. > > > --- a/xen/include/public/argo.h > > +++ b/xen/include/public/argo.h > > @@ -31,6 +31,26 @@ > > > > #include "xen.h" > > > > +#define XEN_ARGO_DOMID_ANY DOMID_INVALID > > + > > +/* > > + * The maximum size of an Argo ring is defined to be: 16GB > > + * -- which is 0x1000000 bytes. > > + * A byte index into the ring is at most 24 bits. > > + */ > > +#define XEN_ARGO_MAX_RING_SIZE (0x1000000ULL) > > + > > +/* > > + * Page descriptor: encoding both page address and size in a 64-bit value. > > + * Intended to allow ABI to support use of different granularity pages. > > + * example of how to populate: > > + * xen_argo_page_descr_t pg_desc = > > + * (physaddr & PAGE_MASK) | XEN_ARGO_PAGE_DESCR_SIZE_4K; > > + */ > > +typedef uint64_t xen_argo_page_descr_t; > > +#define XEN_ARGO_PAGE_DESCR_SIZE_MASK 0x0000000000000fffULL > > +#define XEN_ARGO_PAGE_DESCR_SIZE_4K 0 > > Are the _DESCR_ infixes here really useful? These are now gone, with Julien's approval for the change back to use the gfn-using interfaces. > > > @@ -56,4 +76,56 @@ typedef struct xen_argo_ring > > #endif > > } xen_argo_ring_t; > > > > +typedef struct xen_argo_register_ring > > +{ > > + uint32_t port; > > + domid_t partner_id; > > + uint16_t pad; > > + uint32_t len; > > +} xen_argo_register_ring_t; > > + > > +/* Messages on the ring are padded to a multiple of this size. */ > > +#define XEN_ARGO_MSG_SLOT_SIZE 0x10 > > + > > +struct xen_argo_ring_message_header > > +{ > > + uint32_t len; > > + xen_argo_addr_t source; > > + uint32_t message_type; > > +#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L > > + uint8_t data[]; > > +#elif defined(__GNUC__) > > + uint8_t data[0]; > > +#endif > > +}; > > + > > +/* > > + * Hypercall operations > > + */ > > + > > +/* > > + * XEN_ARGO_OP_register_ring > > + * > > + * Register a ring using the indicated memory. > > + * Also used to reregister an existing ring (eg. after resume from > > hibernate). > > + * > > + * arg1: XEN_GUEST_HANDLE(xen_argo_register_ring_t) > > + * arg2: XEN_GUEST_HANDLE(xen_argo_page_descr_t) > > + * arg3: unsigned long npages > > + * arg4: unsigned long flags > > The "unsigned long"-s here are not necessarily compatible with > compat mode. At the very least flags above bit 31 won't be > usable by compat mode guests. Hence I also question ... > > > + */ > > +#define XEN_ARGO_OP_register_ring 1 > > + > > +/* Register op flags */ > > +/* > > + * Fail exist: > > + * If set, reject attempts to (re)register an existing established ring. > > + * If clear, reregistration occurs if the ring exists, with the new ring > > + * taking the place of the old, preserving tx_ptr if it remains valid. > > + */ > > +#define XEN_ARGO_REGISTER_FLAG_FAIL_EXIST 0x1 > > + > > +/* Mask for all defined flags. unsigned long type so ok for both 32/64-bit > > */ > > +#define XEN_ARGO_REGISTER_FLAG_MASK 0x1UL > > ... the UL suffix here. Also this last item should not be exposed > (perhaps framed by "#ifdef __XEN__") and would perhaps anyway > better be defined in terms of the other > XEN_ARGO_REGISTER_FLAG_*. Notes added in place for each of the above; will work on these tomorrow. thanks Christopher _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |