Xen project Mailing List

On Mon, Jan 7, 2019 at 8:44 AM Christopher Clark <christopher.w.clark@xxxxxxxxx> wrote: > > The register op is used by a domain to register a region of memory for > receiving messages from either a specified other domain, or, if specifying a > wildcard, any domain. > > This operation creates a mapping within Xen's private address space that > will remain resident for the lifetime of the ring. In subsequent commits, > the hypervisor will use this mapping to copy data from a sending domain into > this registered ring, making it accessible to the domain that registered the > ring to receive data. > > Wildcard any-sender rings are default disabled and registration will be > refused with EPERM unless they have been specifically enabled with the > argo-mac boot option introduced here. The reason why the default for > wildcard rings is 'deny' is that there is currently no means to protect the > ring from DoS by a noisy domain spamming the ring, affecting other domains > ability to send to it. This will be addressed with XSM policy controls in > subsequent work. > > Since denying access to any-sender rings is a significant functional > constraint, a new bootparam is provided to enable overriding this: > "argo-mac" variable has allowed values: 'permissive' and 'enforcing'. > Even though this is a boolean variable, use these descriptive strings in > order to make it obvious to an administrator that this has potential > security impact. > > The p2m type of the memory supplied by the guest for the ring must be > p2m_ram_rw and the memory will be pinned as PGT_writable_page while the ring > is registered. > > xen_argo_page_descr_t type is introduced as a page descriptor, to convey > both the physical address of the start of the page and its granularity. The > smallest granularity page is assumed to be 4096 bytes and the lower twelve > bits of the type are used to indicate the size of page of memory supplied. > The implementation of the hypercall op currently only supports 4K pages. > > array_index_nospec is used to guard the result of the ring id hash function. > This is out of an abundance of caution, since this is a very basic hash > function and it operates upon values supplied by the guest just before > being used as an array index. > > Signed-off-by: Christopher Clark <christopher.clark6@xxxxxxxxxxxxxx> > --- > v2 self: disallow ring resize via reregister > v2 feedback Jan: drop cookie, implement teardown > v2 feedback Jan: drop message from argo_message_op > v2 self: move hash_index function below locking comment > v2 self: OVERHAUL > v2 self/Jan: remove use of magic verification field and tidy up > v2 self: merge max and min ring size check clauses > v2 feedback v1#13 Roger: use OS-supplied roundup; drop from public header > v2 feedback #9, Jan: use the argo-mac bootparam at point of introduction > v2 feedback #9, Jan: rename boot opt variable to comply with convention > v2 feedback #9, Jan: rename the argo_mac bootparam to argo-mac > v2 feedback #9 Jan: document argo boot opt in xen-command-line.markdown > v1,2 feedback Jan/Roger/Paul: drop errno returning guest access functions > v1 feedback Roger, Jan: drop argo prefix on static functions > v1 feedback Roger: s/pfn/gfn/ and retire always-64-bit type > v2. feedback Jan: document the argo-mac boot opt > v2. feedback Jan: simplify re-register, drop mappings > v1 #13 feedback Jan: revise use of guest_handle_okay vs __copy ops > > v1 #13 feedback, Jan: register op : s/ECONNREFUSED/ESRCH/ > v1 #5 (#13) feedback Paul: register op: use currd in do_message_op > v1 #13 feedback, Paul: register op: use mfn_eq comparator > v1 #5 (#13) feedback Paul: register op: use currd in argo_register_ring > v1 #13 feedback Paul: register op: whitespace, unsigned, bounds check > v1 #13 feedback Paul: use of hex in limit constant definition > v1 #13 feedback Paul, register op: set nmfns on loop termination > v1 #13 feedback Paul: register op: do/while -> gotos, reindent > v1 argo_ring_map_page: drop uint32_t for unsigned int > v1. #13 feedback Julien: use page descriptors instead of gpfns. > - adds ABI support for pages with different granularity. > v1 feedback #13, Paul: adjust log level of message > v1 feedback #13, Paul: use gprintk for guest-triggered warning > v1 feedback #13, Paul: gprintk and XENLOG_DEBUG for ring registration > v1 feedback #13, Paul: use gprintk for errs in argo_ring_map_page > v1 feedback #13, Paul: use ENOMEM if global mapping fails > v1 feedback Paul: overflow check before shift > v1: add define for copy_field_to_guest_errno > v1: fix gprintk use for ARM as its defn dislikes split format strings > v1: use copy_field_to_guest_errno > v1 feedback #13, Jan: argo_hash_fn: no inline, rename, change type > v1 feedback #13, Paul, Jan: EFAULT -> ENOMEM in argo_ring_map_page > v1 feedback #13, Jan: rename page var in argo_ring_map_page > v1 feedback #13, Jan: switch uint8_t* to void* and drop cast > v1 feedback #13, Jan: switch memory barrier to smp_wmb > v1 feedback #13, Jan: make 'ring' comment comply with single-line style > v1 feedback #13, Jan: use xzalloc_array, drop loop NULL init > v1 feedback #13, Jan: init bool with false rather than 0 > v1 feedback #13 Jan: use __copy; define and use __copy_field_to_guest_errno > v1 feedback #13, Jan: use xzalloc, drop individual init zeroes > v1 feedback #13, Jan: prefix public namespace with xen > v1 feedback #13, Jan: blank line after op case in do_argo_message_op > v1 self: reflow comment in argo_ring_map_page to within 80 char len > v1 feedback #13, Roger: use true not 1 in assign to update_tx_ptr bool > v1 feedback #21, Jan: fold in the array_index_nospec hash function guards > v1 feedback #18, Jan: fold the max ring count limit into the series > v1 self: use unsigned long type for XEN_ARGO_REGISTER_FLAG_MASK > v1: feedback #15 Jan: handle upper-halves of hypercall args > v1. feedback #13 Jan: add comment re: page alignment > v1. self: confirm ring magic presence in supplied page array > v1. feedback #13 Jan: add comment re: minimum ring size > v1. feedback #13 Roger: use ASSERT_UNREACHABLE > v1. feedback Roger: add comment to hash function > > docs/misc/xen-command-line.pandoc | 15 + > xen/common/argo.c | 566 > +++++++++++++++++++++++++++++++++++++ > xen/include/asm-arm/guest_access.h | 2 + > xen/include/asm-x86/guest_access.h | 2 + > xen/include/public/argo.h | 72 +++++ > xen/include/xlat.lst | 1 + > 6 files changed, 658 insertions(+) > > diff --git a/docs/misc/xen-command-line.pandoc > b/docs/misc/xen-command-line.pandoc > index aea13eb..68d4415 100644 > --- a/docs/misc/xen-command-line.pandoc > +++ b/docs/misc/xen-command-line.pandoc > @@ -193,6 +193,21 @@ This allows domains access to the Argo hypercall, which > supports registration > of memory rings with the hypervisor to receive messages, sending messages to > other domains by hypercall and querying the ring status of other domains. > > +### argo-mac > +> `= permissive | enforcing` Why not call this argo-mac-permissive and make it a boolean? Default would be 'false' and that would imply enforcing. This would get rid of parse_opt_argo_mac since you could use the default boolean parser. > + > +> Default: `enforcing` > + > +Constrain the access control applied to the Argo communication mechanism. > + > +When `enforcing`, domains may not register rings that have wildcard specified > +for the sender which would allow messages to be sent to the ring by any > domain. > +This is to protect rings and the services that utilize them against DoS by a > +malicious or buggy domain spamming the ring. > + > +When the boot option is set to `permissive`, this constraint is relaxed and > +wildcard any-sender rings are allowed to be registered. > + > ### asid (x86) > > `= <boolean>` > > diff --git a/xen/common/argo.c b/xen/common/argo.c > index 86195d3..11988e7 100644 > --- a/xen/common/argo.c > +++ b/xen/common/argo.c > @@ -23,16 +23,41 @@ > #include <xen/event.h> > #include <xen/domain_page.h> > #include <xen/guest_access.h> > +#include <xen/lib.h> > +#include <xen/nospec.h> > #include <xen/time.h> > #include <public/argo.h> > > +#define MAX_RINGS_PER_DOMAIN 128U > + > +/* All messages on the ring are padded to a multiple of the slot size. */ > +#define ROUNDUP_MESSAGE(a) (ROUNDUP((a), XEN_ARGO_MSG_SLOT_SIZE)) > + > DEFINE_XEN_GUEST_HANDLE(xen_argo_addr_t); > +DEFINE_XEN_GUEST_HANDLE(xen_argo_page_descr_t); > +DEFINE_XEN_GUEST_HANDLE(xen_argo_register_ring_t); > DEFINE_XEN_GUEST_HANDLE(xen_argo_ring_t); > > /* Xen command line option to enable argo */ > static bool __read_mostly opt_argo_enabled; > boolean_param("argo", opt_argo_enabled); > > +/* Xen command line option for conservative or relaxed access control */ > +bool __read_mostly opt_argo_mac_enforcing = true; > + > +static int __init parse_opt_argo_mac(const char *s) > +{ > + if ( !strcmp(s, "enforcing") ) > + opt_argo_mac_enforcing = true; > + else if ( !strcmp(s, "permissive") ) > + opt_argo_mac_enforcing = false; > + else > + return -EINVAL; > + > + return 0; > +} > +custom_param("argo-mac", parse_opt_argo_mac); > + > typedef struct argo_ring_id > { > uint32_t port; > @@ -198,6 +223,31 @@ static DEFINE_RWLOCK(argo_lock); /* L1 */ > #define argo_dprintk(format, ... ) ((void)0) > #endif > > +/* > + * This hash function is used to distribute rings within the per-domain > + * hash tables (d->argo->ring_hash and d->argo_send_hash). The hash table > + * will provide a struct if a match is found with a 'argo_ring_id' key: > + * ie. the key is a (domain id, port, partner domain id) tuple. > + * Since port number varies the most in expected use, and the Linux driver > + * allocates at both the high and low ends, incorporate high and low bits to > + * help with distribution. > + * Apply array_index_nospec as a defensive measure since this operates > + * on user-supplied input and the array size that it indexes into is known. > + */ > +static unsigned int > +hash_index(const struct argo_ring_id *id) > +{ > + unsigned int hash; > + > + hash = (uint16_t)(id->port >> 16); > + hash ^= (uint16_t)id->port; > + hash ^= id->domain_id; > + hash ^= id->partner_id; > + hash &= (ARGO_HTABLE_SIZE - 1); > + > + return array_index_nospec(hash, ARGO_HTABLE_SIZE); > +} > + > static void > ring_unmap(struct argo_ring_info *ring_info) > { > @@ -219,6 +269,78 @@ ring_unmap(struct argo_ring_info *ring_info) > } > } > > +static int > +ring_map_page(struct argo_ring_info *ring_info, unsigned int i, void > **out_ptr) > +{ > + if ( i >= ring_info->nmfns ) > + { > + gprintk(XENLOG_ERR, > + "argo: ring (vm%u:%x vm%d) %p attempted to map page %u of > %u\n", > + ring_info->id.domain_id, ring_info->id.port, > + ring_info->id.partner_id, ring_info, i, ring_info->nmfns); > + return -ENOMEM; > + } > + > + if ( !ring_info->mfns || !ring_info->mfn_mapping) > + { > + ASSERT_UNREACHABLE(); > + ring_info->len = 0; > + return -ENOMEM; > + } > + > + if ( !ring_info->mfn_mapping[i] ) > + { > + /* > + * TODO: > + * The first page of the ring contains the ring indices, so both read > + * and write access to the page is required by the hypervisor, but > + * read-access is not needed for this mapping for the remainder of > the > + * ring. > + * Since this mapping will remain resident in Xen's address space for > + * the lifetime of the ring, and following the principle of least > + * privilege, it could be preferable to: > + * # add a XSM check to determine what policy is wanted here > + * # depending on the XSM query, optionally create this mapping as > + * _write-only_ on platforms that can support it. > + * (eg. Intel EPT/AMD NPT). Why do Intel EPT or AMD NPT matter here? You are mapping the page to Xen address space, which doesn't use either EPT or NPT. Writable or read-only mappings would be created by setting the right bit in the Xen page tables. > + */ > + ring_info->mfn_mapping[i] = > map_domain_page_global(ring_info->mfns[i]); > + No need for the newline. > + if ( !ring_info->mfn_mapping[i] ) > + { > + gprintk(XENLOG_ERR, > + "argo: ring (vm%u:%x vm%d) %p attempted to map page %u of > %u\n", > + ring_info->id.domain_id, ring_info->id.port, > + ring_info->id.partner_id, ring_info, i, > ring_info->nmfns); > + return -ENOMEM; > + } > + argo_dprintk("mapping page %"PRI_mfn" to %p\n", > + mfn_x(ring_info->mfns[i]), ring_info->mfn_mapping[i]); > + } > + > + if ( out_ptr ) > + *out_ptr = ring_info->mfn_mapping[i]; > + > + return 0; > +} > + > +static void > +update_tx_ptr(struct argo_ring_info *ring_info, uint32_t tx_ptr) > +{ > + void *dst; > + uint32_t *p; > + > + ASSERT(ring_info->mfn_mapping[0]); > + > + ring_info->tx_ptr = tx_ptr; > + > + dst = ring_info->mfn_mapping[0]; > + p = dst + offsetof(xen_argo_ring_t, tx_ptr); Hm, wouldn't it be easier to cast page 0 to the layout of the ring so that you don't need to use pointer arithmetic to get the fields? Ie: make dst be of type xen_argo_ring_t. > + > + write_atomic(p, tx_ptr); > + smp_wmb(); > +} > + > static void > wildcard_pending_list_remove(domid_t domain_id, struct pending_ent *ent) > { > @@ -371,6 +493,418 @@ partner_rings_remove(struct domain *src_d) > } > } > > +static int > +find_ring_mfn(struct domain *d, gfn_t gfn, mfn_t *mfn) > +{ > + p2m_type_t p2mt; > + int ret = 0; > + > +#ifdef CONFIG_X86 > + *mfn = get_gfn_unshare(d, gfn_x(gfn), &p2mt); > +#else > + *mfn = p2m_lookup(d, gfn, &p2mt); > +#endif > + > + if ( !mfn_valid(*mfn) ) > + ret = -EINVAL; > +#ifdef CONFIG_X86 > + else if ( p2m_is_paging(p2mt) || (p2mt == p2m_ram_logdirty) ) > + ret = -EAGAIN; > +#endif > + else if ( (p2mt != p2m_ram_rw) || > + !get_page_and_type(mfn_to_page(*mfn), d, PGT_writable_page) ) > + ret = -EINVAL; > + > +#ifdef CONFIG_X86 > + put_gfn(d, gfn_x(gfn)); > +#endif > + > + return ret; > +} > + > +static int > +find_ring_mfns(struct domain *d, struct argo_ring_info *ring_info, > + uint32_t npage, > + XEN_GUEST_HANDLE_PARAM(xen_argo_page_descr_t) pg_descr_hnd, > + uint32_t len) > +{ > + unsigned int i; > + int ret = 0; > + mfn_t *mfns; > + uint8_t **mfn_mapping; > + > + /* > + * first bounds check on npage here also serves as an overflow check > + * before left shifting it > + */ > + if ( (unlikely(npage > (XEN_ARGO_MAX_RING_SIZE >> PAGE_SHIFT))) || > + ((npage << PAGE_SHIFT) < len) ) > + return -EINVAL; > + > + if ( ring_info->mfns ) > + { > + /* Ring already existed: drop the previous mapping. */ > + gprintk(XENLOG_INFO, > + "argo: vm%u re-register existing ring (vm%u:%x vm%d) clears > mapping\n", > + d->domain_id, ring_info->id.domain_id, > + ring_info->id.port, ring_info->id.partner_id); > + > + ring_remove_mfns(d, ring_info); > + ASSERT(!ring_info->mfns); > + } > + > + mfns = xmalloc_array(mfn_t, npage); > + if ( !mfns ) > + return -ENOMEM; > + > + for ( i = 0; i < npage; i++ ) > + mfns[i] = INVALID_MFN; > + > + mfn_mapping = xzalloc_array(uint8_t *, npage); > + if ( !mfn_mapping ) > + { > + xfree(mfns); > + return -ENOMEM; > + } > + > + ring_info->npage = npage; > + ring_info->mfns = mfns; > + ring_info->mfn_mapping = mfn_mapping; > + > + ASSERT(ring_info->npage == npage); > + > + if ( ring_info->nmfns == ring_info->npage ) > + return 0; > + > + for ( i = ring_info->nmfns; i < ring_info->npage; i++ ) This loop seems to assume that there can be pages already added to the ring, but IIRC you said that redimensioning of the ring was removed in this version? I think for an initial version it would be easier to don't allow redimensioning of active rings, and just allow teardown and re-initialization as the way to redimension a ring. > + { > + xen_argo_page_descr_t pg_descr; > + gfn_t gfn; > + mfn_t mfn; > + > + ret = __copy_from_guest_offset(&pg_descr, pg_descr_hnd, i, 1) ? > + -EFAULT : 0; > + if ( ret ) > + break; > + > + /* Implementation currently only supports handling 4K pages */ > + if ( (pg_descr & XEN_ARGO_PAGE_DESCR_SIZE_MASK) != > + XEN_ARGO_PAGE_DESCR_SIZE_4K ) > + { > + ret = -EINVAL; > + break; > + } > + gfn = _gfn(pg_descr >> PAGE_SHIFT); > + > + ret = find_ring_mfn(d, gfn, &mfn); > + if ( ret ) > + { > + gprintk(XENLOG_ERR, > + "argo: vm%u: invalid gfn %"PRI_gfn" r:(vm%u:%x vm%d) %p > %d/%d\n", > + d->domain_id, gfn_x(gfn), ring_info->id.domain_id, > + ring_info->id.port, ring_info->id.partner_id, > + ring_info, i, ring_info->npage); > + break; > + } > + > + ring_info->mfns[i] = mfn; > + > + argo_dprintk("%d: %"PRI_gfn" -> %"PRI_mfn"\n", > + i, gfn_x(gfn), mfn_x(ring_info->mfns[i])); > + } > + > + ring_info->nmfns = i; > + > + if ( ret ) > + ring_remove_mfns(d, ring_info); > + else > + { > + ASSERT(ring_info->nmfns == ring_info->npage); > + > + gprintk(XENLOG_DEBUG, > + "argo: vm%u ring (vm%u:%x vm%d) %p mfn_mapping %p npage %d nmfns > %d\n", > + d->domain_id, ring_info->id.domain_id, > + ring_info->id.port, ring_info->id.partner_id, ring_info, > + ring_info->mfn_mapping, ring_info->npage, ring_info->nmfns); > + } > + > + return ret; > +} > + > +static struct argo_ring_info * > +ring_find_info(const struct domain *d, const struct argo_ring_id *id) > +{ > + unsigned int ring_hash_index; > + struct hlist_node *node; > + struct argo_ring_info *ring_info; > + > + ASSERT(rw_is_locked(&d->argo->lock)); > + > + ring_hash_index = hash_index(id); > + > + argo_dprintk("d->argo=%p, d->argo->ring_hash[%u]=%p id=%p\n", > + d->argo, ring_hash_index, > + d->argo->ring_hash[ring_hash_index].first, id); > + argo_dprintk("id.port=%x id.domain=vm%u id.partner_id=vm%d\n", > + id->port, id->domain_id, id->partner_id); > + > + hlist_for_each_entry(ring_info, node, > &d->argo->ring_hash[ring_hash_index], > + node) > + { > + struct argo_ring_id *cmpid = &ring_info->id; const? > + > + if ( cmpid->port == id->port && > + cmpid->domain_id == id->domain_id && > + cmpid->partner_id == id->partner_id ) > + { > + argo_dprintk("ring_info=%p\n", ring_info); > + return ring_info; > + } > + } > + argo_dprintk("no ring_info found\n"); > + > + return NULL; > +} > + > +static long > +register_ring(struct domain *currd, If this is indeed the current domain (as the name suggests), why do you need to pass it around? Or else just name the parameter d. > + XEN_GUEST_HANDLE_PARAM(xen_argo_register_ring_t) reg_hnd, > + XEN_GUEST_HANDLE_PARAM(xen_argo_page_descr_t) pg_descr_hnd, > + uint32_t npage, bool fail_exist) > +{ > + xen_argo_register_ring_t reg; > + struct argo_ring_id ring_id; > + void *map_ringp; > + xen_argo_ring_t *ringp; > + struct argo_ring_info *ring_info; > + struct argo_send_info *send_info = NULL; > + struct domain *dst_d = NULL; > + int ret = 0; > + uint32_t private_tx_ptr; > + > + if ( copy_from_guest(&reg, reg_hnd, 1) ) > + { > + ret = -EFAULT; > + goto out; I don't see the point of using an out label, why not just use 'return -EFAULT;' (here and below). This avoids the braces and also removes the need for the ret assignment. > + } > + > + /* > + * A ring must be large enough to transmit messages, so requires space > for: > + * * 1 message header, plus > + * * 1 payload slot (payload is always rounded to a multiple of 16 bytes) > + * for the message payload to be written into, plus > + * * 1 more slot, so that the ring cannot be filled to capacity with a > + * single message -- see the logic in ringbuf_insert -- allowing for > this > + * ensures that there can be space remaining when a message is present. > + * The above determines the minimum acceptable ring size. > + */ > + if ( (reg.len < (sizeof(struct xen_argo_ring_message_header) > + + ROUNDUP_MESSAGE(1) + ROUNDUP_MESSAGE(1))) || > + (reg.len > XEN_ARGO_MAX_RING_SIZE) || > + (reg.len != ROUNDUP_MESSAGE(reg.len)) || > + (reg.pad != 0) ) > + { > + ret = -EINVAL; > + goto out; > + } > + > + ring_id.partner_id = reg.partner_id; > + ring_id.port = reg.port; > + ring_id.domain_id = currd->domain_id; > + > + read_lock(&argo_lock); > + > + if ( !currd->argo ) > + { > + ret = -ENODEV; > + goto out_unlock; > + } > + > + if ( reg.partner_id == XEN_ARGO_DOMID_ANY ) > + { > + if ( opt_argo_mac_enforcing ) > + { > + ret = -EPERM; > + goto out_unlock; > + } > + } > + else > + { > + dst_d = get_domain_by_id(reg.partner_id); > + if ( !dst_d ) > + { > + argo_dprintk("!dst_d, ESRCH\n"); > + ret = -ESRCH; > + goto out_unlock; > + } > + > + if ( !dst_d->argo ) > + { > + argo_dprintk("!dst_d->argo, ECONNREFUSED\n"); > + ret = -ECONNREFUSED; > + put_domain(dst_d); > + goto out_unlock; > + } > + > + send_info = xzalloc(struct argo_send_info); > + if ( !send_info ) > + { > + ret = -ENOMEM; > + put_domain(dst_d); > + goto out_unlock; > + } > + send_info->id = ring_id; > + } > + > + write_lock(&currd->argo->lock); > + > + if ( currd->argo->ring_count >= MAX_RINGS_PER_DOMAIN ) > + { > + ret = -ENOSPC; > + goto out_unlock2; > + } > + > + ring_info = ring_find_info(currd, &ring_id); > + if ( !ring_info ) > + { > + ring_info = xzalloc(struct argo_ring_info); > + if ( !ring_info ) > + { > + ret = -ENOMEM; > + goto out_unlock2; > + } > + > + spin_lock_init(&ring_info->lock); > + > + ring_info->id = ring_id; > + INIT_HLIST_HEAD(&ring_info->pending); > + > + hlist_add_head(&ring_info->node, > + &currd->argo->ring_hash[hash_index(&ring_info->id)]); > + > + gprintk(XENLOG_DEBUG, "argo: vm%u registering ring (vm%u:%x vm%d)\n", > + currd->domain_id, ring_id.domain_id, ring_id.port, > + ring_id.partner_id); > + } > + else > + { > + if ( ring_info->len ) > + { > + /* > + * If the caller specified that the ring must not already exist, > + * fail at attempt to add a completed ring which already exists. > + */ > + if ( fail_exist ) > + { > + argo_dprintk("disallowed reregistration of existing ring\n"); > + ret = -EEXIST; > + goto out_unlock2; > + } > + > + if ( ring_info->len != reg.len ) > + { > + /* > + * Change of ring size could result in entries on the pending > + * notifications list that will never trigger. > + * Simple blunt solution: disallow ring resize for now. > + * TODO: investigate enabling ring resize. > + */ I think ring resizing was removed on this version? > + gprintk(XENLOG_ERR, > + "argo: vm%u attempted to change ring size(vm%u:%x > vm%d)\n", > + currd->domain_id, ring_id.domain_id, ring_id.port, > + ring_id.partner_id); > + /* > + * Could return EINVAL here, but if the ring didn't already > + * exist then the arguments would have been valid, so: > EEXIST. > + */ > + ret = -EEXIST; > + goto out_unlock2; > + } > + > + gprintk(XENLOG_DEBUG, > + "argo: vm%u re-registering existing ring (vm%u:%x > vm%d)\n", > + currd->domain_id, ring_id.domain_id, ring_id.port, > + ring_id.partner_id); > + } > + } > + > + ret = find_ring_mfns(currd, ring_info, npage, pg_descr_hnd, reg.len); > + if ( ret ) > + { > + gprintk(XENLOG_ERR, > + "argo: vm%u failed to find ring mfns (vm%u:%x vm%d)\n", > + currd->domain_id, ring_id.domain_id, ring_id.port, > + ring_id.partner_id); > + > + ring_remove_info(currd, ring_info); > + goto out_unlock2; > + } > + > + /* > + * The first page of the memory supplied for the ring has the > xen_argo_ring > + * structure at its head, which is where the ring indexes reside. > + */ > + ret = ring_map_page(ring_info, 0, &map_ringp); > + if ( ret ) > + { > + gprintk(XENLOG_ERR, > + "argo: vm%u failed to map ring mfn 0 (vm%u:%x vm%d)\n", > + currd->domain_id, ring_id.domain_id, ring_id.port, > + ring_id.partner_id); > + > + ring_remove_info(currd, ring_info); > + goto out_unlock2; > + } > + ringp = map_ringp; > + > + private_tx_ptr = read_atomic(&ringp->tx_ptr); > + > + if ( (private_tx_ptr >= reg.len) || > + (ROUNDUP_MESSAGE(private_tx_ptr) != private_tx_ptr) ) > + { > + /* > + * Since the ring is a mess, attempt to flush the contents of it > + * here by setting the tx_ptr to the next aligned message slot past > + * the latest rx_ptr we have observed. Handle ring wrap correctly. > + */ > + private_tx_ptr = ROUNDUP_MESSAGE(read_atomic(&ringp->rx_ptr)); > + > + if ( private_tx_ptr >= reg.len ) > + private_tx_ptr = 0; > + > + update_tx_ptr(ring_info, private_tx_ptr); > + } > + > + ring_info->tx_ptr = private_tx_ptr; > + ring_info->len = reg.len; > + currd->argo->ring_count++; > + > + if ( send_info ) > + { > + spin_lock(&dst_d->argo->send_lock); > + > + hlist_add_head(&send_info->node, > + &dst_d->argo->send_hash[hash_index(&send_info->id)]); > + > + spin_unlock(&dst_d->argo->send_lock); > + } > + > + out_unlock2: > + if ( !ret && send_info ) > + xfree(send_info); There's no need to check if send_info is set, xfree(NULL) is safe. > + > + if ( dst_d ) > + put_domain(dst_d); > + > + write_unlock(&currd->argo->lock); > + > + out_unlock: > + read_unlock(&argo_lock); > + > + out: > + return ret; > +} > + > long > do_argo_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1, > XEN_GUEST_HANDLE_PARAM(void) arg2, unsigned long arg3, > @@ -392,6 +926,38 @@ do_argo_op(unsigned int cmd, > XEN_GUEST_HANDLE_PARAM(void) arg1, > > switch (cmd) > { > + case XEN_ARGO_OP_register_ring: > + { > + XEN_GUEST_HANDLE_PARAM(xen_argo_register_ring_t) reg_hnd = > + guest_handle_cast(arg1, xen_argo_register_ring_t); > + XEN_GUEST_HANDLE_PARAM(xen_argo_page_descr_t) pg_descr_hnd = > + guest_handle_cast(arg2, xen_argo_page_descr_t); > + /* arg3 is npage */ > + /* arg4 is flags */ > + bool fail_exist = arg4 & XEN_ARGO_REGISTER_FLAG_FAIL_EXIST; > + > + if ( unlikely(arg3 > (XEN_ARGO_MAX_RING_SIZE >> PAGE_SHIFT)) ) > + { > + rc = -EINVAL; > + break; > + } > + /* > + * Check access to the whole array here so we can use the faster > __copy > + * operations to read each element later. > + */ > + if ( unlikely(!guest_handle_okay(pg_descr_hnd, arg3)) ) > + break; > + /* arg4: reserve currently-undefined bits, require zero. */ > + if ( unlikely(arg4 & ~XEN_ARGO_REGISTER_FLAG_MASK) ) > + { > + rc = -EINVAL; > + break; > + } > + > + rc = register_ring(currd, reg_hnd, pg_descr_hnd, arg3, fail_exist); > + break; > + } > + > default: > rc = -EOPNOTSUPP; > break; > diff --git a/xen/include/asm-arm/guest_access.h > b/xen/include/asm-arm/guest_access.h > index 8997a1c..70e9a78 100644 > --- a/xen/include/asm-arm/guest_access.h > +++ b/xen/include/asm-arm/guest_access.h > @@ -29,6 +29,8 @@ int access_guest_memory_by_ipa(struct domain *d, paddr_t > ipa, void *buf, > /* Is the guest handle a NULL reference? */ > #define guest_handle_is_null(hnd) ((hnd).p == NULL) > > +#define guest_handle_is_aligned(hnd, mask) (!((uintptr_t)(hnd).p & (mask))) > + > /* Offset the given guest handle into the array it refers to. */ > #define guest_handle_add_offset(hnd, nr) ((hnd).p += (nr)) > #define guest_handle_subtract_offset(hnd, nr) ((hnd).p -= (nr)) > diff --git a/xen/include/asm-x86/guest_access.h > b/xen/include/asm-x86/guest_access.h > index ca700c9..8dde5d5 100644 > --- a/xen/include/asm-x86/guest_access.h > +++ b/xen/include/asm-x86/guest_access.h > @@ -41,6 +41,8 @@ > /* Is the guest handle a NULL reference? */ > #define guest_handle_is_null(hnd) ((hnd).p == NULL) > > +#define guest_handle_is_aligned(hnd, mask) (!((uintptr_t)(hnd).p & (mask))) > + > /* Offset the given guest handle into the array it refers to. */ > #define guest_handle_add_offset(hnd, nr) ((hnd).p += (nr)) > #define guest_handle_subtract_offset(hnd, nr) ((hnd).p -= (nr)) > diff --git a/xen/include/public/argo.h b/xen/include/public/argo.h > index 4818684..8947230 100644 > --- a/xen/include/public/argo.h > +++ b/xen/include/public/argo.h > @@ -31,6 +31,26 @@ > > #include "xen.h" > > +#define XEN_ARGO_DOMID_ANY DOMID_INVALID > + > +/* > + * The maximum size of an Argo ring is defined to be: 16GB Is such a big size really required as the default maximum? The size of the internal structures required to support a 16GB ring would be quite big, has this been taken into account? Thanks, Roger. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.

Re: [Xen-devel] [PATCH v3 07/15] argo: implement the register op