[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-changelog] [xen staging] argo: implement the register op
commit 4048189d0125ed6033300a7951feb8a430504bf5 Author: Christopher Clark <christopher.w.clark@xxxxxxxxx> AuthorDate: Wed Feb 6 09:55:00 2019 +0100 Commit: Jan Beulich <jbeulich@xxxxxxxx> CommitDate: Thu Feb 7 14:22:52 2019 +0100 argo: implement the register op The register op is used by a domain to register a region of memory for receiving messages from either a specified other domain, or, if specifying a wildcard, any domain. This operation creates a mapping within Xen's private address space that will remain resident for the lifetime of the ring. In subsequent commits, the hypervisor will use this mapping to copy data from a sending domain into this registered ring, making it accessible to the domain that registered the ring to receive data. Wildcard any-sender rings are default disabled and registration will be refused with EPERM unless they have been specifically enabled with the new mac-permissive flag that is added to the argo boot option here. The reason why the default for wildcard rings is 'deny' is that there is currently no means to protect the ring from DoS by a noisy domain spamming the ring, affecting other domains ability to send to it. This will be addressed with XSM policy controls in subsequent work. Since denying access to any-sender rings is a significant functional constraint, the new option "mac-permissive" for the argo bootparam enables overriding this. eg: "argo=1,mac-permissive=1" The p2m type of the memory supplied by the guest for the ring must be p2m_ram_rw and the memory will be pinned as PGT_writable_page while the ring is registered. This hypercall op and its interface currently only supports 4K-sized pages. Signed-off-by: Christopher Clark <christopher.clark6@xxxxxxxxxxxxxx> Tested-by: Chris Patterson <pattersonc@xxxxxxxxxxxx> Reviewed-by: Roger Pau Monné <roger.pau@xxxxxxxxxx> Release-acked-by: Juergen Gross <jgross@xxxxxxxx> --- docs/misc/xen-command-line.pandoc | 9 +- xen/common/argo.c | 473 ++++++++++++++++++++++++++++++++++++++ xen/include/public/argo.h | 69 ++++++ xen/include/xlat.lst | 2 + 4 files changed, 552 insertions(+), 1 deletion(-) diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc index 605c544bd3..c8d1ced018 100644 --- a/docs/misc/xen-command-line.pandoc +++ b/docs/misc/xen-command-line.pandoc @@ -183,7 +183,7 @@ in combination with cpuidle. This option is only expected to be useful for developers wishing Xen to fall back to older timing methods on newer hardware. ### argo - = List of [ <bool> ] + = List of [ <bool>, mac-permissive=<bool> ] Controls for the Argo hypervisor-mediated interdomain communication service. @@ -195,6 +195,13 @@ point of authority. Guests may register memory rings to recieve messages, query the status of other domains, and send messages by hypercall, all subject to appropriate auditing by Xen. Argo is disabled by default. +* The `mac-permissive` boolean controls whether wildcard receive rings may be + registered (`mac-permissive=1`) or may not be registered + (`mac-permissive=0`). + + This option is disabled by default, to protect domains from a DoS by a + buggy or malicious other domain spamming the ring. + ### asid (x86) > `= <boolean>` diff --git a/xen/common/argo.c b/xen/common/argo.c index f1626ca520..814dd0cf1c 100644 --- a/xen/common/argo.c +++ b/xen/common/argo.c @@ -22,6 +22,7 @@ #include <xen/errno.h> #include <xen/event.h> #include <xen/guest_access.h> +#include <xen/lib.h> #include <xen/nospec.h> #include <xen/sched.h> #include <xen/time.h> @@ -31,13 +32,30 @@ #ifdef CONFIG_COMPAT #include <compat/argo.h> CHECK_argo_addr; +#undef CHECK_argo_addr +#define CHECK_argo_addr struct xen_argo_addr +CHECK_argo_register_ring; CHECK_argo_ring; +CHECK_argo_ring_message_header; #endif +#define MAX_RINGS_PER_DOMAIN 128U + +/* All messages on the ring are padded to a multiple of the slot size. */ +#define ROUNDUP_MESSAGE(a) ROUNDUP((a), XEN_ARGO_MSG_SLOT_SIZE) + +/* Number of PAGEs needed to hold a ring of a given size in bytes */ +#define NPAGES_RING(ring_len) \ + (ROUNDUP((ROUNDUP_MESSAGE(ring_len) + sizeof(xen_argo_ring_t)), PAGE_SIZE) \ + >> PAGE_SHIFT) + DEFINE_XEN_GUEST_HANDLE(xen_argo_addr_t); +DEFINE_XEN_GUEST_HANDLE(xen_argo_gfn_t); +DEFINE_XEN_GUEST_HANDLE(xen_argo_register_ring_t); DEFINE_XEN_GUEST_HANDLE(xen_argo_ring_t); static bool __read_mostly opt_argo; +static bool __read_mostly opt_argo_mac_permissive; static int __init parse_argo(const char *s) { @@ -51,6 +69,8 @@ static int __init parse_argo(const char *s) if ( (val = parse_bool(s, ss)) >= 0 ) opt_argo = val; + else if ( (val = parse_boolean("mac-permissive", s, ss)) >= 0 ) + opt_argo_mac_permissive = val; else rc = -EINVAL; @@ -366,6 +386,74 @@ ring_unmap(const struct domain *d, struct argo_ring_info *ring_info) } } +static int +ring_map_page(const struct domain *d, struct argo_ring_info *ring_info, + unsigned int i, void **out_ptr) +{ + ASSERT(LOCKING_L3(d, ring_info)); + + /* + * FIXME: Investigate using vmap to create a single contiguous virtual + * address space mapping of the ring instead of using the array of single + * page mappings. + * Affects logic in memcpy_to_guest_ring, the mfn_mapping array data + * structure, and places where ring mappings are added or removed. + */ + + if ( i >= ring_info->nmfns ) + { + gprintk(XENLOG_ERR, + "argo: ring (vm%u:%x vm%u) %p attempted to map page %u of %u\n", + ring_info->id.domain_id, ring_info->id.aport, + ring_info->id.partner_id, ring_info, i, ring_info->nmfns); + return -ENOMEM; + } + i = array_index_nospec(i, ring_info->nmfns); + + if ( !ring_info->mfns || !ring_info->mfn_mapping ) + { + ASSERT_UNREACHABLE(); + ring_info->len = 0; + return -ENOMEM; + } + + if ( !ring_info->mfn_mapping[i] ) + { + ring_info->mfn_mapping[i] = map_domain_page_global(ring_info->mfns[i]); + if ( !ring_info->mfn_mapping[i] ) + { + gprintk(XENLOG_ERR, "argo: ring (vm%u:%x vm%u) %p attempted to map " + "page %u of %u\n", + ring_info->id.domain_id, ring_info->id.aport, + ring_info->id.partner_id, ring_info, i, ring_info->nmfns); + return -ENOMEM; + } + argo_dprintk("mapping page %"PRI_mfn" to %p\n", + mfn_x(ring_info->mfns[i]), ring_info->mfn_mapping[i]); + } + + if ( out_ptr ) + *out_ptr = ring_info->mfn_mapping[i]; + + return 0; +} + +static void +update_tx_ptr(const struct domain *d, struct argo_ring_info *ring_info, + uint32_t tx_ptr) +{ + xen_argo_ring_t *ringp; + + ASSERT(LOCKING_L3(d, ring_info)); + ASSERT(ring_info->mfn_mapping[0]); + + ring_info->tx_ptr = tx_ptr; + ringp = ring_info->mfn_mapping[0]; + + write_atomic(&ringp->tx_ptr, tx_ptr); + smp_wmb(); +} + static void wildcard_pending_list_remove(domid_t domain_id, struct pending_ent *ent) { @@ -536,11 +624,369 @@ partner_rings_remove(struct domain *src_d) } } +static int +find_ring_mfn(struct domain *d, gfn_t gfn, mfn_t *mfn) +{ + struct page_info *page; + p2m_type_t p2mt; + int ret; + + ret = check_get_page_from_gfn(d, gfn, false, &p2mt, &page); + if ( unlikely(ret) ) + return ret; + + *mfn = page_to_mfn(page); + if ( !mfn_valid(*mfn) ) + ret = -EINVAL; +#ifdef CONFIG_X86 + else if ( p2mt == p2m_ram_logdirty ) + ret = -EAGAIN; +#endif + else if ( (p2mt != p2m_ram_rw) || + !get_page_and_type(page, d, PGT_writable_page) ) + ret = -EINVAL; + + put_page(page); + + return ret; +} + +static int +find_ring_mfns(struct domain *d, struct argo_ring_info *ring_info, + const unsigned int npage, + XEN_GUEST_HANDLE_PARAM(xen_argo_gfn_t) gfn_hnd, + const unsigned int len) +{ + unsigned int i; + int ret = 0; + mfn_t *mfns; + void **mfn_mapping; + + ASSERT(LOCKING_Write_rings_L2(d)); + + if ( ring_info->mfns ) + { + /* Ring already existed: drop the previous mapping. */ + gprintk(XENLOG_INFO, "argo: vm%u re-register existing ring " + "(vm%u:%x vm%u) clears mapping\n", + d->domain_id, ring_info->id.domain_id, + ring_info->id.aport, ring_info->id.partner_id); + + ring_remove_mfns(d, ring_info); + ASSERT(!ring_info->mfns); + } + + mfns = xmalloc_array(mfn_t, npage); + if ( !mfns ) + return -ENOMEM; + + for ( i = 0; i < npage; i++ ) + mfns[i] = INVALID_MFN; + + mfn_mapping = xzalloc_array(void *, npage); + if ( !mfn_mapping ) + { + xfree(mfns); + return -ENOMEM; + } + + ring_info->mfns = mfns; + ring_info->mfn_mapping = mfn_mapping; + + for ( i = 0; i < npage; i++ ) + { + mfn_t mfn; + xen_argo_gfn_t argo_gfn; + + ret = __copy_from_guest_offset(&argo_gfn, gfn_hnd, i, 1) ? -EFAULT : 0; + if ( ret ) + break; + + ret = find_ring_mfn(d, _gfn(argo_gfn), &mfn); + if ( ret ) + { + gprintk(XENLOG_ERR, "argo: vm%u: invalid gfn %"PRI_gfn" " + "r:(vm%u:%x vm%u) %p %u/%u\n", + d->domain_id, gfn_x(_gfn(argo_gfn)), + ring_info->id.domain_id, ring_info->id.aport, + ring_info->id.partner_id, ring_info, i, npage); + break; + } + + ring_info->mfns[i] = mfn; + + argo_dprintk("%u: %"PRI_gfn" -> %"PRI_mfn"\n", + i, gfn_x(_gfn(argo_gfn)), mfn_x(ring_info->mfns[i])); + } + + ring_info->nmfns = i; + + if ( ret ) + ring_remove_mfns(d, ring_info); + else + { + ASSERT(ring_info->nmfns == NPAGES_RING(len)); + + gprintk(XENLOG_DEBUG, "argo: vm%u ring (vm%u:%x vm%u) %p " + "mfn_mapping %p len %u nmfns %u\n", + d->domain_id, ring_info->id.domain_id, + ring_info->id.aport, ring_info->id.partner_id, ring_info, + ring_info->mfn_mapping, ring_info->len, ring_info->nmfns); + } + + return ret; +} + +static long +register_ring(struct domain *currd, + XEN_GUEST_HANDLE_PARAM(xen_argo_register_ring_t) reg_hnd, + XEN_GUEST_HANDLE_PARAM(xen_argo_gfn_t) gfn_hnd, + unsigned int npage, unsigned int flags) +{ + xen_argo_register_ring_t reg; + struct argo_ring_id ring_id; + void *map_ringp; + xen_argo_ring_t *ringp; + struct argo_ring_info *ring_info, *new_ring_info = NULL; + struct argo_send_info *send_info = NULL; + struct domain *dst_d = NULL; + int ret = 0; + unsigned int private_tx_ptr; + + ASSERT(currd == current->domain); + + /* flags: reserve currently-undefined bits, require zero. */ + if ( unlikely(flags & ~XEN_ARGO_REGISTER_FLAG_MASK) ) + return -EINVAL; + + if ( copy_from_guest(®, reg_hnd, 1) ) + return -EFAULT; + + /* + * A ring must be large enough to transmit messages, so requires space for: + * * 1 message header, plus + * * 1 payload slot (payload is always rounded to a multiple of 16 bytes) + * for the message payload to be written into, plus + * * 1 more slot, so that the ring cannot be filled to capacity with a + * single minimum-size message -- see the logic in ringbuf_insert -- + * allowing for this ensures that there can be space remaining when a + * message is present. + * The above determines the minimum acceptable ring size. + */ + if ( (reg.len < (sizeof(struct xen_argo_ring_message_header) + + ROUNDUP_MESSAGE(1) + ROUNDUP_MESSAGE(1))) || + (reg.len > XEN_ARGO_MAX_RING_SIZE) || + (reg.len != ROUNDUP_MESSAGE(reg.len)) || + (NPAGES_RING(reg.len) != npage) || + (reg.pad != 0) ) + return -EINVAL; + + ring_id.partner_id = reg.partner_id; + ring_id.aport = reg.aport; + ring_id.domain_id = currd->domain_id; + + if ( reg.partner_id == XEN_ARGO_DOMID_ANY ) + { + if ( !opt_argo_mac_permissive ) + return -EPERM; + } + else + { + dst_d = get_domain_by_id(reg.partner_id); + if ( !dst_d ) + { + argo_dprintk("!dst_d, ESRCH\n"); + return -ESRCH; + } + + send_info = xzalloc(struct argo_send_info); + if ( !send_info ) + { + ret = -ENOMEM; + goto out; + } + send_info->id = ring_id; + } + + /* + * Common case is that the ring doesn't already exist, so do the alloc here + * before picking up any locks. + */ + new_ring_info = xzalloc(struct argo_ring_info); + if ( !new_ring_info ) + { + ret = -ENOMEM; + goto out; + } + + read_lock(&L1_global_argo_rwlock); + + if ( !currd->argo ) + { + ret = -ENODEV; + goto out_unlock; + } + + if ( dst_d && !dst_d->argo ) + { + argo_dprintk("!dst_d->argo, ECONNREFUSED\n"); + ret = -ECONNREFUSED; + goto out_unlock; + } + + write_lock(&currd->argo->rings_L2_rwlock); + + if ( currd->argo->ring_count >= MAX_RINGS_PER_DOMAIN ) + { + ret = -ENOSPC; + goto out_unlock2; + } + + ring_info = find_ring_info(currd, &ring_id); + if ( !ring_info ) + { + ring_info = new_ring_info; + new_ring_info = NULL; + + spin_lock_init(&ring_info->L3_lock); + + ring_info->id = ring_id; + INIT_LIST_HEAD(&ring_info->pending); + + list_add(&ring_info->node, + &currd->argo->ring_hash[hash_index(&ring_info->id)]); + + gprintk(XENLOG_DEBUG, "argo: vm%u registering ring (vm%u:%x vm%u)\n", + currd->domain_id, ring_id.domain_id, ring_id.aport, + ring_id.partner_id); + } + else if ( ring_info->len ) + { + /* + * If the caller specified that the ring must not already exist, + * fail at attempt to add a completed ring which already exists. + */ + if ( flags & XEN_ARGO_REGISTER_FLAG_FAIL_EXIST ) + { + gprintk(XENLOG_ERR, "argo: vm%u disallowed reregistration of " + "existing ring (vm%u:%x vm%u)\n", + currd->domain_id, ring_id.domain_id, ring_id.aport, + ring_id.partner_id); + ret = -EEXIST; + goto out_unlock2; + } + + if ( ring_info->len != reg.len ) + { + /* + * Change of ring size could result in entries on the pending + * notifications list that will never trigger. + * Simple blunt solution: disallow ring resize for now. + * TODO: investigate enabling ring resize. + */ + gprintk(XENLOG_ERR, "argo: vm%u attempted to change ring size " + "(vm%u:%x vm%u)\n", + currd->domain_id, ring_id.domain_id, ring_id.aport, + ring_id.partner_id); + /* + * Could return EINVAL here, but if the ring didn't already + * exist then the arguments would have been valid, so: EEXIST. + */ + ret = -EEXIST; + goto out_unlock2; + } + + gprintk(XENLOG_DEBUG, + "argo: vm%u re-registering existing ring (vm%u:%x vm%u)\n", + currd->domain_id, ring_id.domain_id, ring_id.aport, + ring_id.partner_id); + } + + ret = find_ring_mfns(currd, ring_info, npage, gfn_hnd, reg.len); + if ( ret ) + { + gprintk(XENLOG_ERR, + "argo: vm%u failed to find ring mfns (vm%u:%x vm%u)\n", + currd->domain_id, ring_id.domain_id, ring_id.aport, + ring_id.partner_id); + + ring_remove_info(currd, ring_info); + goto out_unlock2; + } + + /* + * The first page of the memory supplied for the ring has the xen_argo_ring + * structure at its head, which is where the ring indexes reside. + */ + ret = ring_map_page(currd, ring_info, 0, &map_ringp); + if ( ret ) + { + gprintk(XENLOG_ERR, + "argo: vm%u failed to map ring mfn 0 (vm%u:%x vm%u)\n", + currd->domain_id, ring_id.domain_id, ring_id.aport, + ring_id.partner_id); + + ring_remove_info(currd, ring_info); + goto out_unlock2; + } + ringp = map_ringp; + + private_tx_ptr = read_atomic(&ringp->tx_ptr); + + if ( (private_tx_ptr >= reg.len) || + (ROUNDUP_MESSAGE(private_tx_ptr) != private_tx_ptr) ) + { + /* + * Since the ring is a mess, attempt to flush the contents of it + * here by setting the tx_ptr to the next aligned message slot past + * the latest rx_ptr we have observed. Handle ring wrap correctly. + */ + private_tx_ptr = ROUNDUP_MESSAGE(read_atomic(&ringp->rx_ptr)); + + if ( private_tx_ptr >= reg.len ) + private_tx_ptr = 0; + + update_tx_ptr(currd, ring_info, private_tx_ptr); + } + + ring_info->tx_ptr = private_tx_ptr; + ring_info->len = reg.len; + currd->argo->ring_count++; + + if ( send_info ) + { + spin_lock(&dst_d->argo->send_L2_lock); + + list_add(&send_info->node, + &dst_d->argo->send_hash[hash_index(&send_info->id)]); + + spin_unlock(&dst_d->argo->send_L2_lock); + } + + out_unlock2: + write_unlock(&currd->argo->rings_L2_rwlock); + + out_unlock: + read_unlock(&L1_global_argo_rwlock); + + out: + if ( dst_d ) + put_domain(dst_d); + + if ( ret ) + xfree(send_info); + + xfree(new_ring_info); + + return ret; +} + long do_argo_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1, XEN_GUEST_HANDLE_PARAM(void) arg2, unsigned long raw_arg3, unsigned long raw_arg4) { + struct domain *currd = current->domain; long rc; unsigned int arg3 = raw_arg3, arg4 = raw_arg4; @@ -556,6 +1002,33 @@ do_argo_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1, switch ( cmd ) { + case XEN_ARGO_OP_register_ring: + { + XEN_GUEST_HANDLE_PARAM(xen_argo_register_ring_t) reg_hnd = + guest_handle_cast(arg1, xen_argo_register_ring_t); + XEN_GUEST_HANDLE_PARAM(xen_argo_gfn_t) gfn_hnd = + guest_handle_cast(arg2, xen_argo_gfn_t); + /* arg3: npage, arg4: flags */ + + BUILD_BUG_ON(!IS_ALIGNED(XEN_ARGO_MAX_RING_SIZE, PAGE_SIZE)); + + if ( unlikely(arg3 > (XEN_ARGO_MAX_RING_SIZE >> PAGE_SHIFT)) ) + { + rc = -EINVAL; + break; + } + + /* Check array to allow use of the faster __copy operations later */ + if ( unlikely(!guest_handle_okay(gfn_hnd, arg3)) ) + { + rc = -EFAULT; + break; + } + + rc = register_ring(currd, reg_hnd, gfn_hnd, arg3, arg4); + break; + } + default: rc = -EOPNOTSUPP; break; diff --git a/xen/include/public/argo.h b/xen/include/public/argo.h index 530bb82c62..8b763e7159 100644 --- a/xen/include/public/argo.h +++ b/xen/include/public/argo.h @@ -33,9 +33,15 @@ #define XEN_ARGO_DOMID_ANY DOMID_INVALID +/* The maximum size of an Argo ring is defined to be: 16MB (0x1000000 bytes). */ +#define XEN_ARGO_MAX_RING_SIZE (0x1000000ULL) + /* Fixed-width type for "argo port" number. Nothing to do with evtchns. */ typedef uint32_t xen_argo_port_t; +/* gfn type: 64-bit fixed-width on all architectures */ +typedef uint64_t xen_argo_gfn_t; + typedef struct xen_argo_addr { xen_argo_port_t aport; @@ -61,4 +67,67 @@ typedef struct xen_argo_ring #endif } xen_argo_ring_t; +typedef struct xen_argo_register_ring +{ + xen_argo_port_t aport; + domid_t partner_id; + uint16_t pad; + uint32_t len; +} xen_argo_register_ring_t; + +/* Messages on the ring are padded to a multiple of this size. */ +#define XEN_ARGO_MSG_SLOT_SIZE 0x10 + +struct xen_argo_ring_message_header +{ + uint32_t len; + struct xen_argo_addr source; + uint32_t message_type; +#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L + uint8_t data[]; +#elif defined(__GNUC__) + uint8_t data[0]; +#endif +}; + +/* + * Hypercall operations + */ + +/* + * XEN_ARGO_OP_register_ring + * + * Register a ring using the guest-supplied memory pages. + * Also used to reregister an existing ring (eg. after resume from hibernate). + * + * The first argument struct indicates the port number for the ring to register + * and the partner domain, if any, that is to be allowed to send to the ring. + * A wildcard (XEN_ARGO_DOMID_ANY) may be supplied instead of a partner domid, + * and if the hypervisor has wildcard sender rings enabled, this will allow + * any domain (XSM notwithstanding) to send to the ring. + * + * The second argument is an array of guest frame numbers and the third argument + * indicates the size of the array. This operation only supports 4K-sized pages. + * + * arg1: XEN_GUEST_HANDLE(xen_argo_register_ring_t) + * arg2: XEN_GUEST_HANDLE(xen_argo_gfn_t) + * arg3: unsigned long npages + * arg4: unsigned long flags (32-bit value) + */ +#define XEN_ARGO_OP_register_ring 1 + +/* Register op flags */ +/* + * Fail exist: + * If set, reject attempts to (re)register an existing established ring. + * If clear, reregistration occurs if the ring exists, with the new ring + * taking the place of the old, preserving tx_ptr if it remains valid. + */ +#define XEN_ARGO_REGISTER_FLAG_FAIL_EXIST 0x1 + +#ifdef __XEN__ +/* Mask for all defined flags. */ +#define XEN_ARGO_REGISTER_FLAG_MASK XEN_ARGO_REGISTER_FLAG_FAIL_EXIST +#endif + #endif diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst index 16601d947f..349fbad38c 100644 --- a/xen/include/xlat.lst +++ b/xen/include/xlat.lst @@ -31,7 +31,9 @@ ! mc_physcpuinfo arch-x86/xen-mca.h ? page_offline_action arch-x86/xen-mca.h ? argo_addr argo.h +? argo_register_ring argo.h ? argo_ring argo.h +? argo_ring_message_header argo.h ? evtchn_alloc_unbound event_channel.h ? evtchn_bind_interdomain event_channel.h ? evtchn_bind_ipi event_channel.h -- generated by git-patchbot for /home/xen/git/xen.git#staging _______________________________________________ Xen-changelog mailing list Xen-changelog@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/xen-changelog
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |