[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [PATCH v3 10/15] argo: implement the notify op



Queries for data about space availability in registered rings and
causes notification to be sent when space has become available.

The hypercall op populates a supplied data structure with information about
ring state, and if insufficent space is currently available in a given ring,
the hypervisor will record the domain's expressed interest and notify it
when it observes that space has become available.

Checks for free space occur when this notify op is invoked, so it may be
intentionally invoked with no data structure to populate
(ie. a NULL argument) to trigger such a check and consequent notifications.

Limit the maximum number of notify requests in a single operation to a
simple fixed limit of 256.

Signed-off-by: Christopher Clark <christopher.clark6@xxxxxxxxxxxxxx>
---
v2 feedback Jan: drop cookie, implement teardown
v2 notify: add flag to indicate ring is shared
v2 argument name for fill_ring_data arg is now currd
v2 self: check ring size vs request and flag error rather than queue signal
v2 feedback Jan: drop 'message' from 'argo_message_op'
v2 self: simplify signal_domid, drop unnecessary label + goto
v2 self: skip the cookie check in pending_cancel
v2 self: implement npending limit on number of pending entries
v1 feedback #16 Jan: sanitize_ring in ringbuf_payload_space
v2 self: inline fill_ring_data_array
v2 self: avoid retesting dst_d for put_domain
v2 self/Jan: remove use of magic verification field and tidy up
v1 feedback #16 Jan: remove testing of magic in guest-supplied structure
v2 self: s/argo_pending_ent/pending_ent/g
v2 feedback v1#13 Roger: use OS-supplied roundup; drop from public header
v1,2 feedback Jan/Roger/Paul: drop errno returning guest access functions
v1 feedback Roger, Jan: drop argo prefix on static functions
v2 self: reduce indentation via goto out if arg NULL
v1 feedback #13 Jan: resolve checking of array handle and use of __copy

v1 #5 (#16) feedback Paul: notify op: use currd in do_argo_message_op
v1 #5 (#16) feedback Paul: notify op: use currd in argo_notify
v1 #5 (#16) feedback Paul: notify op: use currd in argo_notify_check_pending
v1 #5 (#16) feedback Paul: notify op: use currd in argo_fill_ring_data_array
v1 #13 (#16) feedback Paul: notify op: do/while: reindent only
v1 #13 (#16) feedback Paul: notify op: do/while: goto
v1 : add compat xlat.lst entries
v1: add definition for copy_field_from_guest_errno
v1 #13 feedback Jan: make 'ring data' comment comply with single-line style
v1 feedback #13 Jan: use __copy; so define and use __copy_field_to_guest_errno
v1: #13 feedback Jan: public namespace: prefix with xen
v1: #13 feedback Jan: add blank line after case in do_argo_message_op
v1: self: rename ent id to domain_id
v1: self: ent id-> domain_id
v1: self: drop signal if domain_cookie mismatches
v1. feedback #15 Jan: make loop i unsigned
v1. self: drop unnecessary mb() in argo_notify_check_pending
v1. self: add blank line
v1 #16 feedback Jan: const domain arg to +argo_fill_ring_data
v1. feedback #15 Jan: check unusued hypercall args are zero
v1 feedback #16 Jan: add comment on space available signal policy
v1. feedback #16 Jan: move declr, drop braces, lower indent
v1. feedback #18 Jan: meld the resource limits into the main commit
v1. feedback #16 Jan: clarify use of magic field
v1. self: use single copy to read notify ring data struct
v1: argo_fill_ring_data: fix dprintk types for port field
v1: self: use %x for printing port as per other print sites
v1. feedback Jan: add comments explaining ring full vs empty
v1. following Jan: fix argo_ringbuf_payload_space calculation for empty ring

 xen/common/argo.c         | 359 ++++++++++++++++++++++++++++++++++++++++++++++
 xen/include/public/argo.h |  67 +++++++++
 xen/include/xlat.lst      |   2 +
 3 files changed, 428 insertions(+)

diff --git a/xen/common/argo.c b/xen/common/argo.c
index 4548435..37eb291 100644
--- a/xen/common/argo.c
+++ b/xen/common/argo.c
@@ -29,6 +29,7 @@
 #include <public/argo.h>
 
 #define MAX_RINGS_PER_DOMAIN            128U
+#define MAX_NOTIFY_COUNT                256U
 #define MAX_PENDING_PER_RING             32U
 
 /* All messages on the ring are padded to a multiple of the slot size. */
@@ -43,6 +44,8 @@ DEFINE_XEN_GUEST_HANDLE(xen_argo_iov_t);
 DEFINE_XEN_GUEST_HANDLE(xen_argo_page_descr_t);
 DEFINE_XEN_GUEST_HANDLE(xen_argo_register_ring_t);
 DEFINE_XEN_GUEST_HANDLE(xen_argo_ring_t);
+DEFINE_XEN_GUEST_HANDLE(xen_argo_ring_data_t);
+DEFINE_XEN_GUEST_HANDLE(xen_argo_ring_data_ent_t);
 DEFINE_XEN_GUEST_HANDLE(xen_argo_send_addr_t);
 DEFINE_XEN_GUEST_HANDLE(xen_argo_unregister_ring_t);
 
@@ -231,6 +234,13 @@ static DEFINE_RWLOCK(argo_lock); /* L1 */
 #define argo_dprintk(format, ... ) ((void)0)
 #endif
 
+static struct argo_ring_info *
+ring_find_info(const struct domain *d, const struct argo_ring_id *id);
+
+static struct argo_ring_info *
+ring_find_info_by_match(const struct domain *d, uint32_t port,
+                        domid_t partner_id);
+
 /*
  * This hash function is used to distribute rings within the per-domain
  * hash tables (d->argo->ring_hash and d->argo_send_hash). The hash table
@@ -265,6 +275,17 @@ signal_domain(struct domain *d)
 }
 
 static void
+signal_domid(domid_t domain_id)
+{
+    struct domain *d = get_domain_by_id(domain_id);
+    if ( !d )
+        return;
+
+    signal_domain(d);
+    put_domain(d);
+}
+
+static void
 ring_unmap(struct argo_ring_info *ring_info)
 {
     unsigned int i;
@@ -473,6 +494,62 @@ get_sanitized_ring(xen_argo_ring_t *ring, struct 
argo_ring_info *ring_info)
     return 0;
 }
 
+static uint32_t
+ringbuf_payload_space(struct domain *d, struct argo_ring_info *ring_info)
+{
+    xen_argo_ring_t ring;
+    uint32_t len;
+    int32_t ret;
+
+    ASSERT(spin_is_locked(&ring_info->lock));
+
+    len = ring_info->len;
+    if ( !len )
+        return 0;
+
+    ret = get_sanitized_ring(&ring, ring_info);
+    if ( ret )
+        return 0;
+
+    argo_dprintk("sanitized ringbuf_payload_space: tx_ptr=%d rx_ptr=%d\n",
+                 ring.tx_ptr, ring.rx_ptr);
+
+    /*
+     * rx_ptr == tx_ptr means that the ring has been emptied, so return
+     * the maximum payload size that can be accepted -- see message size
+     * checking logic in the entry to ringbuf_insert which ensures that
+     * there is always one message slot (of size ROUNDUP_MESSAGE(1)) left
+     * available, preventing a ring from being entirely filled. This ensures
+     * that matching ring indexes always indicate an empty ring and not a
+     * full one.
+     * The subtraction here will not underflow due to minimum size constraints
+     * enforced on ring size elsewhere.
+     */
+    if ( ring.rx_ptr == ring.tx_ptr )
+        return len - sizeof(struct xen_argo_ring_message_header)
+                   - ROUNDUP_MESSAGE(1);
+
+    ret = ring.rx_ptr - ring.tx_ptr;
+    if ( ret < 0 )
+        ret += len;
+
+    /*
+     * The maximum size payload for a message that will be accepted is:
+     * (the available space between the ring indexes)
+     *    minus (space for a message header)
+     *    minus (space for one message slot)
+     * since ringbuf_insert requires that one message slot be left
+     * unfilled, to avoid filling the ring to capacity and confusing a full
+     * ring with an empty one.
+     * Since the ring indexes are sanitized, the value in ret is aligned, so
+     * the simple subtraction here works to return the aligned value needed:
+     */
+    ret -= sizeof(struct xen_argo_ring_message_header);
+    ret -= ROUNDUP_MESSAGE(1);
+
+    return (ret < 0) ? 0 : ret;
+}
+
 /*
  * iov_count returns its count on success via an out variable to avoid
  * potential for a negative return value to be used incorrectly
@@ -812,6 +889,61 @@ pending_remove_all(struct argo_ring_info *ring_info)
     ring_info->npending = 0;
 }
 
+static void
+pending_notify(struct hlist_head *to_notify)
+{
+    struct hlist_node *node, *next;
+    struct pending_ent *ent;
+
+    ASSERT(rw_is_locked(&argo_lock));
+
+    hlist_for_each_entry_safe(ent, node, next, to_notify, node)
+    {
+        hlist_del(&ent->node);
+        signal_domid(ent->domain_id);
+        xfree(ent);
+    }
+}
+
+static void
+pending_find(const struct domain *d, struct argo_ring_info *ring_info,
+             uint32_t payload_space, struct hlist_head *to_notify)
+{
+    struct hlist_node *node, *next;
+    struct pending_ent *ent;
+
+    ASSERT(rw_is_locked(&d->argo->lock));
+
+    /*
+     * TODO: Current policy here is to signal _all_ of the waiting domains
+     *       interested in sending a message of size less than payload_space.
+     *
+     * This is likely to be suboptimal, since once one of them has added
+     * their message to the ring, there may well be insufficient room
+     * available for any of the others to transmit, meaning that they were
+     * woken in vain, which created extra work just to requeue their wait.
+     *
+     * Retain this simple policy for now since it at least avoids starving a
+     * domain of available space notifications because of a policy that only
+     * notified other domains instead. Improvement may be possible;
+     * investigation required.
+     */
+
+    spin_lock(&ring_info->lock);
+    hlist_for_each_entry_safe(ent, node, next, &ring_info->pending, node)
+    {
+        if ( payload_space >= ent->len )
+        {
+            if ( ring_info->id.partner_id == XEN_ARGO_DOMID_ANY )
+                wildcard_pending_list_remove(ent->domain_id, ent);
+            hlist_del(&ent->node);
+            ring_info->npending--;
+            hlist_add_head(&ent->node, to_notify);
+        }
+    }
+    spin_unlock(&ring_info->lock);
+}
+
 static int
 pending_queue(struct argo_ring_info *ring_info, domid_t src_id,
               unsigned int len)
@@ -874,6 +1006,27 @@ pending_requeue(struct argo_ring_info *ring_info, domid_t 
src_id,
 }
 
 static void
+pending_cancel(struct argo_ring_info *ring_info, domid_t src_id)
+{
+    struct hlist_node *node, *next;
+    struct pending_ent *ent;
+
+    ASSERT(spin_is_locked(&ring_info->lock));
+
+    hlist_for_each_entry_safe(ent, node, next, &ring_info->pending, node)
+    {
+        if ( ent->domain_id == src_id )
+        {
+            if ( ring_info->id.partner_id == XEN_ARGO_DOMID_ANY )
+                wildcard_pending_list_remove(ent->domain_id, ent);
+            hlist_del(&ent->node);
+            xfree(ent);
+            ring_info->npending--;
+        }
+    }
+}
+
+static void
 wildcard_rings_pending_remove(struct domain *d)
 {
     struct hlist_node *node, *next;
@@ -994,6 +1147,92 @@ partner_rings_remove(struct domain *src_d)
 }
 
 static int
+fill_ring_data(const struct domain *currd,
+               XEN_GUEST_HANDLE(xen_argo_ring_data_ent_t) data_ent_hnd)
+{
+    xen_argo_ring_data_ent_t ent;
+    struct domain *dst_d;
+    struct argo_ring_info *ring_info;
+    int ret;
+
+    ASSERT(rw_is_locked(&argo_lock));
+
+    ret = __copy_from_guest(&ent, data_ent_hnd, 1) ? -EFAULT : 0;
+    if ( ret )
+        goto out;
+
+    argo_dprintk("fill_ring_data: ent.ring.domain=%u,ent.ring.port=%x\n",
+                 ent.ring.domain_id, ent.ring.port);
+
+    ent.flags = 0;
+
+    dst_d = get_domain_by_id(ent.ring.domain_id);
+    if ( dst_d )
+    {
+        if ( dst_d->argo )
+        {
+            read_lock(&dst_d->argo->lock);
+
+            ring_info = ring_find_info_by_match(dst_d, ent.ring.port,
+                                                currd->domain_id);
+            if ( ring_info )
+            {
+                uint32_t space_avail;
+
+                ent.flags |= XEN_ARGO_RING_DATA_F_EXISTS;
+                ent.max_message_size = ring_info->len -
+                                   sizeof(struct xen_argo_ring_message_header) 
-
+                                   ROUNDUP_MESSAGE(1);
+
+                if ( ring_info->id.partner_id == XEN_ARGO_DOMID_ANY )
+                    ent.flags |= XEN_ARGO_RING_DATA_F_SHARED;
+
+                spin_lock(&ring_info->lock);
+
+                space_avail = ringbuf_payload_space(dst_d, ring_info);
+
+                argo_dprintk("fill_ring_data: port=%x space_avail=%u"
+                             " space_wanted=%u\n",
+                             ring_info->id.port, space_avail,
+                             ent.space_required);
+
+                /* Do not queue a notification for an unachievable size */
+                if ( ent.space_required > ent.max_message_size )
+                    ent.flags |= XEN_ARGO_RING_DATA_F_EMSGSIZE;
+                else if ( space_avail >= ent.space_required )
+                {
+                    pending_cancel(ring_info, currd->domain_id);
+                    ent.flags |= XEN_ARGO_RING_DATA_F_SUFFICIENT;
+                }
+                else
+                {
+                    pending_requeue(ring_info, currd->domain_id,
+                                    ent.space_required);
+                    ent.flags |= XEN_ARGO_RING_DATA_F_PENDING;
+                }
+
+                spin_unlock(&ring_info->lock);
+
+                if ( space_avail == ent.max_message_size )
+                    ent.flags |= XEN_ARGO_RING_DATA_F_EMPTY;
+
+            }
+            read_unlock(&dst_d->argo->lock);
+        }
+        put_domain(dst_d);
+    }
+
+    ret = __copy_field_to_guest(data_ent_hnd, &ent, flags) ? -EFAULT : 0;
+    if ( ret )
+        goto out;
+
+    ret = __copy_field_to_guest(data_ent_hnd, &ent, max_message_size) ?
+                -EFAULT : 0;
+ out:
+    return ret;
+}
+
+static int
 find_ring_mfn(struct domain *d, gfn_t gfn, mfn_t *mfn)
 {
     p2m_type_t p2mt;
@@ -1526,6 +1765,111 @@ register_ring(struct domain *currd,
     return ret;
 }
 
+static void
+notify_ring(struct domain *d, struct argo_ring_info *ring_info,
+            struct hlist_head *to_notify)
+{
+    uint32_t space;
+
+    ASSERT(rw_is_locked(&argo_lock));
+    ASSERT(rw_is_locked(&d->argo->lock));
+
+    spin_lock(&ring_info->lock);
+
+    if ( ring_info->len )
+        space = ringbuf_payload_space(d, ring_info);
+    else
+        space = 0;
+
+    spin_unlock(&ring_info->lock);
+
+    if ( space )
+        pending_find(d, ring_info, space, to_notify);
+}
+
+static void
+notify_check_pending(struct domain *currd)
+{
+    unsigned int i;
+    HLIST_HEAD(to_notify);
+
+    ASSERT(rw_is_locked(&argo_lock));
+
+    read_lock(&currd->argo->lock);
+
+    for ( i = 0; i < ARGO_HTABLE_SIZE; i++ )
+    {
+        struct hlist_node *node, *next;
+        struct argo_ring_info *ring_info;
+
+        hlist_for_each_entry_safe(ring_info, node, next,
+                                  &currd->argo->ring_hash[i], node)
+        {
+            notify_ring(currd, ring_info, &to_notify);
+        }
+    }
+
+    read_unlock(&currd->argo->lock);
+
+    if ( !hlist_empty(&to_notify) )
+        pending_notify(&to_notify);
+}
+
+static long
+notify(struct domain *currd,
+       XEN_GUEST_HANDLE_PARAM(xen_argo_ring_data_t) ring_data_hnd)
+{
+    XEN_GUEST_HANDLE(xen_argo_ring_data_ent_t) ent_hnd;
+    xen_argo_ring_data_t ring_data;
+    int ret = 0;
+
+    read_lock(&argo_lock);
+
+    if ( !currd->argo )
+    {
+        argo_dprintk("!d->argo, ENODEV\n");
+        ret = -ENODEV;
+        goto out;
+    }
+
+    notify_check_pending(currd);
+
+    if ( guest_handle_is_null(ring_data_hnd) )
+        goto out;
+
+    ret = copy_from_guest(&ring_data, ring_data_hnd, 1) ? -EFAULT : 0;
+    if ( ret )
+        goto out;
+
+    if ( ring_data.nent > MAX_NOTIFY_COUNT )
+    {
+        gprintk(XENLOG_ERR,
+                "argo: notify entry count(%u) exceeds max(%u)\n",
+                ring_data.nent, MAX_NOTIFY_COUNT);
+        ret = -EACCES;
+        goto out;
+    }
+
+    ent_hnd = guest_handle_for_field(ring_data_hnd,
+                                     xen_argo_ring_data_ent_t, data[0]);
+    if ( unlikely(!guest_handle_okay(ent_hnd, ring_data.nent)) )
+    {
+        ret = -EFAULT;
+        goto out;
+    }
+
+    while ( !ret && ring_data.nent-- )
+    {
+        ret = fill_ring_data(currd, ent_hnd);
+        guest_handle_add_offset(ent_hnd, 1);
+    }
+
+ out:
+    read_unlock(&argo_lock);
+
+    return ret;
+}
+
 static long
 sendv(struct domain *src_d, const xen_argo_addr_t *src_addr,
       const xen_argo_addr_t *dst_addr,
@@ -1726,6 +2070,21 @@ do_argo_op(unsigned int cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg1,
         break;
     }
 
+    case XEN_ARGO_OP_notify:
+    {
+        XEN_GUEST_HANDLE_PARAM(xen_argo_ring_data_t) ring_data_hnd =
+                   guest_handle_cast(arg1, xen_argo_ring_data_t);
+
+        if ( unlikely((!guest_handle_is_null(arg2)) || arg3 || arg4) )
+        {
+            rc = -EINVAL;
+            break;
+        }
+
+        rc = notify(currd, ring_data_hnd);
+        break;
+    }
+
     default:
         rc = -EOPNOTSUPP;
         break;
diff --git a/xen/include/public/argo.h b/xen/include/public/argo.h
index 8f7d05d..25fed82 100644
--- a/xen/include/public/argo.h
+++ b/xen/include/public/argo.h
@@ -128,6 +128,42 @@ typedef struct xen_argo_unregister_ring
 /* Messages on the ring are padded to a multiple of this size. */
 #define XEN_ARGO_MSG_SLOT_SIZE 0x10
 
+/*
+ * Notify flags
+ */
+/* Ring is empty */
+#define XEN_ARGO_RING_DATA_F_EMPTY       (1U << 0)
+/* Ring exists */
+#define XEN_ARGO_RING_DATA_F_EXISTS      (1U << 1)
+/* Pending interrupt exists. Do not rely on this field - for profiling only */
+#define XEN_ARGO_RING_DATA_F_PENDING     (1U << 2)
+/* Sufficient space to queue space_required bytes exists */
+#define XEN_ARGO_RING_DATA_F_SUFFICIENT  (1U << 3)
+/* Insufficient ring size for space_required bytes */
+#define XEN_ARGO_RING_DATA_F_EMSGSIZE    (1U << 4)
+/* Ring is shared, not unicast */
+#define XEN_ARGO_RING_DATA_F_SHARED      (1U << 5)
+
+typedef struct xen_argo_ring_data_ent
+{
+    xen_argo_addr_t ring;
+    uint16_t flags;
+    uint16_t pad;
+    uint32_t space_required;
+    uint32_t max_message_size;
+} xen_argo_ring_data_ent_t;
+
+typedef struct xen_argo_ring_data
+{
+    uint32_t nent;
+    uint32_t pad;
+#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
+    xen_argo_ring_data_ent_t data[];
+#elif defined(__GNUC__)
+    xen_argo_ring_data_ent_t data[0];
+#endif
+} xen_argo_ring_data_t;
+
 struct xen_argo_ring_message_header
 {
     uint32_t len;
@@ -207,4 +243,35 @@ struct xen_argo_ring_message_header
  */
 #define XEN_ARGO_OP_sendv               3
 
+/*
+ * XEN_ARGO_OP_notify
+ *
+ * Asks Xen for information about other rings in the system.
+ *
+ * ent->ring is the xen_argo_addr_t of the ring you want information on.
+ * Uses the same ring matching rules as XEN_ARGO_OP_sendv.
+ *
+ * ent->space_required : if this field is not null then Xen will check
+ * that there is space in the destination ring for this many bytes of payload.
+ * If the ring is too small for the requested space_required, it will set the
+ * XEN_ARGO_RING_DATA_F_EMSGSIZE flag on return.
+ * If sufficient space is available, it will set 
XEN_ARGO_RING_DATA_F_SUFFICIENT
+ * and CANCEL any pending notification for that ent->ring; otherwise it
+ * will schedule a notification event and the flag will not be set.
+ *
+ * These flags are set by Xen when notify replies:
+ * XEN_ARGO_RING_DATA_F_EMPTY      ring is empty
+ * XEN_ARGO_RING_DATA_F_PENDING    notify event is pending *don't rely on this*
+ * XEN_ARGO_RING_DATA_F_SUFFICIENT sufficient space for space_required is there
+ * XEN_ARGO_RING_DATA_F_EXISTS     ring exists
+ * XEN_ARGO_RING_DATA_F_EMSGSIZE   space_required too large for the ring size
+ * XEN_ARGO_RING_DATA_F_SHARED     ring is registered for wildcard partner
+ *
+ * arg1: XEN_GUEST_HANDLE(xen_argo_ring_data_t) ring_data (may be NULL)
+ * arg2: NULL
+ * arg3: 0 (ZERO)
+ * arg4: 0 (ZERO)
+ */
+#define XEN_ARGO_OP_notify              4
+
 #endif
diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst
index 3723980..e45b60e 100644
--- a/xen/include/xlat.lst
+++ b/xen/include/xlat.lst
@@ -154,3 +154,5 @@
 ?      argo_unregister_ring            argo.h
 ?      argo_iov                        argo.h
 ?      argo_send_addr                  argo.h
+?      argo_ring_data_ent              argo.h
+?      argo_ring_data                  argo.h
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.