[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [UNIKRAFT PATCH v4 10/12] plat/xen/drivers/net: Add transmit operation



On 21.08.20 11:32, Costin Lupu wrote:
On 8/20/20 6:49 PM, Simon Kuenzer wrote:
On 13.08.20 10:53, Costin Lupu wrote:
Whenever a packet is transmitted, the request describing it is written
in the
shared ring. The packet itself is written in a page referenced by the
request
and shared, as well, with the backend. At the end of each transmit
operation, a
cleanup operation is performed free'ing the resources allocated for the
previously transmitted packets.

Signed-off-by: Costin Lupu <costin.lupu@xxxxxxxxx>
Signed-off-by: Razvan Cojocaru <razvan.cojocaru93@xxxxxxxxx>
---
   plat/xen/drivers/net/netfront.c | 102 ++++++++++++++++++++++++++++++++
   1 file changed, 102 insertions(+)

diff --git a/plat/xen/drivers/net/netfront.c
b/plat/xen/drivers/net/netfront.c
index 858d124a..f2b81329 100644
--- a/plat/xen/drivers/net/netfront.c
+++ b/plat/xen/drivers/net/netfront.c
@@ -39,6 +39,7 @@
   #include <uk/alloc.h>
   #include <uk/netdev_driver.h>
   #include <xen-x86/mm.h>
+#include <xen-x86/irq.h>
   #include <xenbus/xenbus.h>
   #include "netfront.h"
   #include "netfront_xb.h"
@@ -78,6 +79,106 @@ static uint16_t get_id_from_freelist(uint16_t
*freelist)
       return id;
   }
   +static int network_tx_buf_gc(struct uk_netdev_tx_queue *txq)
+{
+    RING_IDX prod, cons;
+    netif_tx_response_t *tx_rsp;
+    uint16_t id;
+    bool more_to_do;
+    int count = 0;
+
+    do {
+        prod = txq->ring.sring->rsp_prod;
+        rmb(); /* Ensure we see responses up to 'rp'. */
+
+        for (cons = txq->ring.rsp_cons; cons != prod; cons++) {
+            tx_rsp = RING_GET_RESPONSE(&txq->ring, cons);
+
+            if (tx_rsp->status == NETIF_RSP_NULL)
+                continue;
+
+            if (tx_rsp->status == NETIF_RSP_ERROR)
+                uk_pr_err("packet error\n");

"netdev%u: Transmission error on txq %u\n" ?


Ack.

+
+            id  = tx_rsp->id;
+            UK_ASSERT(id < NET_TX_RING_SIZE);
+
+            gnttab_end_access(txq->gref[id]);
+            txq->gref[id] = GRANT_INVALID_REF;
+
+            add_id_to_freelist(id, txq->freelist);
+            uk_semaphore_up(&txq->sem);
+
+            count++;
+        }
+
+        txq->ring.rsp_cons = prod;
+
+        RING_FINAL_CHECK_FOR_RESPONSES(&txq->ring, more_to_do);
+    } while (more_to_do);
+
+    return count;
+}
+
+static int netfront_xmit(struct uk_netdev *n,
+        struct uk_netdev_tx_queue *txq,
+        struct uk_netbuf *pkt)
+{
+    struct netfront_dev *nfdev;
+    unsigned long flags;
+    uint16_t id;
+    RING_IDX req_prod;
+    netif_tx_request_t *tx_req;
+    int notify;
+    int status = 0, count;
+
+    UK_ASSERT(n != NULL);
+    UK_ASSERT(txq != NULL);
+    UK_ASSERT(pkt != NULL);
+    UK_ASSERT(pkt->len < PAGE_SIZE);
+
+    nfdev = to_netfront_dev(n);
+
+    /* get request id */
+    uk_semaphore_down(&txq->sem);
+    local_irq_save(flags);
+    id = get_id_from_freelist(txq->freelist);

What happens when we are out of space and the ring is full?
The xmit function should be non-blocking so that we can use it in
polling mode. We should return in such a case that the transmission
queue is currently full...


Ack.

+    local_irq_restore(flags);
+
+    /* get request */
+    req_prod = txq->ring.req_prod_pvt;
+    tx_req = RING_GET_REQUEST(&txq->ring, req_prod); > +
+    /* setup grant for buffer data */
+    txq->gref[id] = tx_req->gref =
+        gnttab_grant_access(nfdev->xendev->otherend_id,
+            virt_to_mfn(pkt->data), 1);
+    UK_ASSERT(tx_req->gref != GRANT_INVALID_REF);
+
+    tx_req->offset = 0;
+    tx_req->size = (uint16_t) pkt->len;
+    tx_req->flags = 0;
+    tx_req->id = id;
+
+    txq->ring.req_prod_pvt = req_prod + 1;
+    wmb(); /* Ensure backend sees requests */
+
+    RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&txq->ring, notify);
+    if (notify)
+        notify_remote_via_evtchn(txq->evtchn); > +
+    status |= UK_NETDEV_STATUS_SUCCESS;
+
+    /* some cleanup */
+    local_irq_save(flags);
+    count = network_tx_buf_gc(txq);

The clean-up should happen before enqueuing the given packet to the
transmit queue, it is the first thing of xmit(). Since we anyway need to
do it and we do not have an advantage by delaying this operation.

Of course there is an advantage. You push the packets in order for them
to be processed ASAP by the backend and deal with cleaning up
afterwards. It's a kind of parallelization. Otherwise you would delay
the transmissions.

We can also deal much better with full rings. Otherwise we
unnecessarily drop a packets for transmission although there would
have been space when clean-up finished. After calling cleaning you
can check if there is space on the ring and return if we run out of
it (use an `unlikely` to the condition so that we speed up the
successful sending case).


The only scenario when this would solve anything would be when you have
the ring full and just before you decide to send a new packet the
backend starts processing your previous transmitted packets, leaving you
at least one free slot. There are 2 unlikely events here: first, the
ring is full, and second, the backend starts to process again when the
ring gets full (for the latter event, it's unlikely that the backend
will wake up given that it couldn't process 256 transmitted packets,
where 256 is the current number of slots). And for this very unlikely
context you would introduce a delay for all the transmissions if we
followed your suggestion.

Anyway, both Linux and FreeBSD do the cleanup after pushing the
requests, so I don't think we have much more to say about this.


Hum... I double-checked with Intel DPDK's ixgbe (10 Gig) driver:

http://git.dpdk.org/dpdk-stable/tree/drivers/net/ixgbe/ixgbe_rxtx.c?h=20.02#n230

They clean the slots before sending. I derived my conclusion from there. However I noticed now that there is one minor difference: They do both - cleanup and enqueuing - before they notify the card about changes on the ring. In that case it does not add any delay if you swap the order of cleanup and enqueuing. In your case this is probably different because you can cleanup without notifying. Here, I agree with you that we add unnecessary xmit delays by swapping the order.

However and in general, having a full ring is not an uncommon and unlikely case. To my experience, this can already happen with a single TCP connection if your network stack is feeding next packets fast enough and if the current TCP window allows to send them. It also depends on when the driver domain is scheduled; on which CPU core your netfront domain and the driver domain sit; and even on the underlying physical device that might be busy with other traffic.

Another thing we are having in mind for uknetdev - and this is probably different to Linux/BSD's implementation - we plan to change the xmit function to do batching (as DPDK does). Sharan could push virtio-net performance further with this. You are able to reduce the number of notifies to the backend because you enqueue multiple packets at once and due to locality of batching you can utilize CPU caches ways better.

What about this idea for netfront: We could do first a check for available space and - if needed - clean-up just as many extra slots that you need to complete the transmission (for now this is anyways just one slot). Then we proceed with enqueuing and do the cleanup of the rest of the slots after we notified the backend. What do you think?


Btw, I'm waiting for reviews on the other patches as well before I send
the v5.

Yes, sure.



+    local_irq_restore(flags);
+
+    status |= likely(count > 0) ? UK_NETDEV_STATUS_MORE : 0x0;
+
+    return status;
+}
+
   static int netfront_rxq_enqueue(struct uk_netdev_rx_queue *rxq,
           struct uk_netbuf *netbuf)
   {
@@ -598,6 +699,7 @@ static int netfront_add_dev(struct xenbus_device
*xendev)
       }
         /* register netdev */
+    nfdev->netdev.tx_one = netfront_xmit;
       nfdev->netdev.ops = &netfront_ops;
       rc = uk_netdev_drv_register(&nfdev->netdev, drv_allocator,
DRIVER_NAME);
       if (rc < 0) {



Thanks,

Simon



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.