Xen project Mailing List

Re: [UNIKRAFT PATCH v4 10/12] plat/xen/drivers/net: Add transmit operation

To: Sharan Santhanam <sharan.santhanam@xxxxxxxxx>, Simon Kuenzer <simon.kuenzer@xxxxxxxxx>, minios-devel@xxxxxxxxxxxxx

From: Costin Lupu <costin.lupu@xxxxxxxxx>

Date: Fri, 23 Oct 2020 18:54:58 +0300

Delivery-date: Fri, 23 Oct 2020 15:55:09 +0000

List-id: Mini-os development list <minios-devel.lists.xenproject.org>

Hi Sharan, Please see inline. On 10/22/20 4:38 PM, Sharan Santhanam wrote: > Hello Costin, > > Please find the review comments inline: > > Thanks & Regards > > Sharan > > On 8/21/20 3:04 PM, Simon Kuenzer wrote: >> On 21.08.20 13:24, Costin Lupu wrote: >>> On 8/21/20 1:49 PM, Simon Kuenzer wrote: >>>> On 21.08.20 11:32, Costin Lupu wrote: >>>>> On 8/20/20 6:49 PM, Simon Kuenzer wrote: >>>>>> On 13.08.20 10:53, Costin Lupu wrote: >>>>>>> Whenever a packet is transmitted, the request describing it is >>>>>>> written >>>>>>> in the >>>>>>> shared ring. The packet itself is written in a page referenced by >>>>>>> the >>>>>>> request >>>>>>> and shared, as well, with the backend. At the end of each transmit >>>>>>> operation, a >>>>>>> cleanup operation is performed free'ing the resources allocated >>>>>>> for the >>>>>>> previously transmitted packets. >>>>>>> >>>>>>> Signed-off-by: Costin Lupu <costin.lupu@xxxxxxxxx> >>>>>>> Signed-off-by: Razvan Cojocaru <razvan.cojocaru93@xxxxxxxxx> >>>>>>> --- >>>>>>> plat/xen/drivers/net/netfront.c | 102 >>>>>>> ++++++++++++++++++++++++++++++++ >>>>>>> 1 file changed, 102 insertions(+) >>>>>>> >>>>>>> diff --git a/plat/xen/drivers/net/netfront.c >>>>>>> b/plat/xen/drivers/net/netfront.c >>>>>>> index 858d124a..f2b81329 100644 >>>>>>> --- a/plat/xen/drivers/net/netfront.c >>>>>>> +++ b/plat/xen/drivers/net/netfront.c >>>>>>> @@ -39,6 +39,7 @@ >>>>>>> #include <uk/alloc.h> >>>>>>> #include <uk/netdev_driver.h> >>>>>>> #include <xen-x86/mm.h> >>>>>>> +#include <xen-x86/irq.h> >>>>>>> #include <xenbus/xenbus.h> >>>>>>> #include "netfront.h" >>>>>>> #include "netfront_xb.h" >>>>>>> @@ -78,6 +79,106 @@ static uint16_t get_id_from_freelist(uint16_t >>>>>>> *freelist) >>>>>>> return id; >>>>>>> } >>>>>>> +static int network_tx_buf_gc(struct uk_netdev_tx_queue *txq) >>>>>>> +{ >>>>>>> + RING_IDX prod, cons; >>>>>>> + netif_tx_response_t *tx_rsp; >>>>>>> + uint16_t id; >>>>>>> + bool more_to_do; >>>>>>> + int count = 0; >>>>>>> + >>>>>>> + do { >>>>>>> + prod = txq->ring.sring->rsp_prod; >>>>>>> + rmb(); /* Ensure we see responses up to 'rp'. */ >>>>>>> + >>>>>>> + for (cons = txq->ring.rsp_cons; cons != prod; cons++) { >>>>>>> + tx_rsp = RING_GET_RESPONSE(&txq->ring, cons); >>>>>>> + >>>>>>> + if (tx_rsp->status == NETIF_RSP_NULL) >>>>>>> + continue; >>>>>>> + >>>>>>> + if (tx_rsp->status == NETIF_RSP_ERROR) >>>>>>> + uk_pr_err("packet error\n"); >>>>>> >>>>>> "netdev%u: Transmission error on txq %u\n" ? >>>>>> >>>>> >>>>> Ack. >>>>> >>>>>>> + >>>>>>> + id = tx_rsp->id; >>>>>>> + UK_ASSERT(id < NET_TX_RING_SIZE); >>>>>>> + >>>>>>> + gnttab_end_access(txq->gref[id]); >>>>>>> + txq->gref[id] = GRANT_INVALID_REF; >>>>>>> + >>>>>>> + add_id_to_freelist(id, txq->freelist); >>>>>>> + uk_semaphore_up(&txq->sem); > Shouldn't we free up the netbuf we receive from the ring? Yeah, I wasn't aware of that either. In minios the tx buffers are reused and therefore never freed. For the next version I follow the approach in virtio net driver and keep a reference to the netbuf in order to free it here. >>>>>>> + >>>>>>> + count++; >>>>>>> + } >>>>>>> + >>>>>>> + txq->ring.rsp_cons = prod; >>>>>>> + >>>>>>> + RING_FINAL_CHECK_FOR_RESPONSES(&txq->ring, more_to_do); >>>>>>> + } while (more_to_do); >>>>>>> + >>>>>>> + return count; >>>>>>> +} >>>>>>> + >>>>>>> +static int netfront_xmit(struct uk_netdev *n, >>>>>>> + struct uk_netdev_tx_queue *txq, >>>>>>> + struct uk_netbuf *pkt) >>>>>>> +{ >>>>>>> + struct netfront_dev *nfdev; >>>>>>> + unsigned long flags; >>>>>>> + uint16_t id; >>>>>>> + RING_IDX req_prod; >>>>>>> + netif_tx_request_t *tx_req; >>>>>>> + int notify; >>>>>>> + int status = 0, count; >>>>>>> + >>>>>>> + UK_ASSERT(n != NULL); >>>>>>> + UK_ASSERT(txq != NULL); >>>>>>> + UK_ASSERT(pkt != NULL); >>>>>>> + UK_ASSERT(pkt->len < PAGE_SIZE); >>>>>>> + >>>>>>> + nfdev = to_netfront_dev(n); >>>>>>> + >>>>>>> + /* get request id */ >>>>>>> + uk_semaphore_down(&txq->sem); >>>>>>> + local_irq_save(flags); >>>>>>> + id = get_id_from_freelist(txq->freelist); >>>>>> >>>>>> What happens when we are out of space and the ring is full? >>>>>> The xmit function should be non-blocking so that we can use it in >>>>>> polling mode. We should return in such a case that the transmission >>>>>> queue is currently full... >>>>>> >>>>> >>>>> Ack. >>>>> >>>>>>> + local_irq_restore(flags); >>>>>>> + >>>>>>> + /* get request */ >>>>>>> + req_prod = txq->ring.req_prod_pvt; >>>>>>> + tx_req = RING_GET_REQUEST(&txq->ring, req_prod); > + >>>>>>> + /* setup grant for buffer data */ >>>>>>> + txq->gref[id] = tx_req->gref = >>>>>>> + gnttab_grant_access(nfdev->xendev->otherend_id, >>>>>>> + virt_to_mfn(pkt->data), 1); >>>>>>> + UK_ASSERT(tx_req->gref != GRANT_INVALID_REF); >>>>>>> + >>>>>>> + tx_req->offset = 0; >>>>>>> + tx_req->size = (uint16_t) pkt->len; >>>>>>> + tx_req->flags = 0; >>>>>>> + tx_req->id = id; >>>>>>> + >>>>>>> + txq->ring.req_prod_pvt = req_prod + 1; >>>>>>> + wmb(); /* Ensure backend sees requests */ >>>>>>> + >>>>>>> + RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&txq->ring, notify); >>>>>>> + if (notify) >>>>>>> + notify_remote_via_evtchn(txq->evtchn); > + >>>>>>> + status |= UK_NETDEV_STATUS_SUCCESS; >>>>>>> + >>>>>>> + /* some cleanup */ >>>>>>> + local_irq_save(flags); >>>>>>> + count = network_tx_buf_gc(txq); >>>>>> >>>>>> The clean-up should happen before enqueuing the given packet to the >>>>>> transmit queue, it is the first thing of xmit(). Since we anyway >>>>>> need to >>>>>> do it and we do not have an advantage by delaying this operation. >>>>> >>>>> Of course there is an advantage. You push the packets in order for >>>>> them >>>>> to be processed ASAP by the backend and deal with cleaning up >>>>> afterwards. It's a kind of parallelization. Otherwise you would delay >>>>> the transmissions. >>>>> >>>>>> We can also deal much better with full rings. Otherwise we >>>>>> unnecessarily drop a packets for transmission although there would >>>>>> have been space when clean-up finished. After calling cleaning you >>>>>> can check if there is space on the ring and return if we run out of >>>>>> it (use an `unlikely` to the condition so that we speed up the >>>>>> successful sending case). >>>>>> >>>>> >>>>> The only scenario when this would solve anything would be when you >>>>> have >>>>> the ring full and just before you decide to send a new packet the >>>>> backend starts processing your previous transmitted packets, >>>>> leaving you >>>>> at least one free slot. There are 2 unlikely events here: first, the >>>>> ring is full, and second, the backend starts to process again when the >>>>> ring gets full (for the latter event, it's unlikely that the backend >>>>> will wake up given that it couldn't process 256 transmitted packets, >>>>> where 256 is the current number of slots). And for this very unlikely >>>>> context you would introduce a delay for all the transmissions if we >>>>> followed your suggestion. >>>>> >>>>> Anyway, both Linux and FreeBSD do the cleanup after pushing the >>>>> requests, so I don't think we have much more to say about this. >>>>> >>>> >>>> Hum... I double-checked with Intel DPDK's ixgbe (10 Gig) driver: >>>> >>>> http://git.dpdk.org/dpdk-stable/tree/drivers/net/ixgbe/ixgbe_rxtx.c?h=20.02#n230 >>>> >>>> >>>> >>>> They clean the slots before sending. I derived my conclusion from >>>> there. >>>> However I noticed now that there is one minor difference: They do >>>> both - >>>> cleanup and enqueuing - before they notify the card about changes on >>>> the >>>> ring. In that case it does not add any delay if you swap the order of >>>> cleanup and enqueuing. In your case this is probably different because >>>> you can cleanup without notifying. Here, I agree with you that we add >>>> unnecessary xmit delays by swapping the order. >>>> >>> >>> ixgbe is a native driver, so there is no backend driver in that case. >>> It's a different scenario. >> >> You could consider the card itself as a backend driver, but fine. >> There is a ABI difference between ixgbe and netback. >> >>> >>>> However and in general, having a full ring is not an uncommon and >>>> unlikely case. To my experience, this can already happen with a single >>>> TCP connection if your network stack is feeding next packets fast >>>> enough >>>> and if the current TCP window allows to send them. It also depends on >>>> when the driver domain is scheduled; on which CPU core your netfront >>>> domain and the driver domain sit; and even on the underlying physical >>>> device that might be busy with other traffic. >>>> >>> >>> What I tried to say with the events probabilities explanation is that >>> you add delay for the common case in order to remove delay for the much >>> less probable corner case. >> >> I got that and agreed. >> >>> >>>> Another thing we are having in mind for uknetdev - and this is probably >>>> different to Linux/BSD's implementation - we plan to change the xmit >>>> function to do batching (as DPDK does). Sharan could push virtio-net >>>> performance further with this. You are able to reduce the number of >>>> notifies to the backend because you enqueue multiple packets at once >>>> and >>>> due to locality of batching you can utilize CPU caches ways better. >>>> >>>> What about this idea for netfront: We could do first a check for >>>> available space and - if needed - clean-up just as many extra slots >>>> that >>>> you need to complete the transmission (for now this is anyways just one >>>> slot). Then we proceed with enqueuing and do the cleanup of the rest of >>>> the slots after we notified the backend. What do you think? >>>> >>> >>> This will stay just as it is for the current patches. Feel free to >>> change anything after upstreaming these patches. >> >> No, it won't stay like this. At least, you have to call clean-up when >> the ring is full before you leave the non-blocking function. Otherwise >> you end up in never getting space back on a full ring, but I think >> this is clear. >> >>> >>>>> >>>>> Btw, I'm waiting for reviews on the other patches as well before I >>>>> send >>>>> the v5. >>>> >>>> Yes, sure. >>>> >>>>> >>>>> >>>>>>> + local_irq_restore(flags); >>>>>>> + >>>>>>> + status |= likely(count > 0) ? UK_NETDEV_STATUS_MORE : 0x0; >>>>>>> + >>>>>>> + return status; >>>>>>> +} >>>>>>> + >>>>>>> static int netfront_rxq_enqueue(struct uk_netdev_rx_queue *rxq, >>>>>>> struct uk_netbuf *netbuf) >>>>>>> { >>>>>>> @@ -598,6 +699,7 @@ static int netfront_add_dev(struct xenbus_device >>>>>>> *xendev) >>>>>>> } >>>>>>> /* register netdev */ >>>>>>> + nfdev->netdev.tx_one = netfront_xmit; >>>>>>> nfdev->netdev.ops = &netfront_ops; >>>>>>> rc = uk_netdev_drv_register(&nfdev->netdev, drv_allocator, >>>>>>> DRIVER_NAME); >>>>>>> if (rc < 0) { >>>>>>> >>>>>> >>>> >>>> Thanks, >>>> >>>> Simon >>>> >

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.