[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [UNIKRAFT PATCH v4 10/12] plat/xen/drivers/net: Add transmit operation



On 8/21/20 1:49 PM, Simon Kuenzer wrote:
> On 21.08.20 11:32, Costin Lupu wrote:
>> On 8/20/20 6:49 PM, Simon Kuenzer wrote:
>>> On 13.08.20 10:53, Costin Lupu wrote:
>>>> Whenever a packet is transmitted, the request describing it is written
>>>> in the
>>>> shared ring. The packet itself is written in a page referenced by the
>>>> request
>>>> and shared, as well, with the backend. At the end of each transmit
>>>> operation, a
>>>> cleanup operation is performed free'ing the resources allocated for the
>>>> previously transmitted packets.
>>>>
>>>> Signed-off-by: Costin Lupu <costin.lupu@xxxxxxxxx>
>>>> Signed-off-by: Razvan Cojocaru <razvan.cojocaru93@xxxxxxxxx>
>>>> ---
>>>>    plat/xen/drivers/net/netfront.c | 102
>>>> ++++++++++++++++++++++++++++++++
>>>>    1 file changed, 102 insertions(+)
>>>>
>>>> diff --git a/plat/xen/drivers/net/netfront.c
>>>> b/plat/xen/drivers/net/netfront.c
>>>> index 858d124a..f2b81329 100644
>>>> --- a/plat/xen/drivers/net/netfront.c
>>>> +++ b/plat/xen/drivers/net/netfront.c
>>>> @@ -39,6 +39,7 @@
>>>>    #include <uk/alloc.h>
>>>>    #include <uk/netdev_driver.h>
>>>>    #include <xen-x86/mm.h>
>>>> +#include <xen-x86/irq.h>
>>>>    #include <xenbus/xenbus.h>
>>>>    #include "netfront.h"
>>>>    #include "netfront_xb.h"
>>>> @@ -78,6 +79,106 @@ static uint16_t get_id_from_freelist(uint16_t
>>>> *freelist)
>>>>        return id;
>>>>    }
>>>>    +static int network_tx_buf_gc(struct uk_netdev_tx_queue *txq)
>>>> +{
>>>> +    RING_IDX prod, cons;
>>>> +    netif_tx_response_t *tx_rsp;
>>>> +    uint16_t id;
>>>> +    bool more_to_do;
>>>> +    int count = 0;
>>>> +
>>>> +    do {
>>>> +        prod = txq->ring.sring->rsp_prod;
>>>> +        rmb(); /* Ensure we see responses up to 'rp'. */
>>>> +
>>>> +        for (cons = txq->ring.rsp_cons; cons != prod; cons++) {
>>>> +            tx_rsp = RING_GET_RESPONSE(&txq->ring, cons);
>>>> +
>>>> +            if (tx_rsp->status == NETIF_RSP_NULL)
>>>> +                continue;
>>>> +
>>>> +            if (tx_rsp->status == NETIF_RSP_ERROR)
>>>> +                uk_pr_err("packet error\n");
>>>
>>> "netdev%u: Transmission error on txq %u\n" ?
>>>
>>
>> Ack.
>>
>>>> +
>>>> +            id  = tx_rsp->id;
>>>> +            UK_ASSERT(id < NET_TX_RING_SIZE);
>>>> +
>>>> +            gnttab_end_access(txq->gref[id]);
>>>> +            txq->gref[id] = GRANT_INVALID_REF;
>>>> +
>>>> +            add_id_to_freelist(id, txq->freelist);
>>>> +            uk_semaphore_up(&txq->sem);
>>>> +
>>>> +            count++;
>>>> +        }
>>>> +
>>>> +        txq->ring.rsp_cons = prod;
>>>> +
>>>> +        RING_FINAL_CHECK_FOR_RESPONSES(&txq->ring, more_to_do);
>>>> +    } while (more_to_do);
>>>> +
>>>> +    return count;
>>>> +}
>>>> +
>>>> +static int netfront_xmit(struct uk_netdev *n,
>>>> +        struct uk_netdev_tx_queue *txq,
>>>> +        struct uk_netbuf *pkt)
>>>> +{
>>>> +    struct netfront_dev *nfdev;
>>>> +    unsigned long flags;
>>>> +    uint16_t id;
>>>> +    RING_IDX req_prod;
>>>> +    netif_tx_request_t *tx_req;
>>>> +    int notify;
>>>> +    int status = 0, count;
>>>> +
>>>> +    UK_ASSERT(n != NULL);
>>>> +    UK_ASSERT(txq != NULL);
>>>> +    UK_ASSERT(pkt != NULL);
>>>> +    UK_ASSERT(pkt->len < PAGE_SIZE);
>>>> +
>>>> +    nfdev = to_netfront_dev(n);
>>>> +
>>>> +    /* get request id */
>>>> +    uk_semaphore_down(&txq->sem);
>>>> +    local_irq_save(flags);
>>>> +    id = get_id_from_freelist(txq->freelist);
>>>
>>> What happens when we are out of space and the ring is full?
>>> The xmit function should be non-blocking so that we can use it in
>>> polling mode. We should return in such a case that the transmission
>>> queue is currently full...
>>>
>>
>> Ack.
>>
>>>> +    local_irq_restore(flags);
>>>> +
>>>> +    /* get request */
>>>> +    req_prod = txq->ring.req_prod_pvt;
>>>> +    tx_req = RING_GET_REQUEST(&txq->ring, req_prod); > +
>>>> +    /* setup grant for buffer data */
>>>> +    txq->gref[id] = tx_req->gref =
>>>> +        gnttab_grant_access(nfdev->xendev->otherend_id,
>>>> +            virt_to_mfn(pkt->data), 1);
>>>> +    UK_ASSERT(tx_req->gref != GRANT_INVALID_REF);
>>>> +
>>>> +    tx_req->offset = 0;
>>>> +    tx_req->size = (uint16_t) pkt->len;
>>>> +    tx_req->flags = 0;
>>>> +    tx_req->id = id;
>>>> +
>>>> +    txq->ring.req_prod_pvt = req_prod + 1;
>>>> +    wmb(); /* Ensure backend sees requests */
>>>> +
>>>> +    RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&txq->ring, notify);
>>>> +    if (notify)
>>>> +        notify_remote_via_evtchn(txq->evtchn); > +
>>>> +    status |= UK_NETDEV_STATUS_SUCCESS;
>>>> +
>>>> +    /* some cleanup */
>>>> +    local_irq_save(flags);
>>>> +    count = network_tx_buf_gc(txq);
>>>
>>> The clean-up should happen before enqueuing the given packet to the
>>> transmit queue, it is the first thing of xmit(). Since we anyway need to
>>> do it and we do not have an advantage by delaying this operation.
>>
>> Of course there is an advantage. You push the packets in order for them
>> to be processed ASAP by the backend and deal with cleaning up
>> afterwards. It's a kind of parallelization. Otherwise you would delay
>> the transmissions.
>>
>>> We can also deal much better with full rings. Otherwise we
>>> unnecessarily drop a packets for transmission although there would
>>> have been space when clean-up finished. After calling cleaning you
>>> can check if there is space on the ring and return if we run out of
>>> it (use an `unlikely` to the condition so that we speed up the
>>> successful sending case).
>>>
>>
>> The only scenario when this would solve anything would be when you have
>> the ring full and just before you decide to send a new packet the
>> backend starts processing your previous transmitted packets, leaving you
>> at least one free slot. There are 2 unlikely events here: first, the
>> ring is full, and second, the backend starts to process again when the
>> ring gets full (for the latter event, it's unlikely that the backend
>> will wake up given that it couldn't process 256 transmitted packets,
>> where 256 is the current number of slots). And for this very unlikely
>> context you would introduce a delay for all the transmissions if we
>> followed your suggestion.
>>
>> Anyway, both Linux and FreeBSD do the cleanup after pushing the
>> requests, so I don't think we have much more to say about this.
>>
> 
> Hum... I double-checked with Intel DPDK's ixgbe (10 Gig) driver:
> 
> http://git.dpdk.org/dpdk-stable/tree/drivers/net/ixgbe/ixgbe_rxtx.c?h=20.02#n230
> 
> 
> They clean the slots before sending. I derived my conclusion from there.
> However I noticed now that there is one minor difference: They do both -
> cleanup and enqueuing - before they notify the card about changes on the
> ring. In that case it does not add any delay if you swap the order of
> cleanup and enqueuing. In your case this is probably different because
> you can cleanup without notifying. Here, I agree with you that we add
> unnecessary xmit delays by swapping the order.
> 

ixgbe is a native driver, so there is no backend driver in that case.
It's a different scenario.

> However and in general, having a full ring is not an uncommon and
> unlikely case. To my experience, this can already happen with a single
> TCP connection if your network stack is feeding next packets fast enough
> and if the current TCP window allows to send them. It also depends on
> when the driver domain is scheduled; on which CPU core your netfront
> domain and the driver domain sit; and even on the underlying physical
> device that might be busy with other traffic.
> 

What I tried to say with the events probabilities explanation is that
you add delay for the common case in order to remove delay for the much
less probable corner case.

> Another thing we are having in mind for uknetdev - and this is probably
> different to Linux/BSD's implementation - we plan to change the xmit
> function to do batching (as DPDK does). Sharan could push virtio-net
> performance further with this. You are able to reduce the number of
> notifies to the backend because you enqueue multiple packets at once and
> due to locality of batching you can utilize CPU caches ways better.
> 
> What about this idea for netfront: We could do first a check for
> available space and - if needed - clean-up just as many extra slots that
> you need to complete the transmission (for now this is anyways just one
> slot). Then we proceed with enqueuing and do the cleanup of the rest of
> the slots after we notified the backend. What do you think?
> 

This will stay just as it is for the current patches. Feel free to
change anything after upstreaming these patches.

>>
>> Btw, I'm waiting for reviews on the other patches as well before I send
>> the v5.
> 
> Yes, sure.
> 
>>
>>
>>>> +    local_irq_restore(flags);
>>>> +
>>>> +    status |= likely(count > 0) ? UK_NETDEV_STATUS_MORE : 0x0;
>>>> +
>>>> +    return status;
>>>> +}
>>>> +
>>>>    static int netfront_rxq_enqueue(struct uk_netdev_rx_queue *rxq,
>>>>            struct uk_netbuf *netbuf)
>>>>    {
>>>> @@ -598,6 +699,7 @@ static int netfront_add_dev(struct xenbus_device
>>>> *xendev)
>>>>        }
>>>>          /* register netdev */
>>>> +    nfdev->netdev.tx_one = netfront_xmit;
>>>>        nfdev->netdev.ops = &netfront_ops;
>>>>        rc = uk_netdev_drv_register(&nfdev->netdev, drv_allocator,
>>>> DRIVER_NAME);
>>>>        if (rc < 0) {
>>>>
>>>
> 
> Thanks,
> 
> Simon
> 



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.