[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [UNIKRAFT PATCH v4 10/12] plat/xen/drivers/net: Add transmit operation



Hi Sharan,

Please see inline.

On 10/22/20 4:38 PM, Sharan Santhanam wrote:
> Hello Costin,
> 
> Please find the review comments inline:
> 
> Thanks & Regards
> 
> Sharan
> 
> On 8/21/20 3:04 PM, Simon Kuenzer wrote:
>> On 21.08.20 13:24, Costin Lupu wrote:
>>> On 8/21/20 1:49 PM, Simon Kuenzer wrote:
>>>> On 21.08.20 11:32, Costin Lupu wrote:
>>>>> On 8/20/20 6:49 PM, Simon Kuenzer wrote:
>>>>>> On 13.08.20 10:53, Costin Lupu wrote:
>>>>>>> Whenever a packet is transmitted, the request describing it is
>>>>>>> written
>>>>>>> in the
>>>>>>> shared ring. The packet itself is written in a page referenced by
>>>>>>> the
>>>>>>> request
>>>>>>> and shared, as well, with the backend. At the end of each transmit
>>>>>>> operation, a
>>>>>>> cleanup operation is performed free'ing the resources allocated
>>>>>>> for the
>>>>>>> previously transmitted packets.
>>>>>>>
>>>>>>> Signed-off-by: Costin Lupu <costin.lupu@xxxxxxxxx>
>>>>>>> Signed-off-by: Razvan Cojocaru <razvan.cojocaru93@xxxxxxxxx>
>>>>>>> ---
>>>>>>>     plat/xen/drivers/net/netfront.c | 102
>>>>>>> ++++++++++++++++++++++++++++++++
>>>>>>>     1 file changed, 102 insertions(+)
>>>>>>>
>>>>>>> diff --git a/plat/xen/drivers/net/netfront.c
>>>>>>> b/plat/xen/drivers/net/netfront.c
>>>>>>> index 858d124a..f2b81329 100644
>>>>>>> --- a/plat/xen/drivers/net/netfront.c
>>>>>>> +++ b/plat/xen/drivers/net/netfront.c
>>>>>>> @@ -39,6 +39,7 @@
>>>>>>>     #include <uk/alloc.h>
>>>>>>>     #include <uk/netdev_driver.h>
>>>>>>>     #include <xen-x86/mm.h>
>>>>>>> +#include <xen-x86/irq.h>
>>>>>>>     #include <xenbus/xenbus.h>
>>>>>>>     #include "netfront.h"
>>>>>>>     #include "netfront_xb.h"
>>>>>>> @@ -78,6 +79,106 @@ static uint16_t get_id_from_freelist(uint16_t
>>>>>>> *freelist)
>>>>>>>         return id;
>>>>>>>     }
>>>>>>>     +static int network_tx_buf_gc(struct uk_netdev_tx_queue *txq)
>>>>>>> +{
>>>>>>> +    RING_IDX prod, cons;
>>>>>>> +    netif_tx_response_t *tx_rsp;
>>>>>>> +    uint16_t id;
>>>>>>> +    bool more_to_do;
>>>>>>> +    int count = 0;
>>>>>>> +
>>>>>>> +    do {
>>>>>>> +        prod = txq->ring.sring->rsp_prod;
>>>>>>> +        rmb(); /* Ensure we see responses up to 'rp'. */
>>>>>>> +
>>>>>>> +        for (cons = txq->ring.rsp_cons; cons != prod; cons++) {
>>>>>>> +            tx_rsp = RING_GET_RESPONSE(&txq->ring, cons);
>>>>>>> +
>>>>>>> +            if (tx_rsp->status == NETIF_RSP_NULL)
>>>>>>> +                continue;
>>>>>>> +
>>>>>>> +            if (tx_rsp->status == NETIF_RSP_ERROR)
>>>>>>> +                uk_pr_err("packet error\n");
>>>>>>
>>>>>> "netdev%u: Transmission error on txq %u\n" ?
>>>>>>
>>>>>
>>>>> Ack.
>>>>>
>>>>>>> +
>>>>>>> +            id  = tx_rsp->id;
>>>>>>> +            UK_ASSERT(id < NET_TX_RING_SIZE);
>>>>>>> +
>>>>>>> +            gnttab_end_access(txq->gref[id]);
>>>>>>> +            txq->gref[id] = GRANT_INVALID_REF;
>>>>>>> +
>>>>>>> +            add_id_to_freelist(id, txq->freelist);
>>>>>>> +            uk_semaphore_up(&txq->sem);
> Shouldn't we free up the netbuf we receive from the ring?

Yeah, I wasn't aware of that either. In minios the tx buffers are reused
and therefore never freed.

For the next version I follow the approach in virtio net driver and keep
a reference to the netbuf in order to free it here.

>>>>>>> +
>>>>>>> +            count++;
>>>>>>> +        }
>>>>>>> +
>>>>>>> +        txq->ring.rsp_cons = prod;
>>>>>>> +
>>>>>>> + RING_FINAL_CHECK_FOR_RESPONSES(&txq->ring, more_to_do);
>>>>>>> +    } while (more_to_do);
>>>>>>> +
>>>>>>> +    return count;
>>>>>>> +}
>>>>>>> +
>>>>>>> +static int netfront_xmit(struct uk_netdev *n,
>>>>>>> +        struct uk_netdev_tx_queue *txq,
>>>>>>> +        struct uk_netbuf *pkt)
>>>>>>> +{
>>>>>>> +    struct netfront_dev *nfdev;
>>>>>>> +    unsigned long flags;
>>>>>>> +    uint16_t id;
>>>>>>> +    RING_IDX req_prod;
>>>>>>> +    netif_tx_request_t *tx_req;
>>>>>>> +    int notify;
>>>>>>> +    int status = 0, count;
>>>>>>> +
>>>>>>> +    UK_ASSERT(n != NULL);
>>>>>>> +    UK_ASSERT(txq != NULL);
>>>>>>> +    UK_ASSERT(pkt != NULL);
>>>>>>> +    UK_ASSERT(pkt->len < PAGE_SIZE);
>>>>>>> +
>>>>>>> +    nfdev = to_netfront_dev(n);
>>>>>>> +
>>>>>>> +    /* get request id */
>>>>>>> +    uk_semaphore_down(&txq->sem);
>>>>>>> +    local_irq_save(flags);
>>>>>>> +    id = get_id_from_freelist(txq->freelist);
>>>>>>
>>>>>> What happens when we are out of space and the ring is full?
>>>>>> The xmit function should be non-blocking so that we can use it in
>>>>>> polling mode. We should return in such a case that the transmission
>>>>>> queue is currently full...
>>>>>>
>>>>>
>>>>> Ack.
>>>>>
>>>>>>> +    local_irq_restore(flags);
>>>>>>> +
>>>>>>> +    /* get request */
>>>>>>> +    req_prod = txq->ring.req_prod_pvt;
>>>>>>> +    tx_req = RING_GET_REQUEST(&txq->ring, req_prod); > +
>>>>>>> +    /* setup grant for buffer data */
>>>>>>> +    txq->gref[id] = tx_req->gref =
>>>>>>> + gnttab_grant_access(nfdev->xendev->otherend_id,
>>>>>>> +            virt_to_mfn(pkt->data), 1);
>>>>>>> +    UK_ASSERT(tx_req->gref != GRANT_INVALID_REF);
>>>>>>> +
>>>>>>> +    tx_req->offset = 0;
>>>>>>> +    tx_req->size = (uint16_t) pkt->len;
>>>>>>> +    tx_req->flags = 0;
>>>>>>> +    tx_req->id = id;
>>>>>>> +
>>>>>>> +    txq->ring.req_prod_pvt = req_prod + 1;
>>>>>>> +    wmb(); /* Ensure backend sees requests */
>>>>>>> +
>>>>>>> + RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&txq->ring, notify);
>>>>>>> +    if (notify)
>>>>>>> +        notify_remote_via_evtchn(txq->evtchn); > +
>>>>>>> +    status |= UK_NETDEV_STATUS_SUCCESS;
>>>>>>> +
>>>>>>> +    /* some cleanup */
>>>>>>> +    local_irq_save(flags);
>>>>>>> +    count = network_tx_buf_gc(txq);
>>>>>>
>>>>>> The clean-up should happen before enqueuing the given packet to the
>>>>>> transmit queue, it is the first thing of xmit(). Since we anyway
>>>>>> need to
>>>>>> do it and we do not have an advantage by delaying this operation.
>>>>>
>>>>> Of course there is an advantage. You push the packets in order for
>>>>> them
>>>>> to be processed ASAP by the backend and deal with cleaning up
>>>>> afterwards. It's a kind of parallelization. Otherwise you would delay
>>>>> the transmissions.
>>>>>
>>>>>> We can also deal much better with full rings. Otherwise we
>>>>>> unnecessarily drop a packets for transmission although there would
>>>>>> have been space when clean-up finished. After calling cleaning you
>>>>>> can check if there is space on the ring and return if we run out of
>>>>>> it (use an `unlikely` to the condition so that we speed up the
>>>>>> successful sending case).
>>>>>>
>>>>>
>>>>> The only scenario when this would solve anything would be when you
>>>>> have
>>>>> the ring full and just before you decide to send a new packet the
>>>>> backend starts processing your previous transmitted packets,
>>>>> leaving you
>>>>> at least one free slot. There are 2 unlikely events here: first, the
>>>>> ring is full, and second, the backend starts to process again when the
>>>>> ring gets full (for the latter event, it's unlikely that the backend
>>>>> will wake up given that it couldn't process 256 transmitted packets,
>>>>> where 256 is the current number of slots). And for this very unlikely
>>>>> context you would introduce a delay for all the transmissions if we
>>>>> followed your suggestion.
>>>>>
>>>>> Anyway, both Linux and FreeBSD do the cleanup after pushing the
>>>>> requests, so I don't think we have much more to say about this.
>>>>>
>>>>
>>>> Hum... I double-checked with Intel DPDK's ixgbe (10 Gig) driver:
>>>>
>>>> http://git.dpdk.org/dpdk-stable/tree/drivers/net/ixgbe/ixgbe_rxtx.c?h=20.02#n230
>>>>
>>>>
>>>>
>>>> They clean the slots before sending. I derived my conclusion from
>>>> there.
>>>> However I noticed now that there is one minor difference: They do
>>>> both -
>>>> cleanup and enqueuing - before they notify the card about changes on
>>>> the
>>>> ring. In that case it does not add any delay if you swap the order of
>>>> cleanup and enqueuing. In your case this is probably different because
>>>> you can cleanup without notifying. Here, I agree with you that we add
>>>> unnecessary xmit delays by swapping the order.
>>>>
>>>
>>> ixgbe is a native driver, so there is no backend driver in that case.
>>> It's a different scenario.
>>
>> You could consider the card itself as a backend driver, but fine.
>> There is a ABI difference between ixgbe and netback.
>>
>>>
>>>> However and in general, having a full ring is not an uncommon and
>>>> unlikely case. To my experience, this can already happen with a single
>>>> TCP connection if your network stack is feeding next packets fast
>>>> enough
>>>> and if the current TCP window allows to send them. It also depends on
>>>> when the driver domain is scheduled; on which CPU core your netfront
>>>> domain and the driver domain sit; and even on the underlying physical
>>>> device that might be busy with other traffic.
>>>>
>>>
>>> What I tried to say with the events probabilities explanation is that
>>> you add delay for the common case in order to remove delay for the much
>>> less probable corner case.
>>
>> I got that and agreed.
>>
>>>
>>>> Another thing we are having in mind for uknetdev - and this is probably
>>>> different to Linux/BSD's implementation - we plan to change the xmit
>>>> function to do batching (as DPDK does). Sharan could push virtio-net
>>>> performance further with this. You are able to reduce the number of
>>>> notifies to the backend because you enqueue multiple packets at once
>>>> and
>>>> due to locality of batching you can utilize CPU caches ways better.
>>>>
>>>> What about this idea for netfront: We could do first a check for
>>>> available space and - if needed - clean-up just as many extra slots
>>>> that
>>>> you need to complete the transmission (for now this is anyways just one
>>>> slot). Then we proceed with enqueuing and do the cleanup of the rest of
>>>> the slots after we notified the backend. What do you think?
>>>>
>>>
>>> This will stay just as it is for the current patches. Feel free to
>>> change anything after upstreaming these patches.
>>
>> No, it won't stay like this. At least, you have to call clean-up when
>> the ring is full before you leave the non-blocking function. Otherwise
>> you end up in never getting space back on a full ring, but I think
>> this is clear.
>>
>>>
>>>>>
>>>>> Btw, I'm waiting for reviews on the other patches as well before I
>>>>> send
>>>>> the v5.
>>>>
>>>> Yes, sure.
>>>>
>>>>>
>>>>>
>>>>>>> +    local_irq_restore(flags);
>>>>>>> +
>>>>>>> +    status |= likely(count > 0) ? UK_NETDEV_STATUS_MORE : 0x0;
>>>>>>> +
>>>>>>> +    return status;
>>>>>>> +}
>>>>>>> +
>>>>>>>     static int netfront_rxq_enqueue(struct uk_netdev_rx_queue *rxq,
>>>>>>>             struct uk_netbuf *netbuf)
>>>>>>>     {
>>>>>>> @@ -598,6 +699,7 @@ static int netfront_add_dev(struct xenbus_device
>>>>>>> *xendev)
>>>>>>>         }
>>>>>>>           /* register netdev */
>>>>>>> +    nfdev->netdev.tx_one = netfront_xmit;
>>>>>>>         nfdev->netdev.ops = &netfront_ops;
>>>>>>>         rc = uk_netdev_drv_register(&nfdev->netdev, drv_allocator,
>>>>>>> DRIVER_NAME);
>>>>>>>         if (rc < 0) {
>>>>>>>
>>>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Simon
>>>>
> 



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.