[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Minios-devel] [UNIKRAFT PATCH] lib/uknetdev: Unikraft Network API



Hi Razvan,

I commented inline.

On 18.07.2018 13:53, Razvan Cojocaru wrote:
Hey Simon,

I really appreciate your feedback, and I've added some notes to your comments below.

I'll post a new version, integrating your suggestions and adding some more
comments where it seems that the API wasn't clear enough.


În lun., 16 iul. 2018 la 11:54, Simon Kuenzer <simon.kuenzer@xxxxxxxxx <mailto:simon.kuenzer@xxxxxxxxx>> a scris:

    Hi Razvan,

    thanks a lot for your patch. It is a good start for providing a
    low-level & driver-independent network interface. See my comments
    inline:

    On 11.07.2018 16:39, Razvan Cojocaru wrote:
     > Introduces the Unikraft Network API, that acts as a generalised
     > interface between network drivers and network stack implementations
     > (or low level networking applications).
     >
     > Using the function definitions in netdev_core.h, the driver should
     > implement the functions in uk_netdev_opts and fill in the fields
     > from uk_netdev and uk_netdev_data.
     > The user-facing part of the API is in netdev.h. The network stack
     > glue code can use these functions to configure network devices, as
     > well as send/receive packets.
     >
     > Driver modes allow for explicit configuration of polling/interrupt
     > mechanisms.
     > The driver marks supported capabilities in the supported_modes flags,
     > while the user would select one of these supported modes when
     > setting up the driver.
     >
     > Inspired from from DPDK RTE Ethernet API.
     > IP utility functions taken from LWIP.

    Hum... do we really want to include the IP utility functions?
    I am rather expecting that TCP/IP stacks provide anyways their own
    and I
    would use them instead. It may avoid parsing addresses two times
    because
    data type incompatibilities. For the rare cases you do not have a
    stack,
    I would add a new library for IP conversions that provides just your
    utilities from lwIP.

    So, I was thinking if it may be better to provide a query interface
    that
    returns you the pointer to the actual data field. Since we do not know
    how the various drivers represent the extra configuration data, I would
    provide multiple variants for each type (e.g., a raw one and a string
    one). However, the driver would only return those that it actually has
    and would not parse them by its own.

    enum uk_netdev_econf_type {
             IPv4ADDR_INT,  /**< IPv4 address as raw int (4 bytes) */
             IPv4ADDR_STR,  /**< IPv4 address as null-terminated string */
             IPv4MASK_INT,  /**< IPv4 mask as raw int (4 bytes) */
             IPv4MASK_STR,  /**< IPv4 address as null-terminated string */
             IPv4GW_RAW,    /**< and so on... ;-) */
             IPv4GW_STR,
             IPv4DNS0_RAW,
             IPv4DNS0_STR,

             /*
              * This list is extensible in the future without needing
              * the drivers to adopt
              */
    }

    static inline
    const void *uk_netdev_econf_get(struct uk_netdev *dev,
                                      enum uk_netdev_econf_type econf)
    {
             if (!dev->econf_cb)
                     return NULL; /* driver does not provide
                                     any extra configuration */
             return dev->deconf_cb(dev, econf);
    }




    /**
       * Implemented by each driver
       */
    static const void *netfront_econf_get(struct uk_netdev *dev,
                                            enum uk_netdev_econf_type econf)
    {
             struct uk_netfront *nf;

             UK_ASSERT(dev->initialized);
             nf = _netdev_to_netfront(dev);
             UK_ASSERT(nf);

             switch (econf) {
             case IPv4ADDR_STR:
                     return nf->xenstore_econf.ipv4addr;
             case IPv4MASK_STR:
                     return nf->xenstore_econf.ipv4mask;
             case IPv4GW_STR:
                     return nf->xenstore_econf.ipv4gw;
             default:
                     break;
             }

             /* type not supported */
             return NULL;
    }

    What do you think?


Seems reasonable. Even though it adds some complexity on the user side,
it may be worth it in order to get rid of the IP utils.
I'll add it in the next version.

Since this is a quiet low-level interface to connect a stack with a driver, I think the introduced complexity is worth as long as it reduces overhead of the overall code (e.g., avoiding double parsing of IP addresses into different formats).


     >
     > Signed-off-by: Razvan Cojocaru <razvan.cojocaru93@xxxxxxxxx
    <mailto:razvan.cojocaru93@xxxxxxxxx>>
     > ---
     >   lib/Config.uk                         |   1 +
     >   lib/Makefile.uk                       |   1 +
     >   lib/uknetdev/Config.uk                |   7 +
     >   lib/uknetdev/Makefile.uk              |   7 +
     >   lib/uknetdev/include/uk/ip_addr.h     |  92 +++++++
     >   lib/uknetdev/include/uk/netdev.h      | 303 +++++++++++++++++++++++
     >   lib/uknetdev/include/uk/netdev_core.h | 308 +++++++++++++++++++++++
     >   lib/uknetdev/ip_addr.c                | 447
    ++++++++++++++++++++++++++++++++++
     >   lib/uknetdev/netdev.c                 | 182 ++++++++++++++
     >   9 files changed, 1348 insertions(+)
     >   create mode 100644 lib/uknetdev/Config.uk
     >   create mode 100644 lib/uknetdev/Makefile.uk
     >   create mode 100644 lib/uknetdev/include/uk/ip_addr.h
     >   create mode 100644 lib/uknetdev/include/uk/netdev.h
     >   create mode 100644 lib/uknetdev/include/uk/netdev_core.h
     >   create mode 100644 lib/uknetdev/ip_addr.c
     >   create mode 100644 lib/uknetdev/netdev.c
     >
     > diff --git a/lib/Config.uk b/lib/Config.uk
     > index e438603..003bd4f 100644
     > --- a/lib/Config.uk
     > +++ b/lib/Config.uk
     > @@ -37,3 +37,4 @@ source "lib/uklock/Config.uk"
     >   source "lib/ukmpi/Config.uk"
     >   source "lib/ukswrand/Config.uk"
     >   source "lib/ukbus/Config.uk"
     > +source "lib/uknetdev/Config.uk"
     > diff --git a/lib/Makefile.uk b/lib/Makefile.uk
     > index 40c65d0..bcf22f3 100644
     > --- a/lib/Makefile.uk
     > +++ b/lib/Makefile.uk
     > @@ -19,3 +19,4 @@ $(eval $(call
    _import_lib,$(CONFIG_UK_BASE)/lib/vfscore))
     >   $(eval $(call _import_lib,$(CONFIG_UK_BASE)/lib/uklock))
     >   $(eval $(call _import_lib,$(CONFIG_UK_BASE)/lib/ukmpi))
     >   $(eval $(call _import_lib,$(CONFIG_UK_BASE)/lib/ukbus))
     > +$(eval $(call _import_lib,$(CONFIG_UK_BASE)/lib/uknetdev))
     > diff --git a/lib/uknetdev/Config.uk b/lib/uknetdev/Config.uk
     > new file mode 100644
     > index 0000000..d1acdc0
     > --- /dev/null
     > +++ b/lib/uknetdev/Config.uk
     > @@ -0,0 +1,7 @@
     > +menuconfig LIBUKNETDEV
     > +     bool "uknetdev: Network driver interface"
     > +     default n
     > +     select LIBUKALLOC
     > +
     > +if LIBUKNETDEV
     > +endif
     > diff --git a/lib/uknetdev/Makefile.uk b/lib/uknetdev/Makefile.uk
     > new file mode 100644
     > index 0000000..d7502d8
     > --- /dev/null
     > +++ b/lib/uknetdev/Makefile.uk
     > @@ -0,0 +1,7 @@
     > +$(eval $(call addlib_s,libuknetdev,$(CONFIG_LIBUKNETDEV)))
     > +
     > +CINCLUDES-$(CONFIG_LIBUKNETDEV)              +=
    -I$(LIBUKNETDEV_BASE)/include
     > +CXXINCLUDES-$(CONFIG_LIBUKNETDEV)            +=
    -I$(LIBUKNETDEV_BASE)/include
     > +
     > +LIBUKBUS_SRCS-y += $(LIBUKNETDEV_BASE)/netdev.c
     > +LIBUKBUS_SRCS-y += $(LIBUKNETDEV_BASE)/ip_addr.c
     > diff --git a/lib/uknetdev/include/uk/ip_addr.h
    b/lib/uknetdev/include/uk/ip_addr.h
     > new file mode 100644
     > index 0000000..93ff8ac
     > --- /dev/null
     > +++ b/lib/uknetdev/include/uk/ip_addr.h
     > @@ -0,0 +1,92 @@
     > +/* SPDX-License-Identifier: BSD-3-Clause */
     > +/*
     > + * Authors: Razvan Cojocaru <razvan.cojocaru93@xxxxxxxxx
    <mailto:razvan.cojocaru93@xxxxxxxxx>>
     > + *
     > + * Copyright (c) 2018, NEC Europe Ltd., NEC Corporation. All
    rights reserved.
     > + *
     > + * Redistribution and use in source and binary forms, with or
    without
     > + * modification, are permitted provided that the following
    conditions
     > + * are met:
     > + *
     > + * 1. Redistributions of source code must retain the above copyright
     > + *    notice, this list of conditions and the following disclaimer.
     > + * 2. Redistributions in binary form must reproduce the above
    copyright
     > + *    notice, this list of conditions and the following
    disclaimer in the
     > + *    documentation and/or other materials provided with the
    distribution.
     > + * 3. Neither the name of the copyright holder nor the names of its
     > + *    contributors may be used to endorse or promote products
    derived from
     > + *    this software without specific prior written permission.
     > + *
     > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
    CONTRIBUTORS "AS IS"
     > + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
    LIMITED TO, THE
     > + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
    PARTICULAR PURPOSE
     > + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
    CONTRIBUTORS BE
     > + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
    EXEMPLARY, OR
     > + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
    PROCUREMENT OF
     > + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
    OR BUSINESS
     > + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
    WHETHER IN
     > + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
    OTHERWISE)
     > + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
    ADVISED OF THE
     > + * POSSIBILITY OF SUCH DAMAGE.
     > + *
     > + * THIS HEADER MAY NOT BE EXTRACTED OR MODIFIED IN ANY WAY.
     > + */
     > +
     > +/*
     > + * Copyright (c) 2001-2004 Swedish Institute of Computer Science.
     > + * All rights reserved.
     > + *
     > + * Redistribution and use in source and binary forms, with or
    without modification,
     > + * are permitted provided that the following conditions are met:
     > + *
     > + * 1. Redistributions of source code must retain the above
    copyright notice,
     > + *    this list of conditions and the following disclaimer.
     > + * 2. Redistributions in binary form must reproduce the above
    copyright notice,
     > + *    this list of conditions and the following disclaimer in
    the documentation
     > + *    and/or other materials provided with the distribution.
     > + * 3. The name of the author may not be used to endorse or
    promote products
     > + *    derived from this software without specific prior written
    permission.
     > + *
     > + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY
    EXPRESS OR IMPLIED
     > + * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
    WARRANTIES OF
     > + * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
    DISCLAIMED. IN NO EVENT
     > + * SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
    INCIDENTAL, SPECIAL,
     > + * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
    LIMITED TO, PROCUREMENT
     > + * OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
    PROFITS; OR BUSINESS
     > + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
    WHETHER IN
     > + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
    OTHERWISE) ARISING
     > + * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
    OF THE POSSIBILITY
     > + * OF SUCH DAMAGE.
     > + *
     > + * This file is part of the lwIP TCP/IP stack.
     > + *
     > + * Author: Adam Dunkels <adam@xxxxxxx <mailto:adam@xxxxxxx>>
     > + *
     > + */
     > +
     > +#ifndef __UK_IP_ADDR__
     > +#define __UK_IP_ADDR__
     > +
     > +#include <stdint.h>
     > +#include <stddef.h>
     > +#include <uk/assert.h>
     > +
     > +typedef union {
     > +     uint32_t ipv4_addr;     /**< IPv4 address in big endian. */
     > +     uint32_t ipv6_addr[4];  /**< IPv6 address in big endian. */
     > +} uk_ip_addr_t;
     > +
     > +typedef struct {
     > +     uk_ip_addr_t ip;
     > +     uk_ip_addr_t gateway;
     > +     uk_ip_addr_t netmask;
     > +} uk_ip_info_t;

    See my suggestion ahead. I am concerned that this would not fit to
    every
    driver (for instance, what about DNS addresses or VLAN tags)?


Replacing with query interface, got it.

     > +
     > +
     > +int uk_ip4addr_aton(const char *cp, uk_ip_addr_t *addr);
     > +char *uk_ip4addr_ntoa(const uk_ip_addr_t *addr, char *buf, int
    buflen);
     > +
     > +int uk_ip6addr_aton(const char *cp, uk_ip_addr_t *addr);
     > +char *uk_ip6addr_ntoa_r(const uk_ip_addr_t *addr, char *buf, int
    buflen);
     > +
     > +#endif //__UK_IP_ADDR__
     > diff --git a/lib/uknetdev/include/uk/netdev.h
    b/lib/uknetdev/include/uk/netdev.h
     > new file mode 100644
     > index 0000000..3128ffe
     > --- /dev/null
     > +++ b/lib/uknetdev/include/uk/netdev.h
     > @@ -0,0 +1,303 @@
     > +/* SPDX-License-Identifier: BSD-3-Clause */
     > +/*
     > + * Authors: Simon Kuenzer <simon.kuenzer@xxxxxxxxx
    <mailto:simon.kuenzer@xxxxxxxxx>>
     > + *          Razvan Cojocaru <razvan.cojocaru93@xxxxxxxxx
    <mailto:razvan.cojocaru93@xxxxxxxxx>>
     > + *
     > + * Copyright (c) 2010-2017 Intel Corporation
     > + * Copyright (c) 2018, NEC Europe Ltd., NEC Corporation. All
    rights reserved.
     > + *
     > + * Redistribution and use in source and binary forms, with or
    without
     > + * modification, are permitted provided that the following
    conditions
     > + * are met:
     > + *
     > + * 1. Redistributions of source code must retain the above copyright
     > + *    notice, this list of conditions and the following disclaimer.
     > + * 2. Redistributions in binary form must reproduce the above
    copyright
     > + *    notice, this list of conditions and the following
    disclaimer in the
     > + *    documentation and/or other materials provided with the
    distribution.
     > + * 3. Neither the name of the copyright holder nor the names of its
     > + *    contributors may be used to endorse or promote products
    derived from
     > + *    this software without specific prior written permission.
     > + *
     > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
    CONTRIBUTORS "AS IS"
     > + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
    LIMITED TO, THE
     > + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
    PARTICULAR PURPOSE
     > + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
    CONTRIBUTORS BE
     > + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
    EXEMPLARY, OR
     > + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
    PROCUREMENT OF
     > + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
    OR BUSINESS
     > + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
    WHETHER IN
     > + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
    OTHERWISE)
     > + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
    ADVISED OF THE
     > + * POSSIBILITY OF SUCH DAMAGE.
     > + *
     > + * THIS HEADER MAY NOT BE EXTRACTED OR MODIFIED IN ANY WAY.
     > + */

    Please add a note that you derived this header form DPDK (and which
    file
    from there).


Will do, I thought it would be enough to just add the Intel copyright.

In general you are right. I just want that we keep track where we got things from so that it may be easier later to understand why things are the way they are and where the came from.


     > +
     > +#ifndef __UK_NETDEV__
     > +#define __UK_NETDEV__
     > +
     > +/**
     > + * Unikraft Network API
     > + *
     > + * The Unikraft NET API provides a generalised interface between
    Unikraft

    s/generalised/generalized/


Generalised is the British form, but I'll stick to the US one if that is the current norm :)

Good point. ;-) Yeah, I followed more the US style, so far.


     > + * drivers and network stack implementations or low-level
    network applications.
     > + *
     > + * Most NET API functions take as parameter a reference to the
    corresponding
     > + * Unikraft Network Device (struct uk_netdev) which can be
    obtained with a call
     > + * to uk_netdev_get(). The network application should store this
    reference and
     > + * use it for all subsequent API calls.
     > + *
     > + * The functions exported by the Unikraft NET API to setup a device
     > + * designated by its ID must be invoked in the following order:
     > + *     - uk_netdev_configure()
     > + *     - uk_netdev_tx_queue_setup()
     > + *     - uk_netdev_rx_queue_setup()
     > + *     - uk_netdev_start()
     > + * If the network application wants to change configurations
    (call queue_setup
     > + * or configure again), it must call rte_eth_dev_stop() first to
    stop the
     > + * device and then do the reconfiguration before calling
    rte_eth_dev_start()
     > + * again. The transmit and receive functions should not be
    invoked when the
     > + * device is stopped.

    s/rte_eth_dev/uk_netdev_/


Replacing...

     > + */
     > +
     > +
     > +#include <stddef.h>
     > +#include <stdint.h>
     > +#include <stdio.h>
     > +#include <errno.h>
     > +#include <uk/list.h>
     > +#include <uk/alloc.h>
     > +#include <uk/print.h>
     > +#include <uk/assert.h>
     > +#include "netdev_core.h"
     > +
     > +#ifdef __cplusplus
     > +extern "C" {
     > +#endif
     > +
     > +/**
     > + * Get the number of available Unikraft Network devices.
     > + *
     > + * @return
     > + *   - (unsigned int): number of network devices.
     > + */
     > +unsigned int uk_netdev_count(void);
     > +
     > +/**
     > + * Get a reference to a Unikraft Network Device, based on its ID.
     > + * This reference should be saved by the application and used
    for subsequent
     > + * API calls.
     > + *
     > + * @param id
     > + *   The identifier of the Unikraft network device to configure.
     > + * @return
     > + *   - NULL: device not found in list
     > + *   - (struct uk_netdev *): reference to be passed to API calls
     > + */
     > +struct uk_netdev *uk_netdev_get(unsigned int id);
     > +
     > +/**
     > + * Configure an Unikraft network device.
     > + * This function must be invoked first before any other function
    in the
     > + * Unikraft NET API. This function can also be re-invoked when a
    device is
     > + * in the stopped state.
     > + *
     > + * @param dev
     > + *   The Unikraft Network Device.
     > + * @param eth_conf
     > + *   The pointer to the configuration data to be used for the
    Unikraft
     > + *   network device.
     > + *
     > + *   Embedding all configuration information in a single data
    structure
     > + *   is the more flexible method that allows the addition of new
    features
     > + *   without changing the syntax of the API.
     > + * @return
     > + *   - 0: Success, device configured.
     > + *   - <0: Error code returned by the driver configuration function.
     > + */
     > +int uk_netdev_configure(struct uk_netdev *dev,
     > +             const struct uk_netdev_conf *conf);
     > +
     > +/**
     > + * Start a Network device.
     > + *
     > + * The device start step is the last one and consists of setting
    the configured
     > + * offload features and in starting the transmit and the receive
    units of the
     > + * device.
     > + * On success, all basic functions exported by the Unikraft NET
    API (link
     > + * status, receive/transmit, and so on) can be invoked.
     > + *
     > + * @param dev
     > + *   The Unikraft Network Device.
     > + * @return
     > + *   - 0: Success, Unikraft network device started.
     > + *   - <0: Error code of the driver device start function.
     > + */
     > +int uk_netdev_start(struct uk_netdev *dev);
     > +
     > +/**
     > + * Stop an Unikraft network device. The device can be restarted
    with a call to
     > + * rte_eth_dev_start()

    uk_netdev_start()

     > + *
     > + * @param dev
     > + *   The Unikraft Network Device.
     > + */
     > +void uk_netdev_stop(struct uk_netdev *dev);
     > +
     > +/**
     > + * Close a stopped Unikraft network device.
     > + * The function frees all resources except for needed by the
    closed state.

    What does this mean exactly "except for needed by the closed state"?


It's mostly up to the driver, but it could also mean that you keep the netdev in the list
so you don't have to poll the bus again if you would want to reconfigure.

Got it. Maybe this could be re-phrased a bit? The device returns to the state "configured", right? Or is the device in unconfigured state after the stop call? I agree that the device should be kept in the netdev list (to get it up again later ;-) ).


     > + *
     > + * @param dev
     > + *   The Unikraft Network Device.
     > + */
     > +void uk_netdev_close(struct uk_netdev *dev);
     > +
     > +/**
     > + * Set the default MAC address.
     > + *
     > + * @param dev
     > + *   The Unikraft Network Device.
     > + * @param mac_addr
     > + *   New default MAC address.
    Does this mean we support multiple MAC addresses and one just get
    set as
    default? This might be actually possible for some NICs but I would just
    support only a single MAC address in the driver.


 We don't support multiple MACs for now. That's just a DPDK remnant I forgot to remove.

Sounds good.


     > + * @return
     > + *   - (0) if successful, or *mac_addr* didn't exist.
     > + *   - (-ENOTSUP) if hardware doesn't support.
     > + *   - (-ENODEV) if *id* invalid.
    What is an invalid id?


Removed. Forgot it there from previous version where we used netdev id's instead of references.

     > + *   - (-EINVAL) if MAC address is invalid.
     > + */
     > +int uk_netdev_mac_addr_set(struct uk_netdev *dev, struct
    ether_addr *mac_addr);

    Where is the interface to retrieve the current MAC address from the
    interface?

    I would also add a comment for promiscuous mode on/off so that we could
    implement this later. I would not suggest that a MAC address with just
    zeros or FFs would set this mode.


I'll add the DPDK methods for promiscuous mode on/off and some flag in the data. It won't be used in the first versions of the virtio/xen drivers, but still in the NET API for
future reference.

Actually, I agree, and it is fine to add stubs for promiscuous mode. They should return an error code if the driver does not support it (e.g., -ENOTSUP when a NULL pointer is registered as function callback).


     > +
     > +/**
     > + * Function that can be used by the driver to set an IP address
    to the network
     > + * device. The application or network stack is not forced to use
    this specific
     > + * IP information and can set another address using other methods.
     > + * Useful when the driver has access to more information,
    usually platform
     > + * specific.
     > + *
     > + * @param dev
     > + *   The Unikraft Network Device.
     > + * @param ip_info
     > + *   - (uk_ip_info_t *): IP layer information such as IP,
    netmask, gateway
     > + * @return
     > + *   - (0) if successful.
     > + *   - (-EINVAL) if IP information is invalid.
     > + */
     > +int uk_netdev_ip_set(struct uk_netdev *dev, uk_ip_info_t *ip_info);
     > +

    I am not getting how this function would be useful. I expect that
    network stacks are handling this by their own anyways. Why should I
    tell
    a network card driver which IP/IPs I am oging to use?


It was a function meant to be called by the drivers (optionally).
Will remove it along with the IP utils and it will be replaced by the query interface.

Hum... It looked like it as part of the user uk_netdev API. But this one should go away with the query interface, right? ;-)


     > +/**
     > + * Read the IP information set by the driver.
     > + *
     > + * @param dev
     > + *   The Unikraft Network Device.
     > + * @return ip_info
     > + *   - (uk_ip_info_t *): IP layer information such as IP,
    netmask, gateway
     > + *   - (NULL): if the driver didn't set any IP information.
     > + */
     > +uk_ip_info_t *uk_netdev_ip_get(struct uk_netdev *dev);

    I guess it is filled out with information found on the Xenstore.
    What if
    we are out of mem? Do you return an ERRPTR? As mentioned in the header,
    I would prefer a query interface to avoid possible double parsing of
    values.


Same as for the query interface, the whole IP mechanism would have limited error handling since it would be pretty much optional. It would just return NULL for anything going wrong.

Hum... Better to return an ERRPTR for cases that are going wrong. As an API user it may matter to know if EINVAL or ENOMEM was the reason.


     > +
     > +/**
     > + * Change the MTU of an Unikraft network device.
     > + *
     > + * @param dev
     > + *   The Unikraft Network Device.
     > + * @param mtu
     > + *   A uint16_t for the MTU to be applied.
     > + * @return
     > + *   - (0) if successful.
     > + *   - (-ENOTSUP) if operation is not supported.
     > + *   - (-ENODEV) if *id* invalid.

    Which id is invalid? Better to do UK_ASSERT() for checking dev != NULL.


  Will remove.

     > + *   - (-EIO) if device is removed.
     > + *   - (-EINVAL) if *mtu* invalid.
     > + *   - (-EBUSY) if operation is not allowed when the device is
    running
     > + */
     > +int uk_netdev_mtu_set(struct uk_netdev *dev, uint16_t mtu);
     > +
     > +/**
     > + * Allocate and set up a receive queue for an Unikraft network
    device.
     > + *
     > + * The function handles setup of receive callback for
    interrupt-based modes.
     > + *
     > + * @param dev
     > + *   The Unikraft Network Device.
     > + * @param rx_queue_id
     > + *   The index of the receive queue to set up.
     > + *   The value must be in the range [0, nb_rx_queue - 1]
    previously supplied
     > + *   to rte_eth_dev_configure().

    Replace rte_eth_dev... I will not mention it anymore for the rest of
    the
    patch: just use search & replace. ;-)


Searching and replacing as we speak.

;-)


     > + * @param rx_conf
     > + *   The pointer to the configuration data to be used for the
    receive queue.
     > + *   NULL value is allowed, in which case default RX configuration
     > + *   will be used.
     > + *   The *rx_conf* structure contains an *rx_thresh* structure
    with the values
     > + *   of the Prefetch, Host, and Write-Back threshold registers
    of the receive
     > + *   ring.
     > + *   In addition it contains the hardware offloads features to
    activate using
     > + *   the DEV_RX_OFFLOAD_* flags.
     > + * @return
     > + *   - 0: Success, receive queue correctly set up.
     > + *   - -EIO: if device is removed.
     > + */
     > +int uk_netdev_rx_queue_setup(struct uk_netdev *dev, uint16_t
    rx_queue_id,
     > +             const struct uk_netdev_rxconf *rx_conf);
     > +

    DPDK provided the option to set the number of queues that should be
    configured (I think you removed it?). I would even add another function
    call that you could use to query how many are supported at most. The
    queue_id is otherwise difficult to guess.
    Alternatively, it is also fine to say that we support only a single rx
    queue and single tx queue for now, since we do not have SMP yet. In
    this
    case you should remove the queue_id parameter everywhere. ;-)


Queues are not actually supported as of this version. There are some elements there in order to be able to easily introduce them in future versions without changing
the API too much (keep the same number of parameters, etc).
I would not remove the queue_id, since if you want to later introduce queues and add another
param to most functions it would break any form of backwards compatibility.

If you prefer keeping the queue ID, I would add it everywhere where it makes sense. I prefer completeness on the API.


     > +/**
     > + * Allocate and set up a transmit queue for an Unikraft network
    device.
     > + *
     > + * @param dev
     > + *   The Unikraft Network Device.
     > + * @param tx_queue_id
     > + *   The index of the transmit queue to set up.
     > + *   The value must be in the range [0, nb_tx_queue - 1]
    previously supplied
     > + *   to rte_eth_dev_configure().
     > + * @param tx_conf
     > + *   The pointer to the configuration data to be used for the
    transmit queue.
     > + *   NULL value is allowed, in which case default TX configuration
     > + *   will be used.
     > + * @return
     > + *   - 0: Success, the transmit queue is correctly set up.
     > + *   - -ENOMEM: Unable to allocate the transmit ring descriptors.
     > + */
     > +int uk_netdev_tx_queue_setup(struct uk_netdev *dev, uint16_t
    tx_queue_id,
     > +             const struct uk_netdev_txconf *tx_conf);
     > +
     > +/**
     > + * Basic RX function.

    Maybe call it a compatibility function.
    How is it going to be implemented later?

    Knowing DPDK, I would rather expect a function like:

    uint16_t uk_netdev_rx_burst(struct uk_netdev *dev, uint16_t queue_id,
                        struct uk_mbuf** rx_pkts, uint16_t nb_pkts);

    and then having your uk_netdev_rx() as wrapper, maybe generalized for
    all network functions. Packet flags (e.g., checksum offloading)
    would be
    part of each uk_mbuf.

I agree that it would be a much more extensible option using mbufs, but I think this initial version is pretty complex as-is, even with these simplified versions
of TX/RX functions.
Adding mbufs would introduce a lot more complexity, and I think should be added
in a later version of the API.

I would suggest to introduce a simple mbuf with just size and payload pointer. Actually because of the same reason you want to keep the queue_id. ;-) I agree that the initial mbuf would be really simple, extra flags can be added later to the struct.


     > + * @param dev
     > + *   The Unikraft Network Device.
     > + * @param data
     > + *   The data pointer where the packet will be placed by the driver.
     > + * @param max_len
     > + *   Maximum length of the packet.
     > + * @return
     > + *   - 0: No new packets
     > + *   - >0: Length of the received packet
     > + */
     > +uint16_t uk_netdev_rx(struct uk_netdev *dev, void *data,
    uint16_t max_len);

    How do I retrieve additional flags from the driver (e.g., TCP checksum
    is (not) okay or just partially calculated (VM-to-VM communication))? I
would prefer introducing a mbuf struct.
     > +
     > +/**
     > + * Basic TX function.
     > + * @param dev
     > + *   The Unikraft Network Device.
     > + * @param data
     > + *   Raw packet data (including Ethernet headers) to be sent by
    the driver.
     > + * @param len
     > + *   The length of the packet.
     > + * @return
     > + */
     > +uint16_t uk_netdev_tx(struct uk_netdev *dev, void *data,
    uint16_t len);

    See my comments on uk_netdev_rx() also for the tx function.

     > +
     > +/**
     > + * Add a new Unikraft network device in the device list.
     > + * Should be called by the driver in the configuration step.
     > + *
     > + * @param dev
     > + *   The Unikraft Network Device.
     > + */
     > +void uk_netdev_register(struct uk_netdev *dev);
     > +
     > +#ifdef __cplusplus
     > +}
     > +#endif
     > +
     > +#endif //__UK_NETDEV__
     > diff --git a/lib/uknetdev/include/uk/netdev_core.h
    b/lib/uknetdev/include/uk/netdev_core.h
     > new file mode 100644
     > index 0000000..aec5cf5
     > --- /dev/null
     > +++ b/lib/uknetdev/include/uk/netdev_core.h
     > @@ -0,0 +1,308 @@
     > +/* SPDX-License-Identifier: BSD-3-Clause */
     > +/*
     > + * Authors: Razvan Cojocaru <razvan.cojocaru93@xxxxxxxxx
    <mailto:razvan.cojocaru93@xxxxxxxxx>>
     > + *
     > + * Copyright (c) 2017 Intel Corporation
     > + * Copyright (c) 2018, NEC Europe Ltd., NEC Corporation. All
    rights reserved.
     > + *
     > + * Redistribution and use in source and binary forms, with or
    without
     > + * modification, are permitted provided that the following
    conditions
     > + * are met:
     > + *
     > + * 1. Redistributions of source code must retain the above copyright
     > + *    notice, this list of conditions and the following disclaimer.
     > + * 2. Redistributions in binary form must reproduce the above
    copyright
     > + *    notice, this list of conditions and the following
    disclaimer in the
     > + *    documentation and/or other materials provided with the
    distribution.
     > + * 3. Neither the name of the copyright holder nor the names of its
     > + *    contributors may be used to endorse or promote products
    derived from
     > + *    this software without specific prior written permission.
     > + *
     > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
    CONTRIBUTORS "AS IS"
     > + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
    LIMITED TO, THE
     > + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
    PARTICULAR PURPOSE
     > + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
    CONTRIBUTORS BE
     > + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
    EXEMPLARY, OR
     > + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
    PROCUREMENT OF
     > + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
    OR BUSINESS
     > + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
    WHETHER IN
     > + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
    OTHERWISE)
     > + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
    ADVISED OF THE
     > + * POSSIBILITY OF SUCH DAMAGE.
     > + *
     > + * THIS HEADER MAY NOT BE EXTRACTED OR MODIFIED IN ANY WAY.
     > + */
     > +
     > +#ifndef __UK_NETDEV_CORE__
     > +#define __UK_NETDEV_CORE__
     > +
     > +/**
     > + * Unikraft Network Device internal header.
     > + *
     > + * This header contains internal data types. But they are still
    part of the
     > + * public API because they are used by inline functions in the
    published API.
     > + *
     > + * The device data and operations are separated. This split
    allows the
     > + * function pointer and driver data to be per-process, while the
    actual
     > + * configuration data for the device is shared.
     > + */
     > +
     > +#include "ip_addr.h"
     > +
     > +
     > +#define NETDEV_MAX_QUEUES 10

    Where is this limit coming from?


I'll make the current limit 1 for now, to mark that we don't really support more queues
in this version.

Other question: Is this define currently used and needed? If not, take it out.


     > +#define NETDEV_NAME_MAX_LEN 64

    Why do we need a name? At least make this an menu-configurable optional
    parameter. You also may want to add a API function that returns the
    name
    set by the driver.


The name part is mostly the same as DPDK does it, and I think it could really help
in debugging purposes as humans tend to work better with names than ids.
Not having a name will obviously not break any functionality in the driver, worst
case you'll just end up getting some empty strings on the user-side.

I'll add an API function to return a NAME + ID string if you think that would be useful.

It is fine to have a name but I would say that this is an optional & configurable option (there should be an option in libuknetdev/Config.uk to enable names). The API function would probably return NULL if names are not supported.


     > +
     > +#define ETHER_ADDR_LEN 6 /**< Length of Ethernet address. */
     > +
     > +
     > +struct ether_addr {
     > +     uint8_t addr_bytes[ETHER_ADDR_LEN]; /**< Addr bytes in tx
    order */
     > +} __attribute__((__packed__));

    In <uk/essentials.h> we have a macro defintion for packed.


Replaced with macro.

     > +
     > +/**
     > + * A set of values to describe the possible states of an eth device.
     > + */
     > +enum uk_netdev_state {
     > +     UK_NETDEV_UNCONFIGURED = 0,
     > +     UK_NETDEV_CONFIGURED,
     > +     UK_NETDEV_RUNNING,
     > +};
     > +
     > +
     > +/**
     > + * @defgroup driver_mode Driver Receive Mode
     > + *
     > + * Driver modes provide a method of shifting complexity from
    driver to
     > + * network stack/application or otherwise.
     > + * Polling drivers would offer more control over TX/RX but with
    increased
     > + * complexity to network stack ports, while interrupt-based
    drivers would be
     > + * simpler to use for applications.
     > + * A driver should set implemented capabilities as
    supported_modes and the
     > + * network application requests the specific mode via
    requested_mode in the
     > + * receive queue setup configurations.
     > + * @{
     > + */
     > +
     > +/** Basic polling mode driver. Provides RX/TX functions and it's
    the user's
     > + * responsibility when/how to call them.
     > + */
     > +#define UK_NETDEV_MODE_POLLING   0x01U
     > +
     > +/** Interrupt/Event mode driver. The user defines a callback
    which is called
     > + * by the driver when a new packet is received.
     > + */
     > +#define UK_NETDEV_MODE_INTERRUPT 0x02U
     > +
     > +/** Mixed Polling/Interrupt mode. Hybrid mode that continuously
    polls the
     > + * driver, and after a number of unsuccessful polls enters in
    sleep mode
     > + * and enables interrupts. From a user perspective, similar to
    Interrupt
     > + * mode.
     > + */
     > +#define UK_NETDEV_MODE_HYBRID    0x04U

    Maybe you want an enum instead? You can't set all modes at the same
    time, right?
    Hum... I am still thinking that the modes requires drivers to implement
    policies and not just the mechanisms (e.g., the threshold when to
    decide
    to stop polling and witching to interrupt mode). Can't we add a
    functions to the API that disable and enable interrupts on a queue, so
    it is up to the uk_netdev programmer how the policy should look like? I
    would prefer something like:

       uk_netdev_rx_queue_disable_irq(struct uk_netdev *n, uint16_t
    queue_id);
       uk_netdev_rx_queue_enable_irq(struct uk_netdev *n, uint16_t
    queue_id);

       uk_netdev_tx_queue_disable_irq(struct uk_netdev *n, uint16_t
    queue_id);
       uk_netdev_tx_queue_enable_irq(struct uk_netdev *n,uint16_t queue_id);

    As part of bringing the device up, you would enable interrupts (if you
    registered a callback function). Default behavior should be interrupts
    having off.

    Those could be called by the network stack/application whenever needed.
    As part of lwIP driver code, you could then do there set the threshold
    between pooling and non-polling.

    Do you know how Linux or BSD solved this?


I'll remove the driver mode in the next version and just leave it up to the user
when to enable/disable interrupts.

I think we are not loosing any functionality by doing this. ;-)


     > +
     > +/**
     > + * @}
     > + */
     > +
     > +/**
     > + * A structure used to configure an Unikraft network device.
     > + */
     > +struct uk_netdev_conf {
     > +     uint8_t requested_mode;
     > +};
     > +
     > +UK_TAILQ_HEAD(uk_netdev_list, struct uk_netdev);
     > +
     > +#define UK_NETDEV_LIST_FOREACH(b)                    \
     > +     UK_TAILQ_FOREACH(b, &uk_netdev_list, next)
     > +
     > +/**
     > + * Function type used for RX packet processing packet callbacks.
     > + *
     > + * The callback function is called on RX with a packet that has
    been received
     > + * on the given device and queue.
     > + *
     > + * @param id
     > + *   The identifier of the device on which RX is being performed.
     > + * @param queue
     > + *   The queue on the Unikraft network device which is being
    used to receive
     > + *   the packets.
     > + * @param data
     > + *   Content of the received packet.
     > + * @param len
     > + *   Length of the packet data.
     > + * @return
     > + *   The number of packets returned to the user.
     > + */
     > +typedef void (*rx_callback_fn)(uint16_t id, uint16_t queue, void
    *data,
     > +                                                        uint16_t
    len);

    Hum, I wouldn't send the packet with the callback. I would just notify
    which interface it was (and which queue) and let the user call
    uk_netdev_rx() within the callback (or later) for emptying the queue:

    typedef void (*uk_netdev_rx_callback_fn)(struct uk_netdev *n,
                                              uint16_t queue);


Ok.

    Btw, as long as this function did not return, there should be no more
    interrupts fired by the NIC for the specific queue, right?

    I also miss a calback for TX that is fired when a transmission is
    completed. This would enable async TX which can make a difference on
    high traffic load. It is fine to add it later but then there should
    be a
    comment.


There is a pre-TX, post-RX callback mechanism in DPDK which I had commented-out.
I'm not sure if that is the thing you were referring.

Hum, I was actually only refering to post-TX and post-RX. I would even fire only a single interrupt when a burst of packets were sent - actually dependent on the number of pkts that were given with a burst call (but I am not sure how DPDK does this today). pre-TX, and pre-RX, we can add later if needed.


     > +
     > +/**
     > + * A structure used to configure an Unikraft network device.
     > + */

    Maybe you want to call them "struct uk_netdev_(rx|tx)queue_conf":


Will rename.

     > +struct uk_netdev_rxconf {
     > +     rx_callback_fn rx_cb;
     > +};
     > +
     > +/**
     > + * A structure used to configure an Unikraft network device.
     > + */
     > +struct uk_netdev_txconf {
     > +};
     > +
     > +
     > +typedef int  (*uk_netdev_configure_t)(struct uk_netdev *dev);
     > +/**< @internal Unikraft network device configuration. */
     > +
     > +typedef int  (*uk_netdev_start_t)(struct uk_netdev *dev);
     > +/**< @internal Function used to start a configured Unikraft
    network device. */
     > +
     > +typedef void (*uk_netdev_stop_t)(struct uk_netdev *dev);
     > +/**< @internal Function used to stop a configured Unikraft
    network device. */
     > +
     > +typedef void (*uk_netdev_close_t)(struct uk_netdev *dev);
     > +/**< @internal Function used to close a configured Unikraft
    network device. */
     > +
     > +typedef void (*uk_netdev_mac_addr_remove_t)(struct uk_netdev *dev);
     > +/**< @internal Remove MAC address*/
     > +
     > +typedef void (*uk_netdev_mac_addr_set_t)(struct uk_netdev *dev,
     > +             struct ether_addr *mac_addr);
     > +/**< @internal Set the MAC address */

    So, you support multiple MACs, right? ;-) If we restrict it to a single
    one (which is fine for now), you do not need the remove funciton.


Yes. Will remove it.

     > +
     > +typedef int (*uk_netdev_mtu_set_t)(struct uk_netdev *dev,
    uint16_t mtu);
     > +/**< @internal Set MTU. */
     > +
     > +typedef int (*uk_netdev_rx_queue_setup_t)(struct uk_netdev *dev,
     > +             uint16_t rx_queue_id, const struct uk_netdev_rxconf
    *rx_conf);
     > +/**< @internal Set up a receive queue of an Unikraft network
    device. */
     > +
     > +typedef int (*uk_netdev_tx_queue_setup_t)(struct uk_netdev *dev,
     > +             uint16_t tx_queue_id, const struct uk_netdev_txconf
    *tx_conf);
     > +/**< @internal Setup a transmit queue of an Unikraft network
    device. */
     > +
     > +typedef void (*uk_netdev_queue_release_t)(void *queue);
     > +/**< @internal Release memory resources allocated by given RX/TX
    queue. */
     > +
     > +typedef int (*uk_netdev_rx_enable_intr_t)(struct uk_netdev *dev,
     > +             uint16_t rx_queue_id);
     > +/**< @internal Enable interrupt of a receive queue of an
     > + * Unikraft network device.
     > + */

    Interesting, here it is... You do not need the mode then, right? ;-)


Consider the driver modes gone.

;-)


     > +
     > +typedef int (*uk_netdev_rx_disable_intr_t)(struct uk_netdev *dev,
     > +             uint16_t rx_queue_id);
     > +/**< @internal Disable interrupt of a receive queue of an
     > + * Unikraft network device.
     > + */
     > +
     > +
     > +typedef uint16_t (*uk_netdev_rx_t)(struct uk_netdev *dev, void
    *data,
     > +             uint16_t len);
     > +/**< @internal Retrieve one input packet from an Unikraft
    network device. */

    Do you think that you need this implemented by the driver as
    callback? I
    think we can provide sending/receiving of one packet by the generic
    layer of libuknetdev and base the implementation on
    uk_netdev_(rx|tx)_burst()


Implementing this in the generic layer of libuknetdev kills the whole purpose of it.
For some use-cases, you may not be that concerned with performance and for
compatibility sake, a one-packet-at-a-time approach might be enough.
This way, you could have a very simple interface for a basic driver and a quick
network stack port may use this basic rx|tx mechanism.

Obviously, for more complex systems you would want to use burst tx|rx, but for these stages that seems overkill and could slow down the whole development process.

I'm a fan of simple things that work :-)

I prefer the same thing. ;-) But you have a good point: Having an optimized & less complicated rx and tx function for a single packet provided by the driver may be benefitial for applications/stacks that only make use of this type of receiving/sending. Hum. Then let us provide only the single pkt tx/rx function as callback to the ops struct (or the other as discussed below) for now. We will add burst tx & rx, later. This may include adding generic wrappers that drivers could use if they implement only one of the both variants (both ways: burst->single, single->burst). But please use struct uk_mbuf for these API calls. What do you think?


     > +
     > +typedef uint16_t (*uk_netdev_tx_t)(struct uk_netdev *dev, void
    *data,
     > +             uint16_t len);
     > +/**< @internal Send one output packet to an Unikraft network
    device. */
     > +
     > +typedef uint16_t (*uk_netdev_rx_burst_t)(void *rxq,
     > +             void **rx_pkts, uint16_t nb_pkts);
     > +/**< @internal Retrieve input packets from a receive queue of an
     > + * Unikraft network device.
     > + */
     > +
     > +typedef uint16_t (*uk_netdev_tx_burst_t)(void *txq, void **tx_pkts,
     > +             uint16_t nb_pkts);
     > +/**< @internal Send output packets on a transmit queue of an
     > + * Unikraft network device.
     > + */
     > +
     > +

Please remove then any declaration that we do not support for now
(e.g., burst tx, rx functions).

     > +/**
     > + * @internal A structure containing the functions exported by a
    driver.
     > + */
     > +struct uk_netdev_ops {
     > +     uk_netdev_configure_t      dev_configure; /**< Configure
    device. */
     > +     uk_netdev_start_t          dev_start;     /**< Start device. */
     > +     uk_netdev_stop_t           dev_stop;      /**< Stop device. */
     > +     uk_netdev_close_t          dev_close;     /**< Close device. */
     > +
     > +     uk_netdev_mac_addr_set_t   mac_addr_set;  /**< Set a MAC
    address. */
     > +     uk_netdev_mtu_set_t        mtu_set;       /**< Set MTU. */
     > +
     > +     /**< Set up device RX queue. */
     > +     uk_netdev_rx_queue_setup_t  rx_queue_setup;
     > +     /**< Release RX queue. */
     > +     uk_netdev_queue_release_t   rx_queue_release;
     > +
     > +     /**< Set up device TX queue. */
     > +     uk_netdev_tx_queue_setup_t  tx_queue_setup;
     > +     /**< Release TX queue. */
     > +     uk_netdev_queue_release_t   tx_queue_release;
     > +};
     > +
     > +/**
     > + * @internal
     > + * The data part, with no function pointers, associated with each
     > + * network device.
     > + *
     > + * This structure is safe to place in shared memory to be common
    among different
     > + * processes in a multi-process configuration.
     > + */
     > +struct uk_netdev_data {
     > +     char name[NETDEV_NAME_MAX_LEN]; /**< Unique identifier name */
     > +     uint16_t id;           /**< Device [external] port
    identifier. */

    Is this needed?

     > +
     > +     void *rx_queue; /**< Pointer to RX queue. */
     > +     void *tx_queue; /**< Pointer to TX queue. */

    Shouldn't this be part of the drivers internal representation? The
    callbacks would handle this.


This is stubish since we don't really support multi-queues at the moment.
I'll remove them if you think they seem weird. I just added them from DPDK
for future reference.

Hum, good point. They probably used it to reference to the tx_queue and rx_queue structures directly as performance optimization. Then keep it for now as is. We will collect our experience with it, it makes porting of DPDK drivers easier, and overall it may also get clearer when seeing an actual driver implementation.


     > +
     > +     /** Driver mode (@see @ref driver_mode). Requested by user. */
     > +     uint8_t driver_mode;
     > +
     > +     uk_ip_info_t *ip_info;
    See my comment ahead about a query function. The fields would be
    received from the drivers internal representation.

     > +
     > +     struct ether_addr mac_addr;     /**< Device Ethernet Link
    address. */

    Hum... better to use a callback instead so that the driver stores and
    handles it internally? What do you think?

     > +     uint16_t mtu;                   /**< Maximum Transmission
    Unit. */
     > +

    Same here?


I think it would be helpful to have an API abstraction for MTU and MAC since
they're a pretty strict format and would really help on the user-side to have
a unified format.
The ether_addr was taken from DPDK and I think if it's generic enough there
it should be ok for us as well.

Okay. Good points. However, make sure that each driver is notified and able to decline an update of these values.


     > +     enum uk_netdev_state state; /**< Flag indicating the device
    state */
     > +     uint8_t rx_queue_state;
     > +     /** Queues state: STARTED(1) / STOPPED(0) */
     > +     uint8_t tx_queue_state;
     > +     /** Queues state: STARTED(1) / STOPPED(0) */
     > +
     > +     /** Supported modes (@see @ref driver_mode). Filled in by
    the driver. */
     > +     uint8_t supported_modes;

    See my comment ahead about the mode ;-)

     > +};
     > +
     > +/**
     > + * @internal
     > + * The generic data structure associated with each network device.
     > + *
     > + * Pointers to burst-oriented packet receive and transmit
    functions are
     > + * located at the beginning of the structure, along with the
    pointer to
     > + * where all the data elements for the particular device are
    stored in shared
     > + * memory. This split allows the function pointer and driver
    data to be per-
     > + * process, while the actual configuration data for the device
    is shared.
     > + */
     > +struct uk_netdev {
     > +     UK_TAILQ_ENTRY(struct uk_netdev) next;
     > +
     > +     /**< Pointer to burst receive function. */
     > +     uk_netdev_rx_burst_t rx_pkt_burst;
     > +     /**< Pointer to burst transmit function. */
     > +     uk_netdev_tx_burst_t tx_pkt_burst;
     > +
     > +     uk_netdev_rx_t rx_pkt; /**< Pointer to receive function. */
     > +     uk_netdev_tx_t tx_pkt; /**< Pointer to transmit function. */

    rx_pkt, tx_pkt is really needed here? If so, then it should be part of
    dev_ops. But I would only add rx_pkts_burst and tx_pkt_burst to devops
    and provide an implentation in libuknetdev for sending/receiving a
    single packet. You can probably provide it as 'static inline' in one of
    the header files.


The fact that RX/TX functions are not in the dev_ops is a performance optmization taken from DPDK. It prevents another indirection layer that would improve performance
since these are the most frequently called functions.
I would leave them here, since probably moving them around later could break a lot of stuff.

Okay, could you then add this explanation as a comment to the struct then?


     > +
     > +     /**< Pointer to device data */
     > +     struct uk_netdev_data *data;

    I would rather let the driver keep the data in its internal driver
    representation and store the callbacks in an referenced struct (as you
    do with dev_ops). For the internal device state data, I would really
    make use of container_of so that this struct does not need a reference
    for it.


I don't think that's a good idea. The uk_netdev_data is also part of the public NET API
and it really helps understanding the API as a whole.
It doesn't add a lot of overhead to driver ports and gives a clean, generic interface for users of the API. Relying too much on internal driver representation could cause some
platform fragmentation.

You have now good reasons why you want to keep some values in this generic data struct. This satisfies the existence of uk_netdev_data for me now. ;-)


The data/ops separation in this format is also one of the central things in the DPDK RTE API, and changing this could add a lot of complexity if in the future anyone tries to port
bare-metal drivers from DPDK.

This is a good point.


     > +     /**< Functions exported by driver */
     > +     const struct uk_netdev_ops *dev_ops;
     > +
     > +     /** User-supplied function called from driver on new packet
    RX */
     > +     rx_callback_fn rx_cb;
     > +};
     > +
     > +#endif //__UK_NETDEV_CORE__
     > diff --git a/lib/uknetdev/ip_addr.c b/lib/uknetdev/ip_addr.c
     > new file mode 100644
     > index 0000000..fe8250a
     > --- /dev/null
     > +++ b/lib/uknetdev/ip_addr.c
     > @@ -0,0 +1,447 @@
     > +/* SPDX-License-Identifier: BSD-3-Clause */
     > +/*
     > + * Authors: Razvan Cojocaru <razvan.cojocaru93@xxxxxxxxx
    <mailto:razvan.cojocaru93@xxxxxxxxx>>
     > + *
     > + * Copyright (c) 2018, NEC Europe Ltd., NEC Corporation. All
    rights reserved.
     > + *
     > + * Redistribution and use in source and binary forms, with or
    without
     > + * modification, are permitted provided that the following
    conditions
     > + * are met:
     > + *
     > + * 1. Redistributions of source code must retain the above copyright
     > + *    notice, this list of conditions and the following disclaimer.
     > + * 2. Redistributions in binary form must reproduce the above
    copyright
     > + *    notice, this list of conditions and the following
    disclaimer in the
     > + *    documentation and/or other materials provided with the
    distribution.
     > + * 3. Neither the name of the copyright holder nor the names of its
     > + *    contributors may be used to endorse or promote products
    derived from
     > + *    this software without specific prior written permission.
     > + *
     > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
    CONTRIBUTORS "AS IS"
     > + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
    LIMITED TO, THE
     > + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
    PARTICULAR PURPOSE
     > + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
    CONTRIBUTORS BE
     > + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
    EXEMPLARY, OR
     > + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
    PROCUREMENT OF
     > + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
    OR BUSINESS
     > + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
    WHETHER IN
     > + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
    OTHERWISE)
     > + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
    ADVISED OF THE
     > + * POSSIBILITY OF SUCH DAMAGE.
     > + *
     > + * THIS HEADER MAY NOT BE EXTRACTED OR MODIFIED IN ANY WAY.
     > + */
     > +
     > +/*
     > + * Copyright (c) 2001-2004 Swedish Institute of Computer Science.
     > + * All rights reserved.
     > + *
     > + * Redistribution and use in source and binary forms, with or
    without modification,
     > + * are permitted provided that the following conditions are met:
     > + *
     > + * 1. Redistributions of source code must retain the above
    copyright notice,
     > + *    this list of conditions and the following disclaimer.
     > + * 2. Redistributions in binary form must reproduce the above
    copyright notice,
     > + *    this list of conditions and the following disclaimer in
    the documentation
     > + *    and/or other materials provided with the distribution.
     > + * 3. The name of the author may not be used to endorse or
    promote products
     > + *    derived from this software without specific prior written
    permission.
     > + *
     > + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY
    EXPRESS OR IMPLIED
     > + * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
    WARRANTIES OF
     > + * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
    DISCLAIMED. IN NO EVENT
     > + * SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
    INCIDENTAL, SPECIAL,
     > + * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
    LIMITED TO, PROCUREMENT
     > + * OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
    PROFITS; OR BUSINESS
     > + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
    WHETHER IN
     > + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
    OTHERWISE) ARISING
     > + * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
    OF THE POSSIBILITY
     > + * OF SUCH DAMAGE.
     > + *
     > + * This file is part of the lwIP TCP/IP stack.
     > + *
     > + * Author: Adam Dunkels <adam@xxxxxxx <mailto:adam@xxxxxxx>>
     > + *
     > + */
     > +
     > +#include <uk/ip_addr.h>
     > +
     > +#ifndef isprint
     > +#define in_range(c, lo, up)  ((uint8_t)c >= lo && (uint8_t)c <= up)
     > +#define isprint(c)           in_range(c, 0x20, 0x7f)
     > +#define isdigit(c)           in_range(c, '0', '9')
     > +#define isxdigit(c)          (isdigit(c) || \
> +  in_range(c, 'a', 'f') || \ > +  in_range(c, 'A', 'F'))
     > +#define islower(c)           in_range(c, 'a', 'z')
     > +#define isspace(c)           (c == ' ' || c == '\f' || \
     > +                                                       c == '\n'
    || c == '\r' || \
     > +                                                       c == '\t'
    || c == '\v')
     > +#define xchar(i)             ((i) < 10 ? '0' + (i) : 'A' + (i) - 10)
     > +#endif
     > +
     > +#define UK_HTONL(x) ((((x) & 0x000000ffUL) << 24) | \
> +                                      (((x) & 0x0000ff00UL) << 8) | \ > +                                      (((x) & 0x00ff0000UL) >> 8) | \
     > +                                      (((x) & 0xff000000UL) >> 24))
     > +
     > +/**
     > + * Check whether "cp" is a valid ascii representation
     > + * of an Internet address and convert to a binary address.
     > + * Returns 1 if the address is valid, 0 if not.
     > + * This replaces inet_addr, the return value from which
     > + * cannot distinguish between failure and a local broadcast address.
     > + *
     > + * @param cp IP address in ascii representation (e.g. "127.0.0.1")
     > + * @param addr pointer to which to save the ip address in
    network order
     > + * @return 1 if cp could be converted to addr, 0 on failure
     > + */
     > +int uk_ip4addr_aton(const char *cp, uk_ip_addr_t *addr)
     > +{
     > +     uint32_t val;
     > +     uint8_t base;
     > +     char c;
     > +     uint32_t parts[4];
     > +     uint32_t *pp = parts;
     > +
     > +     c = *cp;
     > +     for (;;) {
     > +             /*
     > +              * Collect number up to ``.''.
     > +              * Values are specified as for C:
     > +              * 0x=hex, 0=octal, 1-9=decimal.
     > +              */
     > +             if (!isdigit(c))
     > +                     return 0;
     > +             val = 0;
     > +             base = 10;
     > +             if (c == '0') {
     > +                     c = *++cp;
     > +                     if (c == 'x' || c == 'X') {
     > +                             base = 16;
     > +                             c = *++cp;
     > +                     } else {
     > +                             base = 8;
     > +                     }
     > +             }
     > +             for (;;) {
     > +                     if (isdigit(c)) {
     > +                             val = (val * base) + (uint32_t)(c -
    '0');
     > +                             c = *++cp;
     > +                     } else if (base == 16 && isxdigit(c)) {
     > +                             val = (val << 4) | (uint32_t)(c + 10 -
     > +                                             (islower(c) ? 'a' :
    'A'));
     > +                             c = *++cp;
     > +                     } else {
     > +                             break;
     > +                     }
     > +             }
     > +             if (c == '.') {
     > +                     /*
     > +                      * Internet format:
     > +                      *  a.b.c.d
     > +                      *  a.b.c   (with c treated as 16 bits)
     > +                      *  a.b (with b treated as 24 bits)
     > +                      */
     > +                     if (pp >= parts + 3)
     > +                             return 0;
     > +                     *pp++ = val;
     > +                     c = *++cp;
     > +             } else {
     > +                     break;
     > +             }
     > +     }
     > +     /*
     > +      * Check for trailing characters.
     > +      */
     > +     if (c != '\0' && !isspace(c))
     > +             return 0;
     > +     /*
     > +      * Concoct the address according to
     > +      * the number of parts specified.
     > +      */
     > +     switch (pp - parts + 1) {
     > +
     > +     case 0:
     > +             return 0;       /* initial nondigit */
     > +
     > +     case 1:             /* a -- 32 bits */
     > +             break;
     > +
     > +     case 2:             /* a.b -- 8.24 bits */
     > +             if (val > 0xffffffUL)
     > +                     return 0;
     > +             if (parts[0] > 0xff)
     > +                     return 0;
     > +             val |= parts[0] << 24;
     > +             break;
     > +
     > +     case 3:             /* a.b.c -- 8.8.16 bits */
     > +             if (val > 0xffff)
     > +                     return 0;
     > +             if ((parts[0] > 0xff) || (parts[1] > 0xff))
     > +                     return 0;
     > +             val |= (parts[0] << 24) | (parts[1] << 16);
     > +             break;
     > +
     > +     case 4:             /* a.b.c.d -- 8.8.8.8 bits */
     > +             if (val > 0xff)
     > +                     return 0;
     > +             if ((parts[0] > 0xff) || (parts[1] > 0xff) ||
    (parts[2] > 0xff))
     > +                     return 0;
     > +             val |= (parts[0] << 24) | (parts[1] << 16) |
    (parts[2] << 8);
     > +             break;
     > +     default:
     > +             uk_printd(DLVL_ERR, "ipv4 addr parse error\n");
     > +             break;
     > +     }
     > +     if (addr)
     > +             addr->ipv4_addr = (uint32_t)UK_HTONL(val);
     > +     return 1;
     > +}
     > +
     > +/**
     > + * Convert numeric IP address into decimal dotted ASCII
    representation.
     > + *
     > + * @param addr ip address in network order to convert
     > + * @param buf target buffer where the string is stored
     > + * @param buflen length of buf
     > + * @return either pointer to buf which now holds the ASCII
     > + *         representation of addr or NULL if buf was too small
     > + */
     > +char *uk_ip4addr_ntoa(const uk_ip_addr_t *addr, char *buf, int
    buflen)
     > +{
     > +     uint32_t s_addr;
     > +     char inv[3];
     > +     char *rp;
     > +     uint8_t *ap;
     > +     uint8_t rem;
     > +     uint8_t n;
     > +     uint8_t i;
     > +     int len = 0;
     > +
     > +     s_addr = addr->ipv4_addr;
     > +
     > +     rp = buf;
     > +     ap = (uint8_t *)&s_addr;
     > +     for (n = 0; n < 4; n++) {
     > +             i = 0;
     > +             do {
     > +                     rem = *ap % (uint8_t)10;
     > +                     *ap /= (uint8_t)10;
     > +                     inv[i++] = (char)('0' + rem);
     > +             } while (*ap);
     > +             while (i--) {
     > +                     if (len++ >= buflen)
     > +                             return NULL;
     > +                     *rp++ = inv[i];
     > +             }
     > +             if (len++ >= buflen)
     > +                     return NULL;
     > +             *rp++ = '.';
     > +             ap++;
     > +     }
     > +     *--rp = 0;
     > +     return buf;
     > +}
     > +
     > +/**
     > + * Check whether "cp" is a valid ascii representation
     > + * of an IPv6 address and convert to a binary address.
     > + * Returns 1 if the address is valid, 0 if not.
     > + *
     > + * @param cp IPv6 address in ascii representation (e.g. "FF01::1")
     > + * @param addr pointer to which to save the ip address in
    network order
     > + * @return 1 if cp could be converted to addr, 0 on failure
     > + */
     > +int uk_ip6addr_aton(const char *cp, uk_ip_addr_t *addr)
     > +{
     > +     uint32_t addr_index, zero_blocks, current_block_index,
    current_block_value;
     > +     const char *s;
     > +
     > +     /* Count the number of colons, to count the number of blocks
     > +      * in a "::" sequence zero_blocks may be 1 even if there are
     > +      * no :: sequences
     > +      */
     > +     zero_blocks = 8;
     > +     for (s = cp; *s != 0; s++) {
     > +             if (*s == ':')
     > +                     zero_blocks--;
     > +             else if (!isxdigit(*s))
     > +                     break;
     > +     }
     > +
     > +     /* parse each block */
     > +     addr_index = 0;
     > +     current_block_index = 0;
     > +     current_block_value = 0;
     > +     for (s = cp; *s != 0; s++) {
     > +             if (*s == ':') {
     > +                     if (addr) {
     > +                             if (current_block_index & 0x1)
> +  addr->ipv6_addr[addr_index++] |= current_block_value;
     > +                             else
     > +                                     addr->ipv6_addr[addr_index]
    = current_block_value << 16;
     > +                     }
     > +                     current_block_index++;
     > +                     current_block_value = 0;
     > +                     if (current_block_index > 7) {
     > +                             /* address too long! */
     > +                             return 0;
     > +                     }
     > +                     if (s[1] == ':') {
     > +                             if (s[2] == ':') {
     > +                                     /* invalid format: three
    successive colons */
     > +                                     return 0;
     > +                             }
     > +                             s++;
     > +                             /* "::" found, set zeros */
     > +                             while (zero_blocks > 0) {
     > +                                     zero_blocks--;
     > +                                     if (current_block_index & 0x1)
     > +                                             addr_index++;
     > +                                     else if (addr)
> +  addr->ipv6_addr[addr_index] = 0;
     > +                                     current_block_index++;
     > +                                     if (current_block_index > 7) {
     > +                                             /* address too long! */
     > +                                             return 0;
     > +                                     }
     > +                             }
     > +                     }
     > +             } else if (isxdigit(*s)) {
     > +                     /* add current digit */
     > +                     current_block_value = (current_block_value
    << 4) + (isdigit(*s) ?
     > +                                     (uint32_t)(*s - '0') :
     > +                                     (uint32_t)(10 +
    (islower(*s) ? *s - 'a' : *s - 'A')));
     > +             } else {
     > +                     /* unexpected digit, space? CRLF? */
     > +                     break;
     > +             }
     > +     }
     > +
     > +     if (addr) {
     > +             if (current_block_index & 0x1)
     > +                     addr->ipv6_addr[addr_index++] |=
    current_block_value;
     > +             else
     > +                     addr->ipv6_addr[addr_index] =
    current_block_value << 16;
     > +     }
     > +
     > +     /* convert to network byte order. */
     > +     if (addr) {
     > +             for (addr_index = 0; addr_index < 4; addr_index++) {
     > +                     addr->ipv6_addr[addr_index] =
> +  (uint32_t)UK_HTONL(addr->ipv6_addr[addr_index]);
     > +             }
     > +     }
     > +
     > +     if (current_block_index != 7)
     > +             return 0;
     > +
     > +     return 1;
     > +}
     > +
     > +/**
     > + * Convert numeric IPv6 address into ASCII representation.
     > + *
     > + * @param addr ip6 address in network order to convert
     > + * @param buf target buffer where the string is stored
     > + * @param buflen length of buf
     > + * @return either pointer to buf which now holds the ASCII
     > + *         representation of addr or NULL if buf was too small
     > + */
     > +char *uk_ip6addr_ntoa_r(const uk_ip_addr_t *addr, char *buf, int
    buflen)
     > +{
     > +     uint32_t current_block_index, current_block_value,
    next_block_value;
     > +     int32_t i;
     > +     uint8_t zero_flag, empty_block_flag;
     > +
     > +     i = 0;
     > +     empty_block_flag = 0; /* used to indicate a zero chain for
    "::' */
     > +
     > +     for (current_block_index = 0; current_block_index < 8;
     > +              current_block_index++) {
     > +             /* get the current 16-bit block */
     > +             current_block_value =
> +  (uint32_t)UK_HTONL(addr->ipv6_addr[current_block_index >> 1]);
     > +             if ((current_block_index & 0x1) == 0)
     > +                     current_block_value = current_block_value
     >> 16;
     > +             current_block_value &= 0xffff;
     > +
     > +             /* Check for empty block. */
     > +             if (current_block_value == 0) {
     > +                     if (current_block_index == 7 &&
    empty_block_flag == 1) {
     > +                             /* special case, we must render a
    ':' for the last block. */
     > +                             buf[i++] = ':';
     > +                             if (i >= buflen)
     > +                                     return NULL;
     > +                             break;
     > +                     }
     > +                     if (empty_block_flag == 0) {
     > +                             /* generate empty block "::", but
    only if more than one
     > +                              * contiguous zero block, according
    to current formatting
     > +                              * suggestions RFC 5952.
     > +                              */
     > +                             next_block_value = (uint32_t)UK_HTONL(
> +  addr->ipv6_addr[(current_block_index + 1) >> 1]);
     > +                             if ((current_block_index & 0x1) ==
    0x01)
     > +                                     next_block_value =
    next_block_value >> 16;
     > +                             next_block_value &= 0xffff;
     > +                             if (next_block_value == 0) {
     > +                                     empty_block_flag = 1;
     > +                                     buf[i++] = ':';
     > +                                     if (i >= buflen)
     > +                                             return NULL;
     > +                                     continue; /* move on to
    next block. */
     > +                             }
     > +                     } else if (empty_block_flag == 1) {
     > +                             /* move on to next block. */
     > +                             continue;
     > +                     }
     > +             } else if (empty_block_flag == 1) {
     > +                     /* Set this flag value so we don't produce
    multiple empty blocks. */
     > +                     empty_block_flag = 2;
     > +             }
     > +
     > +             if (current_block_index > 0) {
     > +                     buf[i++] = ':';
     > +                     if (i >= buflen)
     > +                             return NULL;
     > +             }
     > +
     > +             if ((current_block_value & 0xf000) == 0) {
     > +                     zero_flag = 1;
     > +             } else {
     > +                     buf[i++] = xchar(((current_block_value &
    0xf000) >> 12));
     > +                     zero_flag = 0;
     > +                     if (i >= buflen)
     > +                             return NULL;
     > +             }
     > +
     > +             if (((current_block_value & 0xf00) == 0) &&
    (zero_flag)) {
     > +                     /* do nothing */
     > +             } else {
     > +                     buf[i++] = xchar(((current_block_value &
    0xf00) >> 8));
     > +                     zero_flag = 0;
     > +                     if (i >= buflen)
     > +                             return NULL;
     > +             }
     > +
     > +             if (((current_block_value & 0xf0) == 0) &&
    (zero_flag)) {
     > +                     /* do nothing */
     > +             } else {
     > +                     buf[i++] = xchar(((current_block_value &
    0xf0) >> 4));
     > +                     zero_flag = 0;
     > +                     if (i >= buflen)
     > +                             return NULL;
     > +             }
     > +
     > +             buf[i++] = xchar((current_block_value & 0xf));
     > +             if (i >= buflen)
     > +                     return NULL;
     > +     }
     > +
     > +     buf[i] = 0;
     > +
     > +     return buf;
     > +}
     > diff --git a/lib/uknetdev/netdev.c b/lib/uknetdev/netdev.c
     > new file mode 100644
     > index 0000000..2898434
     > --- /dev/null
     > +++ b/lib/uknetdev/netdev.c
     > @@ -0,0 +1,182 @@
     > +/* SPDX-License-Identifier: BSD-3-Clause */
     > +/*
     > + * Authors: Simon Kuenzer <simon.kuenzer@xxxxxxxxx
    <mailto:simon.kuenzer@xxxxxxxxx>>
     > + *          Razvan Cojocaru <razvan.cojocaru93@xxxxxxxxx
    <mailto:razvan.cojocaru93@xxxxxxxxx>>
     > + *
     > + * Copyright (c) 2017-2018, NEC Europe Ltd., NEC Corporation.
    All rights reserved.
     > + *
     > + * Redistribution and use in source and binary forms, with or
    without
     > + * modification, are permitted provided that the following
    conditions
     > + * are met:
     > + *
     > + * 1. Redistributions of source code must retain the above copyright
     > + *    notice, this list of conditions and the following disclaimer.
     > + * 2. Redistributions in binary form must reproduce the above
    copyright
     > + *    notice, this list of conditions and the following
    disclaimer in the
     > + *    documentation and/or other materials provided with the
    distribution.
     > + * 3. Neither the name of the copyright holder nor the names of its
     > + *    contributors may be used to endorse or promote products
    derived from
     > + *    this software without specific prior written permission.
     > + *
     > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
    CONTRIBUTORS "AS IS"
     > + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
    LIMITED TO, THE
     > + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
    PARTICULAR PURPOSE
     > + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
    CONTRIBUTORS BE
     > + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
    EXEMPLARY, OR
     > + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
    PROCUREMENT OF
     > + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
    OR BUSINESS
     > + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
    WHETHER IN
     > + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
    OTHERWISE)
     > + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
    ADVISED OF THE
     > + * POSSIBILITY OF SUCH DAMAGE.
     > + *
     > + * THIS HEADER MAY NOT BE EXTRACTED OR MODIFIED IN ANY WAY.
     > + */
     > +
     > +#include <uk/netdev.h>
     > +#include <string.h>
     > +#include <uk/assert.h>
     > +#include <uk/print.h>
     > +#include <uk/plat/ctors.h>
     > +
     > +#define UK_NETDEV_CTOR_PRIO    (101U)

    You should use 102 if you want priority 1. 101 is the lowest you can
    set.


Ok. 102 it is then.

Forget to say that 101 is reserved for libc's. Your initialization may depend on it. Unfortunately, we do not have a convention yet.


     > +
     > +struct uk_netdev_list uk_netdev_list;
     > +static uint16_t netdev_count;
     > +
     > +/* This library does not have any dependency to another library for
     > + * initialization, except a libc -> We use priority 1
     > + */ > +static void _uk_netdev_ctor(void)
    __constructor_prio(UK_NETDEV_CTOR_PRIO);
     > +
     > +void uk_netdev_register(struct uk_netdev *dev)
     > +{
     > +     UK_ASSERT(dev != NULL);
     > +
     > +     uk_printd(DLVL_INFO, "Register netdev%u: %p\n",
     > +               netdev_count, dev);
     > +     dev->data->id = netdev_count;
     > +     UK_TAILQ_INSERT_TAIL(&uk_netdev_list, dev, next);
     > +
     > +     ++netdev_count;
     > +}
     > +
     > +unsigned int uk_netdev_count(void)
     > +{
     > +     return netdev_count;
     > +}
     > +
     > +struct uk_netdev *uk_netdev_get(unsigned int id)
     > +{
     > +     struct uk_netdev *n;
     > +
     > +     UK_NETDEV_LIST_FOREACH(n) {
     > +             if (n->data->id == id)
     > +                     return n;
     > +     }
     > +     return NULL;
     > +}
     > +
     > +int uk_netdev_configure(struct uk_netdev *dev,
     > +             const struct uk_netdev_conf *eth_conf)
     > +{
     > +     UK_ASSERT(dev);
     > +     uk_printd(DLVL_INFO, "Configure device 0x%p\n", dev);
     > +
     > +     if (eth_conf == NULL) {
     > +             dev->data->driver_mode = UK_NETDEV_MODE_POLLING;
     > +     } else {
     > +             if ((eth_conf->requested_mode &
    dev->data->supported_modes)
     > +                             != 0) {
     > +                     dev->data->driver_mode =
    eth_conf->requested_mode;
     > +             } else {
     > +                     uk_printd(DLVL_ERR, "Invalid driver mode
    requested\n");
     > +                     return -EINVAL;
     > +             }
     > +     }
     > +
     > +     uk_netdev_ip_set(dev, NULL);
     > +
     > +     return dev->dev_ops->dev_configure(dev);
     > +}
     > +
     > +int uk_netdev_rx_queue_setup(struct uk_netdev *dev, uint16_t
    rx_queue_id,
     > +             const struct uk_netdev_rxconf *rx_conf)
     > +{
     > +     UK_ASSERT(dev);
     > +
     > +     if (dev->data->driver_mode != UK_NETDEV_MODE_POLLING) {
     > +             if (rx_conf == NULL) {
     > +                     uk_printd(DLVL_ERR,
     > +                                       "Interrupt-based mode
    requested with no callback\n");
     > +                     return -EINVAL;
     > +             }
     > +             dev->rx_cb = rx_conf->rx_cb;
     > +     }
     > +
     > +     return dev->dev_ops->rx_queue_setup(dev, rx_queue_id, rx_conf);
     > +}
     > +

    I would provide as many of the following straight forward declarations
    as 'static inline'.

     > +int uk_netdev_tx_queue_setup(struct uk_netdev *dev, uint16_t
    tx_queue_id,
     > +             const struct uk_netdev_txconf *tx_conf)
     > +{
     > +     UK_ASSERT(dev);
     > +     return dev->dev_ops->tx_queue_setup(dev, tx_queue_id, tx_conf);
     > +}
     > +
     > +int uk_netdev_start(struct uk_netdev *dev)
     > +{
     > +     UK_ASSERT(dev);
     > +     return dev->dev_ops->dev_start(dev);
     > +}
     > +
     > +void uk_netdev_stop(struct uk_netdev *dev)
     > +{
     > +     UK_ASSERT(dev);
     > +     dev->dev_ops->dev_stop(dev);
     > +}
     > +
     > +int uk_netdev_mac_addr_set(struct uk_netdev *dev, struct
    ether_addr *mac_addr)
     > +{
     > +     UK_ASSERT(dev);
     > +     memcpy(&dev->data->mac_addr, mac_addr, sizeof(struct
    ether_addr));
     > +     return 0;
     > +}
     > +
     > +int uk_netdev_ip_set(struct uk_netdev *dev, uk_ip_info_t *ip)
     > +{
     > +     UK_ASSERT(dev);
     > +     dev->data->ip_info = ip;
     > +     return 0;
     > +}
     > +
     > +uk_ip_info_t *uk_netdev_ip_get(struct uk_netdev *dev)
     > +{
     > +     UK_ASSERT(dev);
     > +     return dev->data->ip_info;
     > +}
     > +
     > +int uk_netdev_mtu_set(struct uk_netdev *dev, uint16_t mtu)
     > +{
     > +     UK_ASSERT(dev);
     > +     dev->data->mtu = mtu;
     > +     return 0;
     > +}

    Maybe a specific MTU is not supported by a driver? so, you may want to
    let the driver do this operation and return an appropriate return code.

    There should also be an API interface to get the current MTU form the
    driver.


Will add a MTU getter and mtu_set in dev_opts.

     > +
     > +uint16_t uk_netdev_rx(struct uk_netdev *dev, void *data,
    uint16_t max_len)
     > +{
     > +     UK_ASSERT(dev);
     > +     return dev->rx_pkt(dev, data, max_len);
     > +}
     > +
     > +uint16_t uk_netdev_tx(struct uk_netdev *dev, void *data,
    uint16_t len)
     > +{
     > +     UK_ASSERT(dev);
     > +     return dev->tx_pkt(dev, data, len);
     > +}
     > +
     > +static void _uk_netdev_ctor(void)
     > +{
     > +     UK_TAILQ_INIT(&uk_netdev_list);
     > +     netdev_count = 0;
     > +}
     >

    Thanks,

    Simon


Thanks again for all the feedback!
I'll come up with the second version as soon as possible.

Razvan

Thanks for your comments. I am looking forward to the second version ;-)

Cheers,

Simon


_______________________________________________
Minios-devel mailing list
Minios-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/minios-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.