[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Minios-devel] [UNIKRAFT PATCH] lib/uknetdev: Unikraft Network API



Hey Simon,

I really appreciate your feedback, and I've added some notes to your comments below.

I'll post a new version, integrating your suggestions and adding some more
comments where it seems that the API wasn't clear enough.


În lun., 16 iul. 2018 la 11:54, Simon Kuenzer <simon.kuenzer@xxxxxxxxx> a scris:
Hi Razvan,

thanks a lot for your patch. It is a good start for providing a
low-level & driver-independent network interface. See my comments inline:

On 11.07.2018 16:39, Razvan Cojocaru wrote:
> Introduces the Unikraft Network API, that acts as a generalised
> interface between network drivers and network stack implementations
> (or low level networking applications).
>
> Using the function definitions in netdev_core.h, the driver should
> implement the functions in uk_netdev_opts and fill in the fields
> from uk_netdev and uk_netdev_data.
> The user-facing part of the API is in netdev.h. The network stack
> glue code can use these functions to configure network devices, as
> well as send/receive packets.
>
> Driver modes allow for explicit configuration of polling/interrupt
> mechanisms.
> The driver marks supported capabilities in the supported_modes flags,
> while the user would select one of these supported modes when
> setting up the driver.
>
> Inspired from from DPDK RTE Ethernet API.
> IP utility functions taken from LWIP.

Hum... do we really want to include the IP utility functions?
I am rather expecting that TCP/IP stacks provide anyways their own and I
would use them instead. It may avoid parsing addresses two times because
data type incompatibilities. For the rare cases you do not have a stack,
I would add a new library for IP conversions that provides just your
utilities from lwIP.

So, I was thinking if it may be better to provide a query interface that
returns you the pointer to the actual data field. Since we do not know
how the various drivers represent the extra configuration data, I would
provide multiple variants for each type (e.g., a raw one and a string
one). However, the driver would only return those that it actually has
and would not parse them by its own.

enum uk_netdev_econf_type {
        IPv4ADDR_INT,  /**< IPv4 address as raw int (4 bytes) */
        IPv4ADDR_STR,  /**< IPv4 address as null-terminated string */
        IPv4MASK_INT,  /**< IPv4 mask as raw int (4 bytes) */
        IPv4MASK_STR,  /**< IPv4 address as null-terminated string */
        IPv4GW_RAW,    /**< and so on... ;-) */
        IPv4GW_STR,
        IPv4DNS0_RAW,
        IPv4DNS0_STR,

        /*
         * This list is extensible in the future without needing
         * the drivers to adopt
         */
}

static inline
const void *uk_netdev_econf_get(struct uk_netdev *dev,
                                 enum uk_netdev_econf_type econf)
{
        if (!dev->econf_cb)
                return NULL; /* driver does not provide
                                any extra configuration */
        return dev->deconf_cb(dev, econf);
}




/**
  * Implemented by each driver
  */
static const void *netfront_econf_get(struct uk_netdev *dev,
                                       enum uk_netdev_econf_type econf)
{
        struct uk_netfront *nf;

        UK_ASSERT(dev->initialized);
        nf = _netdev_to_netfront(dev);
        UK_ASSERT(nf);

        switch (econf) {
        case IPv4ADDR_STR:
                return nf->xenstore_econf.ipv4addr;
        case IPv4MASK_STR:
                return nf->xenstore_econf.ipv4mask;
        case IPv4GW_STR:
                return nf->xenstore_econf.ipv4gw;
        default:
                break;
        }

        /* type not supported */
        return NULL;
}

What do you think?

Seems reasonable. Even though it adds some complexity on the user side,
it may be worth it in order to get rid of the IP utils.
I'll add it in the next version.
 
>
> Signed-off-by: Razvan Cojocaru <razvan.cojocaru93@xxxxxxxxx>
> ---
>   lib/Config.uk                         |   1 +
>   lib/Makefile.uk                       |   1 +
>   lib/uknetdev/Config.uk                |   7 +
>   lib/uknetdev/Makefile.uk              |   7 +
>   lib/uknetdev/include/uk/ip_addr.h     |  92 +++++++
>   lib/uknetdev/include/uk/netdev.h      | 303 +++++++++++++++++++++++
>   lib/uknetdev/include/uk/netdev_core.h | 308 +++++++++++++++++++++++
>   lib/uknetdev/ip_addr.c                | 447 ++++++++++++++++++++++++++++++++++
>   lib/uknetdev/netdev.c                 | 182 ++++++++++++++
>   9 files changed, 1348 insertions(+)
>   create mode 100644 lib/uknetdev/Config.uk
>   create mode 100644 lib/uknetdev/Makefile.uk
>   create mode 100644 lib/uknetdev/include/uk/ip_addr.h
>   create mode 100644 lib/uknetdev/include/uk/netdev.h
>   create mode 100644 lib/uknetdev/include/uk/netdev_core.h
>   create mode 100644 lib/uknetdev/ip_addr.c
>   create mode 100644 lib/uknetdev/netdev.c
>
> diff --git a/lib/Config.uk b/lib/Config.uk
> index e438603..003bd4f 100644
> --- a/lib/Config.uk
> +++ b/lib/Config.uk
> @@ -37,3 +37,4 @@ source "lib/uklock/Config.uk"
>   source "lib/ukmpi/Config.uk"
>   source "lib/ukswrand/Config.uk"
>   source "lib/ukbus/Config.uk"
> +source "lib/uknetdev/Config.uk"
> diff --git a/lib/Makefile.uk b/lib/Makefile.uk
> index 40c65d0..bcf22f3 100644
> --- a/lib/Makefile.uk
> +++ b/lib/Makefile.uk
> @@ -19,3 +19,4 @@ $(eval $(call _import_lib,$(CONFIG_UK_BASE)/lib/vfscore))
>   $(eval $(call _import_lib,$(CONFIG_UK_BASE)/lib/uklock))
>   $(eval $(call _import_lib,$(CONFIG_UK_BASE)/lib/ukmpi))
>   $(eval $(call _import_lib,$(CONFIG_UK_BASE)/lib/ukbus))
> +$(eval $(call _import_lib,$(CONFIG_UK_BASE)/lib/uknetdev))
> diff --git a/lib/uknetdev/Config.uk b/lib/uknetdev/Config.uk
> new file mode 100644
> index 0000000..d1acdc0
> --- /dev/null
> +++ b/lib/uknetdev/Config.uk
> @@ -0,0 +1,7 @@
> +menuconfig LIBUKNETDEV
> +     bool "uknetdev: Network driver interface"
> +     default n
> +     select LIBUKALLOC
> +
> +if LIBUKNETDEV
> +endif
> diff --git a/lib/uknetdev/Makefile.uk b/lib/uknetdev/Makefile.uk
> new file mode 100644
> index 0000000..d7502d8
> --- /dev/null
> +++ b/lib/uknetdev/Makefile.uk
> @@ -0,0 +1,7 @@
> +$(eval $(call addlib_s,libuknetdev,$(CONFIG_LIBUKNETDEV)))
> +
> +CINCLUDES-$(CONFIG_LIBUKNETDEV)              += -I$(LIBUKNETDEV_BASE)/include
> +CXXINCLUDES-$(CONFIG_LIBUKNETDEV)            += -I$(LIBUKNETDEV_BASE)/include
> +
> +LIBUKBUS_SRCS-y += $(LIBUKNETDEV_BASE)/netdev.c
> +LIBUKBUS_SRCS-y += $(LIBUKNETDEV_BASE)/ip_addr.c
> diff --git a/lib/uknetdev/include/uk/ip_addr.h b/lib/uknetdev/include/uk/ip_addr.h
> new file mode 100644
> index 0000000..93ff8ac
> --- /dev/null
> +++ b/lib/uknetdev/include/uk/ip_addr.h
> @@ -0,0 +1,92 @@
> +/* SPDX-License-Identifier: BSD-3-Clause */
> +/*
> + * Authors: Razvan Cojocaru <razvan.cojocaru93@xxxxxxxxx>
> + *
> + * Copyright (c) 2018, NEC Europe Ltd., NEC Corporation. All rights reserved.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + *
> + * 1. Redistributions of source code must retain the above copyright
> + *    notice, this list of conditions and the following disclaimer.
> + * 2. Redistributions in binary form must reproduce the above copyright
> + *    notice, this list of conditions and the following disclaimer in the
> + *    documentation and/or other materials provided with the distribution.
> + * 3. Neither the name of the copyright holder nor the names of its
> + *    contributors may be used to endorse or promote products derived from
> + *    this software without specific prior written permission.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
> + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
> + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
> + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
> + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
> + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
> + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
> + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
> + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
> + * POSSIBILITY OF SUCH DAMAGE.
> + *
> + * THIS HEADER MAY NOT BE EXTRACTED OR MODIFIED IN ANY WAY.
> + */
> +
> +/*
> + * Copyright (c) 2001-2004 Swedish Institute of Computer Science.
> + * All rights reserved.
> + *
> + * Redistribution and use in source and binary forms, with or without modification,
> + * are permitted provided that the following conditions are met:
> + *
> + * 1. Redistributions of source code must retain the above copyright notice,
> + *    this list of conditions and the following disclaimer.
> + * 2. Redistributions in binary form must reproduce the above copyright notice,
> + *    this list of conditions and the following disclaimer in the documentation
> + *    and/or other materials provided with the distribution.
> + * 3. The name of the author may not be used to endorse or promote products
> + *    derived from this software without specific prior written permission.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED
> + * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
> + * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT
> + * SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
> + * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT
> + * OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
> + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
> + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
> + * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY
> + * OF SUCH DAMAGE.
> + *
> + * This file is part of the lwIP TCP/IP stack.
> + *
> + * Author: Adam Dunkels <adam@xxxxxxx>
> + *
> + */
> +
> +#ifndef __UK_IP_ADDR__
> +#define __UK_IP_ADDR__
> +
> +#include <stdint.h>
> +#include <stddef.h>
> +#include <uk/assert.h>
> +
> +typedef union {
> +     uint32_t ipv4_addr;     /**< IPv4 address in big endian. */
> +     uint32_t ipv6_addr[4];  /**< IPv6 address in big endian. */
> +} uk_ip_addr_t;
> +
> +typedef struct {
> +     uk_ip_addr_t ip;
> +     uk_ip_addr_t gateway;
> +     uk_ip_addr_t netmask;
> +} uk_ip_info_t;

See my suggestion ahead. I am concerned that this would not fit to every
driver (for instance, what about DNS addresses or VLAN tags)?

Replacing with query interface, got it.
 
> +
> +
> +int uk_ip4addr_aton(const char *cp, uk_ip_addr_t *addr);
> +char *uk_ip4addr_ntoa(const uk_ip_addr_t *addr, char *buf, int buflen);
> +
> +int uk_ip6addr_aton(const char *cp, uk_ip_addr_t *addr);
> +char *uk_ip6addr_ntoa_r(const uk_ip_addr_t *addr, char *buf, int buflen);
> +
> +#endif //__UK_IP_ADDR__
> diff --git a/lib/uknetdev/include/uk/netdev.h b/lib/uknetdev/include/uk/netdev.h
> new file mode 100644
> index 0000000..3128ffe
> --- /dev/null
> +++ b/lib/uknetdev/include/uk/netdev.h
> @@ -0,0 +1,303 @@
> +/* SPDX-License-Identifier: BSD-3-Clause */
> +/*
> + * Authors: Simon Kuenzer <simon.kuenzer@xxxxxxxxx>
> + *          Razvan Cojocaru <razvan.cojocaru93@xxxxxxxxx>
> + *
> + * Copyright (c) 2010-2017 Intel Corporation
> + * Copyright (c) 2018, NEC Europe Ltd., NEC Corporation. All rights reserved.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + *
> + * 1. Redistributions of source code must retain the above copyright
> + *    notice, this list of conditions and the following disclaimer.
> + * 2. Redistributions in binary form must reproduce the above copyright
> + *    notice, this list of conditions and the following disclaimer in the
> + *    documentation and/or other materials provided with the distribution.
> + * 3. Neither the name of the copyright holder nor the names of its
> + *    contributors may be used to endorse or promote products derived from
> + *    this software without specific prior written permission.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
> + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
> + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
> + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
> + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
> + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
> + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
> + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
> + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
> + * POSSIBILITY OF SUCH DAMAGE.
> + *
> + * THIS HEADER MAY NOT BE EXTRACTED OR MODIFIED IN ANY WAY.
> + */

Please add a note that you derived this header form DPDK (and which file
from there).

Will do, I thought it would be enough to just add the Intel copyright.
 
> +
> +#ifndef __UK_NETDEV__
> +#define __UK_NETDEV__
> +
> +/**
> + * Unikraft Network API
> + *
> + * The Unikraft NET API provides a generalised interface between Unikraft

s/generalised/generalized/

Generalised is the British form, but I'll stick to the US one if that is the current norm :)
 
> + * drivers and network stack implementations or low-level network applications.
> + *
> + * Most NET API functions take as parameter a reference to the corresponding
> + * Unikraft Network Device (struct uk_netdev) which can be obtained with a call
> + * to uk_netdev_get(). The network application should store this reference and
> + * use it for all subsequent API calls.
> + *
> + * The functions exported by the Unikraft NET API to setup a device
> + * designated by its ID must be invoked in the following order:
> + *     - uk_netdev_configure()
> + *     - uk_netdev_tx_queue_setup()
> + *     - uk_netdev_rx_queue_setup()
> + *     - uk_netdev_start()
> + * If the network application wants to change configurations (call queue_setup
> + * or configure again), it must call rte_eth_dev_stop() first to stop the
> + * device and then do the reconfiguration before calling rte_eth_dev_start()
> + * again. The transmit and receive functions should not be invoked when the
> + * device is stopped.

s/rte_eth_dev/uk_netdev_/

Replacing...
 
> + */
> +
> +
> +#include <stddef.h>
> +#include <stdint.h>
> +#include <stdio.h>
> +#include <errno.h>
> +#include <uk/list.h>
> +#include <uk/alloc.h>
> +#include <uk/print.h>
> +#include <uk/assert.h>
> +#include "netdev_core.h"
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +/**
> + * Get the number of available Unikraft Network devices.
> + *
> + * @return
> + *   - (unsigned int): number of network devices.
> + */
> +unsigned int uk_netdev_count(void);
> +
> +/**
> + * Get a reference to a Unikraft Network Device, based on its ID.
> + * This reference should be saved by the application and used for subsequent
> + * API calls.
> + *
> + * @param id
> + *   The identifier of the Unikraft network device to configure.
> + * @return
> + *   - NULL: device not found in list
> + *   - (struct uk_netdev *): reference to be passed to API calls
> + */
> +struct uk_netdev *uk_netdev_get(unsigned int id);
> +
> +/**
> + * Configure an Unikraft network device.
> + * This function must be invoked first before any other function in the
> + * Unikraft NET API. This function can also be re-invoked when a device is
> + * in the stopped state.
> + *
> + * @param dev
> + *   The Unikraft Network Device.
> + * @param eth_conf
> + *   The pointer to the configuration data to be used for the Unikraft
> + *   network device.
> + *
> + *   Embedding all configuration information in a single data structure
> + *   is the more flexible method that allows the addition of new features
> + *   without changing the syntax of the API.
> + * @return
> + *   - 0: Success, device configured.
> + *   - <0: Error code returned by the driver configuration function.
> + */
> +int uk_netdev_configure(struct uk_netdev *dev,
> +             const struct uk_netdev_conf *conf);
> +
> +/**
> + * Start a Network device.
> + *
> + * The device start step is the last one and consists of setting the configured
> + * offload features and in starting the transmit and the receive units of the
> + * device.
> + * On success, all basic functions exported by the Unikraft NET API (link
> + * status, receive/transmit, and so on) can be invoked.
> + *
> + * @param dev
> + *   The Unikraft Network Device.
> + * @return
> + *   - 0: Success, Unikraft network device started.
> + *   - <0: Error code of the driver device start function.
> + */
> +int uk_netdev_start(struct uk_netdev *dev);
> +
> +/**
> + * Stop an Unikraft network device. The device can be restarted with a call to
> + * rte_eth_dev_start()

uk_netdev_start()

> + *
> + * @param dev
> + *   The Unikraft Network Device.
> + */
> +void uk_netdev_stop(struct uk_netdev *dev);
> +
> +/**
> + * Close a stopped Unikraft network device.
> + * The function frees all resources except for needed by the closed state.

What does this mean exactly "except for needed by the closed state"?

It's mostly up to the driver, but it could also mean that you keep the netdev in the list
so you don't have to poll the bus again if you would want to reconfigure.
 
> + *
> + * @param dev
> + *   The Unikraft Network Device.
> + */
> +void uk_netdev_close(struct uk_netdev *dev);
> +
> +/**
> + * Set the default MAC address.
> + *
> + * @param dev
> + *   The Unikraft Network Device.
> + * @param mac_addr
> + *   New default MAC address.
Does this mean we support multiple MAC addresses and one just get set as
default? This might be actually possible for some NICs but I would just
support only a single MAC address in the driver.

 We don't support multiple MACs for now. That's just a DPDK remnant I forgot to remove.

> + * @return
> + *   - (0) if successful, or *mac_addr* didn't exist.
> + *   - (-ENOTSUP) if hardware doesn't support.
> + *   - (-ENODEV) if *id* invalid.
What is an invalid id?

Removed. Forgot it there from previous version where we used netdev id's instead of references.
 
> + *   - (-EINVAL) if MAC address is invalid.
> + */
> +int uk_netdev_mac_addr_set(struct uk_netdev *dev, struct ether_addr *mac_addr);

Where is the interface to retrieve the current MAC address from the
interface?

I would also add a comment for promiscuous mode on/off so that we could
implement this later. I would not suggest that a MAC address with just
zeros or FFs would set this mode.

I'll add the DPDK methods for promiscuous mode on/off and some flag in the data.
It won't be used in the first versions of the virtio/xen drivers, but still in the NET API for
future reference.
 
> +
> +/**
> + * Function that can be used by the driver to set an IP address to the network
> + * device. The application or network stack is not forced to use this specific
> + * IP information and can set another address using other methods.
> + * Useful when the driver has access to more information, usually platform
> + * specific.
> + *
> + * @param dev
> + *   The Unikraft Network Device.
> + * @param ip_info
> + *   - (uk_ip_info_t *): IP layer information such as IP, netmask, gateway
> + * @return
> + *   - (0) if successful.
> + *   - (-EINVAL) if IP information is invalid.
> + */
> +int uk_netdev_ip_set(struct uk_netdev *dev, uk_ip_info_t *ip_info);
> +

I am not getting how this function would be useful. I expect that
network stacks are handling this by their own anyways. Why should I tell
a network card driver which IP/IPs I am oging to use?

It was a function meant to be called by the drivers (optionally).
Will remove it along with the IP utils and it will be replaced by the query interface.
 
> +/**
> + * Read the IP information set by the driver.
> + *
> + * @param dev
> + *   The Unikraft Network Device.
> + * @return ip_info
> + *   - (uk_ip_info_t *): IP layer information such as IP, netmask, gateway
> + *   - (NULL): if the driver didn't set any IP information.
> + */
> +uk_ip_info_t *uk_netdev_ip_get(struct uk_netdev *dev);

I guess it is filled out with information found on the Xenstore. What if
we are out of mem? Do you return an ERRPTR? As mentioned in the header,
I would prefer a query interface to avoid possible double parsing of values.

Same as for the query interface, the whole IP mechanism would have limited error handling
since it would be pretty much optional. It would just return NULL for anything going wrong.
 
> +
> +/**
> + * Change the MTU of an Unikraft network device.
> + *
> + * @param dev
> + *   The Unikraft Network Device.
> + * @param mtu
> + *   A uint16_t for the MTU to be applied.
> + * @return
> + *   - (0) if successful.
> + *   - (-ENOTSUP) if operation is not supported.
> + *   - (-ENODEV) if *id* invalid.

Which id is invalid? Better to do UK_ASSERT() for checking dev != NULL.

 Will remove.

> + *   - (-EIO) if device is removed.
> + *   - (-EINVAL) if *mtu* invalid.
> + *   - (-EBUSY) if operation is not allowed when the device is running
> + */
> +int uk_netdev_mtu_set(struct uk_netdev *dev, uint16_t mtu);
> +
> +/**
> + * Allocate and set up a receive queue for an Unikraft network device.
> + *
> + * The function handles setup of receive callback for interrupt-based modes.
> + *
> + * @param dev
> + *   The Unikraft Network Device.
> + * @param rx_queue_id
> + *   The index of the receive queue to set up.
> + *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
> + *   to rte_eth_dev_configure().

Replace rte_eth_dev... I will not mention it anymore for the rest of the
patch: just use search & replace. ;-)

Searching and replacing as we speak.
 
> + * @param rx_conf
> + *   The pointer to the configuration data to be used for the receive queue.
> + *   NULL value is allowed, in which case default RX configuration
> + *   will be used.
> + *   The *rx_conf* structure contains an *rx_thresh* structure with the values
> + *   of the Prefetch, Host, and Write-Back threshold registers of the receive
> + *   ring.
> + *   In addition it contains the hardware offloads features to activate using
> + *   the DEV_RX_OFFLOAD_* flags.
> + * @return
> + *   - 0: Success, receive queue correctly set up.
> + *   - -EIO: if device is removed.
> + */
> +int uk_netdev_rx_queue_setup(struct uk_netdev *dev, uint16_t rx_queue_id,
> +             const struct uk_netdev_rxconf *rx_conf);
> +

DPDK provided the option to set the number of queues that should be
configured (I think you removed it?). I would even add another function
call that you could use to query how many are supported at most. The
queue_id is otherwise difficult to guess.
Alternatively, it is also fine to say that we support only a single rx
queue and single tx queue for now, since we do not have SMP yet. In this
case you should remove the queue_id parameter everywhere. ;-)

Queues are not actually supported as of this version. There are some elements
there in order to be able to easily introduce them in future versions without changing
the API too much (keep the same number of parameters, etc).
I would not remove the queue_id, since if you want to later introduce queues and add another
param to most functions it would break any form of backwards compatibility.
 
> +/**
> + * Allocate and set up a transmit queue for an Unikraft network device.
> + *
> + * @param dev
> + *   The Unikraft Network Device.
> + * @param tx_queue_id
> + *   The index of the transmit queue to set up.
> + *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
> + *   to rte_eth_dev_configure().
> + * @param tx_conf
> + *   The pointer to the configuration data to be used for the transmit queue.
> + *   NULL value is allowed, in which case default TX configuration
> + *   will be used.
> + * @return
> + *   - 0: Success, the transmit queue is correctly set up.
> + *   - -ENOMEM: Unable to allocate the transmit ring descriptors.
> + */
> +int uk_netdev_tx_queue_setup(struct uk_netdev *dev, uint16_t tx_queue_id,
> +             const struct uk_netdev_txconf *tx_conf);
> +
> +/**
> + * Basic RX function.

Maybe call it a compatibility function.
How is it going to be implemented later?

Knowing DPDK, I would rather expect a function like:

uint16_t uk_netdev_rx_burst(struct uk_netdev *dev, uint16_t queue_id,
                   struct uk_mbuf** rx_pkts, uint16_t nb_pkts);

and then having your uk_netdev_rx() as wrapper, maybe generalized for
all network functions. Packet flags (e.g., checksum offloading) would be
part of each uk_mbuf.
 
I agree that it would be a much more extensible option using mbufs, but I think
this initial version is pretty complex as-is, even with these simplified versions
of TX/RX functions.
Adding mbufs would introduce a lot more complexity, and I think should be added
in a later version of the API.
 
> + * @param dev
> + *   The Unikraft Network Device.
> + * @param data
> + *   The data pointer where the packet will be placed by the driver.
> + * @param max_len
> + *   Maximum length of the packet.
> + * @return
> + *   - 0: No new packets
> + *   - >0: Length of the received packet
> + */
> +uint16_t uk_netdev_rx(struct uk_netdev *dev, void *data, uint16_t max_len);

How do I retrieve additional flags from the driver (e.g., TCP checksum
is (not) okay or just partially calculated (VM-to-VM communication))? I
would prefer introducing a mbuf struct. 
> +
> +/**
> + * Basic TX function.
> + * @param dev
> + *   The Unikraft Network Device.
> + * @param data
> + *   Raw packet data (including Ethernet headers) to be sent by the driver.
> + * @param len
> + *   The length of the packet.
> + * @return
> + */
> +uint16_t uk_netdev_tx(struct uk_netdev *dev, void *data, uint16_t len);

See my comments on uk_netdev_rx() also for the tx function.

> +
> +/**
> + * Add a new Unikraft network device in the device list.
> + * Should be called by the driver in the configuration step.
> + *
> + * @param dev
> + *   The Unikraft Network Device.
> + */
> +void uk_netdev_register(struct uk_netdev *dev);
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif //__UK_NETDEV__
> diff --git a/lib/uknetdev/include/uk/netdev_core.h b/lib/uknetdev/include/uk/netdev_core.h
> new file mode 100644
> index 0000000..aec5cf5
> --- /dev/null
> +++ b/lib/uknetdev/include/uk/netdev_core.h
> @@ -0,0 +1,308 @@
> +/* SPDX-License-Identifier: BSD-3-Clause */
> +/*
> + * Authors: Razvan Cojocaru <razvan.cojocaru93@xxxxxxxxx>
> + *
> + * Copyright (c) 2017 Intel Corporation
> + * Copyright (c) 2018, NEC Europe Ltd., NEC Corporation. All rights reserved.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + *
> + * 1. Redistributions of source code must retain the above copyright
> + *    notice, this list of conditions and the following disclaimer.
> + * 2. Redistributions in binary form must reproduce the above copyright
> + *    notice, this list of conditions and the following disclaimer in the
> + *    documentation and/or other materials provided with the distribution.
> + * 3. Neither the name of the copyright holder nor the names of its
> + *    contributors may be used to endorse or promote products derived from
> + *    this software without specific prior written permission.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
> + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
> + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
> + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
> + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
> + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
> + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
> + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
> + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
> + * POSSIBILITY OF SUCH DAMAGE.
> + *
> + * THIS HEADER MAY NOT BE EXTRACTED OR MODIFIED IN ANY WAY.
> + */
> +
> +#ifndef __UK_NETDEV_CORE__
> +#define __UK_NETDEV_CORE__
> +
> +/**
> + * Unikraft Network Device internal header.
> + *
> + * This header contains internal data types. But they are still part of the
> + * public API because they are used by inline functions in the published API.
> + *
> + * The device data and operations are separated. This split allows the
> + * function pointer and driver data to be per-process, while the actual
> + * configuration data for the device is shared.
> + */
> +
> +#include "ip_addr.h"
> +
> +
> +#define NETDEV_MAX_QUEUES 10

Where is this limit coming from?

I'll make the current limit 1 for now, to mark that we don't really support more queues
in this version.
 
> +#define NETDEV_NAME_MAX_LEN 64

Why do we need a name? At least make this an menu-configurable optional
parameter. You also may want to add a API function that returns the name
set by the driver.

The name part is mostly the same as DPDK does it, and I think it could really help
in debugging purposes as humans tend to work better with names than ids.
Not having a name will obviously not break any functionality in the driver, worst
case you'll just end up getting some empty strings on the user-side.

I'll add an API function to return a NAME + ID string if you think that would be useful.
 
> +
> +#define ETHER_ADDR_LEN 6 /**< Length of Ethernet address. */
> +
> +
> +struct ether_addr {
> +     uint8_t addr_bytes[ETHER_ADDR_LEN]; /**< Addr bytes in tx order */
> +} __attribute__((__packed__));

In <uk/essentials.h> we have a macro defintion for packed.

Replaced with macro.
 
> +
> +/**
> + * A set of values to describe the possible states of an eth device.
> + */
> +enum uk_netdev_state {
> +     UK_NETDEV_UNCONFIGURED = 0,
> +     UK_NETDEV_CONFIGURED,
> +     UK_NETDEV_RUNNING,
> +};
> +
> +
> +/**
> + * @defgroup driver_mode Driver Receive Mode
> + *
> + * Driver modes provide a method of shifting complexity from driver to
> + * network stack/application or otherwise.
> + * Polling drivers would offer more control over TX/RX but with increased
> + * complexity to network stack ports, while interrupt-based drivers would be
> + * simpler to use for applications.
> + * A driver should set implemented capabilities as supported_modes and the
> + * network application requests the specific mode via requested_mode in the
> + * receive queue setup configurations.
> + * @{
> + */
> +
> +/** Basic polling mode driver. Provides RX/TX functions and it's the user's
> + * responsibility when/how to call them.
> + */
> +#define UK_NETDEV_MODE_POLLING   0x01U
> +
> +/** Interrupt/Event mode driver. The user defines a callback which is called
> + * by the driver when a new packet is received.
> + */
> +#define UK_NETDEV_MODE_INTERRUPT 0x02U
> +
> +/** Mixed Polling/Interrupt mode. Hybrid mode that continuously polls the
> + * driver, and after a number of unsuccessful polls enters in sleep mode
> + * and enables interrupts. From a user perspective, similar to Interrupt
> + * mode.
> + */
> +#define UK_NETDEV_MODE_HYBRID    0x04U

Maybe you want an enum instead? You can't set all modes at the same
time, right?
Hum... I am still thinking that the modes requires drivers to implement
policies and not just the mechanisms (e.g., the threshold when to decide
to stop polling and witching to interrupt mode). Can't we add a
functions to the API that disable and enable interrupts on a queue, so
it is up to the uk_netdev programmer how the policy should look like? I
would prefer something like:

  uk_netdev_rx_queue_disable_irq(struct uk_netdev *n, uint16_t queue_id);
  uk_netdev_rx_queue_enable_irq(struct uk_netdev *n, uint16_t queue_id);

  uk_netdev_tx_queue_disable_irq(struct uk_netdev *n, uint16_t queue_id);
  uk_netdev_tx_queue_enable_irq(struct uk_netdev *n,uint16_t queue_id);

As part of bringing the device up, you would enable interrupts (if you
registered a callback function). Default behavior should be interrupts
having off.

Those could be called by the network stack/application whenever needed.
As part of lwIP driver code, you could then do there set the threshold
between pooling and non-polling.

Do you know how Linux or BSD solved this?

I'll remove the driver mode in the next version and just leave it up to the user
when to enable/disable interrupts.
 
> +
> +/**
> + * @}
> + */
> +
> +/**
> + * A structure used to configure an Unikraft network device.
> + */
> +struct uk_netdev_conf {
> +     uint8_t requested_mode;
> +};
> +
> +UK_TAILQ_HEAD(uk_netdev_list, struct uk_netdev);
> +
> +#define UK_NETDEV_LIST_FOREACH(b)                    \
> +     UK_TAILQ_FOREACH(b, &uk_netdev_list, next)
> +
> +/**
> + * Function type used for RX packet processing packet callbacks.
> + *
> + * The callback function is called on RX with a packet that has been received
> + * on the given device and queue.
> + *
> + * @param id
> + *   The identifier of the device on which RX is being performed.
> + * @param queue
> + *   The queue on the Unikraft network device which is being used to receive
> + *   the packets.
> + * @param data
> + *   Content of the received packet.
> + * @param len
> + *   Length of the packet data.
> + * @return
> + *   The number of packets returned to the user.
> + */
> +typedef void (*rx_callback_fn)(uint16_t id, uint16_t queue, void *data,
> +                                                        uint16_t len);

Hum, I wouldn't send the packet with the callback. I would just notify
which interface it was (and which queue) and let the user call
uk_netdev_rx() within the callback (or later) for emptying the queue:

typedef void (*uk_netdev_rx_callback_fn)(struct uk_netdev *n,
                                         uint16_t queue);

Ok.
 
Btw, as long as this function did not return, there should be no more
interrupts fired by the NIC for the specific queue, right?

I also miss a calback for TX that is fired when a transmission is
completed. This would enable async TX which can make a difference on
high traffic load. It is fine to add it later but then there should be a
comment.

There is a pre-TX, post-RX callback mechanism in DPDK which I had commented-out.
I'm not sure if that is the thing you were referring.

 
> +
> +/**
> + * A structure used to configure an Unikraft network device.
> + */

Maybe you want to call them "struct uk_netdev_(rx|tx)queue_conf":

Will rename. 

> +struct uk_netdev_rxconf {
> +     rx_callback_fn rx_cb;
> +};
> +
> +/**
> + * A structure used to configure an Unikraft network device.
> + */
> +struct uk_netdev_txconf {
> +};
> +
> +
> +typedef int  (*uk_netdev_configure_t)(struct uk_netdev *dev);
> +/**< @internal Unikraft network device configuration. */
> +
> +typedef int  (*uk_netdev_start_t)(struct uk_netdev *dev);
> +/**< @internal Function used to start a configured Unikraft network device. */
> +
> +typedef void (*uk_netdev_stop_t)(struct uk_netdev *dev);
> +/**< @internal Function used to stop a configured Unikraft network device. */
> +
> +typedef void (*uk_netdev_close_t)(struct uk_netdev *dev);
> +/**< @internal Function used to close a configured Unikraft network device. */
> +
> +typedef void (*uk_netdev_mac_addr_remove_t)(struct uk_netdev *dev);
> +/**< @internal Remove MAC address*/
> +
> +typedef void (*uk_netdev_mac_addr_set_t)(struct uk_netdev *dev,
> +             struct ether_addr *mac_addr);
> +/**< @internal Set the MAC address */

So, you support multiple MACs, right? ;-) If we restrict it to a single
one (which is fine for now), you do not need the remove funciton.

Yes. Will remove it.
 
> +
> +typedef int (*uk_netdev_mtu_set_t)(struct uk_netdev *dev, uint16_t mtu);
> +/**< @internal Set MTU. */
> +
> +typedef int (*uk_netdev_rx_queue_setup_t)(struct uk_netdev *dev,
> +             uint16_t rx_queue_id, const struct uk_netdev_rxconf *rx_conf);
> +/**< @internal Set up a receive queue of an Unikraft network device. */
> +
> +typedef int (*uk_netdev_tx_queue_setup_t)(struct uk_netdev *dev,
> +             uint16_t tx_queue_id, const struct uk_netdev_txconf *tx_conf);
> +/**< @internal Setup a transmit queue of an Unikraft network device. */
> +
> +typedef void (*uk_netdev_queue_release_t)(void *queue);
> +/**< @internal Release memory resources allocated by given RX/TX queue. */
> +
> +typedef int (*uk_netdev_rx_enable_intr_t)(struct uk_netdev *dev,
> +             uint16_t rx_queue_id);
> +/**< @internal Enable interrupt of a receive queue of an
> + * Unikraft network device.
> + */

Interesting, here it is... You do not need the mode then, right? ;-)

Consider the driver modes gone.
 
> +
> +typedef int (*uk_netdev_rx_disable_intr_t)(struct uk_netdev *dev,
> +             uint16_t rx_queue_id);
> +/**< @internal Disable interrupt of a receive queue of an
> + * Unikraft network device.
> + */
> +
> +
> +typedef uint16_t (*uk_netdev_rx_t)(struct uk_netdev *dev, void *data,
> +             uint16_t len);
> +/**< @internal Retrieve one input packet from an Unikraft network device. */

Do you think that you need this implemented by the driver as callback? I
think we can provide sending/receiving of one packet by the generic
layer of libuknetdev and base the implementation on
uk_netdev_(rx|tx)_burst()

Implementing this in the generic layer of libuknetdev kills the whole purpose of it.
For some use-cases, you may not be that concerned with performance and for
compatibility sake, a one-packet-at-a-time approach might be enough.
This way, you could have a very simple interface for a basic driver and a quick
network stack port may use this basic rx|tx mechanism.

Obviously, for more complex systems you would want to use burst tx|rx, but for
these stages that seems overkill and could slow down the whole development process.

I'm a fan of simple things that work :-)
 
> +
> +typedef uint16_t (*uk_netdev_tx_t)(struct uk_netdev *dev, void *data,
> +             uint16_t len);
> +/**< @internal Send one output packet to an Unikraft network device. */
> +
> +typedef uint16_t (*uk_netdev_rx_burst_t)(void *rxq,
> +             void **rx_pkts, uint16_t nb_pkts);
> +/**< @internal Retrieve input packets from a receive queue of an
> + * Unikraft network device.
> + */
> +
> +typedef uint16_t (*uk_netdev_tx_burst_t)(void *txq, void **tx_pkts,
> +             uint16_t nb_pkts);
> +/**< @internal Send output packets on a transmit queue of an
> + * Unikraft network device.
> + */
> +
> +
> +/**
> + * @internal A structure containing the functions exported by a driver.
> + */
> +struct uk_netdev_ops {
> +     uk_netdev_configure_t      dev_configure; /**< Configure device. */
> +     uk_netdev_start_t          dev_start;     /**< Start device. */
> +     uk_netdev_stop_t           dev_stop;      /**< Stop device. */
> +     uk_netdev_close_t          dev_close;     /**< Close device. */
> +
> +     uk_netdev_mac_addr_set_t   mac_addr_set;  /**< Set a MAC address. */
> +     uk_netdev_mtu_set_t        mtu_set;       /**< Set MTU. */
> +
> +     /**< Set up device RX queue. */
> +     uk_netdev_rx_queue_setup_t  rx_queue_setup;
> +     /**< Release RX queue. */
> +     uk_netdev_queue_release_t   rx_queue_release;
> +
> +     /**< Set up device TX queue. */
> +     uk_netdev_tx_queue_setup_t  tx_queue_setup;
> +     /**< Release TX queue. */
> +     uk_netdev_queue_release_t   tx_queue_release;
> +};
> +
> +/**
> + * @internal
> + * The data part, with no function pointers, associated with each
> + * network device.
> + *
> + * This structure is safe to place in shared memory to be common among different
> + * processes in a multi-process configuration.
> + */
> +struct uk_netdev_data {
> +     char name[NETDEV_NAME_MAX_LEN]; /**< Unique identifier name */
> +     uint16_t id;           /**< Device [external] port identifier. */

Is this needed?

> +
> +     void *rx_queue; /**< Pointer to RX queue. */
> +     void *tx_queue; /**< Pointer to TX queue. */

Shouldn't this be part of the drivers internal representation? The
callbacks would handle this.

This is stubish since we don't really support multi-queues at the moment.
I'll remove them if you think they seem weird. I just added them from DPDK
for future reference.
 
> +
> +     /** Driver mode (@see @ref driver_mode). Requested by user. */
> +     uint8_t driver_mode;
> +
> +     uk_ip_info_t *ip_info;
See my comment ahead about a query function. The fields would be
received from the drivers internal representation.

> +
> +     struct ether_addr mac_addr;     /**< Device Ethernet Link address. */

Hum... better to use a callback instead so that the driver stores and
handles it internally? What do you think?

> +     uint16_t mtu;                   /**< Maximum Transmission Unit. */
> +

Same here?

I think it would be helpful to have an API abstraction for MTU and MAC since
they're a pretty strict format and would really help on the user-side to have
a unified format.
The ether_addr was taken from DPDK and I think if it's generic enough there
it should be ok for us as well. 

> +     enum uk_netdev_state state; /**< Flag indicating the device state */
> +     uint8_t rx_queue_state;
> +     /** Queues state: STARTED(1) / STOPPED(0) */
> +     uint8_t tx_queue_state;
> +     /** Queues state: STARTED(1) / STOPPED(0) */
> +
> +     /** Supported modes (@see @ref driver_mode). Filled in by the driver. */
> +     uint8_t supported_modes;

See my comment ahead about the mode ;-)

> +};
> +
> +/**
> + * @internal
> + * The generic data structure associated with each network device.
> + *
> + * Pointers to burst-oriented packet receive and transmit functions are
> + * located at the beginning of the structure, along with the pointer to
> + * where all the data elements for the particular device are stored in shared
> + * memory. This split allows the function pointer and driver data to be per-
> + * process, while the actual configuration data for the device is shared.
> + */
> +struct uk_netdev {
> +     UK_TAILQ_ENTRY(struct uk_netdev) next;
> +
> +     /**< Pointer to burst receive function. */
> +     uk_netdev_rx_burst_t rx_pkt_burst;
> +     /**< Pointer to burst transmit function. */
> +     uk_netdev_tx_burst_t tx_pkt_burst;
> +
> +     uk_netdev_rx_t rx_pkt; /**< Pointer to receive function. */
> +     uk_netdev_tx_t tx_pkt; /**< Pointer to transmit function. */

rx_pkt, tx_pkt is really needed here? If so, then it should be part of
dev_ops. But I would only add rx_pkts_burst and tx_pkt_burst to devops
and provide an implentation in libuknetdev for sending/receiving a
single packet. You can probably provide it as 'static inline' in one of
the header files.

The fact that RX/TX functions are not in the dev_ops is a performance optmization
taken from DPDK. It prevents another indirection layer that would improve performance
since these are the most frequently called functions.
I would leave them here, since probably moving them around later could break a lot of stuff.
 
> +
> +     /**< Pointer to device data */
> +     struct uk_netdev_data *data;

I would rather let the driver keep the data in its internal driver
representation and store the callbacks in an referenced struct (as you
do with dev_ops). For the internal device state data, I would really
make use of container_of so that this struct does not need a reference
for it.

I don't think that's a good idea. The uk_netdev_data is also part of the public NET API
and it really helps understanding the API as a whole.
It doesn't add a lot of overhead to driver ports and gives a clean, generic interface for
users of the API. Relying too much on internal driver representation could cause some
platform fragmentation. 

The data/ops separation in this format is also one of the central things in the DPDK RTE API,
and changing this could add a lot of complexity if in the future anyone tries to port
bare-metal drivers from DPDK.
 
> +     /**< Functions exported by driver */
> +     const struct uk_netdev_ops *dev_ops;
> +
> +     /** User-supplied function called from driver on new packet RX */
> +     rx_callback_fn rx_cb;
> +};
> +
> +#endif //__UK_NETDEV_CORE__
> diff --git a/lib/uknetdev/ip_addr.c b/lib/uknetdev/ip_addr.c
> new file mode 100644
> index 0000000..fe8250a
> --- /dev/null
> +++ b/lib/uknetdev/ip_addr.c
> @@ -0,0 +1,447 @@
> +/* SPDX-License-Identifier: BSD-3-Clause */
> +/*
> + * Authors: Razvan Cojocaru <razvan.cojocaru93@xxxxxxxxx>
> + *
> + * Copyright (c) 2018, NEC Europe Ltd., NEC Corporation. All rights reserved.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + *
> + * 1. Redistributions of source code must retain the above copyright
> + *    notice, this list of conditions and the following disclaimer.
> + * 2. Redistributions in binary form must reproduce the above copyright
> + *    notice, this list of conditions and the following disclaimer in the
> + *    documentation and/or other materials provided with the distribution.
> + * 3. Neither the name of the copyright holder nor the names of its
> + *    contributors may be used to endorse or promote products derived from
> + *    this software without specific prior written permission.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
> + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
> + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
> + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
> + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
> + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
> + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
> + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
> + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
> + * POSSIBILITY OF SUCH DAMAGE.
> + *
> + * THIS HEADER MAY NOT BE EXTRACTED OR MODIFIED IN ANY WAY.
> + */
> +
> +/*
> + * Copyright (c) 2001-2004 Swedish Institute of Computer Science.
> + * All rights reserved.
> + *
> + * Redistribution and use in source and binary forms, with or without modification,
> + * are permitted provided that the following conditions are met:
> + *
> + * 1. Redistributions of source code must retain the above copyright notice,
> + *    this list of conditions and the following disclaimer.
> + * 2. Redistributions in binary form must reproduce the above copyright notice,
> + *    this list of conditions and the following disclaimer in the documentation
> + *    and/or other materials provided with the distribution.
> + * 3. The name of the author may not be used to endorse or promote products
> + *    derived from this software without specific prior written permission.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED
> + * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
> + * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT
> + * SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
> + * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT
> + * OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
> + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
> + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
> + * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY
> + * OF SUCH DAMAGE.
> + *
> + * This file is part of the lwIP TCP/IP stack.
> + *
> + * Author: Adam Dunkels <adam@xxxxxxx>
> + *
> + */
> +
> +#include <uk/ip_addr.h>
> +
> +#ifndef isprint
> +#define in_range(c, lo, up)  ((uint8_t)c >= lo && (uint8_t)c <= up)
> +#define isprint(c)           in_range(c, 0x20, 0x7f)
> +#define isdigit(c)           in_range(c, '0', '9')
> +#define isxdigit(c)          (isdigit(c) || \
> +                                                       in_range(c, 'a', 'f') || \
> +                                                       in_range(c, 'A', 'F'))
> +#define islower(c)           in_range(c, 'a', 'z')
> +#define isspace(c)           (c == ' ' || c == '\f' || \
> +                                                       c == '\n' || c == '\r' || \
> +                                                       c == '\t' || c == '\v')
> +#define xchar(i)             ((i) < 10 ? '0' + (i) : 'A' + (i) - 10)
> +#endif
> +
> +#define UK_HTONL(x) ((((x) & 0x000000ffUL) << 24) | \
> +                                      (((x) & 0x0000ff00UL) <<  8) | \
> +                                      (((x) & 0x00ff0000UL) >>  8) | \
> +                                      (((x) & 0xff000000UL) >> 24))
> +
> +/**
> + * Check whether "cp" is a valid ascii representation
> + * of an Internet address and convert to a binary address.
> + * Returns 1 if the address is valid, 0 if not.
> + * This replaces inet_addr, the return value from which
> + * cannot distinguish between failure and a local broadcast address.
> + *
> + * @param cp IP address in ascii representation (e.g. "127.0.0.1")
> + * @param addr pointer to which to save the ip address in network order
> + * @return 1 if cp could be converted to addr, 0 on failure
> + */
> +int uk_ip4addr_aton(const char *cp, uk_ip_addr_t *addr)
> +{
> +     uint32_t val;
> +     uint8_t base;
> +     char c;
> +     uint32_t parts[4];
> +     uint32_t *pp = parts;
> +
> +     c = *cp;
> +     for (;;) {
> +             /*
> +              * Collect number up to ``.''.
> +              * Values are specified as for C:
> +              * 0x=hex, 0=octal, 1-9=decimal.
> +              */
> +             if (!isdigit(c))
> +                     return 0;
> +             val = 0;
> +             base = 10;
> +             if (c == '0') {
> +                     c = *++cp;
> +                     if (c == 'x' || c == 'X') {
> +                             base = 16;
> +                             c = *++cp;
> +                     } else {
> +                             base = 8;
> +                     }
> +             }
> +             for (;;) {
> +                     if (isdigit(c)) {
> +                             val = (val * base) + (uint32_t)(c - '0');
> +                             c = *++cp;
> +                     } else if (base == 16 && isxdigit(c)) {
> +                             val = (val << 4) | (uint32_t)(c + 10 -
> +                                             (islower(c) ? 'a' : 'A'));
> +                             c = *++cp;
> +                     } else {
> +                             break;
> +                     }
> +             }
> +             if (c == '.') {
> +                     /*
> +                      * Internet format:
> +                      *  a.b.c.d
> +                      *  a.b.c   (with c treated as 16 bits)
> +                      *  a.b (with b treated as 24 bits)
> +                      */
> +                     if (pp >= parts + 3)
> +                             return 0;
> +                     *pp++ = val;
> +                     c = *++cp;
> +             } else {
> +                     break;
> +             }
> +     }
> +     /*
> +      * Check for trailing characters.
> +      */
> +     if (c != '\0' && !isspace(c))
> +             return 0;
> +     /*
> +      * Concoct the address according to
> +      * the number of parts specified.
> +      */
> +     switch (pp - parts + 1) {
> +
> +     case 0:
> +             return 0;       /* initial nondigit */
> +
> +     case 1:             /* a -- 32 bits */
> +             break;
> +
> +     case 2:             /* a.b -- 8.24 bits */
> +             if (val > 0xffffffUL)
> +                     return 0;
> +             if (parts[0] > 0xff)
> +                     return 0;
> +             val |= parts[0] << 24;
> +             break;
> +
> +     case 3:             /* a.b.c -- 8.8.16 bits */
> +             if (val > 0xffff)
> +                     return 0;
> +             if ((parts[0] > 0xff) || (parts[1] > 0xff))
> +                     return 0;
> +             val |= (parts[0] << 24) | (parts[1] << 16);
> +             break;
> +
> +     case 4:             /* a.b.c.d -- 8.8.8.8 bits */
> +             if (val > 0xff)
> +                     return 0;
> +             if ((parts[0] > 0xff) || (parts[1] > 0xff) || (parts[2] > 0xff))
> +                     return 0;
> +             val |= (parts[0] << 24) | (parts[1] << 16) | (parts[2] << 8);
> +             break;
> +     default:
> +             uk_printd(DLVL_ERR, "ipv4 addr parse error\n");
> +             break;
> +     }
> +     if (addr)
> +             addr->ipv4_addr = (uint32_t)UK_HTONL(val);
> +     return 1;
> +}
> +
> +/**
> + * Convert numeric IP address into decimal dotted ASCII representation.
> + *
> + * @param addr ip address in network order to convert
> + * @param buf target buffer where the string is stored
> + * @param buflen length of buf
> + * @return either pointer to buf which now holds the ASCII
> + *         representation of addr or NULL if buf was too small
> + */
> +char *uk_ip4addr_ntoa(const uk_ip_addr_t *addr, char *buf, int buflen)
> +{
> +     uint32_t s_addr;
> +     char inv[3];
> +     char *rp;
> +     uint8_t *ap;
> +     uint8_t rem;
> +     uint8_t n;
> +     uint8_t i;
> +     int len = 0;
> +
> +     s_addr = addr->ipv4_addr;
> +
> +     rp = buf;
> +     ap = (uint8_t *)&s_addr;
> +     for (n = 0; n < 4; n++) {
> +             i = 0;
> +             do {
> +                     rem = *ap % (uint8_t)10;
> +                     *ap /= (uint8_t)10;
> +                     inv[i++] = (char)('0' + rem);
> +             } while (*ap);
> +             while (i--) {
> +                     if (len++ >= buflen)
> +                             return NULL;
> +                     *rp++ = inv[i];
> +             }
> +             if (len++ >= buflen)
> +                     return NULL;
> +             *rp++ = '.';
> +             ap++;
> +     }
> +     *--rp = 0;
> +     return buf;
> +}
> +
> +/**
> + * Check whether "cp" is a valid ascii representation
> + * of an IPv6 address and convert to a binary address.
> + * Returns 1 if the address is valid, 0 if not.
> + *
> + * @param cp IPv6 address in ascii representation (e.g. "FF01::1")
> + * @param addr pointer to which to save the ip address in network order
> + * @return 1 if cp could be converted to addr, 0 on failure
> + */
> +int uk_ip6addr_aton(const char *cp, uk_ip_addr_t *addr)
> +{
> +     uint32_t addr_index, zero_blocks, current_block_index, current_block_value;
> +     const char *s;
> +
> +     /* Count the number of colons, to count the number of blocks
> +      * in a "::" sequence zero_blocks may be 1 even if there are
> +      * no :: sequences
> +      */
> +     zero_blocks = 8;
> +     for (s = cp; *s != 0; s++) {
> +             if (*s == ':')
> +                     zero_blocks--;
> +             else if (!isxdigit(*s))
> +                     break;
> +     }
> +
> +     /* parse each block */
> +     addr_index = 0;
> +     current_block_index = 0;
> +     current_block_value = 0;
> +     for (s = cp; *s != 0; s++) {
> +             if (*s == ':') {
> +                     if (addr) {
> +                             if (current_block_index & 0x1)
> +                                     addr->ipv6_addr[addr_index++] |= current_block_value;
> +                             else
> +                                     addr->ipv6_addr[addr_index] = current_block_value << 16;
> +                     }
> +                     current_block_index++;
> +                     current_block_value = 0;
> +                     if (current_block_index > 7) {
> +                             /* address too long! */
> +                             return 0;
> +                     }
> +                     if (s[1] == ':') {
> +                             if (s[2] == ':') {
> +                                     /* invalid format: three successive colons */
> +                                     return 0;
> +                             }
> +                             s++;
> +                             /* "::" found, set zeros */
> +                             while (zero_blocks > 0) {
> +                                     zero_blocks--;
> +                                     if (current_block_index & 0x1)
> +                                             addr_index++;
> +                                     else if (addr)
> +                                             addr->ipv6_addr[addr_index] = 0;
> +                                     current_block_index++;
> +                                     if (current_block_index > 7) {
> +                                             /* address too long! */
> +                                             return 0;
> +                                     }
> +                             }
> +                     }
> +             } else if (isxdigit(*s)) {
> +                     /* add current digit */
> +                     current_block_value = (current_block_value << 4) + (isdigit(*s) ?
> +                                     (uint32_t)(*s - '0') :
> +                                     (uint32_t)(10 + (islower(*s) ? *s - 'a' : *s - 'A')));
> +             } else {
> +                     /* unexpected digit, space? CRLF? */
> +                     break;
> +             }
> +     }
> +
> +     if (addr) {
> +             if (current_block_index & 0x1)
> +                     addr->ipv6_addr[addr_index++] |= current_block_value;
> +             else
> +                     addr->ipv6_addr[addr_index] = current_block_value << 16;
> +     }
> +
> +     /* convert to network byte order. */
> +     if (addr) {
> +             for (addr_index = 0; addr_index < 4; addr_index++) {
> +                     addr->ipv6_addr[addr_index] =
> +                                     (uint32_t)UK_HTONL(addr->ipv6_addr[addr_index]);
> +             }
> +     }
> +
> +     if (current_block_index != 7)
> +             return 0;
> +
> +     return 1;
> +}
> +
> +/**
> + * Convert numeric IPv6 address into ASCII representation.
> + *
> + * @param addr ip6 address in network order to convert
> + * @param buf target buffer where the string is stored
> + * @param buflen length of buf
> + * @return either pointer to buf which now holds the ASCII
> + *         representation of addr or NULL if buf was too small
> + */
> +char *uk_ip6addr_ntoa_r(const uk_ip_addr_t *addr, char *buf, int buflen)
> +{
> +     uint32_t current_block_index, current_block_value, next_block_value;
> +     int32_t i;
> +     uint8_t zero_flag, empty_block_flag;
> +
> +     i = 0;
> +     empty_block_flag = 0; /* used to indicate a zero chain for "::' */
> +
> +     for (current_block_index = 0; current_block_index < 8;
> +              current_block_index++) {
> +             /* get the current 16-bit block */
> +             current_block_value =
> +                             (uint32_t)UK_HTONL(addr->ipv6_addr[current_block_index >> 1]);
> +             if ((current_block_index & 0x1) == 0)
> +                     current_block_value = current_block_value >> 16;
> +             current_block_value &= 0xffff;
> +
> +             /* Check for empty block. */
> +             if (current_block_value == 0) {
> +                     if (current_block_index == 7 && empty_block_flag == 1) {
> +                             /* special case, we must render a ':' for the last block. */
> +                             buf[i++] = ':';
> +                             if (i >= buflen)
> +                                     return NULL;
> +                             break;
> +                     }
> +                     if (empty_block_flag == 0) {
> +                             /* generate empty block "::", but only if more than one
> +                              * contiguous zero block, according to current formatting
> +                              * suggestions RFC 5952.
> +                              */
> +                             next_block_value = (uint32_t)UK_HTONL(
> +                                             addr->ipv6_addr[(current_block_index + 1) >> 1]);
> +                             if ((current_block_index & 0x1) == 0x01)
> +                                     next_block_value = next_block_value >> 16;
> +                             next_block_value &= 0xffff;
> +                             if (next_block_value == 0) {
> +                                     empty_block_flag = 1;
> +                                     buf[i++] = ':';
> +                                     if (i >= buflen)
> +                                             return NULL;
> +                                     continue; /* move on to next block. */
> +                             }
> +                     } else if (empty_block_flag == 1) {
> +                             /* move on to next block. */
> +                             continue;
> +                     }
> +             } else if (empty_block_flag == 1) {
> +                     /* Set this flag value so we don't produce multiple empty blocks. */
> +                     empty_block_flag = 2;
> +             }
> +
> +             if (current_block_index > 0) {
> +                     buf[i++] = ':';
> +                     if (i >= buflen)
> +                             return NULL;
> +             }
> +
> +             if ((current_block_value & 0xf000) == 0) {
> +                     zero_flag = 1;
> +             } else {
> +                     buf[i++] = xchar(((current_block_value & 0xf000) >> 12));
> +                     zero_flag = 0;
> +                     if (i >= buflen)
> +                             return NULL;
> +             }
> +
> +             if (((current_block_value & 0xf00) == 0) && (zero_flag)) {
> +                     /* do nothing */
> +             } else {
> +                     buf[i++] = xchar(((current_block_value & 0xf00) >> 8));
> +                     zero_flag = 0;
> +                     if (i >= buflen)
> +                             return NULL;
> +             }
> +
> +             if (((current_block_value & 0xf0) == 0) && (zero_flag)) {
> +                     /* do nothing */
> +             } else {
> +                     buf[i++] = xchar(((current_block_value & 0xf0) >> 4));
> +                     zero_flag = 0;
> +                     if (i >= buflen)
> +                             return NULL;
> +             }
> +
> +             buf[i++] = xchar((current_block_value & 0xf));
> +             if (i >= buflen)
> +                     return NULL;
> +     }
> +
> +     buf[i] = 0;
> +
> +     return buf;
> +}
> diff --git a/lib/uknetdev/netdev.c b/lib/uknetdev/netdev.c
> new file mode 100644
> index 0000000..2898434
> --- /dev/null
> +++ b/lib/uknetdev/netdev.c
> @@ -0,0 +1,182 @@
> +/* SPDX-License-Identifier: BSD-3-Clause */
> +/*
> + * Authors: Simon Kuenzer <simon.kuenzer@xxxxxxxxx>
> + *          Razvan Cojocaru <razvan.cojocaru93@xxxxxxxxx>
> + *
> + * Copyright (c) 2017-2018, NEC Europe Ltd., NEC Corporation. All rights reserved.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + *
> + * 1. Redistributions of source code must retain the above copyright
> + *    notice, this list of conditions and the following disclaimer.
> + * 2. Redistributions in binary form must reproduce the above copyright
> + *    notice, this list of conditions and the following disclaimer in the
> + *    documentation and/or other materials provided with the distribution.
> + * 3. Neither the name of the copyright holder nor the names of its
> + *    contributors may be used to endorse or promote products derived from
> + *    this software without specific prior written permission.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
> + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
> + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
> + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
> + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
> + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
> + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
> + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
> + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
> + * POSSIBILITY OF SUCH DAMAGE.
> + *
> + * THIS HEADER MAY NOT BE EXTRACTED OR MODIFIED IN ANY WAY.
> + */
> +
> +#include <uk/netdev.h>
> +#include <string.h>
> +#include <uk/assert.h>
> +#include <uk/print.h>
> +#include <uk/plat/ctors.h>
> +
> +#define UK_NETDEV_CTOR_PRIO    (101U)

You should use 102 if you want priority 1. 101 is the lowest you can set.

Ok. 102 it is then.
 
> +
> +struct uk_netdev_list uk_netdev_list;
> +static uint16_t netdev_count;
> +
> +/* This library does not have any dependency to another library for
> + * initialization, except a libc -> We use priority 1
> + */ > +static void _uk_netdev_ctor(void) __constructor_prio(UK_NETDEV_CTOR_PRIO);
> +
> +void uk_netdev_register(struct uk_netdev *dev)
> +{
> +     UK_ASSERT(dev != NULL);
> +
> +     uk_printd(DLVL_INFO, "Register netdev%u: %p\n",
> +               netdev_count, dev);
> +     dev->data->id = netdev_count;
> +     UK_TAILQ_INSERT_TAIL(&uk_netdev_list, dev, next);
> +
> +     ++netdev_count;
> +}
> +
> +unsigned int uk_netdev_count(void)
> +{
> +     return netdev_count;
> +}
> +
> +struct uk_netdev *uk_netdev_get(unsigned int id)
> +{
> +     struct uk_netdev *n;
> +
> +     UK_NETDEV_LIST_FOREACH(n) {
> +             if (n->data->id == id)
> +                     return n;
> +     }
> +     return NULL;
> +}
> +
> +int uk_netdev_configure(struct uk_netdev *dev,
> +             const struct uk_netdev_conf *eth_conf)
> +{
> +     UK_ASSERT(dev);
> +     uk_printd(DLVL_INFO, "Configure device 0x%p\n", dev);
> +
> +     if (eth_conf == NULL) {
> +             dev->data->driver_mode = UK_NETDEV_MODE_POLLING;
> +     } else {
> +             if ((eth_conf->requested_mode & dev->data->supported_modes)
> +                             != 0) {
> +                     dev->data->driver_mode = eth_conf->requested_mode;
> +             } else {
> +                     uk_printd(DLVL_ERR, "Invalid driver mode requested\n");
> +                     return -EINVAL;
> +             }
> +     }
> +
> +     uk_netdev_ip_set(dev, NULL);
> +
> +     return dev->dev_ops->dev_configure(dev);
> +}
> +
> +int uk_netdev_rx_queue_setup(struct uk_netdev *dev, uint16_t rx_queue_id,
> +             const struct uk_netdev_rxconf *rx_conf)
> +{
> +     UK_ASSERT(dev);
> +
> +     if (dev->data->driver_mode != UK_NETDEV_MODE_POLLING) {
> +             if (rx_conf == NULL) {
> +                     uk_printd(DLVL_ERR,
> +                                       "Interrupt-based mode requested with no callback\n");
> +                     return -EINVAL;
> +             }
> +             dev->rx_cb = rx_conf->rx_cb;
> +     }
> +
> +     return dev->dev_ops->rx_queue_setup(dev, rx_queue_id, rx_conf);
> +}
> +

I would provide as many of the following straight forward declarations
as 'static inline'.

> +int uk_netdev_tx_queue_setup(struct uk_netdev *dev, uint16_t tx_queue_id,
> +             const struct uk_netdev_txconf *tx_conf)
> +{
> +     UK_ASSERT(dev);
> +     return dev->dev_ops->tx_queue_setup(dev, tx_queue_id, tx_conf);
> +}
> +
> +int uk_netdev_start(struct uk_netdev *dev)
> +{
> +     UK_ASSERT(dev);
> +     return dev->dev_ops->dev_start(dev);
> +}
> +
> +void uk_netdev_stop(struct uk_netdev *dev)
> +{
> +     UK_ASSERT(dev);
> +     dev->dev_ops->dev_stop(dev);
> +}
> +
> +int uk_netdev_mac_addr_set(struct uk_netdev *dev, struct ether_addr *mac_addr)
> +{
> +     UK_ASSERT(dev);
> +     memcpy(&dev->data->mac_addr, mac_addr, sizeof(struct ether_addr));
> +     return 0;
> +}
> +
> +int uk_netdev_ip_set(struct uk_netdev *dev, uk_ip_info_t *ip)
> +{
> +     UK_ASSERT(dev);
> +     dev->data->ip_info = ip;
> +     return 0;
> +}
> +
> +uk_ip_info_t *uk_netdev_ip_get(struct uk_netdev *dev)
> +{
> +     UK_ASSERT(dev);
> +     return dev->data->ip_info;
> +}
> +
> +int uk_netdev_mtu_set(struct uk_netdev *dev, uint16_t mtu)
> +{
> +     UK_ASSERT(dev);
> +     dev->data->mtu = mtu;
> +     return 0;
> +}

Maybe a specific MTU is not supported by a driver? so, you may want to
let the driver do this operation and return an appropriate return code.

There should also be an API interface to get the current MTU form the
driver.

Will add a MTU getter and mtu_set in dev_opts.
 
> +
> +uint16_t uk_netdev_rx(struct uk_netdev *dev, void *data, uint16_t max_len)
> +{
> +     UK_ASSERT(dev);
> +     return dev->rx_pkt(dev, data, max_len);
> +}
> +
> +uint16_t uk_netdev_tx(struct uk_netdev *dev, void *data, uint16_t len)
> +{
> +     UK_ASSERT(dev);
> +     return dev->tx_pkt(dev, data, len);
> +}
> +
> +static void _uk_netdev_ctor(void)
> +{
> +     UK_TAILQ_INIT(&uk_netdev_list);
> +     netdev_count = 0;
> +}
>

Thanks,

Simon

Thanks again for all the feedback!
I'll come up with the second version as soon as possible.

Razvan 
_______________________________________________
Minios-devel mailing list
Minios-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/minios-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.