[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] [PATCH] Network Checksum Removal
Currently in Xen, interdomain communication needlessly wastes CPU cycles calculating and verifying TCP/UDP checksums. This is unnecessary, as the possibility of packet corruption between domains is miniscule (and can be detected in memory via ECC). Also, domU's are unable to take advantage of any adapter hardware checksum offload capabilities when transmitting packets outside of the system. This patch removes the inter-xen network checksums by using the existing Linux hardware checksum offload infrastructure. This decreased the changes needed by this patch, and enabled me to easily use hardware checksum on the physical devices. Here is how the traffic flow now works (generically): Traffic generated by dom0 will not do the TCP/UDP checksums and will notify domU this via the csum bit in netif_rx_response_t. domU will check for the csum bit on each incoming packet, and if not enabled it will verify the checksum. Traffic generated externally, if rx hardware checksum is available and enabled, then dom0 will notify domU that it is unnecessary to validate this checksum (providing the checksum is valid) by enabling the csum bit. If domU is not notified that it is unnecessary to vaildate the checksum, then domU will do it. Traffic generated by domU will not do the TCP/UDP checksums and will notify dom0 this via the csim bit in netif_tx_request_t. dom0 will check for the csum bit on each incoming packet, and if enabled it will calculate the necessary bits for hardware checksum offload (skb->csum, which is the offset to insert the checksum). It also sets skb->ip_summed = CHECKSUM_UNNECESSARY; skb->flags |= SKB_FDW_NO_CSUM; ip_summed is set in the case that the packet is destined for dom0, which will prevent dom0 from checking the TCP/UDP checksum. Unfortunately, this flag is stomped on by both routing and bridging. So I added a new skb field and a new flag, SKB_FDW_NO_CSUM. This field is checked on transmission and corrects the fields that have been modified by the bridging/routing code. Once these fields have been corrected, the adapter (if tx csum able) or stack (via skb_checksum_help()) will calculate the TCP/UDP checksum. Performance: I ran the following test cases with netperf3 TCP_STREAM, and get the following boosts (using bridging): domU->dom0 500Mbps dom0->domU 10Mbps domU->remote host none domU->domU 70Mbps Note: I have a small bridging patch which increases dom0 throughput. I am in the process of having it accepted into the Linux kernel. I currently do not have CPU utilization numbers (where the real boost of this patch would be), and I do not have throughput numbers for routing/nat. Also, I added the ability to enable/disable checksum offload via the ethtool command. Signed-off-by: Jon Mason <jdmason@xxxxxxxxxx> --- ../xen-unstable-pristine/xen/include/public/io/netif.h 2005-05-04 22:20:10.000000000 -0500 +++ xen/include/public/io/netif.h 2005-05-18 12:05:41.000000000 -0500 @@ -12,7 +12,8 @@ typedef struct { memory_t addr; /* 0: Machine address of packet. */ MEMORY_PADDING; - u16 id; /* 8: Echoed in response message. */ + u16 csum:1; + u16 id:15; /* 8: Echoed in response message. */ u16 size; /* 10: Packet size in bytes. */ } PACKED netif_tx_request_t; /* 12 bytes */ @@ -29,7 +30,8 @@ typedef struct { typedef struct { memory_t addr; /* 0: Machine address of packet. */ MEMORY_PADDING; - u16 id; /* 8: */ + u16 csum:1; + u16 id:15; /* 8: */ s16 status; /* 10: -ve: BLKIF_RSP_* ; +ve: Rx'ed pkt size. */ } PACKED netif_rx_response_t; /* 12 bytes */ --- ../xen-unstable-pristine/linux-2.6.11-xen-sparse/drivers/xen/netback/netback.c 2005-05-04 22:20:01.000000000 -0500 +++ linux-2.6.11-xen-sparse/drivers/xen/netback/netback.c 2005-05-19 13:25:50.000000000 -0500 @@ -13,6 +13,9 @@ #include "common.h" #include <asm-xen/balloon.h> #include <asm-xen/evtchn.h> +#include <net/ip.h> +#include <linux/tcp.h> +#include <linux/udp.h> #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,0) #include <linux/delay.h> @@ -154,10 +157,14 @@ int netif_be_start_xmit(struct sk_buff * __skb_put(nskb, skb->len); (void)skb_copy_bits(skb, -hlen, nskb->data - hlen, skb->len + hlen); nskb->dev = skb->dev; + nskb->ip_summed = skb->ip_summed; dev_kfree_skb(skb); skb = nskb; } + if (skb->ip_summed > 0) + netif->rx->ring[MASK_NETIF_RX_IDX(netif->rx_resp_prod)].resp.csum = 1; + netif->rx_req_cons++; netif_get(netif); @@ -646,6 +653,18 @@ static void net_tx_action(unsigned long skb->dev = netif->dev; skb->protocol = eth_type_trans(skb, skb->dev); + skb->csum = 0; + if (txreq.csum) { + skb->ip_summed = CHECKSUM_UNNECESSARY; + skb->flags |= SKB_FDW_NO_CSUM; + skb->nh.iph = (struct iphdr *) skb->data; + if (skb->nh.iph->protocol == IPPROTO_TCP) + skb->csum = offsetof(struct tcphdr, check); + if (skb->nh.iph->protocol == IPPROTO_UDP) + skb->csum = offsetof(struct udphdr, check); + } else + skb->ip_summed = CHECKSUM_NONE; + netif->stats.rx_bytes += txreq.size; netif->stats.rx_packets++; --- ../xen-unstable-pristine/linux-2.6.11-xen-sparse/drivers/xen/netback/interface.c 2005-05-04 22:20:09.000000000 -0500 +++ linux-2.6.11-xen-sparse/drivers/xen/netback/interface.c 2005-05-20 10:36:14.000000000 -0500 @@ -159,6 +159,7 @@ void netif_create(netif_be_create_t *cre dev->get_stats = netif_be_get_stats; dev->open = net_open; dev->stop = net_close; + dev->features = NETIF_F_NO_CSUM; /* Disable queuing. */ dev->tx_queue_len = 0; --- ../xen-unstable-pristine/linux-2.6.11-xen-sparse/drivers/xen/netfront/netfront.c 2005-05-04 22:20:11.000000000 -0500 +++ linux-2.6.11-xen-sparse/drivers/xen/netfront/netfront.c 2005-05-20 13:15:39.000000000 -0500 @@ -40,6 +40,7 @@ #include <linux/init.h> #include <linux/bitops.h> #include <linux/proc_fs.h> +#include <linux/ethtool.h> #include <net/sock.h> #include <net/pkt_sched.h> #include <net/arp.h> @@ -287,6 +288,11 @@ static int send_fake_arp(struct net_devi return dev_queue_xmit(skb); } +static struct ethtool_ops network_ethtool_ops = { + .get_tx_csum = ethtool_op_get_tx_csum, + .set_tx_csum = ethtool_op_set_tx_csum, +}; + static int network_open(struct net_device *dev) { struct net_private *np = netdev_priv(dev); @@ -472,6 +478,7 @@ static int network_start_xmit(struct sk_ tx->id = id; tx->addr = virt_to_machine(skb->data); tx->size = skb->len; + tx->csum = (skb->ip_summed) ? 1 : 0; wmb(); /* Ensure that backend will see the request. */ np->tx->req_prod = i + 1; @@ -572,6 +579,9 @@ static int netif_poll(struct net_device skb->len = rx->status; skb->tail = skb->data + skb->len; + if (rx->csum) + skb->ip_summed = CHECKSUM_UNNECESSARY; + np->stats.rx_packets++; np->stats.rx_bytes += rx->status; @@ -966,7 +976,9 @@ static int create_netdev(int handle, str dev->get_stats = network_get_stats; dev->poll = netif_poll; dev->weight = 64; - + dev->features = NETIF_F_IP_CSUM; + SET_ETHTOOL_OPS(dev, &network_ethtool_ops); + if ((err = register_netdev(dev)) != 0) { printk(KERN_WARNING "%s> register_netdev err=%d\n", __FUNCTION__, err); goto exit; --- ../xen-unstable-pristine/linux-2.6.11-xen0/include/linux/skbuff.h 2005-03-02 01:38:38.000000000 -0600 +++ linux-2.6.11-xen0/include/linux/skbuff.h 2005-05-18 12:05:41.000000000 -0500 @@ -37,6 +37,10 @@ #define CHECKSUM_HW 1 #define CHECKSUM_UNNECESSARY 2 +#define SKB_CLONED 1 +#define SKB_NOHDR 2 +#define SKB_FDW_NO_CSUM 4 + #define SKB_DATA_ALIGN(X) (((X) + (SMP_CACHE_BYTES - 1)) & \ ~(SMP_CACHE_BYTES - 1)) #define SKB_MAX_ORDER(X, ORDER) (((PAGE_SIZE << (ORDER)) - (X) - \ @@ -238,7 +242,7 @@ struct sk_buff { mac_len, csum; unsigned char local_df, - cloned, + flags, pkt_type, ip_summed; __u32 priority; @@ -370,7 +374,7 @@ static inline void kfree_skb(struct sk_b */ static inline int skb_cloned(const struct sk_buff *skb) { - return skb->cloned && atomic_read(&skb_shinfo(skb)->dataref) != 1; + return (skb->flags & SKB_CLONED) && atomic_read(&skb_shinfo(skb)->dataref) != 1; } /** --- ../xen-unstable-pristine/linux-2.6.11-xen0/net/core/skbuff.c 2005-03-02 01:38:17.000000000 -0600 +++ linux-2.6.11-xen0/net/core/skbuff.c 2005-05-18 12:05:41.000000000 -0500 @@ -240,7 +240,7 @@ static void skb_clone_fraglist(struct sk void skb_release_data(struct sk_buff *skb) { - if (!skb->cloned || + if (!(skb->flags & SKB_CLONED) || atomic_dec_and_test(&(skb_shinfo(skb)->dataref))) { if (skb_shinfo(skb)->nr_frags) { int i; @@ -352,7 +352,7 @@ struct sk_buff *skb_clone(struct sk_buff C(data_len); C(csum); C(local_df); - n->cloned = 1; + n->flags = skb->flags | SKB_CLONED; C(pkt_type); C(ip_summed); C(priority); @@ -395,7 +395,7 @@ struct sk_buff *skb_clone(struct sk_buff C(end); atomic_inc(&(skb_shinfo(skb)->dataref)); - skb->cloned = 1; + skb->flags |= SKB_CLONED; return n; } @@ -603,7 +603,7 @@ int pskb_expand_head(struct sk_buff *skb skb->mac.raw += off; skb->h.raw += off; skb->nh.raw += off; - skb->cloned = 0; + skb->flags &= SKB_CLONED; atomic_set(&skb_shinfo(skb)->dataref, 1); return 0; --- ../xen-unstable-pristine/linux-2.6.11-xen0/net/core/dev.c 2005-03-02 01:38:09.000000000 -0600 +++ linux-2.6.11-xen0/net/core/dev.c 2005-05-20 10:20:36.000000000 -0500 @@ -98,6 +98,7 @@ #include <linux/stat.h> #include <linux/if_bridge.h> #include <linux/divert.h> +#include <net/ip.h> #include <net/dst.h> #include <net/pkt_sched.h> #include <net/checksum.h> @@ -1182,7 +1183,7 @@ int __skb_linearize(struct sk_buff *skb, skb->data += offset; /* We are no longer a clone, even if we were. */ - skb->cloned = 0; + skb->flags &= ~SKB_CLONED; skb->tail += skb->data_len; skb->data_len = 0; @@ -1236,6 +1237,15 @@ int dev_queue_xmit(struct sk_buff *skb) __skb_linearize(skb, GFP_ATOMIC)) goto out_kfree_skb; + /* If packet is forwarded to a device that needs a checksum and not + * checksummed, correct the pointers and enable checksumming in the + * next function. + */ + if (skb->flags & SKB_FDW_NO_CSUM) { + skb->ip_summed = CHECKSUM_HW; + skb->h.raw = (void *)skb->nh.iph + (skb->nh.iph->ihl * 4); + } + /* If packet is not checksummed and device does not support * checksumming for this protocol, complete checksumming here. */ _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |