Xen project Mailing List

[Xen-devel] checksum `offload'

From: Ian Jackson <ian@xxxxxxxxxxxxxxxxxxxxxxxx>

Date: Wed, 22 Mar 2006 18:40:35 +0000

Delivery-date: Wed, 22 Mar 2006 18:41:53 +0000

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

I wrote to xen-users a few weeks ago about difficulty I was having with the TCP checksum offload feature[1]. The xen-users list seems to have a fair few people who are having difficulty with this optimisation. In order to debug my problem, I ended up modifying the Xen 3.0.1 network backend driver[2]. Since I'm suggesting a change to the behaviour and default configuration, it seems appropriate to post here: Background: Hardcoded in the Xen 3.0.1 network backend driver (in the supplied patch to Linux 2.6.12) is the notion that packets `outbound' through the network backend (destined for a frontend in another guest) do not ever need to be checksummed. I can't find any design documentation which explains this decision, but I presume that this is the result of the following chain of reasoning about virtual network interfaces: 1. The backend is in dom0 and the frontend is in some domU. 2. domU does not have and use any physical network hardware. 3. The domU does not act as a router-encapsulator. (eg, run a VPN client, tunnel endpoint, etc. etc.) 4. The domU will always know correctly whether the packet originated from dom0 (checksum not needed, not calculated) or from some other machine and just came via domU (checksum calculated and needed). 5. Therefore all packets leaving dom0 for domU will terminate on that domU and do not need to be checksummed. (It is possible that there's something fancy happening in the frontend; I briefly looked at that code but didn't take the time to understand it fully.) All of the assumptions 1-4 can be false. 1-3 can be false in many network topologies and the system should not assume that the network topology is as set up by the provided default configuration scripts. 4 is apparently false in my case and caused the symptoms I saw. While Xen allows the frontend interface's `transmit checksum offload' (ie, for packets leaving that guest) to be enabled and disabled from userland, so that checksum calculation can be suprresed, it does not allow the `receive checksum offload' (for packets entering the guest) to be controlled, and it does not allow the backend's checksum processing to be enabled and disabled (in 3.0.1, at least). Some observations: In the general case, it is not possible to determine whether any particular packet needs checksum processing (generation, outbound, or checking, inbound) without knowledge of the network topology and configuration. This network topology and configuration could be very complex, as many of the guests supported by Xen have very sophisticated (not to say dangerous!) mixed-layer packet routing and mangling capabilities; additionally, Xen guests (including dom0 and domU) may well contain instances of routers or encapsulators which will further complicate the topology. Therefore, it is not possible to encode rules for correct behaviour in the code for Xen's virtual network devices. The correct behaviour can only be determined by the network configuration scripts which are also responsible for establishing the desired network topology. Ie, the behaviour must be configurable from userland. In many (most?) scenarios, checksums cannot safely be suppressed for any significant proportion of the traffic. If the guests are strongly isolated with their own filesystems and the purpose is providing multiple largely-independent hardware platforms, guest-guest communication will be relatively rare, and of course communications from one guest to the internet at large must be checksummed. The suppression is only useful when a large amount of network traffic has the different guests as endpoints; the most likely scenario is one where the guests share `network' filesystems from dom0 - but this is not the default configuration with the supplied scripts, and doing it safely involves significant effort to ensure that the fs traffic is protected from interference. Ie, the checksum offload should be disabled by default. It's probably too hard to write sensible rules, or provide a sensible mechanism, to allow different packets traversing the same interface to be treated differently. The administrator will probably want to control the checksumming via iptables rules, routing tables, or other normal host-side mechanisms and Linux's packet-handling system is not ideally suited for this AFAIAA. So, I conclude that: * Checksum suppression for virtual network backends should not be done with NETIF_F_NO_CSUM but with NETIF_F_IP_CSUM or the like, as for the frontends. * Any code in the frontend that attempts to decide whether the peer for a packet is the backend guest itself or some other machine further away should be removed. * Checksum suppression control with ethtool -K should be supported both for outbound and inbound packets on both frontend and backend devices. * The default should have checksum suppression enabled. * Ideally, there would be example scripts which provide guest domains with a set of eth1's on a private entirely-virtual network, all of whose interfaces have checksums suppressed, and which does not exchange packets with the wider Internet. This could be used for intra-system NFS, etc. Thanks, Ian. [1] http://lists.xensource.com/archives/html/xen-users/2006-03/msg00135.html and the subsequent thread. [2] http://lists.xensource.com/archives/html/xen-users/2006-03/msg00159.html _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.