[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] checksum `offload'
Nivedita Singhvi writes ("Re: [Xen-devel] checksum `offload'"): > Excellent and timely summary. I just started looking into > the offload problem for VLANs. Jon Mason and Jim Dykman > generated a patch for the IPSec environment issue, but > due to concerns about whether it would be acceptable > upstream, this hasn't yet been blessed. I'd really like > to look at that bug in a wider context with many of the > issues you just specified addressed, but this was going > to be post 3.0.2 and distro release happening. Would it be better to disable this feature in 3.0.2 in the meantime ? Just a suggestion. When I first encountered this problem I naturally searched the xen-users archives and it seems to be causing trouble for a fair few people and the ethtool -K rune is being handed around as folklore amonst the poor unwashed, therr (although of course it doesn't always work). > > [various assumptions, including:] > > 3. The domU does not act as a router-encapsulator. (eg, > > run a VPN client, tunnel endpoint, etc. etc.) > > At the point this was done, there was not support for > a different model (backend in dom0, frontend in domU). > It was assumed to be the traffic model. My assumption no.3 could still have been violated easily, surely, by running a VPN client in a domU which the dom0 uses for some traffic ? VPN packets would leave dom0 for domU via the virtual interface, be encapsulated and encrypted there (bad checksum and all), and be routed back out via dom0. The eventual receiving system would decrypt it and find the checksum was wrong. I assume no-one has done that, or if they did they've noticed it doesn't work and have tried something else. Given the still rather high prevalance of weird and strange VPN endpoint programs, wanting to encapsulate one in a domU isn't that silly an idea. Likewise with IPv6-over-IPv4 tunnelling and other such things. > > While Xen allows the frontend interface's `transmit checksum offload' > > (ie, for packets leaving that guest) to be enabled and disabled from > > userland, so that checksum calculation can be suprresed, it does not > > allow the `receive checksum offload' (for packets entering the guest) > > to be controlled, and it does not allow the backend's checksum > > processing to be enabled and disabled (in 3.0.1, at least). > > Since I believe we only initiate for outgoing, suppressing > the offload on the transmit on DomU should be enough to > bypass this behaviour(?). I don't understand the word `initiate' in this context. Do you mean to refer to which endpoint initiaties the traffic flow ? That doesn't seem relevant and is in any case not even necessarily a meaningful context in IP (the modern prevalance of NAT and stateful firewalling notwithstanding). Suppressing the offload on the transmit in domU is not sufficient. I found that it was necessary to suppress the offload (ie, suppress the `optimisation' away of the checksum calculation, ie actually calculate the checksum) on the transmit in dom0, which can only be done with a source code patch. This must have been because the machinery for suppressing the checksum _checking_ on the _receive_ in domU wasn't working. I haven't read the frontend driver code but if the backend code ever works at all there _must_ be some such suppression arrangements. It seems very likely to me that these arrangements for suppressing receive checksum checking will sometimes suppress the checksum inappropriately. After all, the information needed to make a correct decision is not available. In my case the checking was mistakenly not suppressed, so the packets were rejected by the domU; but in another case the checking might be mistakenly suppressed so that corrupted packets from outside the physical host might be accepted unquestioned by a domU. It seems quite possible to me that this bug does in fact exist in my own setup and I can only hope that it doesn't bite me somehow with corrupted data. (If I were more worried I'd patch the frontend driver too to remove the offload feature.) > Deferring the checksum to dom0 [Assumption = dom0 is where > it reaches the physical hw] where it can be offloaded > to the real hardware is not a bad idea - expected to be a > non-trivial performance boost. Yes, I can see that that might be useful. But it's very complicated: If you want to do this I think you have to add a flag to the packet as it crosses the domU<->dom0 interface which indicates whether the checksum has been suppressed. This is because otherwise the kernel with the actual hardware will not know to instruct the hardware to compute and insert the checksum, since it will think that the checksum is already correct. There are three possibilities: 1. `Transmitter' has not calculated the checksum; the `receiver' must do so if the packet is to leave via another interface (or arrange that the onward interface offload does so). 2. Packet was received from another physical host by the virtual interface `transitter' and the `transmitter' (or the incoming other interface offload) has already checked the checksum, so the `receiver' need not do so; the `receiver' may assume that the packet checksum is correct so that nothing special needs to be done if the packet will leave via another interface. 3. Packet checksum is supposed to be valid but must be checked by the `receivier'. This information needs to be correctly propagated through the in-kernel routing system - and arrangements need to me made for the checksum to be checked/computed/recomputed if (eg) iptables rules need values of checksum-covered fields, or modify them. Note that in principle these considerations apply separately to each checksum in the header: a UDP packet inside IPv4 inside an ethernet frame has several checksums, some of which are transparently passed through by (say) dom0 and some of which are checked and recomputed - and the behaviour depends on whether the relay kernel (dom0, probably) is acting as a switch, router, NAPT, or something even more horrid. With knowledge of the topology, it might be possible to arrange that these kinds of decisions don't need a flag to accompany the dom0<->domU packet transmission, but that's not the hard part: the hard part is threading the `must still calculate checksum on this' note through the kernel's routing/bridgeing system so that it knows to overwrite the correct subset of the checksums. It is not safe to always overwrite the checksums unless they were checked earlier, because that risks fixing up the checksum(s) on already damaged packets. > > * The default should have checksum suppression enabled. > > Agreed. Oh dear, I meant `disabled'. That is, the checksums should be calculated and checked `normally'. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |