[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] netif.h clarifications



On Fri, May 20, 2016 at 12:55:16PM +0100, Paul Durrant wrote:
> > -----Original Message-----
> [snip]
> > > > And then I've also seen some issues with TSO/LRO (GSO in Linux
> > > > terminology)
> > > > when using packet forwarding inside of a FreeBSD DomU. For example in
> > the
> > > > following scenario:
> > > >
> > > >                                    +
> > > >                                    |
> > > >    +---------+           +--------------------+           +----------+
> > > >    |         |A         B|       router       |C         D|          |
> > > >    | Guest 1 +-----------+         +          +-----------+ Guest 2  |
> > > >    |         |  bridge0  |         |          |  bridge1  |          |
> > > >    +---------+           +--------------------+           +----------+
> > > >    172.16.1.67          172.16.1.66|   10.0.1.1           10.0.1.2
> > > >                                    |
> > > >              +--------------------------------------------->
> > > >               ssh 10.0.1.2         |
> > > >                                    |
> > > >                                    |
> > > >                                    |
> > > >                                    +
> > > >
> > > > All those VMs are inside of the same host, and one of them acts as a
> > gateway
> > > > between them because they are on two different subnets. In this case
> > I'm
> > > > seeing issues because even though I disable TSO/LRO on the "router" at
> > > > runtime, the backend doesn't watch the xenstore feature flag, and never
> > > > disables it from the vif on the Dom0 bridge. This causes LRO packets
> > > > (non-fragmented) to be received at point 'C', and then when the
> > gateway
> > > > tries to inject them into the other NIC it fails because the size is 
> > > > greater
> > > > than the MTU, and the "no fragment" bit is set.
> > > >
> > >
> > > Yes, GSO cannot be disabled/enabled dynamically on the netback tx side
> > (i.e. guest rx side) so you can't turn it off. The Windows PV driver leave 
> > sit on
> > all the time and does the fragmentation itself if the stack doesn't want 
> > GRO.
> > Doing the fragmentation in the frontend makes more sense anyway since
> > the cpu cycles are burned by the VM rather than dom0 and so it scales
> > better.
> > 
> > The weird thing is that GSO can usually be dinamically enabled/disabled on
> > all network cards, so it would make sense to allow netfront to do the same.
> > I guess the only way is to reset the netfront/netback connection when
> > changing this property.
> 
> Or implement GSO fragmentation in netfront, as I did for Windows.
>
> > 
> > > > How does Linux deal with this situation? Does it simply ignore the no
> > > > fragment flag and fragments the packet? Does it simply inject the packet
> > to
> > > > the other end ignoring the MTU and propagating the GSO flag?
> > > >
> > >
> > > I've not looked at the netfront rx code but I assume that the large packet
> > that is passed from netback is just marked as GSO and makes its way to
> > wherever it's going (being fragmented by the stack if it's forwarded to an
> > interface that doesn't have the TSO flag set).
> > 
> > But it cannot be fragmented if it has the IP "don't fragment" flag set.
> > 
> 
> Huh? This is GSO we're talking about here, not IP fragmentation. They are not 
> the same thing.

Well, as I understand it GSO works by offloading the fragmentation to the 
NIC, so the NIC performs the TCP/IP fragmentation itself. In which case I 
think it's relevant, because if you receive a 64KB GSO packet with the 
"don't fragment" IP flags set, you should not fragment it AFAIK, even if 
it's a GSO packet.

I think this is all caused because there's no real media here, it's all 
bridges and virtual network interfaces on the same host. The bridge has no 
real MTU, but on the real world the packet would be fragmented the moment it 
hits the wire.

OTOH, when using the PV net protocol we are basically passing mbufs (or skbs 
in Linux world) around, so is it expected that the fragmentation is going to 
be performed when the packet is put on a real wire that has a real MTU, so 
the last entity that touches it must do the fragmentation?

IMHO, this apporach seems very dangerous, and we are breaking the end-to-end 
principle.

> > What I'm seeing here is that at point C netback passes GSO packets to the
> > "router" VM, this packets have not been fragmented, and then when the
> > router
> > VM tries to forward them to point B it has to issue a "need fragmentation"
> > icmp message because the MTU of the interface is 1500 and the IP header
> > has
> > the "don't fragment" set (and of course the GSO chain is bigger than 1500).
> > 
> 
> That's presumably because they've lost the GSO information somewhere (i.e. 
> the flag saying it's GSO and the MSS).

AFAICT, I'm correctly passing the GSO information around.

> > Is Linux ignoring the "don't fragment" IP flag here and simply fragmenting
> > it?
> 
> Yes. As I said GSO != IP fragmentation; the DF bit has no bearing on it. You 
> do need the GSO information though.

I'm sorry but I don't think I'm following here. GSO basically offloads 
IP/TCP fragmentation to the NIC, so I don't see why the DF bit is not 
relevant here. The DF bit is clearly not relevant if it's a locally 
generated packet, but it matters if it's a packet comming from another 
entity.

In the diagram that I've posted above for example, if you change bridge0 
with a physical media, and the guests at both ends want to stablish a SSH 
connection the fragmentation would then be done at point B (for packets 
going from guest 2 to 1), which seems completely wrong to me for packets 
that have the DF bit set, because the fragmentation would be done by the 
_router_, not the sender (which AFAICT is what the DF flag is trying to 
avoid).

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.