[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] netif.h clarifications



> -----Original Message-----
> From: Roger Pau Monne [mailto:roger.pau@xxxxxxxxxx]
> Sent: 20 May 2016 13:34
> To: Paul Durrant
> Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx; Wei Liu; David Vrabel
> Subject: Re: netif.h clarifications
> 
> On Fri, May 20, 2016 at 12:55:16PM +0100, Paul Durrant wrote:
> > > -----Original Message-----
> > [snip]
> > > > > And then I've also seen some issues with TSO/LRO (GSO in Linux
> > > > > terminology)
> > > > > when using packet forwarding inside of a FreeBSD DomU. For
> example in
> > > the
> > > > > following scenario:
> > > > >
> > > > >                                    +
> > > > >                                    |
> > > > >    +---------+           +--------------------+           +----------+
> > > > >    |         |A         B|       router       |C         D|          |
> > > > >    | Guest 1 +-----------+         +          +-----------+ Guest 2  |
> > > > >    |         |  bridge0  |         |          |  bridge1  |          |
> > > > >    +---------+           +--------------------+           +----------+
> > > > >    172.16.1.67          172.16.1.66|   10.0.1.1           10.0.1.2
> > > > >                                    |
> > > > >              +--------------------------------------------->
> > > > >               ssh 10.0.1.2         |
> > > > >                                    |
> > > > >                                    |
> > > > >                                    |
> > > > >                                    +
> > > > >
> > > > > All those VMs are inside of the same host, and one of them acts as a
> > > gateway
> > > > > between them because they are on two different subnets. In this
> case
> > > I'm
> > > > > seeing issues because even though I disable TSO/LRO on the "router"
> at
> > > > > runtime, the backend doesn't watch the xenstore feature flag, and
> never
> > > > > disables it from the vif on the Dom0 bridge. This causes LRO packets
> > > > > (non-fragmented) to be received at point 'C', and then when the
> > > gateway
> > > > > tries to inject them into the other NIC it fails because the size is
> greater
> > > > > than the MTU, and the "no fragment" bit is set.
> > > > >
> > > >
> > > > Yes, GSO cannot be disabled/enabled dynamically on the netback tx
> side
> > > (i.e. guest rx side) so you can't turn it off. The Windows PV driver 
> > > leave sit
> on
> > > all the time and does the fragmentation itself if the stack doesn't want
> GRO.
> > > Doing the fragmentation in the frontend makes more sense anyway since
> > > the cpu cycles are burned by the VM rather than dom0 and so it scales
> > > better.
> > >
> > > The weird thing is that GSO can usually be dinamically enabled/disabled
> on
> > > all network cards, so it would make sense to allow netfront to do the
> same.
> > > I guess the only way is to reset the netfront/netback connection when
> > > changing this property.
> >
> > Or implement GSO fragmentation in netfront, as I did for Windows.
> >
> > >
> > > > > How does Linux deal with this situation? Does it simply ignore the no
> > > > > fragment flag and fragments the packet? Does it simply inject the
> packet
> > > to
> > > > > the other end ignoring the MTU and propagating the GSO flag?
> > > > >
> > > >
> > > > I've not looked at the netfront rx code but I assume that the large
> packet
> > > that is passed from netback is just marked as GSO and makes its way to
> > > wherever it's going (being fragmented by the stack if it's forwarded to an
> > > interface that doesn't have the TSO flag set).
> > >
> > > But it cannot be fragmented if it has the IP "don't fragment" flag set.
> > >
> >
> > Huh? This is GSO we're talking about here, not IP fragmentation. They are
> not the same thing.
> 
> Well, as I understand it GSO works by offloading the fragmentation to the
> NIC, so the NIC performs the TCP/IP fragmentation itself. In which case I
> think it's relevant, because if you receive a 64KB GSO packet with the
> "don't fragment" IP flags set, you should not fragment it AFAIK, even if
> it's a GSO packet.

I don't believe that is the case. The DF bit is not relevant because you are 
not fragmenting an IP packet, you are taking a large TCP segment and splitting 
it into MSS sized segments.

> 
> I think this is all caused because there's no real media here, it's all
> bridges and virtual network interfaces on the same host. The bridge has no
> real MTU, but on the real world the packet would be fragmented the
> moment it
> hits the wire.
> 

Yes. MTU is not really relevant until a bit of wire gets involved.

> OTOH, when using the PV net protocol we are basically passing mbufs (or
> skbs
> in Linux world) around, so is it expected that the fragmentation is going to
> be performed when the packet is put on a real wire that has a real MTU, so
> the last entity that touches it must do the fragmentation?
> 

Let's call it 'segmentation' to avoid getting confused again... Yes, if 
something along the path does not know about large TCP segments (a.k.a. GSO 
packets) then the segmentation must be done at that boundary. So, for VM <-> VM 
traffic on the same host you really want everything to know about GSO packets 
so that the payload never has to be segmented.

> IMHO, this apporach seems very dangerous, and we are breaking the end-
> to-end
> principle.

Why? What principle are we breaking? As long as the MSS information is carried 
in the packet metadata then it can always be segmented at the point where the 
packet is handed so something that either doesn't know how to handle that 
metadata, or when it needs to go on a bit of wire.

> 
> > > What I'm seeing here is that at point C netback passes GSO packets to the
> > > "router" VM, this packets have not been fragmented, and then when the
> > > router
> > > VM tries to forward them to point B it has to issue a "need
> fragmentation"
> > > icmp message because the MTU of the interface is 1500 and the IP
> header
> > > has
> > > the "don't fragment" set (and of course the GSO chain is bigger than
> 1500).
> > >
> >
> > That's presumably because they've lost the GSO information somewhere
> (i.e. the flag saying it's GSO and the MSS).
> 
> AFAICT, I'm correctly passing the GSO information around.
> 
> > > Is Linux ignoring the "don't fragment" IP flag here and simply fragmenting
> > > it?
> >
> > Yes. As I said GSO != IP fragmentation; the DF bit has no bearing on it. You
> do need the GSO information though.
> 
> I'm sorry but I don't think I'm following here. GSO basically offloads
> IP/TCP fragmentation to the NIC, so I don't see why the DF bit is not
> relevant here. The DF bit is clearly not relevant if it's a locally
> generated packet, but it matters if it's a packet comming from another
> entity.
> 

The DF bit says whether the *IP packet* is being fragmented. IP packets are not 
fragmented then TCP segmentation offload is done.

> In the diagram that I've posted above for example, if you change bridge0
> with a physical media, and the guests at both ends want to stablish a SSH
> connection the fragmentation would then be done at point B (for packets
> going from guest 2 to 1), which seems completely wrong to me for packets
> that have the DF bit set, because the fragmentation would be done by the
> _router_, not the sender (which AFAICT is what the DF flag is trying to
> avoid).
> 

It would be wrong if IP packets are being fragmented, but they are not.

  Paul

> Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.