Xen project Mailing List

It turned out that both the suspected problems were real problems and the interference betw the two was confusing the debugging. It looks like the max skbuff frags is 18 (65536/page_size + 2) and indeed if the chain of packet fragmets is longer than 19 (the logic probably allows for one extra) it locks up the ring permanently. The other problem was that the ring does indeed get depleted down to the point where the available slots are fewer than the number needed for the current chain of frags. Unfortunately in this case the write is still permitted which overwrites/corrupts freely and things immediately or pretty soon thereafter go kaplooey. To confirm that there is nothing else, I implemented a quick workaround - the chain of frags is never allowed to be longer than 19 and if there aren't enough free slots then the whole chain is dropped. With these two changes all tests always completed and completed correctly. However, just dropping when not enough slots causes excessive pkt loss so slows things randomly and a lot - it should either block or the write should fail with an ENOBUFS flavoured exception. The good news though is that it still works and a lot of the other tricky machinery also works correctly.

Balraj

On Sun, May 26, 2013 at 10:53 AM, Anil Madhavapeddy <anil@xxxxxxxxxx> wrote:

The long chain of 36-byte frags is consistent with the backend dropping it. Does it work better if you restrict the total fragment chains size to just 10 or 11?

The first unexplained packet loss is a real alarm bell though. The entire TCP retransmit code on our stack is just a canary that spots latent bugs elsewhere in the device stack :-)

-anil

On 25 May 2013, at 22:25, Balraj Singh <balraj.singh@xxxxxxxxxxxx> wrote:

In the particular test I am using I write 36 bytes of payload and use the Mirage equivalent of TCP_NODELAY. This works for a bit but then suffers some packet loss (why? TBD) and triggers a rexmit. The retransmitted packet is 1400+ bytes and is made up of a long chain of 36 byte io_pages. I thought that it may be that the ring did not have enough slots to take all the chunks of the pkt. Making the retransmitted pkt be the size of the original write improved it very significantly but it would still fail in the same way, tho less frequently. I'm working on it - I see available txring slots vary, but I havent yet found a case where the slots are fully depleted or down to fewer than chunks that need to be written. I'm still narrowing it down.

This test originally was with 1-byte writes, but that seemed to wedge even before the 1st data packet made it to the wire. This may be because of the limitation Steven mentioned. I think I'm getting close on the 36 byte write test, once this is figured out I'll try it with 1 byte writes again.

Balraj

On Sat, May 25, 2013 at 11:11 AM, Anil Madhavapeddy <anil@xxxxxxxxxx> wrote:

Balraj noticed that a stream of 1-byte writes on a TCP connection would cause netfront/netback to wedge. This is obviously quite unrealistic, but a good regression test.

A quick chat with Steven Smith pointed out that some Linux netbacks had a limit on the number of fragments allowed (based on the skbuff chain limit size). So you might be ending up with a situation where the backend drops the entire set of fragments, and the frontend is retransmitting all the time.

So if you modify our frontend to limit the fragment size to ~10 or so for any given packet, that might help. On the other hand, if you're doing writes with a TCP segment size of 1, but still only 3-4 fragments (for the Ethernet/IP/TCP headers), then we have some other bug. What does the Netif request look like, Balraj?

-anil

Re: one-byte TCP writes wedging