[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] kernel panic in skb_copy_bits

To: Eric Dumazet <eric.dumazet@xxxxxxxxx>, Ian Campbell <Ian.Campbell@xxxxxxxxxx>
From: Alex Bligh <alex@xxxxxxxxxxx>
Date: Thu, 04 Jul 2013 13:57:16 +0100
Cc: Frank Blaschka <frank.blaschka@xxxxxxxxxx>, zheng.x.li@xxxxxxxxxx, Alex Bligh <alex@xxxxxxxxxxx>, Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>, netdev@xxxxxxxxxxxxxxx, Joe Jin <joe.jin@xxxxxxxxxx>, linux-kernel@xxxxxxxxxxxxxxx, Xen Devel <xen-devel@xxxxxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>, "David S. Miller" <davem@xxxxxxxxxxxxx>
Delivery-date: Thu, 04 Jul 2013 12:57:49 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>



--On 4 July 2013 03:12:10 -0700 Eric Dumazet <eric.dumazet@xxxxxxxxx> wrote:

It looks like a typical COW issue to me.

If the page content is written while there is still a reference on this
page, we should allocate a new page and copy the previous content.

And this has little to do with networking.


I suspect this would get more attention if we could make Ian's case
below trigger (a) outside Xen, (b) outside networking.

        memset(buf, 0xaa, 4096);
        write(fd, buf, 4096)
        memset(buf, 0x55, 4096);
(where fd is O_DIRECT on NFS) Can result in 0x55 being seen on the wire
in the TCP retransmit.


We know this should fail using O_DIRECT+NFS. We've had reports suggesting
it fails in O_DIRECT+iSCSI. However, that's been with a kernel panic
(under Xen) rather than data corruption as per the above.

Historical trawling suggests this is an issue with DRDB (see Ian's
original thread from the mists of time).

I don't quite understand why we aren't seeing corruption with standard
ATA devices + O_DIRECT and no Xen involved at all.

My memory is a bit misty on this but I had thought the reason why
this would NOT be solved simply by O_DIRECT taking a reference to
the page was that the O_DIRECT I/O completed (and thus the reference
would be freed up) before the networking stack had actually finished
with the page. If the O_DIRECT I/O did not complete until the
page was actually finished with, we wouldn't see the problem in the
first place. I may be completely off base here.

--
Alex Bligh

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

References:
- Re: [Xen-devel] kernel panic in skb_copy_bits
  - From: Joe Jin
- Re: [Xen-devel] kernel panic in skb_copy_bits
  - From: Ian Campbell
- Re: [Xen-devel] kernel panic in skb_copy_bits
  - From: Joe Jin
- Re: [Xen-devel] kernel panic in skb_copy_bits
  - From: Ian Campbell
- Re: [Xen-devel] kernel panic in skb_copy_bits
  - From: Eric Dumazet
- Re: [Xen-devel] kernel panic in skb_copy_bits
  - From: Ian Campbell
- Re: [Xen-devel] kernel panic in skb_copy_bits
  - From: Eric Dumazet

Prev by Date: Re: [Xen-devel] [PATCH v5] Xen PV Device
Next by Date: Re: [Xen-devel] XSAVE/XRSTOR crash resurgence in 4.3
Previous by thread: Re: [Xen-devel] kernel panic in skb_copy_bits
Next by thread: Re: [Xen-devel] kernel panic in skb_copy_bits
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.