[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Dom0 crash with apache bench (ab)





On Mon, Sep 14, 2015 at 5:20 PM, Ian Campbell <ian.campbell@xxxxxxxxxx> wrote:
On Mon, 2015-09-14 at 14:40 +0200, Christoffer Dall wrote:
> On Fri, Jul 31, 2015 at 03:17:56PM +0200, Christoffer Dall wrote:
> > On Fri, Jul 31, 2015 at 12:28 PM, David Vrabel <david.vrabel@xxxxxxxxxx
> > >
> > wrote:
> >
> > > On 31/07/15 11:24, Stefano Stabellini wrote:
> > > > This is a Linux Dom0 crash on x86 (Dell PowerEdge R320, Xeon E5
> > > > -2450),
> > > > CC'ing relevant people. As you can see from the links below the
> > > > crash
> > > > is:
> > > >
> > > > [ 253.619326] Call Trace:
> > > > [ 253.619330] <IRQ>
> > > > [ 253.619332] [<ffffffff815d7c25>] ? skb_copy_ubufs+0xa5/0x230
> > > > [ 253.619347] [<ffffffff815e8525>]
> > > > __netif_receive_skb_core+0x6f5/0x940
> > > > [ 253.619353] [<ffffffff815e8788>] __netif_receive_skb+0x18/0x60
> > > > [ 253.619360] [<ffffffff815e87f8>]
> > > > netif_receive_skb_internal+0x28/0x90
> > > > [ 253.619366] [<ffffffff815e91f5>] napi_gro_frags+0x125/0x1a0
> > > > [ 253.619378] [<ffffffffa01b1173>]
> > > > mlx4_en_process_rx_cq+0x753/0xb50
> > > [mlx4_en]
> > > > [ 253.619387] [<ffffffffa01b1657>] mlx4_en_poll_rx_cq+0x97/0x160
> > > [mlx4_en]
> > >
> > > What makes you think this is Xen specific? I suggest raising this
> > > the
> > > the mlx4 maintainers.
> > >
> > >
> > Linux native and KVM guests (same hw, same kernel version+config) run
> > just
> > fine under the same workload.
> >
> Ping?
>
> From the fact that bare-metal and KVM works fine with this hardware I
> still think it's reasonable to assume that it's a Xen issue and not a
> mlx4 issue.
>
> Is this completely flawed?

My (somewhat educated) guess is that this is to do with the difference
between (pseudo-)physical addresses and machine (AKA real-physical)
addresses when running under Xen.

The way this often shows up is in drivers which do not make correct use of
the kernels DMA APIs but which happen to work on native x86 because
physical==bus address on x86.

Sometimes booting natively with 'iommu=soft swiotlb=force' can expose these
sorts of issues.

Indeed it does, on both v4.0 and v4.3-rc2.
Â

You are running 64-bit so I don't think the recent "config: Enable
NEED_DMA_MAP_STATE by default when SWIOTLB is selected" is likely to be
relevant (it's already unconditionally on for 64-bit).

The trace appears to be on rx from a physical nic, there shouldn't be any
magic Xen stuff (granted pages etc) getting themselves into that path at
all. If it were tx then maybe it might be an issue with foreign pages. In
any case I think you are able to repro with just dom0, i.e. never having
started a domU, is that right?


Yes, I can reproduce on Dom0.

I will send this to the Mellanox people.

Thanks,
-ChristofferÂ

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.