[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] bnx2x DMA mapping errors cause iscsi problems



Hi,

We are running open source Xen 4.1.4 on Debian 7.4 amd64
HW is HP BL460c Gen8. Nic is Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)

We are experiencing sporadic network blackouts after a few days on eth4 which is used for iscsi block storage for the VMs. VMs file sytems are switching to read-only and so we loose all the VMs. Then we have to reboot the hypervisor to regain network connectivity.

MTU = 9000 on eth4.

We were using Broadcom kernel driver from Debian 7.4 official kernel (3.2.54-2). Now we've updated with the latest driver published on Broadcom website, we have some more login :

[1200406.207855] [bnx2x_alloc_rx_data:1009(eth4)]Can't map rx data
[1200406.207978] [bnx2x_alloc_rx_data:1009(eth4)]Can't map rx data
.....

Here are bnx2x module versions we tried :
Debian 7.4 stock kernel : 1.70.30
Broadcom website (latest) : 1.78.58
Broadcom Firmware : Latest from HP BROADCOM 2.9.26 CP021537 package

Looking at bnx2x source code (bnx2x_cmn.c), it appears this error is caused by a DMA mapping error for rx buffers (memory leak ?)

static int bnx2x_alloc_rx_data(struct bnx2x *bp, struct bnx2x_fastpath *fp,
u16 index, gfp_t gfp_mask)
{
....
mapping = dma_map_single(&bp->pdev->dev, data + NET_SKB_PAD,
fp->rx_buf_size,
DMA_FROM_DEVICE);

if (unlikely(dma_mapping_error(&bp->pdev->dev, mapping))) {

#ifdef BCM_HAS_BUILD_SKB /* BNX2X_UPSTREAM */
bnx2x_frag_free(fp, data);
#else
dev_kfree_skb_any(data);
#endif
BNX2X_ERR("Can't map rx data\n");
return -ENOMEM;
}
...

We found several other references of people suffering from the same problem.
Here are two threads concerning Citrix XenServer 6 showing the exact same problem on BL460C G6 and Gen8
http://discussions.citrix.com/topic/324343-xenserver-61-bnx2x-sw-iommu/
http://discussions.citrix.com/topic/333281-xenserver-62-crash-bug/page-3

It seems from other references that most of the time, similar problems occuring with this driver are related to virtualized environments.

We found a rather old workaround from VMWare. The solution is to reduce the number of queues used by the driver (num_queues parameter). Unfortunately, the problem still occurs but after a longer period.

There are threads in this mailing list related to DMA allocation in Xen ( http://markmail.org/message/uududlw5w6xlqcp2 ) but I'm not able to understand if those threads are related to our problem.

Thanks for your help,

Patrick

.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.