Xen project Mailing List

Re: [Xen-devel] Network blocked after sending several packets larger than 128 bytes when using Driver Domain

To: openlui <openlui@xxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>

From: Zoltan Kiss <zoltan.kiss@xxxxxxxxxx>

Date: Thu, 19 Mar 2015 16:48:01 +0000

Delivery-date: Thu, 19 Mar 2015 16:48:08 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 19/03/15 03:40, openlui wrote:

Hi, all:

I am trying to use a HVM with PCI pass-through NIC as network driver domain. 
However, when I send packets whose size are larger than 128 bytes from DomU 
using pkt-gen tools, after several seconds, the network between driver domain 
and destination host will be blocked.

The networking structure when testing is shown below:
Pkt-gen (in DomU) <--> Virtual Eth (in DomU) <---> VIF (in Driver Domain) <--> OVS (in 
Driver Domain) <--> pNIC (passthrough nic in Driver Domain) <---> Another Host
        
The summarized results are as follows:
1. When we just ping from DomU to another host, the network seems ok.
2. When sending 64 or 128 bytes UDP packets from DomU, the network will not be 
blocked
3. When sending 256, 1024 or 1400 bytes UDP packets from DomU, and if the 
scatter-gather feature of passthrough NIC in driver domain is on, the network 
will be blocked
4. When sending 256, 1024 or 1400 bytes UDP packets from DomU, and only if the 
scatter-gather feature of passthrough NIC in driver domain is off, the network 
will not be blocked

As shown in detailed syslog below, when network is blocked, it seems that the 
passthrough NIC's driver entry an exception state and the tx queue is hung.
As far as I know, when sending 64 or 128 bytes package, the skb generated by 
netback only has the linearized data, and the data is stored in the PAGE 
allocated from the driver domain's memory. But for packets whose size is larger 
than 128 bytes, the skb will also has a frag page which is grant mapped from 
DomU's memory. And if we disable the scatter-gather feature of NIC, the skb 
sent from netback will be linearized firstly, and it will make the skb's data 
is stored in the PAGE allocated from the driver domain other than the DomU's 
memory.

Yes, you are correct: the first slot (at most 128 bytes from it) is

grant copied to a locally allocated skb, whilst the rest is grant mapped

from the guest's memory in this case.


I am wondering if it is the problem caused by PCI-passthrough and DMA 
operations, or if there is some wrong configuration in our environment. How can 
I continue to debug this problem? I am looking forward to your replay and 
advice, Thanks.

The environment we used are as follows:
a. Dom0: SUSE 12 (kernel: 3.12.28)
b. XEN: 4.4.1_0602.2 (provided by SUSE 12)
c. DomU: kernel 3.17.4
d. Driver Domain: kernel 3.17.8

I would try out an upstream kernel, there were some grant mapping

changes recently, maybe that solves your issue.

Also, have you set the kernel's loglevel to DEBUG? ixgbe also has a modul parameter to enable further logging.

e. OVS: 2.1.2
f. Host: Huawei RH2288, CPU Intel Xenon E5645@xxxxxxx, disabled HyperThread, 
enabled VT-d
g. pNIC: we tried Intel 82599 10GE NIC (ixgbe v3.23.2), Intel 82576 1GE NIC 
(igb) and Broadcom NetXtreme II BCM 5709 1GE NIC (bnx2 v2.2.5)
h. para-virtulization driver: netfront/netback
i. MTU: 1500

The detailed Logs in Driver Domain after the network is blocked are as follows:
1. When using 82599 10GE NIC, syslog and dmesg includes infos below. The log 
shows that the Tx unit Hang is detected and driver will try to reset the 
adapter repeatly, however, the network is still blocked.

<snip>
ixgbe: 0000:00:04.0 eth10ï Detected Tx Unit Hang
Tx Queue             <0>
TDH, TDT             <1fd>, <5a>
next_to_use          <5a>
next_to_clean        <1fc>
ixgbe: 0000:00:04.0 eth0: tx hang 11 detected on queue 0, resetting adapter
ixgbe: 0000:00:04.0 eth10: Reset adapter
ixgbe: 0000:00:04.0 eth10: PCIe transaction pending bit also did not clear
ixgbe: 0000:00:04.0 master disable timed out
ixgbe: 0000:00:04.0 eth10: detected SFP+: 3
ixgbe: 0000:00:04.0 eth10: NIC Link is Up 10 Gbps, Flow Control: RX/TX
...
</snip>

I have tried to remove the "reset adpater" call in ixgbe driver's ndo_tx_timeout 
function, and the logs are shown below. The log shows that when network is blocked, the 
"TDH" and the nic cannot be incremented any more.

<snip>
ixgbe 0000:00:04.0 eth3: Detected Tx Unit Hang
Tx Queue             <0>
TDH, TDT             <1fd>, <5a>
next_to_use          <5a>
next_to_clean        <1fc>
ixgbe 0000:00:04.0 eth3: tx_buffer_info[next_to_clean]
time_stamp           <1075b74ca>
jiffies              <1075b791c>
ixgbe 0000:00:04.0 eth3: Fake Tx hang detected with timeout of 5 seconds
ixgbe 0000:00:04.0 eth3: Detected Tx Unit Hang
Tx Queue             <0>
TDH, TDT             <1fd>, <5a>
next_to_use          <5a>
next_to_clean        <1fc>
ixgbe 0000:00:04.0 eth3: tx_buffer_info[next_to_clean]
time_stamp           <1075b74ca>
jiffies              <1075b7b11>
...
</snip>

I have also compared the nic's corresponding pci status before and after the network is hung, and found that 
the "DevSta" filed changed from "TransPend-" to "TransPend+" after the network 
is blocked:

<snip>
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend+
</snip>

The network can only be recovered after we reload the ixgbe module in driver 
domain.

2. When using BCM5709 NIC, the results is smiliar. After the network is 
blocked, the syslog has info below:

<snip>
bnx2 0000:00:04.0 eth14: <--- start FTQ dump --->
bnx2 0000:00:04.0 eth14: RV2P_PFTQ_CTL 00010000
bnx2 0000:00:04.0 eth14: RV2P_TFTQ_CTL 00020000
...
bnx2 0000:00:04.0 eth14: CP_CPQ_FTQ_CTL 00004000
bnx2 0000:00:04.0 eth14: CPU states:
bnx2 0000:00:04.0 eth14: 045000 mode b84c state 80001000 evt_mask 500 pc 
8001280 pc 8001288 instr 8e030000
...
bnx2 0000:00:04.0 eth14: 185000 mode b8cc state 80000000 evt_mask 500 pc 
8000ca8 pc 8000920 instr 8ca50020
bnx2 0000:00:04.0 eth14: <--- end FTQ dump --->
bnx2 0000:00:04.0 eth14: <--- start TBDC dump --->
...
</snip>

The difference of lspci command results before and after the network is hung show that the Status 
field changed from "MAbort-" to "MAbort+":

<snip>
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ 
>SERR- <PERR- INTx-
</snip>

The network can not be recovered even after we reload the bnx2 module in
driver domain.

----------
openlui
Best Regards





_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.