Hi Oliver, This thread may be marginally interesting: https://lore.kernel.org/netdev/217e3fa9-7782-08c7-1f2b-8dabacaa83f9@xxxxxxxxx/T/ It suggests that the Realtek driver might have some issues. Cheers, Paul From: Oliver Linden <oliver_linden@xxxxxxxxxxx> Sent: 07 May 2020 13:27 To: paul@xxxxxxx; win-pv-devel@xxxxxxxxxxxxxxxxxxxx Subject: Re: Likely bug in the PV driver v9.0 Hi Paul, the Dom0 is sitting on brand new hardware with a fresh installed ubuntu 20.04 and the NIC being used is the onboard one: lspci -vv: 09:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15) Subsystem: ASRock Incorporation Motherboard (one of many) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 35 Region 0: I/O ports at b000 [size=256] Region 2: Memory at fc304000 (64-bit, non-prefetchable) [size=4K] Region 4: Memory at fc300000 (64-bit, non-prefetchable) [size=16K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [70] Express (v2) Endpoint, MSI 01 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 26.000W DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 4096 bytes DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s (ok), Width x1 (ok) TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, NROPrPrP-, LTR+ 10BitTagComp-, 10BitTagReq-, OBFF Via message/WAKE#, ExtFmt-, EETLPPrefix- EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- FRS-, TPHComp-, ExtTPHComp- AtomicOpsCap: 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled AtomicOpsCtl: ReqEn- LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [b0] MSI-X: Enable+ Count=4 Masked- Vector table: BAR=4 offset=00000000 PBA: BAR=4 offset=00000800 Capabilities: [100 v2] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn- MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- HeaderLog: 00000000 00000000 00000000 00000000 Capabilities: [140 v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 Arb: Fixed- WRR32- WRR64- WRR128- Ctrl: ArbSelect=Fixed Status: InProgress- VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01 Status: NegoPending- InProgress- Capabilities: [160 v1] Device Serial Number 7a-96-1a-59-a1-a8-00-00 Capabilities: [170 v1] Latency Tolerance Reporting Max snoop latency: 0ns Max no snoop latency: 0ns Capabilities: [178 v1] L1 PM Substates L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+ PortCommonModeRestoreTime=150us PortTPowerOnTime=150us L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1- T_CommonMode=0us LTR1.2_Threshold=0ns L1SubCtl2: T_PwrOn=10us Kernel driver in use: r8169 Kernel modules: r8169 All settings are on default. Best, Oliver Am 07.05.20 um 14:03 schrieb Paul Durrant: Oliver, That’s interesting. It suggests a bug in the guest tx side checksum calculation. This is normally done by netback setting up metadata in skb and having either the kernel or the h/w driver do the calculation. Disabling the option in the guest means the calculation will be done in-guest by XENVIF before the segment is passed to netback. Hence, it sounds like your problem may actually be in your dom0 or NIC (possibly failing to handle some quirk of the RDP packets). Cheers, Paul Hi Paul, that was a perfect hint! Disabling all features in the advanced properties pane allowed to reconnect via RDP. I subsequently activated the features and was able to nail it down to the two "TCP Checksum Offload (IPv[46])" entries. For those two you need to set them to either "Disabled" or "RX enabled". TX and RX&TX enabled are breaking RDP connectivity. Many thanks for your support, it's highly appreciated. Best, Oliver Am 06.05.20 um 09:26 schrieb Paul Durrant: Hi Oliver, Xen 4.9 and Ubuntu 18.04 are clearly both a little old. I guess it is possible that changes in netback have caused problems. I think the next step is probably to disable all offloads (checksum and LSO) in advanced properties pane for the PV network frontend and see if the problem still exists. If that doesn’t have any effect then next would be to collect wireshark traces and look for oddities around RDP login. Cheers, Paul Hello Paul, thanks a lot for your reply. I already reached out to the Xen IRC channels but didn't get a response besides the email address I've used for my initial email. The Windows firewall settings don't have any influence on the behavior. I tested that already. But as far as I can recall I had the drivers installed on a Windows 10 1909 DomU on my old server running an upgraded ubuntu 18.04 with Xen 4.9 and xend still being active (originally server install date was Dec 2012 with ubuntu 12.04 and their Xen version). With this setup the v9 drivers were working with RDP. Does this give you any hint/idea? Best, Oliver Am 05.05.20 um 15:56 schrieb Paul Durrant: Hi Oliver, I can only think this is a checksumming issue. I can’t see how else a network driver operating no higher than L4 (for checksum/LSO) would be able to affect a very specific part of a higher level protocol. Although, one thing to watch out for… in the past I have seen Windows do things like re-enabling firewall rules when there is a change in the network stack so you might want to check that. Cheers, Paul Dear all, I'm observing an annoying bug with the v9 Windows PV drivers. It's easy reproducible here on my side: Dom0: ubuntu 20.04 freshly installed around Easter with there version of Xen (4.11) and only using the xl tool-stack. DomU: With every fresh installation of Windows 10 v1909 German the following can be reproduced: - Everything is working but slow as expected
- Installation of the v9 drivers works well without any issues and results in significantly improved speed
but RDP connections to such a machine aren't possible anymore. The RDP service always asks for a user/PW but never accepts it. To be precise, the RDP connections are working
- with the plain vanilla installation
- with the v9 drivers being installed except the network class & driver
Since I'm using my Windows machines predominately with RDP clients it would be great if this can be solved. Please let me know if you need any kind of details from my side. Best, Oliver
|