[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Config space access to Mediatek MT7922 doesn't work after device reset in Xen PV dom0 (regression, Linux 6.12)
On Tue, Jan 28, 2025 at 09:03:15PM -0600, Bjorn Helgaas wrote: > [+cc linux-pci] > > On Wed, Jan 29, 2025 at 03:10:49AM +0100, Marek Marczykowski-Górecki wrote: > > On Tue, Jan 28, 2025 at 07:15:26PM -0600, Bjorn Helgaas wrote: > > > On Fri, Jan 17, 2025 at 01:05:30PM +0100, Marek Marczykowski-Górecki > > > wrote: > > > > After updating PV dom0 to Linux 6.12, The Mediatek MT7922 device reports > > > > all 0xff when accessing its config space. This happens only after device > > > > reset (which is also triggered when binding the device to the > > > > xen-pciback driver). > > > > > > Thanks for the report and for all the debugging you've already done! > > > > > > > Reproducer: > > > > > > > > # lspci -xs 01:00.0 > > > > 01:00.0 Network controller: MEDIATEK Corp. MT7922 802.11ax PCI > > > > Express Wireless Network Adapter > > > > 00: c3 14 16 06 00 00 10 00 00 00 80 02 10 00 00 00 > > > > ... > > > > # echo 1 > /sys/bus/pci/devices/0000:01:00.0/reset > > > > # lspci -xs 01:00.0 > > > > 01:00.0 Network controller: MEDIATEK Corp. MT7922 802.11ax PCI > > > > Express Wireless Network Adapter > > > > 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > > > > > > > > The same operation done on Linux 6.12 running without Xen works fine. > > > > > > > > git bisect points at: > > > > > > > > commit d591f6804e7e1310881c9224d72247a2b65039af > > > > Author: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> > > > > Date: Tue Aug 27 18:48:46 2024 -0500 > > > > > > > > PCI: Wait for device readiness with Configuration RRS > > > > > > > > part of that commit: > > > > @@ -1311,9 +1320,15 @@ static int pci_dev_wait(struct pci_dev *dev, > > > > char *reset_type, int timeout) > > > > return -ENOTTY; > > > > } > > > > > > > > - pci_read_config_dword(dev, PCI_COMMAND, &id); > > > > - if (!PCI_POSSIBLE_ERROR(id)) > > > > - break; > > > > + if (root && root->config_crs_sv) { > > > > + pci_read_config_dword(dev, PCI_VENDOR_ID, &id); > > > > + if (!pci_bus_crs_vendor_id(id)) > > > > + break; > > > > + } else { > > > > + pci_read_config_dword(dev, PCI_COMMAND, &id); > > > > + if (!PCI_POSSIBLE_ERROR(id)) > > > > + break; > > > > + } > > > > > > > > > > > > Adding some debugging, the PCI_VENDOR_ID read in pci_dev_wait() returns > > > > initially 0xffffffff. If I extend the condition with > > > > "&& !PCI_POSSIBLE_ERROR(id)", then the issue disappear. But reading the > > > > patch description, it would break VF. > > > > I'm not sure where the issue is, but given it breaks only when running > > > > with Xen, I guess something is wrong with "Configuration RRS Software > > > > Visibility" in that case. > > > > > > I'm missing something. If you get 0xffffffff, that is not the 0x0001 > > > Vendor ID, so pci_dev_wait() should exit immediately. > > > > I'm not sure what is going on there either, but my _guess_ is that the > > loop exits too early due to the above. And it makes some further actions > > to fail. > > When RRS SV is enabled, reading PCI_VENDOR_ID should always return > 0x0001 (if the device isn't ready and responds with RRS status) or the > valid Vendor ID. I don't think it should ever return 0xffff (unless > the device is powered off, unplugged, or broken, of course). Maybe it isn't really enabled when Xen is involved? By looking at lspci of the bridge for this device, I do see RootCtl: ... CRSVisible+, but maybe there is something else needed too? Just in case, full lspci -vvvs 2.2 (the bridge): 00:02.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Phoenix GPP Bridge (prog-if 00 [Normal decode]) Subsystem: Advanced Micro Devices, Inc. [AMD] Device 1453 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin ? routed to IRQ 102 Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 I/O behind bridge: 0000f000-00000fff [disabled] [32-bit] Memory behind bridge: 90b00000-90bfffff [size=1M] [32-bit] Prefetchable memory behind bridge: 8010900000-80109fffff [size=1M] [32-bit] Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR- BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16- MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [50] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [58] Express (v2) Root Port (Slot+), IntMsgNum 0 DevCap: MaxPayload 256 bytes, PhantFunc 0 ExtTag+ RBE+ TEE-IO- DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend- LnkCap: Port #4, Speed 16GT/s, Width x1, ASPM L1, Exit Latency L1 <64us ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+ LnkCtl: ASPM L1 Enabled; RCB 64 bytes, LnkDisable- CommClk+ ExtSynch+ ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s, Width x1 TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt- SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise- Slot #0, PowerLimit 75W; Interlock- NoCompl+ SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg- Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock- SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock- Changed: MRL- PresDet- LinkState+ RootCap: CRSVisible+ RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible+ RootSta: PME ReqID 0000, PMEStatus- PMEPending- DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR+ 10BitTagComp+ 10BitTagReq+ OBFF Not Supported, ExtFmt+ EETLPPrefix+, MaxEETLPPrefixes 1 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- FRS- LN System CLS Not Supported, TPHComp+ ExtTPHComp- ARIFwd+ AtomicOpsCap: Routing+ 32bit+ 64bit+ 128bitCAS- DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis- ARIFwd- AtomicOpsCtl: ReqEn- EgressBlck- IDOReq- IDOCompl- LTR+ EmergencyPowerReductionReq- 10BitTagReq- OBFF Disabled, EETLPPrefixBlk- LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS- LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete- EqualizationPhase1- EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest- Retimer- 2Retimers- CrosslinkRes: unsupported Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Address: 00000000fee08000 Data: 4000 Capabilities: [c0] Subsystem: Advanced Micro Devices, Inc. [AMD] Device 1453 Capabilities: [c8] HyperTransport: MSI Mapping Enable+ Fixed+ Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?> Capabilities: [270 v1] Secondary PCI Express LnkCtl3: LnkEquIntrruptEn- PerformEqu- LaneErrStat: 0 Capabilities: [2a0 v1] Access Control Services ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans+ ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans- Capabilities: [370 v1] L1 PM Substates L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+ PortCommonModeRestoreTime=10us PortTPowerOnTime=150us L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ T_CommonMode=10us LTR1.2_Threshold=166912ns L1SubCtl2: T_PwrOn=150us Capabilities: [400 v1] Data Link Feature <?> Capabilities: [410 v1] Physical Layer 16.0 GT/s <?> Capabilities: [440 v1] Lane Margining at the Receiver PortCap: Uses Driver- PortSta: MargReady- MargSoftReady- Kernel driver in use: pcieport > > > > But the log at > > > https://github.com/QubesOS/qubes-issues/issues/9689#issuecomment-2582927149 > > > says it *doesn't* exit and eventually times out. > > > > Note this log is from "working" kernel, so that timeout must be > > something else. > > I saw it was labeled "NO BUG" but I'm not sure it's labeled correctly > since there are no interesting messages from the "BUG PRESENT" part. > Awfully funny coincidence if it's unrelated. The timeout thing I have seen before, possibly also on a different hardware (although I'm not 100% sure), I think it's a different issue. > > > And the lspci above shows ~0 data for much of the header, even though > > > the device must be ready by then. > > > > > > I don't have any good ideas, but since the problem only happens with > > > Xen, and it seems to affect more than just the Vendor ID, maybe you > > > could instrument xen_pcibk_config_read() and see if there's something > > > wonky going on there? > > > > This one is used when pcifront (from a different PV VM) is asking pciback > > to read something. I see the issue even before starting any other VM and > > not even attaching the device to the xen-pciback driver... > > The report claims the problem only happens with Xen. I'm not a Xen > person, and I don't know how to find the relevant config accessors. > The snippets of kernel messages I see at [1] all mention pciback, so > that's my only clue of where to look. Bottom line, I have no idea > what the config accessor path is, and maybe we could learn something > by looking at whatever it is. AFAIK there are no separate config accessors under Xen dom0, the default ones are used. xen-pcifront takes over PCI config space access (and few more) only in a domU (and only for PV), when PCI passthrough is used. Here, it didn't went that far... But then, Xen may intercept such access [2]. If I read it right, it should allow all access (is_hardware_domain(dom0)==true, and also the device is not on ro_map - otherwise reset wouldn't work at all). > > [1] > https://github.com/QubesOS/qubes-issues/issues/9689#issuecomment-2582927149 [2] https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/pv/emul-priv-op.c;h=70150c27227661baa253af8693ff00f2ab640a98;hb=HEAD#l295 -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab Attachment:
signature.asc
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |