[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Config space access to Mediatek MT7922 doesn't work after device reset in Xen PV dom0 (regression, Linux 6.12)
[+cc linux-pci] On Wed, Jan 29, 2025 at 03:10:49AM +0100, Marek Marczykowski-Górecki wrote: > On Tue, Jan 28, 2025 at 07:15:26PM -0600, Bjorn Helgaas wrote: > > On Fri, Jan 17, 2025 at 01:05:30PM +0100, Marek Marczykowski-Górecki wrote: > > > After updating PV dom0 to Linux 6.12, The Mediatek MT7922 device reports > > > all 0xff when accessing its config space. This happens only after device > > > reset (which is also triggered when binding the device to the > > > xen-pciback driver). > > > > Thanks for the report and for all the debugging you've already done! > > > > > Reproducer: > > > > > > # lspci -xs 01:00.0 > > > 01:00.0 Network controller: MEDIATEK Corp. MT7922 802.11ax PCI > > > Express Wireless Network Adapter > > > 00: c3 14 16 06 00 00 10 00 00 00 80 02 10 00 00 00 > > > ... > > > # echo 1 > /sys/bus/pci/devices/0000:01:00.0/reset > > > # lspci -xs 01:00.0 > > > 01:00.0 Network controller: MEDIATEK Corp. MT7922 802.11ax PCI > > > Express Wireless Network Adapter > > > 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > > > > > > The same operation done on Linux 6.12 running without Xen works fine. > > > > > > git bisect points at: > > > > > > commit d591f6804e7e1310881c9224d72247a2b65039af > > > Author: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> > > > Date: Tue Aug 27 18:48:46 2024 -0500 > > > > > > PCI: Wait for device readiness with Configuration RRS > > > > > > part of that commit: > > > @@ -1311,9 +1320,15 @@ static int pci_dev_wait(struct pci_dev *dev, char > > > *reset_type, int timeout) > > > return -ENOTTY; > > > } > > > > > > - pci_read_config_dword(dev, PCI_COMMAND, &id); > > > - if (!PCI_POSSIBLE_ERROR(id)) > > > - break; > > > + if (root && root->config_crs_sv) { > > > + pci_read_config_dword(dev, PCI_VENDOR_ID, &id); > > > + if (!pci_bus_crs_vendor_id(id)) > > > + break; > > > + } else { > > > + pci_read_config_dword(dev, PCI_COMMAND, &id); > > > + if (!PCI_POSSIBLE_ERROR(id)) > > > + break; > > > + } > > > > > > > > > Adding some debugging, the PCI_VENDOR_ID read in pci_dev_wait() returns > > > initially 0xffffffff. If I extend the condition with > > > "&& !PCI_POSSIBLE_ERROR(id)", then the issue disappear. But reading the > > > patch description, it would break VF. > > > I'm not sure where the issue is, but given it breaks only when running > > > with Xen, I guess something is wrong with "Configuration RRS Software > > > Visibility" in that case. > > > > I'm missing something. If you get 0xffffffff, that is not the 0x0001 > > Vendor ID, so pci_dev_wait() should exit immediately. > > I'm not sure what is going on there either, but my _guess_ is that the > loop exits too early due to the above. And it makes some further actions > to fail. When RRS SV is enabled, reading PCI_VENDOR_ID should always return 0x0001 (if the device isn't ready and responds with RRS status) or the valid Vendor ID. I don't think it should ever return 0xffff (unless the device is powered off, unplugged, or broken, of course). > > But the log at > > https://github.com/QubesOS/qubes-issues/issues/9689#issuecomment-2582927149 > > says it *doesn't* exit and eventually times out. > > Note this log is from "working" kernel, so that timeout must be > something else. I saw it was labeled "NO BUG" but I'm not sure it's labeled correctly since there are no interesting messages from the "BUG PRESENT" part. Awfully funny coincidence if it's unrelated. > > And the lspci above shows ~0 data for much of the header, even though > > the device must be ready by then. > > > > I don't have any good ideas, but since the problem only happens with > > Xen, and it seems to affect more than just the Vendor ID, maybe you > > could instrument xen_pcibk_config_read() and see if there's something > > wonky going on there? > > This one is used when pcifront (from a different PV VM) is asking pciback > to read something. I see the issue even before starting any other VM and > not even attaching the device to the xen-pciback driver... The report claims the problem only happens with Xen. I'm not a Xen person, and I don't know how to find the relevant config accessors. The snippets of kernel messages I see at [1] all mention pciback, so that's my only clue of where to look. Bottom line, I have no idea what the config accessor path is, and maybe we could learn something by looking at whatever it is. [1] https://github.com/QubesOS/qubes-issues/issues/9689#issuecomment-2582927149
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |