[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] x86/x2apic: introduce a mixed physical/cluster mode
cab@xxxxxxxxxx> <x4qzfuqkkebjkdfmhw6rvdhrn2ewa6ghjtjqd7xevnuylfahh7@pjratinsg76a> <a4b4546a-60b8-4d0e-bdf4-9af6699fb925@xxxxxxxxxx> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <a4b4546a-60b8-4d0e-bdf4-9af6699fb925@xxxxxxxxxx> Hello, Thanks a lot for all the details and explainations ! :) On 2023-11-27 11:11, Andrew Cooper wrote: > On 24/11/2023 7:54 pm, Neowutran wrote: > > Hi, > > I did some more tests and research, indeed this patch improved/solved my > > specific case. > > > > Starting point: > > > > I am using Xen version 4.17.2 (exactly this source > > https://github.com/QubesOS/qubes-vmm-xen). > > In the bios (a Asus motherboard), I configured the "local apic" parameter > > to "X2APIC". > > For Xen, I did not set the parameter "x2apic-mode" nor the parameter > > "x2apic_phys". > > > > Case 1: > > I tryied to boot just like that, result: system is unusuably slow > > > > Case 2: > > Then, I applied a backport of the patch > > https://lore.kernel.org/xen-devel/20231106142739.19650-1-roger.pau@xxxxxxxxxx/raw > > > > to the original Xen version of QubesOS and I recompiled. > > (https://github.com/neowutran/qubes-vmm-xen/blob/x2apic3/X2APIC.patch) > > Result: it work, the system is usable. > > > > Case 3: > > Then, I applied the patch > > https://github.com/xen-project/xen/commit/26a449ce32cef33f2cb50602be19fcc0c4223ba9 > > to the original Xen version of QubesOS and I recompiled. > > (https://github.com/neowutran/qubes-vmm-xen/blob/x2apic4/X2APIC.patch) > > Result: system is > > unusuably slow. > > > > > > In "Case 2", the value returned by the function "apic_x2apic_probe" is > > "&apic_x2apic_mixed". > > In "Case 3", the value returned by the function "apic_x2apic_probe" is > > "&apic_x2apic_cluster". > > > > > > ------------------- > > If you want / need, details for the function "apic_x2apic_probe": > > > > Known "input" value: > > > > "CONFIG_X2APIC_PHYSICAL" is not defined > > "iommu_intremap == iommu_intremap_off" = false > > "acpi _gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL" -> 0 > > "acpi_gbl_FADT.flags" = 247205 (in decimal) > > "CONFIG_X2APIC_PHYSICAL" is not defined > > "CONFIG_X2APIC_MIXED" is defined, because it is the default choice > > "x2apic_mode" = 0 > > "x2apic_phys" = -1 > > > > > > > > Trace log (I did some call "printk" to trace what was going on) > > Case 2: > > (XEN) NEOWUTRAN: X2APIC_MODE: 0 > > (XEN) NEOWUTRAN: X2APIC_PHYS: -1 > > (XEN) NEOWUTRAN: acpi_gbl_FADT.flags: 247205 > > (XEN) NEOWUTRAN IOMMU_INTREMAP: different > > (XEN) Neowutran: PASSE 2 > > (XEN) Neowutran: PASSE 4 > > (XEN) NEOWUTRAN: X2APIC_MODE: 3 > > (XEN) Neowutran: PASSE 7 > > (XEN) NEOWUTRAN: X2APIC_MODE: 3 > > > > (XEN) NEOWUTRAN: X2APIC_PHYS: -1 > > (XEN) NEOWUTRAN: acpi_gbl_FADT.flags: 247205 > > (XEN) NEOWUTRAN IOMMU_INTREMAP: different > > > > Case 3: > > (XEN) NEOWUTRAN2: X2APIC_PHYS: -1 > > (XEN) NEOWUTRAN2: acpi_gbl_FADT.flags: 247205 > > (XEN) NEOWUTRAN2 IOMMU_INTREMAP: different > > (XEN) Neowutran2: Passe 1 > > (XEN) NEO WUTRAN2: X2APIC_PHYS: 0 > > (XEN) Neowutran2: Passe 6 > > (XEN) Neowutran2: Passe 7 > > (XEN) NEOWUTRAN2: X2APIC_PHYS: 0 > > (XEN) NEOWUTRAN2: acpi_gbl_FADT.flags: 247205 > > (XEN) NEOWUTRAN2 IOMMU_INTREMAP: different > > (XEN) Neowutran2: Passe 2 > > (XEN) Neowutran2: Passe 4 > > (XEN) Neowutran2: Passe 7 > > > > > > > > If you require the full logs, I could publish the full logs somewhere. > > ---------------------- > > > > ( However I do not understand if the root issue is a buggy motherboard, a > > bug in xen, or if the parameter "X2APIC_PHYSICAL" should have been set > > by the QubesOS project, or something else) > > Hello, > > Thankyou for the analysis. > > For your base version of QubeOS Xen, was that 4.13.2-5 ?  I can't see > any APIC changes in the patchqueue, and I believe all relevant bugfixes > are in 4.17.2, but I'd just like to confirm. I was using the qubes-vmm-xen release "4.17.2-5" that use xen version "4.17.2" . I don't see custom modification for APIC in the patchs applied t o Xen by QubesOS > > First, by "unusable slow", other than the speed, did everything else > appear to operate adequately? Any chance you could guess the slowdown. > i.e. was it half the speed, or "seconds per log console line during > boot" levels of slow? Once I was logged in, it took me around 10 minutes to type the command "sudo dmesg > log" There was also graphical instabilities (screen display something, then it is black, few seconds later it display things again. Sometime it completly crash and I need to reboot to try to finish the boot+login process), and unable to start guests due to the system being too slow. Some of the logs gathered from "sudo dmesg" that only appear for case 1 and case 3: " nvme nvme1: I/O 998 QID 1 timeout, completion polled nvme nvme1: I/O 854 QID 5 timeout, completion polled ... [drm] Fence fallback timer expired on ring sdma0 [drm] Fence fallback timer expired on ring sdma0 ... [drm] Fence fallback timer expired on ring sdma0 [drm] Fence fallback timer ex pired on ring gfx_0.0.0 [drm] Fence fallback timer expired on ring gfx_0.0.0 [drm] Fence fallback timer expired on ring sdma0 ... " things like that repeated hundreds of times. > > Having re-reviewed 26a449ce32, the patch is correct but the reasoning is > wrong. > > ACPI_FADT_APIC_CLUSTER predates x2APIC by almost a decade (it appeared > in ACPI 3.0), and is not relevant outside of xAPIC mode. xAPIC has 2 > different logical destination modes, cluster and flat, and their > applicability is dependent on whether you have fewer or more than 8 > local APICs, hence that property being called out in the ACPI spec. > > x2APIC does not have this property. DFR was removed from the > architecture, and logical mode is strictly cluster. So the bit should > never have been interpreted on an x2APIC code path. > > Not that it matters in your case - the bit isn't set in your FADT, hence > why case 1 and 3 have the same behaviour. > > > This brings us to case 2, where mixed mode does seem to resolve the per f > problem. > > Since that patch was written, I've learnt how cluster delivery mode > works for external interrupts, and Xen should never ever have been using > it (Xen appears to be alone in OS software here). For an external > interrupt in Logical cluster mode, it always sends to the lowest ID in > the cluster. If that APIC decides that the local processor is too busy > to handle the interrupt now, it forwards the interrupt to the next APIC > in the cluster, and this cycle continues until one APIC accepts the message. > > You get most interrupts hitting the lowest APIC in the cluster, but the > interrupt can be forwarded between APICs for an unbounded quantity of > time depending on system utilisation. > > > Could you please take case 2 and confirm what happens when booting with > x2apic-mode={physical,cluster}? If the pattern holds, the physical > should be fine, and cluster should see the same problems as case 1 and 3. I confirm that the pattern holds. "physical" is fine and "cluster" have th e same issue as case 1 and case 3. > Thanks, > > ~Andrew Thanks, Neowutran
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |