|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 2/2] xen/x86/pvh: copy ACPI tables to Dom0 instead of mapping
On Wed, 17 May 2023, Roger Pau Monné wrote:
> On Tue, May 16, 2023 at 04:34:09PM -0700, Stefano Stabellini wrote:
> > On Tue, 16 May 2023, Roger Pau Monné wrote:
> > > On Mon, May 15, 2023 at 05:11:25PM -0700, Stefano Stabellini wrote:
> > > > On Mon, 15 May 2023, Roger Pau Monné wrote:
> > > > > On Fri, May 12, 2023 at 06:17:20PM -0700, Stefano Stabellini wrote:
> > > > > > From: Stefano Stabellini <stefano.stabellini@xxxxxxx>
> > > > > >
> > > > > > Mapping the ACPI tables to Dom0 PVH 1:1 leads to memory corruptions
> > > > > > of
> > > > > > the tables in the guest. Instead, copy the tables to Dom0.
> > > > > >
> > > > > > This is a workaround.
> > > > > >
> > > > > > Signed-off-by: Stefano Stabellini <stefano.stabellini@xxxxxxx>
> > > > > > ---
> > > > > > As mentioned in the cover letter, this is a RFC workaround as I
> > > > > > don't
> > > > > > know the cause of the underlying problem. I do know that this patch
> > > > > > solves what would be otherwise a hang at boot when Dom0 PVH
> > > > > > attempts to
> > > > > > parse ACPI tables.
> > > > >
> > > > > I'm unsure how safe this is for native systems, as it's possible for
> > > > > firmware to modify the data in the tables, so copying them would
> > > > > break that functionality.
> > > > >
> > > > > I think we need to get to the root cause that triggers this behavior
> > > > > on QEMU. Is it the table checksum that fail, or something else? Is
> > > > > there an error from Linux you could reference?
> > > >
> > > > I agree with you but so far I haven't managed to find a way to the root
> > > > of the issue. Here is what I know. These are the logs of a successful
> > > > boot using this patch:
> > > >
> > > > [ 10.437488] ACPI: Early table checksum verification disabled
> > > > [ 10.439345] ACPI: RSDP 0x000000004005F955 000024 (v02 BOCHS )
> > > > [ 10.441033] ACPI: RSDT 0x000000004005F979 000034 (v01 BOCHS
> > > > BXPCRSDT 00000001 BXPC 00000001)
> > > > [ 10.444045] ACPI: APIC 0x0000000040060F76 00008A (v01 BOCHS
> > > > BXPCAPIC 00000001 BXPC 00000001)
> > > > [ 10.445984] ACPI: FACP 0x000000004005FA65 000074 (v01 BOCHS
> > > > BXPCFACP 00000001 BXPC 00000001)
> > > > [ 10.447170] ACPI BIOS Warning (bug): Incorrect checksum in table
> > > > [FACP] - 0x67, should be 0x30 (20220331/tbprint-174)
> > > > [ 10.449522] ACPI: DSDT 0x000000004005FB19 00145D (v01 BOCHS
> > > > BXPCDSDT 00000001 BXPC 00000001)
> > > > [ 10.451258] ACPI: FACS 0x000000004005FAD9 000040
> > > > [ 10.452245] ACPI: Reserving APIC table memory at [mem
> > > > 0x40060f76-0x40060fff]
> > > > [ 10.452389] ACPI: Reserving FACP table memory at [mem
> > > > 0x4005fa65-0x4005fad8]
> > > > [ 10.452497] ACPI: Reserving DSDT table memory at [mem
> > > > 0x4005fb19-0x40060f75]
> > > > [ 10.452602] ACPI: Reserving FACS table memory at [mem
> > > > 0x4005fad9-0x4005fb18]
> > > >
> > > >
> > > > And these are the logs of the same boot (unsuccessful) without this
> > > > patch:
> > > >
> > > > [ 10.516015] ACPI: Early table checksum verification disabled
> > > > [ 10.517732] ACPI: RSDP 0x0000000040060F1E 000024 (v02 BOCHS )
> > > > [ 10.519535] ACPI: RSDT 0x0000000040060F42 000034 (v01 BOCHS
> > > > BXPCRSDT 00000001 BXPC 00000001)
> > > > [ 10.522523] ACPI: APIC 0x0000000040060F76 00008A (v01 BOCHS
> > > > BXPCAPIC 00000001 BXPC 00000001)
> > > > [ 10.527453] ACPI: ���� 0x000000007FFE149D FFFFFFFF (v255 ������
> > > > �������� FFFFFFFF ���� FFFFFFFF)
> > > > [ 10.528362] ACPI: Reserving APIC table memory at [mem
> > > > 0x40060f76-0x40060fff]
> > > > [ 10.528491] ACPI: Reserving ���� table memory at [mem
> > > > 0x7ffe149d-0x17ffe149b]
> > > >
> > > > It is clearly a memory corruption around FACS but I couldn't find the
> > > > reason for it. The mapping code looks correct. I hope you can suggest a
> > > > way to narrow down the problem. If I could, I would suggest to apply
> > > > this patch just for the QEMU PVH tests but we don't have the
> > > > infrastructure for that in gitlab-ci as there is a single Xen build for
> > > > all tests.
> > >
> > > Would be helpful to see the memory map provided to Linux, just in case
> > > we messed up and there's some overlap.
> >
> > Everything looks correct. Here are some more logs:
> >
> > (XEN) Xen-e820 RAM map:
> > (XEN) [0000000000000000, 000000000009fbff] (usable)
> > (XEN) [000000000009fc00, 000000000009ffff] (reserved)
> > (XEN) [00000000000f0000, 00000000000fffff] (reserved)
> > (XEN) [0000000000100000, 000000007ffdffff] (usable)
> > (XEN) [000000007ffe0000, 000000007fffffff] (reserved)
> > (XEN) [00000000fffc0000, 00000000ffffffff] (reserved)
> > (XEN) [000000fd00000000, 000000ffffffffff] (reserved)
> > (XEN) Microcode loading not available
> > (XEN) New Xen image base address: 0x7f600000
> > (XEN) System RAM: 2047MB (2096636kB)
> > (XEN) ACPI: RSDP 000F58D0, 0014 (r0 BOCHS )
> > (XEN) ACPI: RSDT 7FFE1B21, 0034 (r1 BOCHS BXPC 1 BXPC 1)
> > (XEN) ACPI: FACP 7FFE19CD, 0074 (r1 BOCHS BXPC 1 BXPC 1)
> > (XEN) ACPI: DSDT 7FFE0040, 198D (r1 BOCHS BXPC 1 BXPC 1)
> > (XEN) ACPI: FACS 7FFE0000, 0040
> > (XEN) ACPI: APIC 7FFE1A41, 0080 (r1 BOCHS BXPC 1 BXPC 1)
> > (XEN) ACPI: HPET 7FFE1AC1, 0038 (r1 BOCHS BXPC 1 BXPC 1)
> > (XEN) ACPI: WAET 7FFE1AF9, 0028 (r1 BOCHS BXPC 1 BXPC 1)
> > [...]
> > (XEN) Dom0 memory map:
> > (XEN) [0000000000000000, 000000000009efff] (usable)
> > (XEN) [000000000009fc00, 000000000009ffff] (reserved)
> > (XEN) [00000000000f0000, 00000000000fffff] (reserved)
> > (XEN) [0000000000100000, 0000000040060f1d] (usable)
> > (XEN) [0000000040060f1e, 0000000040060fa7] (ACPI data)
> > (XEN) [0000000040061000, 000000007ffdffff] (unusable)
> > (XEN) [000000007ffe0000, 000000007fffffff] (reserved)
> > (XEN) [00000000fffc0000, 00000000ffffffff] (reserved)
> > (XEN) [000000fd00000000, 000000ffffffffff] (reserved)
> > [...]
> > [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009efff] usable
> > [ 0.000000] BIOS-e820: [mem 0x000000000009fc00-0x00000000000fffff]
> > reserved
> > [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x0000000040060f1d] usable
> > [ 0.000000] BIOS-e820: [mem 0x0000000040060f1e-0x0000000040060fa7] ACPI
> > data
> > [ 0.000000] BIOS-e820: [mem 0x0000000040061000-0x000000007ffdffff]
> > unusable
> > [ 0.000000] BIOS-e820: [mem 0x000000007ffe0000-0x000000007fffffff]
> > reserved
> > [ 0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff]
> > reserved
> > [ 0.000000] BIOS-e820: [mem 0x000000fd00000000-0x000000ffffffffff]
> > reserved
> > [...]
> > [ 10.102427] ACPI: Early table checksum verification disabled
> > [ 10.104455] ACPI: RSDP 0x0000000040060F1E 000024 (v02 BOCHS )
> > [ 10.106250] ACPI: RSDT 0x0000000040060F42 000034 (v01 BOCHS BXPC
> > 00000001 BXPC 00000001)
> > [ 10.109549] ACPI: APIC 0x0000000040060F76 00008A (v01 BOCHS BXPC
> > 00000001 BXPC 00000001)
> > [ 10.115173] ACPI: ���� 0x000000007FFE19CD FFFFFFFF (v255 ������ ��������
> > FFFFFFFF ���� FFFFFFFF)
> > [ 10.116054] ACPI: Reserving APIC table memory at [mem
> > 0x40060f76-0x40060fff]
> > [ 10.116182] ACPI: Reserving ���� table memory at [mem
> > 0x7ffe19cd-0x17ffe19cb]
> >
> >
> >
> > > It seems like some of the XSDT entries (the FADT one) is corrupt?
> > >
> > > Could you maybe add some debug to the Xen-crafted XSDT placement.
> >
> > I added a printk just after:
> >
> > xsdt->table_offset_entry[j++] = tables[i].address;
> >
> > And it printed only once:
> >
> > (XEN) DEBUG pvh_setup_acpi_xsdt 1000 name=FACP address=7ffe19cd
> >
> > That actually matches the address read by Linux:
> >
> > [ 10.175448] ACPI: ���� 0x000000007FFE19CD FFFFFFFF (v255 ������
> > �������� FFFFFFFF ���� FFFFFFFF)
> >
> > So the address seems correct. It is the content of the FADT/FACP table
> > that is corrupted.
> >
> > I wrote the following function in Xen:
> >
> > static void check(void)
> > {
> > unsigned long addr = 0x7ffe19cd;
> > struct acpi_table_fadt *fadt;
> > fadt = acpi_os_map_memory(addr, sizeof(*fadt));
> > printk("DEBUG %s %d s=%s\n",__func__,__LINE__,fadt->header.signature);
> > acpi_os_unmap_memory(fadt, sizeof(*fadt));
> > }
> >
> > It prints the right table signature at the end of pvh_setup_acpi.
> > I also added a call at the top of xenmem_add_to_physmap_one, and the
> > signature is still correct. Then I added a call at the beginning of
> > __update_vcpu_system_time. Here is the surprise: from Xen point of view
> > the table never gets corrupted. Here are the logs:
> >
> > [...]
> > (XEN) DEBUG fadt_check 1551 s=FACPt
> > (XEN) DEBUG fadt_check 1551 s=FACPt
> > (XEN) DEBUG fadt_check 1551 s=FACPt
> > (XEN) DEBUG fadt_check 1551 s=FACPt
> > (XEN) d0v0: upcall vector f3
> > [ 0.000000] Linux version 6.1.19 (root@124de7fbba7f) (gcc (Debian
> > 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_3
> > [ 0.000000] Command line: console=hvc0
> > [...]
> > [ 10.371610] ACPI: Early table checksum verification disabled
> > [ 10.373633] ACPI: RSDP 0x0000000040060F1E 000024 (v02 BOCHS )
> > [ 10.375548] ACPI: RSDT 0x0000000040060F42 000034 (v01 BOCHS BXPC
> > 00000001 BXPC 00000001)
> > [ 10.378732] ACPI: APIC 0x0000000040060F76 00008A (v01 BOCHS BXPC
> > 00000001 BXPC 00000001)
> > [ 10.384188] ACPI: ���� 0x000000007FFE19CD FFFFFFFF (v255 ������ ��������
> > FFFFFFFF ���� FFFFFFFF)
> > [ 10.385374] ACPI: Reserving APIC table memory at [mem
> > 0x40060f76-0x40060fff]
> > [ 10.385519] ACPI: Reserving ���� table memory at [mem
> > 0x7ffe19cd-0x17ffe19cb]
> > [...]
> > (XEN) DEBUG fadt_check 1551 s=FACPt
> > (XEN) DEBUG fadt_check 1551 s=FACPt
> > (XEN) DEBUG fadt_check 1551 s=FACPt
> > (XEN) DEBUG fadt_check 1551 s=FACPt
> >
> >
> > So it looks like it is a problem with the mapping itself? Xen sees the
> > data correctly and Linux sees it corrupted?
>
> It seems to me like the page is not correctly mapped, and so 1s are
> returned? (same behavior as a hole). IOW: would seem to me like MMIO
> areas are not correctly handled by nested NPT? (I assume you are
> running this on AMD).
>
> Does it make a difference if you try to boot with dom0=pvh,shadow?
>
> A couple of wild ideas. Maybe the nested virt support that you are
> using doesn't handle the UC bit in second stage page table entries?
> You could to remove this in p2m_type_to_flags() (see the
> p2m_mmio_direct case).
>
> Another wild idea I have is that the emulated NPT code doesn't like
> having the bits 63:52 from the PTE set to anything different than 0,
> and hence only p2m_ram_rw works (p2m_mmio_direct is 5).
Many thanks to Xenia for figuring out the root cause of the bug. The
underlying memory region is already added as E820_RESERVED to the guest
(instead of E820_ACPI). When pvh_add_mem_range is called with E820_ACPI
as parameter for the interesting table, pvh_add_mem_range returns with
-EEXIST without doing anything.
The original fix by Xenia was to carve out the relevant subset of the
reserved region and mark it as E820_ACPI. Instead, I rewrote it as
changing the type of the entire region to E820_ACPI because it is
simpler and doesn't have to deal with the edge cases (partially
overlapping, overlapping 2 existing regions, etc.)
What do you think?
diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index e1043e40d2..6c1c73d853 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -241,6 +241,20 @@ static int __init pvh_add_mem_range(struct domain *d,
uint64_t s, uint64_t e,
if ( rs >= e )
break;
+ if ( re >= e && rs <= s )
+ {
+ /*
+ * An existing overlapping memory range exists and it is
+ * marked as reserved. This happens on QEMU. Change the type
+ * to E820_ACPI.
+ */
+ if ( d->arch.e820[i].type == E820_RESERVED && type == E820_ACPI )
+ {
+ d->arch.e820[i].type = E820_ACPI;
+ break;
+ }
+ }
+
if ( re > s )
return -EEXIST;
}
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |