[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] acpidump crashes on some machines



On 06/30/2012 04:19 AM, Konrad Rzeszutek Wilk wrote:

Konrad, David,

back on track for this issue. Thanks for your input, I could do some more debugging (see below for a refresh):

It seems like it affects only the first page of the 1:1 mapping. I didn't have an issues with the last PFN or the page behind it (which failed properly).

David, thanks for the hint with varying dom0_mem parameter. I thought I already checked this, but I did it once again and it turned out that it is only an issue if dom0_mem is smaller than the ACPI area, which generates a hole in the memory map. So we have (simplified)
* 1:1 mapping to 1 MB
* normal mapping till dom0_mem
* unmapped area till ACPI E820 area
* ACPI E820 1:1 mapping

As far as I could chase it down the 1:1 mapping itself looks OK, I couldn't find any off-by-one bugs here. So maybe it is code that later on invalidates areas between the normal guest mapping and the ACPI mem?

Hope that helps, I will also try to find more about this.

Thanks,
Andre.

[  351.964914] ------------[ cut here ]------------
[  351.964924] WARNING: at /src/linux-2.6/xentest/testxenmap.c:24
acpitest_init+0x5e/0x1000 [testxenmap]()
[  351.964926] Hardware name: empty
[  351.964928] We get cfef0 instead of ffffffffffffffff!

Is cfef0 part of the 1-1 mapping and in ACPI? On my box I see this:

# dmesg | head -30 | grep bc55
[    0.000000] 1-1 mapping on bc558->bc5ac
[    0.000000] Xen: [mem 0x0000000040200000-0x00000000bc557fff] usable
[    0.000000] Xen: [mem 0x00000000bc558000-0x00000000bc560fff] ACPI data

So the E820 has it marked a ACPI data and sure enough I also see this:

[    0.000000] ACPI: DSDT 00000000bc558168 079E1 (v02 INTEL  DQ67SW   00000016 
INTL 20051117)

Let me see what I get with the little module.

So:
[    0.000000] 1-1 mapping on 9a->100
[    0.000000] 1-1 mapping on 20000->20200
[    0.000000] 1-1 mapping on 40000->40200
[    0.000000] 1-1 mapping on bc558->bc5ac
[    0.000000] 1-1 mapping on bc5b4->bc8c5
[    0.000000] 1-1 mapping on bc8c6->bcb7c
[    0.000000] 1-1 mapping on bcd00->100000

dmesg | grep ACPI: | head
[    0.000000] ACPI: RSDP 00000000000f0450 00024 (v02  INTEL)
[    0.000000] ACPI: XSDT 00000000bc558070 00064 (v01 INTEL  DQ67SW   01072009 
AMI  00010013)
[    0.000000] ACPI: FACP 00000000bc55fb50 000F4 (v04 INTEL  DQ67SW   01072009 
AMI  00010013)
[    0.000000] ACPI: DSDT 00000000bc558168 079E1 (v02 INTEL  DQ67SW   00000016 
INTL 20051117)
[    0.000000] ACPI: FACS 00000000bc8dbf80 00040
[    0.000000] ACPI: APIC 00000000bc55fc48 00072 (v03 INTEL  DQ67SW   01072009 
AMI  00010013)
[    0.000000] ACPI: TCPA 00000000bc55fcc0 00032 (v02 INTEL  DQ67SW   00000001 
MSFT 01000013)
[    0.000000] ACPI: SSDT 00000000bc55fcf8 00102 (v01 INTEL  DQ67SW   00000001 
MSFT 03000001)
[    0.000000] ACPI: MCFG 00000000bc55fe00 0003C (v01 INTEL  DQ67SW   01072009 
MSFT 00000097)
[    0.000000] ACPI: HPET 00000000bc55fe40 00038 (v01 INTEL  DQ67SW   01072009 
AMI. 00000004)

02:11:06 # 42 :~/
rmmod acpidump;insmod /acpidump.ko pfn=0xbc55e

02:11:15 # 43 :~/
rmmod acpidump;insmod /acpidump.ko pfn=0xbc559

02:11:26 # 44 :~/
rmmod acpidump;insmod /acpidump.ko pfn=0xbc558
insmod: error inserting '/acpidump.ko': -1 Invalid parameters

2:16:37 # 8 :/data/
insmod /acpidump.ko pfn=0xbc5ac
insmod: error inserting '/acpidump.ko': -1 Invalid parameters

02:16:45 # 10 :/data/
dmesg | grep p2m
[  389.847683] raw p2m (bc558) gives us: ffffffffffffffff
[  701.348502] raw p2m (bc5ac) gives us: ffffffffffffffff

Huh? Looks like I can access the ACPI regions (bc559 had a bunch of stuff),
but _not_ on the boundary PFNs.

Plot thickens - but sadly I won't be able to do much until Thursday.

I think the issue is somewhere in set_phys_range_identity. This
loop:
  767         for (pfn = pfn_s; pfn < pfn_e; pfn++)
  768                 if (!__set_phys_to_machine(pfn, IDENTITY_FRAME(pfn)))
  769                         break;
  770

Probably needs pfn <= pfn_e. But that still does not explain
why pfn_s is failing.

Or maybe in the pfn_to_mfn machinary. It certainly has a lot of
overrides in it. If you were to instrument any of those to print
out more details on the offending PFNs that could help.





--
Andre Przywara
AMD-OSRC (Dresden)
Tel: x29712


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.