[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Linux 4.15-rc6 + xen-unstable: BUG: unable to handle kernel NULL pointer dereference at (null), [ 0.000000] IP: zero_resv_unavail+0x8e/0xe1



On 04/01/18 12:44, Juergen Gross wrote:
> On 04/01/18 11:17, Sander Eikelenboom wrote:
>> Hi Boris / Juergen,
>>
>> First of all best wishes for a quite turbulent starting new year.
>>
>> Now the holidays are over I finally gotten to test a linux 4.15-rc6 kernel
>> and experienced a crash in early dom0 boot on my system (AMD phenom x6).
>>
>> I tested some earlier linux 4.15 rc's but experienced crashes then as well, 
>> but didn't have time to setup serial console to send them in 
>> (and waited to see if the issue Boris fixed with AMD PCI 64bit bar's could 
>> be it). 
>>
>> But since that patch went in before 4.15 rc6, that doesn't seem to be the 
>> issue. 
>> So it could be that the culprit went in pretty earlier in the 4.15 cycle.
>>
>> The 4.15-rc6 kernel boots fine on bare metal, as does a 4.14.6 kernel on 
>> xen-unstable.
>>
>> Hopefully you have a pointer to what is wrong, if not i can try to do a 
>> bisect.
> 
> A bisect would be very welcome.

Hi Juergen / Boris / Pavel,

Bisection result is:

a4a3ede2132ae0863e2d43e06f9b5697c51a7a3b is the first bad commit
commit a4a3ede2132ae0863e2d43e06f9b5697c51a7a3b
Author: Pavel Tatashin <pasha.tatashin@xxxxxxxxxx>
Date:   Wed Nov 15 17:36:31 2017 -0800

    mm: zero reserved and unavailable struct pages
    
    Some memory is reserved but unavailable: not present in memblock.memory
    (because not backed by physical pages), but present in memblock.reserved.
    Such memory has backing struct pages, but they are not initialized by
    going through __init_single_page().
    
    In some cases these struct pages are accessed even if they do not
    contain any data.  One example is page_to_pfn() might access page->flags
    if this is where section information is stored (CONFIG_SPARSEMEM,
    SECTION_IN_PAGE_FLAGS).
    
    One example of such memory: trim_low_memory_range() unconditionally
    reserves from pfn 0, but e820__memblock_setup() might provide the
    exiting memory from pfn 1 (i.e.  KVM).
    
    Since struct pages are zeroed in __init_single_page(), and not during
    allocation time, we must zero such struct pages explicitly.
    
    The patch involves adding a new memblock iterator:
            for_each_resv_unavail_range(i, p_start, p_end)
    
    Which iterates through reserved && !memory lists, and we zero struct pages
    explicitly by calling mm_zero_struct_page().
    
    ===
    
    Here is more detailed example of problem that this patch is addressing:
    
    Run tested on qemu with the following arguments:
    
            -enable-kvm -cpu kvm64 -m 512 -smp 2
    
    This patch reports that there are 98 unavailable pages.
    
    They are: pfn 0 and pfns in range [159, 255].
    
    Note, trim_low_memory_range() reserves only pfns in range [0, 15], it does
    not reserve [159, 255] ones.
    
    e820__memblock_setup() reports linux that the following physical ranges are
    available:
        [1 , 158]
    [256, 130783]
    
    Notice, that exactly unavailable pfns are missing!
    
    Now, lets check what we have in zone 0: [1, 131039]
    
    pfn 0, is not part of the zone, but pfns [1, 158], are.
    
    However, the bigger problem we have if we do not initialize these struct
    pages is with memory hotplug.  Because, that path operates at 2M
    boundaries (section_nr).  And checks if 2M range of pages is hot
    removable.  It starts with first pfn from zone, rounds it down to 2M
    boundary (sturct pages are allocated at 2M boundaries when vmemmap is
    created), and checks if that section is hot removable.  In this case
    start with pfn 1 and convert it down to pfn 0.  Later pfn is converted
    to struct page, and some fields are checked.  Now, if we do not zero
    struct pages, we get unpredictable results.
    
    In fact when CONFIG_VM_DEBUG is enabled, and we explicitly set all
    vmemmap memory to ones, the following panic is observed with kernel test
    without this patch applied:
    
      BUG: unable to handle kernel NULL pointer dereference at          (null)
      IP: is_pageblock_removable_nolock+0x35/0x90
      PGD 0 P4D 0
      Oops: 0000 [#1] PREEMPT
      ...
      task: ffff88001f4e2900 task.stack: ffffc90000314000
      RIP: 0010:is_pageblock_removable_nolock+0x35/0x90
      Call Trace:
       ? is_mem_section_removable+0x5a/0xd0
       show_mem_removable+0x6b/0xa0
       dev_attr_show+0x1b/0x50
       sysfs_kf_seq_show+0xa1/0x100
       kernfs_seq_show+0x22/0x30
       seq_read+0x1ac/0x3a0
       kernfs_fop_read+0x36/0x190
       ? security_file_permission+0x90/0xb0
       __vfs_read+0x16/0x30
       vfs_read+0x81/0x130
       SyS_read+0x44/0xa0
       entry_SYSCALL_64_fastpath+0x1f/0xbd
    
    Link: 
http://lkml.kernel.org/r/20171013173214.27300-7-pasha.tatashin@xxxxxxxxxx
    Signed-off-by: Pavel Tatashin <pasha.tatashin@xxxxxxxxxx>
    Reviewed-by: Steven Sistare <steven.sistare@xxxxxxxxxx>
    Reviewed-by: Daniel Jordan <daniel.m.jordan@xxxxxxxxxx>
    Reviewed-by: Bob Picco <bob.picco@xxxxxxxxxx>
    Tested-by: Bob Picco <bob.picco@xxxxxxxxxx>
    Acked-by: Michal Hocko <mhocko@xxxxxxxx>
    Cc: Alexander Potapenko <glider@xxxxxxxxxx>
    Cc: Andrey Ryabinin <aryabinin@xxxxxxxxxxxxx>
    Cc: Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx>
    Cc: Catalin Marinas <catalin.marinas@xxxxxxx>
    Cc: Christian Borntraeger <borntraeger@xxxxxxxxxx>
    Cc: David S. Miller <davem@xxxxxxxxxxxxx>
    Cc: Dmitry Vyukov <dvyukov@xxxxxxxxxx>
    Cc: Heiko Carstens <heiko.carstens@xxxxxxxxxx>
    Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
    Cc: Ingo Molnar <mingo@xxxxxxxxxx>
    Cc: Mark Rutland <mark.rutland@xxxxxxx>
    Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx>
    Cc: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>
    Cc: Michal Hocko <mhocko@xxxxxxxxxx>
    Cc: Sam Ravnborg <sam@xxxxxxxxxxxx>
    Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
    Cc: Will Deacon <will.deacon@xxxxxxx>
    Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
    Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>

:040000 040000 b0422cb4f5ef60f5bc7f0686d135c869680c603d 
51ef20afe641afceaf5530b83b4f1b9a51563939 M      include
:040000 040000 55be7a5dd879578dc3f88bec059bcc392e3f1a1c 
b4c9f81df05629bb034b6d0bdc0454579f2986fe M      mm


--
Sander

> 
> Juergen
> 
>>
>> --
>> Sander
>>
>> Attached: .config and full serial log
>>
>>  0.000000] ACPI: Early table checksum verification disabled
>> [    0.000000] ACPI: RSDP 0x00000000000FB100 000014 (v00 ACPIAM)
>> [    0.000000] ACPI: RSDT 0x00000000C7F90000 000048 (v01 MSI    OEMSLIC  
>> 20100913 MSFT 00000097)
>> [    0.000000] ACPI: FACP 0x00000000C7F90200 000084 (v01 7640MS A7640100 
>> 20100913 MSFT 00000097)
>> [    0.000000] ACPI: DSDT 0x00000000C7F905E0 009427 (v01 A7640  A7640100 
>> 00000100 INTL 20051117)
>> [    0.000000] ACPI: FACS 0x00000000C7F9E000 000040
>> [    0.000000] ACPI: APIC 0x00000000C7F90390 000088 (v01 7640MS A7640100 
>> 20100913 MSFT 00000097)
>> [    0.000000] ACPI: MCFG 0x00000000C7F90420 00003C (v01 7640MS OEMMCFG  
>> 20100913 MSFT 00000097)
>> [    0.000000] ACPI: SLIC 0x00000000C7F90460 000176 (v01 MSI    OEMSLIC  
>> 20100913 MSFT 00000097)
>> [    0.000000] ACPI: OEMB 0x00000000C7F9E040 000072 (v01 7640MS A7640100 
>> 20100913 MSFT 00000097)
>> [    0.000000] ACPI: SRAT 0x00000000C7F9A5E0 000108 (v03 AMD    FAM_F_10 
>> 00000002 AMD  00000001)
>> [    0.000000] ACPI: HPET 0x00000000C7F9A6F0 000038 (v01 7640MS OEMHPET  
>> 20100913 MSFT 00000097)
>> [    0.000000] ACPI: IVRS 0x00000000C7F9A730 000110 (v01 AMD    RD890S   
>> 00202031 AMD  00000000)
>> [    0.000000] ACPI: SSDT 0x00000000C7F9A840 000DA4 (v01 A M I  POWERNOW 
>> 00000001 AMD  00000001)
>> [    0.000000] ACPI: Local APIC address 0xfee00000
>> [    0.000000] Setting APIC routing to Xen PV.
>> [    0.000000] NUMA turned off
>> [    0.000000] Faking a node at [mem 0x0000000000000000-0x000000007fffffff]
>> [    0.000000] NODE_DATA(0) allocated [mem 0x7fc15000-0x7fc1efff]
>> [    0.000000] tsc: Fast TSC calibration using PIT
>> [    0.000000] Zone ranges:
>> [    0.000000]   DMA      [mem 0x0000000000001000-0x0000000000ffffff]
>> [    0.000000]   DMA32    [mem 0x0000000001000000-0x000000007fffffff]
>> [    0.000000]   Normal   empty
>> [    0.000000] Movable zone start for each node
>> [    0.000000] Early memory node ranges
>> [    0.000000]   node   0: [mem 0x0000000000001000-0x0000000000095fff]
>> [    0.000000]   node   0: [mem 0x0000000000100000-0x000000007fffffff]
>> [    0.000000] Initmem setup node 0 [mem 
>> 0x0000000000001000-0x000000007fffffff]
>> [    0.000000] On node 0 totalpages: 524181
>> [    0.000000]   DMA zone: 64 pages used for memmap
>> [    0.000000]   DMA zone: 21 pages reserved
>> [    0.000000]   DMA zone: 3989 pages, LIFO batch:0
>> [    0.000000]   DMA32 zone: 8128 pages used for memmap
>> [    0.000000]   DMA32 zone: 520192 pages, LIFO batch:31
>> [    0.000000] BUG: unable to handle kernel NULL pointer dereference at      
>>      (null)
>> [    0.000000] IP: zero_resv_unavail+0x8e/0xe1
>> [    0.000000] PGD 0 P4D 0 
>> [    0.000000] Oops: 0002 [#1] SMP
>> [    0.000000] Modules linked in:
>> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 
>> 4.15.0-rc6-20180104-linus-doflr+ #1
>> [    0.000000] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS 
>> V1.8B1 09/13/2010
>> [    0.000000] RIP: e030:zero_resv_unavail+0x8e/0xe1
>> [    0.000000] RSP: e02b:ffffffff82803d68 EFLAGS: 00010006
>> [    0.000000] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 
>> 0000000000000010
>> [    0.000000] RDX: 000000000007ffff RSI: 0000000000000100 RDI: 
>> ffffea0002000000
>> [    0.000000] RBP: ffffffff82803d70 R08: ffffea0002000000 R09: 
>> 0000000000000002
>> [    0.000000] R10: 0000000000000002 R11: 0000000000000003 R12: 
>> ffffea0000000000
>> [    0.000000] R13: 0000000000000000 R14: ffffffff82803f20 R15: 
>> 0000000000000000
>> [    0.000000] FS:  0000000000000000(0000) GS:ffffffff82e16000(0000) 
>> knlGS:0000000000000000
>> [    0.000000] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [    0.000000] CR2: 0000000000000000 CR3: 0000000002823000 CR4: 
>> 0000000000000660
>> [    0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
>> 0000000000000000
>> [    0.000000] DR3: 0000000000000000 DR6: 0000000000000000 DR7: 
>> 0000000000000000
>> [    0.000000] Call Trace:
>> [    0.000000]  ? free_area_init_nodes+0x690/0x69f
>> [    0.000000]  ? zone_sizes_init+0x4b/0x50
>> [    0.000000]  ? xen_pagetable_init+0x13/0x43f
>> [    0.000000]  ? memblock_find_dma_reserve+0x141/0x15b
>> [    0.000000]  ? memblock_find_dma_reserve+0x150/0x15b
>> [    0.000000]  ? numa_init+0x43c/0x453
>> [    0.000000]  ? setup_arch+0x7a0/0x87f
>> [    0.000000]  ? start_kernel+0x58/0x3a8
>> [    0.000000]  ? iommu_shutdown_noop+0x10/0x10
>> [    0.000000]  ? xen_start_kernel+0x528/0x534
>> [    0.000000] Code: da 49 c1 e0 06 4d 01 e0 48 8b 44 24 08 48 8d 0c 1a 48 
>> 05 ff 0f 00 00 48 c1 e8 0c 48 39 c8 76 16 4c 89 c7 b9 10 00 00 00 44 89 e8 
>> <f3> ab 48 ff c3 49 83 c0 40 eb d2 6a 00 55 31 d2 49 c7 c0 90 78 
>> [    0.000000] RIP: zero_resv_unavail+0x8e/0xe1 RSP: ffffffff82803d68
>> [    0.000000] CR2: 0000000000000000
>> [    0.000000] ---[ end trace b788f32e38f6de39 ]---
>> [    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
>> (XEN) [2018-01-04 09:52:49.218] Hardware Dom0 crashed: rebooting machine in 
>> 5 seconds.
>>
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.