[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Linux 4.15-rc6 + xen-unstable: BUG: unable to handle kernel NULL pointer dereference at (null), [ 0.000000] IP: zero_resv_unavail+0x8e/0xe1



On 09/01/18 17:38, Boris Ostrovsky wrote:
> On 01/09/2018 11:31 AM, Sander Eikelenboom wrote:
>> On 09/01/18 17:16, Pavel Tatashin wrote:
>>> Hi Juergen,
>>>
>>> Do you have this patch applied:
>>>
>>> https://github.com/torvalds/linux/commit/e8c24773d6b2cd9bc8b36bd6e60beff599be14be
>> Seems this hasn't made it to Linus yet ?

Hmm that was a stupid remark, since the link actually is to Linus his
github repo :p (though not his git.kernel.org repo).

>> I will give it a test and report back, thanks !

Test turns out the patch helps and dom0 boots fine now.
Thanks !

> 
> 
> BTW, I assume this problem goes away if you don't specify dom0_mem?

Haven't tested, since i need the dom0_mem for pci-passthrough.

> -boris
> 

--
Sander


>>
>>> Thank you,
>>> Pavel
>>>
>>> On 01/09/2018 11:10 AM, Juergen Gross wrote:
>>>> On 09/01/18 16:29, Sander Eikelenboom wrote:
>>>>> Since it's already rc7:
>>>>> "Give me a subtle ping, Vasili. One subtle ping only, please."
>>>> I like that film :-)
>> :)
>>
>> --
>> Sander
>>
>>>> Pavel, can you please comment? Do you have an idea how to repair the
>>>> issue or should we revert your patch in 4.15?
>>>>
>>>>
>>>> Juergen
>>>>
>>>>> On 04/01/18 21:02, Sander Eikelenboom wrote:
>>>>>> On 04/01/18 12:44, Juergen Gross wrote:
>>>>>>> On 04/01/18 11:17, Sander Eikelenboom wrote:
>>>>>>>> Hi Boris / Juergen,
>>>>>>>>
>>>>>>>> First of all best wishes for a quite turbulent starting new year.
>>>>>>>>
>>>>>>>> Now the holidays are over I finally gotten to test a linux 4.15-rc6 
>>>>>>>> kernel
>>>>>>>> and experienced a crash in early dom0 boot on my system (AMD phenom 
>>>>>>>> x6).
>>>>>>>>
>>>>>>>> I tested some earlier linux 4.15 rc's but experienced crashes then as 
>>>>>>>> well,
>>>>>>>> but didn't have time to setup serial console to send them in
>>>>>>>> (and waited to see if the issue Boris fixed with AMD PCI 64bit bar's 
>>>>>>>> could be it).
>>>>>>>>
>>>>>>>> But since that patch went in before 4.15 rc6, that doesn't seem to be 
>>>>>>>> the issue.
>>>>>>>> So it could be that the culprit went in pretty earlier in the 4.15 
>>>>>>>> cycle.
>>>>>>>>
>>>>>>>> The 4.15-rc6 kernel boots fine on bare metal, as does a 4.14.6 kernel 
>>>>>>>> on xen-unstable.
>>>>>>>>
>>>>>>>> Hopefully you have a pointer to what is wrong, if not i can try to do 
>>>>>>>> a bisect.
>>>>>>> A bisect would be very welcome.
>>>>>> Hi Juergen / Boris / Pavel,
>>>>>>
>>>>>> Bisection result is:
>>>>>>
>>>>>> a4a3ede2132ae0863e2d43e06f9b5697c51a7a3b is the first bad commit
>>>>>> commit a4a3ede2132ae0863e2d43e06f9b5697c51a7a3b
>>>>>> Author: Pavel Tatashin <pasha.tatashin@xxxxxxxxxx>
>>>>>> Date:   Wed Nov 15 17:36:31 2017 -0800
>>>>>>
>>>>>>      mm: zero reserved and unavailable struct pages
>>>>>>      
>>>>>>      Some memory is reserved but unavailable: not present in 
>>>>>> memblock.memory
>>>>>>      (because not backed by physical pages), but present in 
>>>>>> memblock.reserved.
>>>>>>      Such memory has backing struct pages, but they are not initialized 
>>>>>> by
>>>>>>      going through __init_single_page().
>>>>>>      
>>>>>>      In some cases these struct pages are accessed even if they do not
>>>>>>      contain any data.  One example is page_to_pfn() might access 
>>>>>> page->flags
>>>>>>      if this is where section information is stored (CONFIG_SPARSEMEM,
>>>>>>      SECTION_IN_PAGE_FLAGS).
>>>>>>      
>>>>>>      One example of such memory: trim_low_memory_range() unconditionally
>>>>>>      reserves from pfn 0, but e820__memblock_setup() might provide the
>>>>>>      exiting memory from pfn 1 (i.e.  KVM).
>>>>>>      
>>>>>>      Since struct pages are zeroed in __init_single_page(), and not 
>>>>>> during
>>>>>>      allocation time, we must zero such struct pages explicitly.
>>>>>>      
>>>>>>      The patch involves adding a new memblock iterator:
>>>>>>              for_each_resv_unavail_range(i, p_start, p_end)
>>>>>>      
>>>>>>      Which iterates through reserved && !memory lists, and we zero 
>>>>>> struct pages
>>>>>>      explicitly by calling mm_zero_struct_page().
>>>>>>      
>>>>>>      ===
>>>>>>      
>>>>>>      Here is more detailed example of problem that this patch is 
>>>>>> addressing:
>>>>>>      
>>>>>>      Run tested on qemu with the following arguments:
>>>>>>      
>>>>>>              -enable-kvm -cpu kvm64 -m 512 -smp 2
>>>>>>      
>>>>>>      This patch reports that there are 98 unavailable pages.
>>>>>>      
>>>>>>      They are: pfn 0 and pfns in range [159, 255].
>>>>>>      
>>>>>>      Note, trim_low_memory_range() reserves only pfns in range [0, 15], 
>>>>>> it does
>>>>>>      not reserve [159, 255] ones.
>>>>>>      
>>>>>>      e820__memblock_setup() reports linux that the following physical 
>>>>>> ranges are
>>>>>>      available:
>>>>>>          [1 , 158]
>>>>>>      [256, 130783]
>>>>>>      
>>>>>>      Notice, that exactly unavailable pfns are missing!
>>>>>>      
>>>>>>      Now, lets check what we have in zone 0: [1, 131039]
>>>>>>      
>>>>>>      pfn 0, is not part of the zone, but pfns [1, 158], are.
>>>>>>      
>>>>>>      However, the bigger problem we have if we do not initialize these 
>>>>>> struct
>>>>>>      pages is with memory hotplug.  Because, that path operates at 2M
>>>>>>      boundaries (section_nr).  And checks if 2M range of pages is hot
>>>>>>      removable.  It starts with first pfn from zone, rounds it down to 2M
>>>>>>      boundary (sturct pages are allocated at 2M boundaries when vmemmap 
>>>>>> is
>>>>>>      created), and checks if that section is hot removable.  In this case
>>>>>>      start with pfn 1 and convert it down to pfn 0.  Later pfn is 
>>>>>> converted
>>>>>>      to struct page, and some fields are checked.  Now, if we do not zero
>>>>>>      struct pages, we get unpredictable results.
>>>>>>      
>>>>>>      In fact when CONFIG_VM_DEBUG is enabled, and we explicitly set all
>>>>>>      vmemmap memory to ones, the following panic is observed with kernel 
>>>>>> test
>>>>>>      without this patch applied:
>>>>>>      
>>>>>>        BUG: unable to handle kernel NULL pointer dereference at          
>>>>>> (null)
>>>>>>        IP: is_pageblock_removable_nolock+0x35/0x90
>>>>>>        PGD 0 P4D 0
>>>>>>        Oops: 0000 [#1] PREEMPT
>>>>>>        ...
>>>>>>        task: ffff88001f4e2900 task.stack: ffffc90000314000
>>>>>>        RIP: 0010:is_pageblock_removable_nolock+0x35/0x90
>>>>>>        Call Trace:
>>>>>>         ? is_mem_section_removable+0x5a/0xd0
>>>>>>         show_mem_removable+0x6b/0xa0
>>>>>>         dev_attr_show+0x1b/0x50
>>>>>>         sysfs_kf_seq_show+0xa1/0x100
>>>>>>         kernfs_seq_show+0x22/0x30
>>>>>>         seq_read+0x1ac/0x3a0
>>>>>>         kernfs_fop_read+0x36/0x190
>>>>>>         ? security_file_permission+0x90/0xb0
>>>>>>         __vfs_read+0x16/0x30
>>>>>>         vfs_read+0x81/0x130
>>>>>>         SyS_read+0x44/0xa0
>>>>>>         entry_SYSCALL_64_fastpath+0x1f/0xbd
>>>>>>      
>>>>>>      Link: 
>>>>>> http://lkml.kernel.org/r/20171013173214.27300-7-pasha.tatashin@xxxxxxxxxx
>>>>>>      Signed-off-by: Pavel Tatashin <pasha.tatashin@xxxxxxxxxx>
>>>>>>      Reviewed-by: Steven Sistare <steven.sistare@xxxxxxxxxx>
>>>>>>      Reviewed-by: Daniel Jordan <daniel.m.jordan@xxxxxxxxxx>
>>>>>>      Reviewed-by: Bob Picco <bob.picco@xxxxxxxxxx>
>>>>>>      Tested-by: Bob Picco <bob.picco@xxxxxxxxxx>
>>>>>>      Acked-by: Michal Hocko <mhocko@xxxxxxxx>
>>>>>>      Cc: Alexander Potapenko <glider@xxxxxxxxxx>
>>>>>>      Cc: Andrey Ryabinin <aryabinin@xxxxxxxxxxxxx>
>>>>>>      Cc: Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx>
>>>>>>      Cc: Catalin Marinas <catalin.marinas@xxxxxxx>
>>>>>>      Cc: Christian Borntraeger <borntraeger@xxxxxxxxxx>
>>>>>>      Cc: David S. Miller <davem@xxxxxxxxxxxxx>
>>>>>>      Cc: Dmitry Vyukov <dvyukov@xxxxxxxxxx>
>>>>>>      Cc: Heiko Carstens <heiko.carstens@xxxxxxxxxx>
>>>>>>      Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
>>>>>>      Cc: Ingo Molnar <mingo@xxxxxxxxxx>
>>>>>>      Cc: Mark Rutland <mark.rutland@xxxxxxx>
>>>>>>      Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx>
>>>>>>      Cc: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>
>>>>>>      Cc: Michal Hocko <mhocko@xxxxxxxxxx>
>>>>>>      Cc: Sam Ravnborg <sam@xxxxxxxxxxxx>
>>>>>>      Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>>>>>>      Cc: Will Deacon <will.deacon@xxxxxxx>
>>>>>>      Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
>>>>>>      Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
>>>>>>
>>>>>> :040000 040000 b0422cb4f5ef60f5bc7f0686d135c869680c603d 
>>>>>> 51ef20afe641afceaf5530b83b4f1b9a51563939 M       include
>>>>>> :040000 040000 55be7a5dd879578dc3f88bec059bcc392e3f1a1c 
>>>>>> b4c9f81df05629bb034b6d0bdc0454579f2986fe M       mm
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Sander
>>>>>>
>>>>>>> Juergen
>>>>>>>
>>>>>>>> --
>>>>>>>> Sander
>>>>>>>>
>>>>>>>> Attached: .config and full serial log
>>>>>>>>
>>>>>>>>   0.000000] ACPI: Early table checksum verification disabled
>>>>>>>> [    0.000000] ACPI: RSDP 0x00000000000FB100 000014 (v00 ACPIAM)
>>>>>>>> [    0.000000] ACPI: RSDT 0x00000000C7F90000 000048 (v01 MSI    
>>>>>>>> OEMSLIC  20100913 MSFT 00000097)
>>>>>>>> [    0.000000] ACPI: FACP 0x00000000C7F90200 000084 (v01 7640MS 
>>>>>>>> A7640100 20100913 MSFT 00000097)
>>>>>>>> [    0.000000] ACPI: DSDT 0x00000000C7F905E0 009427 (v01 A7640  
>>>>>>>> A7640100 00000100 INTL 20051117)
>>>>>>>> [    0.000000] ACPI: FACS 0x00000000C7F9E000 000040
>>>>>>>> [    0.000000] ACPI: APIC 0x00000000C7F90390 000088 (v01 7640MS 
>>>>>>>> A7640100 20100913 MSFT 00000097)
>>>>>>>> [    0.000000] ACPI: MCFG 0x00000000C7F90420 00003C (v01 7640MS 
>>>>>>>> OEMMCFG  20100913 MSFT 00000097)
>>>>>>>> [    0.000000] ACPI: SLIC 0x00000000C7F90460 000176 (v01 MSI    
>>>>>>>> OEMSLIC  20100913 MSFT 00000097)
>>>>>>>> [    0.000000] ACPI: OEMB 0x00000000C7F9E040 000072 (v01 7640MS 
>>>>>>>> A7640100 20100913 MSFT 00000097)
>>>>>>>> [    0.000000] ACPI: SRAT 0x00000000C7F9A5E0 000108 (v03 AMD    
>>>>>>>> FAM_F_10 00000002 AMD  00000001)
>>>>>>>> [    0.000000] ACPI: HPET 0x00000000C7F9A6F0 000038 (v01 7640MS 
>>>>>>>> OEMHPET  20100913 MSFT 00000097)
>>>>>>>> [    0.000000] ACPI: IVRS 0x00000000C7F9A730 000110 (v01 AMD    RD890S 
>>>>>>>>   00202031 AMD  00000000)
>>>>>>>> [    0.000000] ACPI: SSDT 0x00000000C7F9A840 000DA4 (v01 A M I  
>>>>>>>> POWERNOW 00000001 AMD  00000001)
>>>>>>>> [    0.000000] ACPI: Local APIC address 0xfee00000
>>>>>>>> [    0.000000] Setting APIC routing to Xen PV.
>>>>>>>> [    0.000000] NUMA turned off
>>>>>>>> [    0.000000] Faking a node at [mem 
>>>>>>>> 0x0000000000000000-0x000000007fffffff]
>>>>>>>> [    0.000000] NODE_DATA(0) allocated [mem 0x7fc15000-0x7fc1efff]
>>>>>>>> [    0.000000] tsc: Fast TSC calibration using PIT
>>>>>>>> [    0.000000] Zone ranges:
>>>>>>>> [    0.000000]   DMA      [mem 0x0000000000001000-0x0000000000ffffff]
>>>>>>>> [    0.000000]   DMA32    [mem 0x0000000001000000-0x000000007fffffff]
>>>>>>>> [    0.000000]   Normal   empty
>>>>>>>> [    0.000000] Movable zone start for each node
>>>>>>>> [    0.000000] Early memory node ranges
>>>>>>>> [    0.000000]   node   0: [mem 0x0000000000001000-0x0000000000095fff]
>>>>>>>> [    0.000000]   node   0: [mem 0x0000000000100000-0x000000007fffffff]
>>>>>>>> [    0.000000] Initmem setup node 0 [mem 
>>>>>>>> 0x0000000000001000-0x000000007fffffff]
>>>>>>>> [    0.000000] On node 0 totalpages: 524181
>>>>>>>> [    0.000000]   DMA zone: 64 pages used for memmap
>>>>>>>> [    0.000000]   DMA zone: 21 pages reserved
>>>>>>>> [    0.000000]   DMA zone: 3989 pages, LIFO batch:0
>>>>>>>> [    0.000000]   DMA32 zone: 8128 pages used for memmap
>>>>>>>> [    0.000000]   DMA32 zone: 520192 pages, LIFO batch:31
>>>>>>>> [    0.000000] BUG: unable to handle kernel NULL pointer dereference 
>>>>>>>> at           (null)
>>>>>>>> [    0.000000] IP: zero_resv_unavail+0x8e/0xe1
>>>>>>>> [    0.000000] PGD 0 P4D 0
>>>>>>>> [    0.000000] Oops: 0002 [#1] SMP
>>>>>>>> [    0.000000] Modules linked in:
>>>>>>>> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 
>>>>>>>> 4.15.0-rc6-20180104-linus-doflr+ #1
>>>>>>>> [    0.000000] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , 
>>>>>>>> BIOS V1.8B1 09/13/2010
>>>>>>>> [    0.000000] RIP: e030:zero_resv_unavail+0x8e/0xe1
>>>>>>>> [    0.000000] RSP: e02b:ffffffff82803d68 EFLAGS: 00010006
>>>>>>>> [    0.000000] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 
>>>>>>>> 0000000000000010
>>>>>>>> [    0.000000] RDX: 000000000007ffff RSI: 0000000000000100 RDI: 
>>>>>>>> ffffea0002000000
>>>>>>>> [    0.000000] RBP: ffffffff82803d70 R08: ffffea0002000000 R09: 
>>>>>>>> 0000000000000002
>>>>>>>> [    0.000000] R10: 0000000000000002 R11: 0000000000000003 R12: 
>>>>>>>> ffffea0000000000
>>>>>>>> [    0.000000] R13: 0000000000000000 R14: ffffffff82803f20 R15: 
>>>>>>>> 0000000000000000
>>>>>>>> [    0.000000] FS:  0000000000000000(0000) GS:ffffffff82e16000(0000) 
>>>>>>>> knlGS:0000000000000000
>>>>>>>> [    0.000000] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>>> [    0.000000] CR2: 0000000000000000 CR3: 0000000002823000 CR4: 
>>>>>>>> 0000000000000660
>>>>>>>> [    0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
>>>>>>>> 0000000000000000
>>>>>>>> [    0.000000] DR3: 0000000000000000 DR6: 0000000000000000 DR7: 
>>>>>>>> 0000000000000000
>>>>>>>> [    0.000000] Call Trace:
>>>>>>>> [    0.000000]  ? free_area_init_nodes+0x690/0x69f
>>>>>>>> [    0.000000]  ? zone_sizes_init+0x4b/0x50
>>>>>>>> [    0.000000]  ? xen_pagetable_init+0x13/0x43f
>>>>>>>> [    0.000000]  ? memblock_find_dma_reserve+0x141/0x15b
>>>>>>>> [    0.000000]  ? memblock_find_dma_reserve+0x150/0x15b
>>>>>>>> [    0.000000]  ? numa_init+0x43c/0x453
>>>>>>>> [    0.000000]  ? setup_arch+0x7a0/0x87f
>>>>>>>> [    0.000000]  ? start_kernel+0x58/0x3a8
>>>>>>>> [    0.000000]  ? iommu_shutdown_noop+0x10/0x10
>>>>>>>> [    0.000000]  ? xen_start_kernel+0x528/0x534
>>>>>>>> [    0.000000] Code: da 49 c1 e0 06 4d 01 e0 48 8b 44 24 08 48 8d 0c 
>>>>>>>> 1a 48 05 ff 0f 00 00 48 c1 e8 0c 48 39 c8 76 16 4c 89 c7 b9 10 00 00 
>>>>>>>> 00 44 89 e8 <f3> ab 48 ff c3 49 83 c0 40 eb d2 6a 00 55 31 d2 49 c7 c0 
>>>>>>>> 90 78
>>>>>>>> [    0.000000] RIP: zero_resv_unavail+0x8e/0xe1 RSP: ffffffff82803d68
>>>>>>>> [    0.000000] CR2: 0000000000000000
>>>>>>>> [    0.000000] ---[ end trace b788f32e38f6de39 ]---
>>>>>>>> [    0.000000] Kernel panic - not syncing: Attempted to kill the idle 
>>>>>>>> task!
>>>>>>>> (XEN) [2018-01-04 09:52:49.218] Hardware Dom0 crashed: rebooting 
>>>>>>>> machine in 5 seconds.
>>>>>>>>
>>>>>
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.