[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Linux kernel tmem regression v4.1 -> v4.4



On 28/09/17 10:42, James Dingwall wrote:
> Hi,
> 
> I am trying to migrate my domU instances from v4.1.44 to v4.4.88 and it
> seems that whether or not e820_host = 1 in the domU configuration is the
> cause of the following stack trace.  Please note I have #define MC_DEBUG
> 1 in arch/x86/xen/multicall.c so the failed hypervisor call is logged. 
> I'm unsure which side of the kernel/xen boundary this really falls.
> 
> Sep 25 22:02:50 [kernel] 1 multicall(s) failed: cpu 0
> Sep 25 22:02:50 [kernel] CPU: 0 PID: 22 Comm: kworker/0:1 Not tainted
> 4.4.88 #157
> Sep 25 22:02:50 [kernel] Workqueue: events balloon_process
> Sep 25 22:02:50 [kernel]  0000000000000000 ffff88001e31fa78
> ffffffff812f9a28 ffff88001f80a220
> Sep 25 22:02:50 [kernel]  ffff88001f80a238 ffff88001e31fab0
> ffffffff81004d79 0000000000115bb7
> Sep 25 22:02:50 [kernel]  ffff88001f80a270 ffff88001f80b330
> ffff880195bb7000 0000000000000000
> Sep 25 22:02:50 [kernel] Call Trace:
> Sep 25 22:02:50 [kernel]  [<ffffffff812f9a28>] dump_stack+0x61/0x7e
> Sep 25 22:02:50 [kernel]  [<ffffffff81004d79>] xen_mc_flush+0xfd/0x1a0
> Sep 25 22:02:50 [kernel]  [<ffffffff81006be5>] xen_alloc_pte+0x176/0x18e
> Sep 25 22:02:50 [kernel]  [<ffffffff8154521b>] phys_pmd_init+0x23c/0x2af
> Sep 25 22:02:50 [kernel]  [<ffffffff8154549b>] phys_pud_init+0x20d/0x2d4
> Sep 25 22:02:50 [kernel]  [<ffffffff81546022>]
> kernel_physical_mapping_init+0x15e/0x233
> Sep 25 22:02:50 [kernel]  [<ffffffff81542694>]
> init_memory_mapping+0x1c7/0x264
> Sep 25 22:02:50 [kernel]  [<ffffffff810411be>] arch_add_memory+0x50/0xda
> Sep 25 22:02:50 [kernel]  [<ffffffff81543191>]
> add_memory_resource+0x9c/0x12d
> Sep 25 22:02:50 [kernel]  [<ffffffff8137462f>]
> reserve_additional_memory+0x125/0x16b
> Sep 25 22:02:50 [kernel]  [<ffffffff8137482d>] balloon_process+0x1b8/0x2c5
> Sep 25 22:02:50 [kernel]  [<ffffffff8107df27>] ?
> __raw_callee_save___pv_queued_spin_unlock+0x11/0x1e
> Sep 25 22:02:50 [kernel]  [<ffffffff81060c18>] process_one_work+0x19d/0x2a9
> Sep 25 22:02:50 [kernel]  [<ffffffff8106162a>] worker_thread+0x27d/0x36e
> Sep 25 22:02:50 [kernel]  [<ffffffff810613ad>] ? rescuer_thread+0x2a2/0x2a2
> Sep 25 22:02:50 [kernel]  [<ffffffff8106575b>] kthread+0xda/0xe2
> Sep 25 22:02:50 [kernel]  [<ffffffff81065681>] ?
> kthread_worker_fn+0x13f/0x13f
> Sep 25 22:02:50 [kernel]  [<ffffffff8154c57f>] ret_from_fork+0x3f/0x70
> Sep 25 22:02:50 [kernel]  [<ffffffff81065681>] ?
> kthread_worker_fn+0x13f/0x13f
> Sep 25 22:02:50 [kernel]   call  1/2: op=14 arg=[ffff880115bb7000]
> result=0_xen_alloc_pte+0x81/0x18e
> Sep 25 22:02:50 [kernel]   call  2/2: op=26 arg=[ffff88001f80b330]
> result=-1_xen_alloc_pte+0xd7/0x18e
> Sep 25 22:02:50 [kernel] ------------[ cut here ]------------
> 
> 
> xen version is 4.8.1-r3 from Gentoo, dom0 is 4.1.44.  I have seen the
> same trace logged in an Ubuntu 16.04 guest with a 4.4 kernel.  I don't
> have a specific test case which triggers this but it will usually appear
> within 24 hours but it depends on how much work the domU has been
> performing (so probably how much ballooning it has been doing).  Setting
> e820_host = 0 in the config seems to prevent this happening.
> 
> In the kernel git log v4.1.44..v4.4.89 -- :/arch/x86/xen/mmu.c shows
> some commits which seem to relate to the failed hypervisor operation and
> working round the e820 map.  I have not done a bisect to try and isolate
> this more definitively.  I suspect this could be a more general balloon
> issue but perhaps is revealed with tmem more easily as the rate of
> ballooning up/down is higher than occasional manual changes.
> 
> This is the guest /proc/iomem with e820_host = 0:
> 
> KERNEL: 4.4.89 #157 SMP Wed Sep 27 19:30:28 BST 2017
> TMEM MODULE PARAMS:
> /sys/module/tmem/parameters/cleancache: Y
> /sys/module/tmem/parameters/frontswap: Y
> /sys/module/tmem/parameters/selfballooning: Y
> /sys/module/tmem/parameters/selfshrinking: Y
> KERNEL COMMAND LINE: root=/dev/ram0 init=/linuxrc ramdisk=8192
> real_root=/dev/systemvg/rootlv udev doscsi dolvm tmem
> /proc/iomem:
> 00000000-00000fff : reserved
> 00001000-0009ffff : System RAM
> 000a0000-000fffff : reserved
>   000f0000-000fffff : System ROM
> 00100000-3fffffff : System RAM
>   01000000-015509ad : Kernel code
>   015509ae-01807ebf : Kernel data
>   01914000-019c1fff : Kernel bss
> fee00000-fee00fff : Local APIC
> 
> And with e820_host = 1:
> 
> KERNEL: 4.4.89 #157 SMP Wed Sep 27 19:30:28 BST 2017
> TMEM MODULE PARAMS:
> /sys/module/tmem/parameters/cleancache: Y
> /sys/module/tmem/parameters/frontswap: Y
> /sys/module/tmem/parameters/selfballooning: Y
> /sys/module/tmem/parameters/selfshrinking: Y
> KERNEL COMMAND LINE: root=/dev/ram0 init=/linuxrc ramdisk=8192
> real_root=/dev/systemvg/rootlv udev doscsi dolvm tmem
> /proc/iomem:
> 00000000-00000fff : reserved
> 00001000-0009ffff : System RAM
> 000a0000-000fffff : reserved
>   000f0000-000fffff : System ROM
> 00100000-1fffffff : System RAM
>   01000000-015509ad : Kernel code
>   015509ae-01807ebf : Kernel data
>   01914000-019c1fff : Kernel bss
> 20000000-d7feffff : Unusable memory
> d7ff0000-d7ffdfff : ACPI Tables
> d7ffe000-d7ffffff : ACPI Non-volatile Storage
> fee00000-fee00fff : Local APIC
> 100000000-11fffffff : System RAM
> 
> 
> If other information about the environment is useful please let me know.

Cc-ing Konrad, who should be much more familiar with tmem than I am.


Juergen


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.