|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] page faults on machines with > 4TB memory
On Thu, Jul 23, 2015 at 06:01:45PM +0100, Andrew Cooper wrote:
> On 23/07/15 17:35, Elena Ufimtseva wrote:
> > Hi
> >
> > While working on bugs during boot time on large oracle server x4-8,
> > There is a problem with booting Xen on large machines with > 4TB memory,
> > such as Oracle x4-8.
> > The page fault occured initially while loading xen pm info into hypervisor
> > (you can see it in serial log attahced named 4.4.2_no_mem_override).
> > Tracing down an issue shows that page fault occures in timer.c code
> > while getting heap size.
> >
> > Here is the original call trace:
> > rocessor: Uploading Xen processor PM info
> > @ (XEN) ----[ Xen-4.4.3-preOVM x86_64 debug=n Tainted: C ]----
> > @ (XEN) CPU: 0
> > @ (XEN) RIP: e008:[<ffff82d08022e747>] add_entry+0x27/0x120
> > @ (XEN) RFLAGS: 0000000000010082 CONTEXT: hypervisor
> > @ (XEN) rax: ffff8a2d080513a20 rbx: ffff83808e802300 rcx:
> > 00000000000000e8
> > @ (XEN) rdx: 00000000000000e8 rsi: 00000000000000e8 rdi:
> > ffff83808e802300
> > @ (XEN) rbp: ffff82d080513a20 rsp: ffff82d0804d7c70 r8:
> > ffff8840ffdb5010
> > @ (XEN) r9: 0000000000000017 r10: ffff83808e802180 r11:
> > 0200200200200200
> > @ (XEN) r12: ffff82d080533080 r13: 0000000000000296 r14:
> > 0100100100100100
> > @ (XEN) r15: 00000000000000e8 cr0: 0000000080050033 cr4:
> > 00000000001526f0
> > @ (XEN) cr3: 00000100818b2000 cr2: ffff8840ffdb5010
> > @ (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008
> > @ (XEN) Xen stack trace from rsp=ffff82d0804d7c70:
> > @ (XEN) ffff83808e802300 ffff82d080513a20 ffff82d08022f59b
> > ffff82d080533080
> > @ (XEN) ffff82d080532f50 00000000000000e8 ffff83808e802328
> > 0000000000000000
> > @ (XEN) ffff82d080513a20 ffff83808e8022c0 ffff82d080533200
> > 00000000000000e8
> > @ (XEN) 00000000000000f0 ffff82d0805331c0 ffff82d0802458e2
> > 0000000000000000
> > @ (XEN) 00000000000000e8 ffff83808e802334 ffff8384be7979b0
> > ffff82d0804d7d78
> > @ (XEN) 0000000000000000 ffff8384be77c700 ffff82d0804d7d78
> > ffff82d080513a20
> > @ (XEN) ffff82d080246207 00000000000000e8 00000000000000e8
> > ffff8384be7979b0
> > @ (XEN) ffff82d08024518a ffff82d080533080 0000000000000070
> > ffff82d080533da8
> > @ (XEN) 00000001000000e8 ffff8384be797a00 000000e800000001
> > 002ab980002abd68
> > @ (XEN) 0000271000124f80 002abd6800124f80 00000000002ab980
> > ffff82d0803753e0
> > @ (XEN) 0000000000010101 0000000000000001 ffff82d0804d7e18
> > ffff881fb4afbc88
> > @ (XEN) ffff82d0804d0000 ffff881fb28a4400 ffff82d0804fca80
> > ffffffff819b7080
> > @ (XEN) ffff82d080266c16 ffff83808fb46ba8 ffff82d080208a82
> > ffff83006bddd190
> > @ (XEN) 0000000000000292 0300000100000036 00000001000000f6
> > 000000000000000f
> > @ (XEN) 0000007f000c0082 0000000000000000 0000007f000c0082
> > 0000000000000000
> > @ (XEN) 000000000000000a ffff881fb28a4400 0000000000000005
> > 0000000000000000
> > @ (XEN) 0000000000000000 00000000000000fe 0000000000000001
> > 0000000000000001
> > @ (XEN) 0000000000000000 0000000000000000 ffff82d08031f521
> > 0000000000000000
> > @ (XEN) 0000000000000246 ffffffff810010ea 0000000000000000
> > ffffffff810010ea
> > @ (XEN) 000000000000e030 0000000000000246 ffff83006bddd000
> > ffff881fb4afbd48
> > @ (XEN) Xen call trace:
> > @ (XEN) [<ffff82d08022e747>] add_entry+0x27/0x120
> > @ (XEN) [<ffff82d08022f59b>] set_timer+0x10b/0x220
> > @ (XEN) [<ffff82d0802458e2>] cpufreq_governor_dbs+0x1e2/0x2f0
> > @ (XEN) [<ffff82d080246207>] __cpufreq_set_policy+0x87/0x120
> > @ (XEN) [<ffff82d08024518a>] cpufreq_add_cpu+0x24a/0x4f0
> > @ (XEN) [<ffff82d080266c16>] do_platform_op+0x9c6/0x1650
> > @ (XEN) [<ffff82d080208a82>] evtchn_check_pollers+0x22/0xb0
> > @ (XEN) [<ffff82d08031f521>] do_iret+0xc1/0x1a0
> > @ (XEN) [<ffff82d0803243a9>] syscall_enter+0xa9/0xae
> > @ (XEN)
> > @ (XEN) Pagetable walk from ffff8840ffdb5010:
> > @ (XEN) L4[0x110] = 00000100818b3067 00000000000018b3
> > @ (XEN) L3[0x103] = 0000000000000000 ffffffffffffffff
> > @ (XEN)
> > @ (XEN) ****************************************
> >
> > 0xffff82d08022e720 <add_entry>: movzwl 0x28(%rdi),%edx
> > 0xffff82d08022e724 <add_entry+4>: push %rbp
> > 0xffff82d08022e725 <add_entry+5>:
> > lea 0x2e52f4(%rip),%rax # 0xffff82d080513a20
> > <__per_cpu_offset>
> > 0xffff82d08022e72c <add_entry+12>:
> > lea 0x30494d(%rip),%r10 # 0xffff82d080533080 <per_cpu__timers>
> > 0xffff82d08022e733 <add_entry+19>: push %rbx
> > 0xffff82d08022e734 <add_entry+20>: add (%rax,%rdx,8),%r10
> > 0xffff82d08022e738 <add_entry+24>: movl $0x0,0x8(%rdi)
> > 0xffff82d08022e73f <add_entry+31>: movb $0x3,0x2a(%rdi)
> > 0xffff82d08022e743 <add_entry+35>: mov 0x8(%r10),%r8
> > 0xffff82d08022e747 <add_entry+39>: movzwl (%r8),%ecx
> >
> > And this points to
> > int sz = GET_HEAP_SIZE(heap);
> > in add_entry of timer.c.
> >
> > static int add_entry(struct timer *t)
> >
> > {
> >
> > ffff82d08022cad3: 53 push %rbx
> >
> > struct timers *timers = &per_cpu(timers, t->cpu);
> >
> > ffff82d08022cad4: 4c 03 14 d0 add (%rax,%rdx,8),%r10
> >
> > int rc;
> >
> >
> >
> > ASSERT(t->status == TIMER_STATUS_invalid);
> >
> >
> >
> > /* Try to add to heap. t->heap_offset indicates whether we succeed. */
> >
> > t->heap_offset = 0;
> >
> > ffff82d08022cad8: c7 47 08 00 00 00 00 movl $0x0,0x8(%rdi)
> >
> > t->status = TIMER_STATUS_in_heap;
> >
> > ffff82d08022cadf: c6 47 2a 03 movb $0x3,0x2a(%rdi)
> >
> > rc = add_to_heap(timers->heap, t);
> >
> > ffff82d08022cae3: 4d 8b 42 08 mov 0x8(%r10),%r8
> >
> >
> >
> >
> >
> > /* Add new entry @t to @heap. Return TRUE if new top of heap. */
> >
> > static int add_to_heap(struct timer **heap, struct timer *t)
> >
> > {
> >
> > int sz = GET_HEAP_SIZE(heap);
> >
> > ffff82d08022cae7: 41 0f b7 08 movzwl (%r8),%ecx
> >
> >
> >
> > /* Fail if the heap is full. */
> >
> > if ( unlikely(sz == GET_HEAP_LIMIT(heap)) )
> >
> > But checking values for nr_cpumask_bits, nr_cpu_ids and NR_CPUS did not
> > provide any clues on why it fails here.
> >
> > After disabling xen cpufreq in linux, the page fault did not appear, but
> > creating new guest caused another fatal page fault:
> >
> > CPU: 0
> > @ (XEN) RIP: e008:[<ffff82d08025d59b>] __find_first_bit+0xb/0x30
> > @ (XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor
> > @ (XEN) rax: 0000000000000000 rbx: 00000000ffdb53c0 rcx:
> > 0000000000000004
> > @ (XEN) rdx: ffff82d080513a20 rsi: 00000000000000f0 rdi:
> > ffff8840ffdb53c0
> > @ (XEN) rbp: 00000000000000e9 rsp: ffff82d0804d7d88 r8:
> > 0000000000000000
> > @ (XEN) r9: 0000000000000000 r10: 0000000000000017 r11:
> > 0000000000000000
> > @ (XEN) r12: ffff8381875ee3e0 r13: ffff82d0804d7e98 r14:
> > 00000000000000e9
> > @ (XEN) r15: 00000000000000f0 cr0: 0000000080050033 cr4:
> > 00000000001526f0
> > @ (XEN) cr3: 0000008174093000 cr2: ffff8840ffdb53c0
> > @ (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008
> > @ (XEN) Xen stack trace from rsp=ffff82d0804d7d88:
> > @ (XEN) 00000000000000e7 ffff82d080206030 000000cf7d47d0a2
> > 00000000000000e9
> > @ (XEN) 00000000000000f0 0000000000000002 ffff83808fb6ffd0
> > ffff82d080533db8
> > @ (XEN) 0000000000000000 ffff82d080532f50 ffff82d0804d0000
> > ffff82d080533db8
> > @ (XEN) 00007fa8c83e5004 ffff82d0804d7e08 ffff82d080533db8
> > ffff83818b4e5000
> > @ (XEN) 000000090000000f 00007fa8c8390001 00007fa800000002
> > 00007fa8ae7f8eb8
> > @ (XEN) 0000000000000002 00007fa898004170 000000000159c320
> > 00000034ccc6cffe
> > @ (XEN) 00007fa8c83e5000 0000000000000000 000000000159c320
> > fffffc73ffffffff
> > @ (XEN) 00000034ccf6e920 00000034ccf6e920 00000034ccf6e920
> > 00000034ccc94298
> > @ (XEN) 00007fa898004170 00000034ccc94220 ffffffffffffffff
> > ffffffffffffffff
> > @ (XEN) ffffffffffffffff 000000ffffffffff 00000034ca0e08c7
> > 0000000000000100
> > @ (XEN) 00000034ca0e08c7 0000000000000033 0000000000000246
> > ffff83006bddd000
> > @ (XEN) ffff8808456f1e98 00007fa8ae7f8d90 ffff88084ad1d900
> > 0000000000000001
> > @ (XEN) 00007fa8ae7f8d90 ffff82d0803243a9 00000000ffffffff
> > 0000000001d0085c
> > @ (XEN) 00007fa8c84549c0 00007fa898004170 ffff8808456f1e98
> > 00007fa8ae7f8d90
> > @ (XEN) 0000000000000282 00000000019c9998 0000000000000003
> > 0000000001d00a49
> > @ (XEN) 0000000000000024 ffffffff8100148a 00007fa898004170
> > 00007fa8ae7f8ed0
> > @ (XEN) 00007fa8c83e5004 0001010000000000 ffffffff8100148a
> > 000000000000e033
> > @ (XEN) 0000000000000282 ffff8808456f1e40 000000000000e02b
> > 0000000000000000
> > @ (XEN) 0000000000000000 0000000000000000 0000000000000000
> > 0000000000000000
> > @ (XEN) ffff83006bddd000 0000000000000000 0000000000000000
> > @ (XEN) Xen call trace:
> > @ (XEN) [<ffff82d08025d59b>] __find_first_bit+0xb/0x30
> > @ (XEN) [<ffff82d080206030>] do_domctl+0x12b0/0x13d0
> > @ (XEN) [<ffff82d0803243a9>] syscall_enter+0xa9/0xae
> > @ (XEN)
> > @ (XEN) Pagetable walk from ffff8840ffdb53c0:
> > @ (XEN) L4[0x110] = 00000080818b3067 00000000000018b3
> >
> > While booting upstream on the same server (same command line as in other
> > cases)
> > causes another page fault (see attaches upstream_no_mem_override.log);
> >
> > We remembered there there is another open bug about a problem when starting
> > with more than 4 TB memory. The workaround for this was to override mem at
> > Xen command line. Tried this, and with upstream Xen and one that 4.4.3 with
> > enabled cpufreq linux driver, problem dissapears. See attached logs
> > upstream_with_mem_override.log and 4.4.3_with_mem_overrride.log.
> >
> > Any information on what can be an issue here or any other pointers will be
> > very helpful.
> > I will provide additional info if needed.
> >
> > Thank you
> > Elena
>
> This is an issue we have found in XenServer as well.
>
> Observe that ffff8840ffdb53c0 is actually a pointer in the 64bit PV
> virtual region, because the xenheap allocator has wandered off the top
> of the directmap region. This is a direct result of passing numa node
> information to alloc_xenheap_page(), which overrides the check which
> keeps the allocation inside the directmap region.
Thanks Andrew.
Ok, that explains why the address looked odd.
>
> I have worked around in XenServer with
>
> diff --git a/xen/arch/x86/e820.c b/xen/arch/x86/e820.c
> index 3c64f19..715765a 100644
> --- a/xen/arch/x86/e820.c
> +++ b/xen/arch/x86/e820.c
> @@ -15,7 +15,7 @@
> * opt_mem: Limit maximum address of physical RAM.
> * Any RAM beyond this address limit is ignored.
> */
> -static unsigned long long __initdata opt_mem;
> +static unsigned long long __initdata opt_mem = GB(5 * 1024);
> size_param("mem", opt_mem);
>
> /*
>
> This causes Xen to ignore any RAM above the top of the directmap region,
> which happens to be 5TiB on Xen 4.5.
Yes, looks like mem override is a current workaround in our case too.
>
> In some copious free time, I was going to look into segmenting the
> directmap region by numa node, rather than having it linear from 0, so
> xenheap pages can still be properly numa-located.
Thanks Andrew.
>
> ~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |