Xen project Mailing List

Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

To: "Boris Ostrovsky" <boris.ostrovsky@xxxxxxxxxx>, "Kevin Moraga" <kmoragas@xxxxxxxxxx>

From: "Jan Beulich" <JBeulich@xxxxxxxx>

Date: Tue, 10 May 2016 01:23:01 -0600

Delivery-date: Tue, 10 May 2016 07:23:19 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

>>> On 09.05.16 at 20:40, <boris.ostrovsky@xxxxxxxxxx> wrote: > On 05/09/2016 01:22 PM, Kevin Moraga wrote: >> >> On 05/09/2016 11:15 AM, Boris Ostrovsky wrote: >>> On 05/09/2016 12:40 PM, Kevin Moraga wrote: >>>> On 05/09/2016 09:53 AM, Jan Beulich wrote: >>>>>>>> On 09.05.16 at 16:52, <kmoragas@xxxxxxxxxx> wrote: >>>>>> On 05/09/2016 04:08 AM, Jan Beulich wrote: >>>>>>>>>> On 09.05.16 at 00:51, <kmoragas@xxxxxxxxxx> wrote: >>>>>>>> I'm try to compile kernel 4.4.8 (using fedora 23) to run with Xen 4.6.0 >>>>>>>> and Intel Skylake processor (Intel Core i7-6600U) >>>>>>>> >>>>>>>> This kernel is crashing almost in the same way as explained in this >>>>>>>> thread... But my problem is mainly with Skylake. Because the same >>>>>>>> configuration works within another machine but with another processor >>>>>>>> (Intel Core i5-3340M). Attached are the boot logs. >>>>>>> The address the fault occurs on (ffff8000006bdee0) is bogus, so >>>>>>> from the register and stack dump alone I don't think we can derive >>>>>>> much. What we'd need is access to the kernel binary used (or >>>>>>> really the vmlinux accompanying the vmlinuz that was used), in >>>>>>> order to see where exactly the kernel died, and hence where this >>>>>>> bogus address originates from. As I understand it this is a kernel >>>>>>> you built yourself - can you make said binary from exactly that >>>>>>> build available somewhere? >>>>>> Yes I have it. But I get the same crash on various 4.4.X and also with >>>>>> 4.5.3. >>>>>> >>>>>> **https://drive.google.com/open?id=0B6Ol0ob95UxXQV9HM1BWMmhCZ0E >>>>> Well, this doesn't contain the file I'm after (vmlinux), and taking >>>>> apart vmlinuz would be quite cumbersome. >>>>> >>>>> Jan >>>>> >>>> Oh sorry, here is the link to vmlinux >>>> >>>> > https://drive.google.com/file/d/0B6Ol0ob95UxXN0dDMWM1a29vMEk/view?usp=sharing >>> This is still vmlinuz but the failure is at >>> >>> ffffffff81007ef3: 48 3b 1d 4e 2e ec 00 cmp >>> 0xec2e4e(%rip),%rbx # 0xffffffff81ecad48 >>> ffffffff81007efa: 73 51 jae 0xffffffff81007f4d >>> ffffffff81007efc: 31 c0 xor %eax,%eax >>> ffffffff81007efe: 48 8b 15 03 d2 c0 00 mov >>> 0xc0d203(%rip),%rdx # 0xffffffff81c15108 >>> ffffffff81007f05: 90 nop >>> ffffffff81007f06: 90 nop >>> ffffffff81007f07: 90 nop >>> ffffffff81007f08: 4c 8b 2c da mov >>> (%rdx,%rbx,8),%r13 <====== >>> ffffffff81007f0c: 90 nop >>> ffffffff81007f0d: 90 nop >>> ffffffff81007f0e: 90 nop >>> ffffffff81007f0f: 85 c0 test %eax,%eax >>> ffffffff81007f11: 78 3a js 0xffffffff81007f4d >>> ffffffff81007f13: 48 8b 05 ee 11 d2 00 mov >>> 0xd211ee(%rip),%rax # 0xffffffff81d29108 >>> ffffffff81007f1a: 49 39 c5 cmp %rax,%r13 >>> ffffffff81007f1d: 73 6f jae 0xffffffff81007f8e >>> ffffffff81007f1f: 48 8b 05 ea 11 d2 00 mov >>> 0xd211ea(%rip),%rax # 0xffffffff81d29110 >>> ffffffff81007f26: 4a 8b 04 e8 mov (%rax,%r13,8),%rax >>> >>> Any chance you could provide an un-stripped binary or System.map? >> Here is the link for System.map >> >> > https://drive.google.com/file/d/0B6Ol0ob95UxXYVE4SzdMcENsWWs/view?usp=sharing >> > > > So my semi-educated guess at your stack is > __early_ioremap > -> __early_set_fixmap > -> set_pte > -> xen_set_pte_init > -> mask_rw_pte > -> pte_pfn > -> pte_val > -> xen_pte_val > -> pte_mfn_to_pfn > -> mfn_to_pfn_no_overrides > -> ret = > xen_safe_read_ulong(&machine_to_phys_mapping[mfn], &pfn) > > > With ffffffff81007f08 being the faulted address the last one looks > plausible: > > > ffffffff81007efe: 48 8b 15 03 d2 c0 00 mov > 0xc0d203(%rip),%rdx # 0xffffffff81c15108 > ffffffff81007f05: 90 nop > ffffffff81007f06: 90 nop > ffffffff81007f07: 90 nop > ffffffff81007f08: 4c 8b 2c da mov (%rdx,%rbx,8),%r13 > > since > > ostr@workbase> grep ffffffff81c15108 > /tmp/System.map-4.4.8-9.pvops.qubes.x86_64 > ffffffff81c15108 D machine_to_phys_mapping > ostr@workbase> > > But %rdx is not ffffffff81c15108, it is ffff800000000000: > > (XEN) rax: 0000000000000000 rbx: 00000000000d7bdc rcx: ffff880002059000 > (XEN) rdx: ffff800000000000 rsi: 80000000d7bdc063 rdi: 80000000d7bdc063 But that's a MOV above, i.e. %rdx = [0xffffffff81c15108], which sensibly is MACH2PHYS_VIRT_START. And the MFN in %rbx would then match with the value in %cr2. Question is - where does MFN 0xd7bdc come from (it's in a reserved range, and hence can only be MMIO, which shouldn't be subject to M2P translation), and why is this a problem only on Skylake (or maybe that's not CPU related at all, but just dependent on the memory layout produced by the firmware). Obviously, accesses to the sparse[!] M2P prior to a proper #PF handler established can't end well. With no RAM present in the range 0xc0000000-0xffffffff, the 4th 2Mb M2P page doesn't get populated, i.e. this page walk (XEN) Pagetable walk from ffff8000006bdee0: (XEN) L4[0x100] = 000000081daf9067 ffffffffffffffff (XEN) L3[0x000] = 000000081daf7067 ffffffffffffffff (XEN) L2[0x003] = 0000000000000000 ffffffffffffffff is to be expected. Anyway, Kevin, it would really make things a lot easier if you provided the vmlinux matching the vmlinuz, which you should have (assuming my understanding is correct that this is a kernel you built yourself). After all what we may need to figure out is the caller of __early_ioremap() in the call stack Boris deduced. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.