[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Fix VGA logdirty related display freezes with altp2m



On Mon, Oct 22, 2018 at 3:22 PM Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
>
> On 22/10/2018 22:17, Razvan Cojocaru wrote:
> > On 10/22/18 11:48 PM, Tamas K Lengyel wrote:
> >> On Thu, Oct 18, 2018 at 3:12 PM Razvan Cojocaru
> >> <rcojocaru@xxxxxxxxxxxxxxx> wrote:
> >>> On 10/18/18 11:08 PM, Tamas K Lengyel wrote:
> >>>> On Thu, Oct 18, 2018 at 4:09 AM Razvan Cojocaru
> >>>> <rcojocaru@xxxxxxxxxxxxxxx> wrote:
> >>>>> Hello,
> >>>>>
> >>>>> This series aims to prevent the display from freezing when
> >>>>> enabling altp2m and switching to a new view (and assorted problems
> >>>>> when resizing the display).
> >>>>>
> >>>>> The first patch propagates ept.ad changes to all active altp2ms,
> >>>>> and the second one allocates a new logdirty rangeset for each
> >>>>> new altp2m, and propagates (under lock) changes to all p2ms.
> >>>>>
> >>>>> The first patch is the same as:
> >>>>> [PATCH V4] x86/altp2m: propagate ept.ad changes to all active altp2ms
> >>>>> but as it is now required for the second one to apply cleanly, it
> >>>>> has been resent as part of this series.
> >>>>>
> >>>>> [PATCH 1/2] x86/altp2m: propagate ept.ad changes to all active altp2ms
> >>>>> [PATCH 2/2] x86/altp2m: fix display frozen when switching to a new
> >>>> Hi Razvan,
> >>>> I would be happy to give this a spin, can you push it as a git branch 
> >>>> somewhere?
> >>> Sure, here you go:
> >>>
> >>> https://github.com/razvan-cojocaru/xen/tree/altp2m-logdirty-take1
> >> I ran into this crash when my config incorrectly pointed to a
> >> non-valid disk location:
> >>
> >> (XEN) Assertion 'p2m->sync.logdirty_ranges' failed at p2m-ept.c:1475
> >> (XEN) ----[ Xen-4.12-unstable  x86_64  debug=y   Not tainted ]----
> >> (XEN) CPU:    4
> >> (XEN) RIP:    e008:[<ffff82d08033f40c>] p2m_uninit_altp2m_ept+0x29/0x2b
> >> (XEN) RFLAGS: 0000000000010246   CONTEXT: hypervisor
> >> (XEN) rax: ffff83046d27802c   rbx: ffff8304558dd880   rcx: 0000000000000000
> >> (XEN) rdx: ffff83046d277fff   rsi: 00000000004680c0   rdi: 0000000000000000
> >> (XEN) rbp: ffff83046d277d60   rsp: ffff83046d277d50   r8:  ffff82d0809304a0
> >> (XEN) r9:  0000000000455940   r10: ffff82e008d01000   r11: 0000000000000017
> >> (XEN) r12: ffff8304558dd880   r13: ffff8304558df830   r14: ffff8304558df000
> >> (XEN) r15: fffffffffffffff8   cr0: 000000008005003b   cr4: 00000000003526e0
> >> (XEN) cr3: 000000005da16000   cr2: ffff880456cd6e80
> >> (XEN) fsb: 0000000000000000   gsb: ffff880467f40000   gss: 0000000000000000
> >> (XEN) ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
> >> (XEN) Xen code around <ffff82d08033f40c> (p2m_uninit_altp2m_ept+0x29/0x2b):
> >> (XEN)  00 48 83 c4 08 5b 5d c3 <0f> 0b 55 48 89 e5 41 56 41 55 41 54 53 48 
> >> 8d 05
> >> (XEN) Xen call trace:
> >> (XEN)    [<ffff82d08033f40c>] p2m_uninit_altp2m_ept+0x29/0x2b
> >> (XEN)    [<ffff82d0803305ab>] p2m.c#p2m_teardown_altp2m+0x36/0x52
> >> (XEN)    [<ffff82d0803331b5>] p2m_final_teardown+0x11/0x28
> >> (XEN)    [<ffff82d08034509c>] paging_final_teardown+0x2e/0x3c
> >> (XEN)    [<ffff82d080276439>] arch_domain_destroy+0x50/0xa1
> >> (XEN)    [<ffff82d08020595c>] domain.c#complete_domain_destroy+0x86/0x159
> >> (XEN)    [<ffff82d080228f4f>] rcupdate.c#rcu_process_callbacks+0xa5/0x1cf
> >> (XEN)    [<ffff82d08023ae6b>] softirq.c#__do_softirq+0x71/0x9a
> >> (XEN)    [<ffff82d08023aede>] do_softirq+0x13/0x15
> >> (XEN)    [<ffff82d080275068>] domain.c#idle_loop+0x63/0xb9
> >> (XEN)
> >> (XEN)
> >> (XEN) ****************************************
> >> (XEN) Panic on CPU 4:
> >> (XEN) Assertion 'p2m->sync.logdirty_ranges' failed at p2m-ept.c:1475
> >> (XEN) ****************************************
> > Right, that one I've also come across now, that will be fixed in the
> > next series as a result of doing what Andrew has suggested, which is to say:
> >
> > "Please make all destroy functions idempotent.  i.e.
> >
> > if ( p2m->sync.logdirty_ranges )
> > {
> >     rangeset_destroy(p2m->sync.logdirty_ranges);
> >     p2m->sync.logdirty_ranges = NULL;
> > }
> >
> > and use this destroy function in the cleanup path of init()."
>
> Indeed.
>
> >
> >> With the config fixed it boots but when I run DRAKVUF on the domain I
> >> get the following crash:
> >>
> >> (XEN) ----[ Xen-4.12-unstable  x86_64  debug=y   Not tainted ]----
> >> (XEN) CPU:    0
> >> (XEN) RIP:    e008:[<000000007bdb630c>] 000000007bdb630c
> >> (XEN) RFLAGS: 0000000000010282   CONTEXT: hypervisor (d0v5)
> >> (XEN) rax: 00000000ee138470   rbx: 0000000000000000   rcx: 000000008000b098
> >> (XEN) rdx: 0000000000000cf8   rsi: 0000000000000000   rdi: 000000046d2ef000
> >> (XEN) rbp: 0000000000000000   rsp: ffff83005da27a10   r8:  0000000000000cf8
> >> (XEN) r9:  0000000000000cf8   r10: ffff83005da27ab8   r11: ffff83005da27a08
> >> (XEN) r12: 0000000000000000   r13: 0000000000000000   r14: 0000000000000065
> >> (XEN) r15: 00000000000005a7   cr0: 0000000080050033   cr4: 0000000000372660
> >> (XEN) cr3: 000000046d2ef000   cr2: 00000000ee138470
> >> (XEN) fsb: 00007fe46d97bbc0   gsb: ffff880467f40000   gss: 0000000000000000
> >> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
> >> (XEN) Xen code around <000000007bdb630c> (000000007bdb630c):
> >> (XEN)  80 74 0b 05 70 84 00 00 <c7> 00 00 00 00 e0 80 3d 7a 34 00 00 00 75 
> >> 64 48
> >> (XEN) Xen stack trace from rsp=ffff83005da27a10:(XEN) Xen stack trace
> >> from rsp=ffff83005da27a10:
> >> (XEN)    0000000000000000 0000000000000065 ffff83005da27a50 
> >> ffff82d08037aafc
> >> (XEN)    00000000fffffffe ffff82d08037ae14 0000000000000000 
> >> ffff83005da27a90
> >> (XEN)    0000000000372660 000000046d2ef000 0000000393e91000 
> >> ffff82d0809602b0
> >> (XEN)    000000fe00000000 ffff82d0802a3b98 ffffffffffffffff 
> >> ffff83005da27ab8
> >> (XEN)    ffff83005da27b08 ffff82d0802a3511 ffff82d08046b028 
> >> ffff83005da27b08
> >> (XEN)    ffff82d0802a3511 ffff83005da27fff 0000138800000292 
> >> 000082d0808176a0
> >> (XEN)    0000000000000000 ffff82d08023b889 0000000000000292 
> >> ffff82d08046b028
> >> (XEN)    ffff82d080451ac8 ffff82d080454af2 00000000000005a7 
> >> ffff83005da27b78
> >> (XEN)    ffff82d080251d6f ffff82d080250fcd 0000000000000028 
> >> ffff83005da27b88
> >> (XEN)    ffff83005da27b38 000000000000e010 ffff82d080454c73 
> >> ffff82d080451ac8
> >> (XEN)    ffff82d080454af2 00000000000005a7 0000000000000030 
> >> ffff83005da27bf8
> >> (XEN)    ffff82d080454c73 ffff83005da27be8 ffff82d0802aaebc 
> >> ffff82d08033f3dc
> >> (XEN)    ffff82d080451ac8 ffff82d08037d969 ffff82d08037d95d 
> >> ffff82d08037d969
> >> (XEN)    0b0f82d08037d95d ffff82d08037d969 ffff83005fe5b000 
> >> 0000000000000000
> >> (XEN)    0000000000000000 ffff83005da27fff 0000000000000000 
> >> 00007cffa25d83e7
> >> (XEN)    ffff82d08037da2d deadbeefdeadf00d ffff83018caf2530 
> >> ffff83005da27d38
> >> (XEN)    ffff83040a492830 ffff83005da27cc8 ffff83040bab2880 
> >> 0000000000000000
> >> (XEN)    0000000000000000 deadbeefdeadf00d deadbeefdeadf00d 
> >> 0000000000000000
> >> (XEN)    0000000000000000 ffff830451835000 0000000000000000 
> >> ffff83040a492000
> >> (XEN)    0000000600000000 ffff82d08033f3da 000000000000e008 
> >> 0000000000010282
> >> (XEN) Xen call trace:
> >> (XEN)    [<000000007bdb630c>] 000000007bdb630c
> >> (XEN)
> >> (XEN) Pagetable walk from 00000000ee138470:
> >> (XEN)  L4[0x000] = 000000046d2ee063 ffffffffffffffff
> >> (XEN)  L3[0x003] = 000000005da11063 ffffffffffffffff
> >> (XEN)  L2[0x170] = 0000000000000000 ffffffffffffffff
> >> (XEN)
> >> (XEN) ****************************************
> >> (XEN) Panic on CPU 0:
> >> (XEN) FATAL PAGE FAULT
> >> (XEN) [error_code=0002]
> >> (XEN) Faulting linear address: 00000000ee138470
> >> (XEN) ****************************************
> >> (XEN)
> >> (XEN) Reboot in five seconds...
> > This one I'm not sure about. What does your introspection agent do at
> > that point?
>
> This crash is bizarre.  Xen has most likely followed a corrupt function
> pointer, because none of Xen's .text section live just below the 2G boundary
>

It's reproducible and happens immediately after a successful call to
xc_altp2m_set_domain_state to enable altp2m.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.