[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v9 0/9] xen/x86: various XPTI speedups



On 03/05/18 19:41, Andrew Cooper wrote:
> On 02/05/18 11:38, Juergen Gross wrote:
>> On 01/05/18 11:28, Andrew Cooper wrote:
>>> On 26/04/18 12:33, Juergen Gross wrote:
>>>> This patch series aims at reducing the overhead of the XPTI Meltdown
>>>> mitigation.
>>> With just the first 3 patches of this series (in a bisection attempt),
>>> on a XenServer build based off staging, XenRT finds the following:
>>>
>>> (XEN) Assertion 'first_dirty != INVALID_DIRTY_IDX || !(pg[i].count_info & 
>>> PGC_need_scrub)' failed at page_alloc.c:979
>>> (XEN) ----[ Xen-4.11.0-6.0.0-d  x86_64  debug=y   Not tainted ]----
>>> (XEN) CPU:    0
>>> (XEN) RIP:    e008:[<ffff82d080229914>] 
>>> page_alloc.c#alloc_heap_pages+0x371/0x6f2
>>> (XEN) RFLAGS: 0000000000010286   CONTEXT: hypervisor (d33v0)
>>> (XEN) rax: ffff82e01307ade8   rbx: 000000000007ffff   rcx: 8180000000000000
>>> (XEN) rdx: 0000000000000000   rsi: 00000000000001b5   rdi: 0000000000000000
>>> (XEN) rbp: ffff8300952b7ba8   rsp: ffff8300952b7b18   r8:  8000000000000000
>>> (XEN) r9:  ffff82e01307ade8   r10: 0180000000000000   r11: 7fffffffffffffff
>>> (XEN) r12: 0000000000000000   r13: 00000000024c2e83   r14: 0000000000000000
>>> (XEN) r15: ffff82e01307add8   cr0: 0000000080050033   cr4: 00000000001526e0
>>> (XEN) cr3: 0000000799c41000   cr2: 00007fdaf5539000
>>> (XEN) fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000000
>>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
>>> (XEN) Xen code around <ffff82d080229914> 
>>> (page_alloc.c#alloc_heap_pages+0x371/0x6f2):
>>> (XEN)  ff 0f 0b 48 85 c9 79 31 <0f> 0b 48 c7 42 08 00 00 00 00 c7 42 10 00 
>>> 00 00
>>> (XEN) Xen stack trace from rsp=ffff8300952b7b18:
>>> (XEN)    0000000000000001 ffff830799cdd000 0000000000000000 00000000003037e9
>>> (XEN)    0000000100000004 ffff8300952b7b68 0000000100000000 ffff830095738000
>>> (XEN)    ffff8300952b7be8 000000008033bfe8 ffff82e01295e540 0000000000001adc
>>> (XEN)    ffff830756971770 0000000000000028 0000000000000000 ffff830799cdd000
>>> (XEN)    0000000000000000 ffff830799cdd000 ffff8300952b7be8 ffff82d080229d4c
>>> (XEN)    0000000000000000 ffff8300952b7d40 0000000000000000 0000000000000000
>>> (XEN)    00000000000000a8 ffff830799cdd000 ffff8300952b7c98 ffff82d080221d90
>>> (XEN)    0000000100000000 ffff830799cdd000 0000000000000000 0000000099cdd000
>>> (XEN)    ffff82e009cd0fd8 00000000000e7b1f ffff8300952b7c88 0000000000000020
>>> (XEN)    ffff8800e7b1fdd8 0000000000000002 0000000000000006 ffff830799cdd000
>>> (XEN)    ffff8300952b7c78 000000000039f480 0000000000000000 000000000000008d
>>> (XEN)    ffff8800e7b1fdd8 ffff830799cdd000 0000000000000006 ffff830799cdd000
>>> (XEN)    ffff8300952b7db8 ffff82d080223ad7 0000000000000046 ffff830088ff9000
>>> (XEN)    ffff8300952b7d18 ffff82d08023cfaf ffff82c000230118 ffff830842ceeb8c
>>> (XEN)    ffff82e009f54db8 00000000003bc78b ffff830842cd2770 ffff830088ff9000
>>> (XEN)    0000000000000000 0000000000000000 ffff83085d6b9350 0000000000000000
>>> (XEN)    ffff8300952b7d28 ffff82d08023d766 ffff8300952b7d58 ffff82d08020c9a2
>>> (XEN)    ffff830842cee000 ffff830799cdd000 ffffffff81adbec0 0000000000000200
>>> (XEN)    0000008d00000000 ffff82d000000000 ffffffff81adbec0 0000000000000200
>>> (XEN)    0000000000000000 0000000000007ff0 ffff83085d6b9350 0000000000000006
>>> (XEN) Xen call trace:
>>> (XEN)    [<ffff82d080229914>] page_alloc.c#alloc_heap_pages+0x371/0x6f2
>>> (XEN)    [<ffff82d080229d4c>] alloc_domheap_pages+0xb7/0x157
>>> (XEN)    [<ffff82d080221d90>] memory.c#populate_physmap+0x27e/0x4c9
>>> (XEN)    [<ffff82d080223ad7>] do_memory_op+0x2e2/0x2695
>>> (XEN)    [<ffff82d080308be9>] hypercall.c#hvm_memory_op+0x36/0x60
>>> (XEN)    [<ffff82d0803091c2>] hvm_hypercall+0x5af/0x681
>>> (XEN)    [<ffff82d08032fee6>] vmx_vmexit_handler+0x1040/0x1e14
>>> (XEN)    [<ffff82d080335f88>] vmx_asm_vmexit_handler+0xe8/0x250
>>> (XEN)
>>> (XEN)
>>> (XEN) ****************************************
>>> (XEN) Panic on CPU 0:
>>> (XEN) Assertion 'first_dirty != INVALID_DIRTY_IDX || !(pg[i].count_info & 
>>> PGC_need_scrub)' failed at page_alloc.c:979
>>> (XEN) ****************************************
>>>
>>> Running repeated tests on adjacent builds, we never see the assertion
>>> failure without the patches (6 runs), and have so far seen for 3 of 4
>>> runs (2 still pending) with the patches.
>>>
>>> What is rather strange is that there is a lot of migration and
>>> ballooning going on, but only for HVM (Debian Jessie, not that this
>>> should matter) VMs.  dom0 will be the only PV domain in the system, and
>>> is 64bit.
>> Are you sure you have no other patches compared to staging in your
>> hypervisor? I can't imagine how one of the 3 patches could cause that
>> behavior.
>>
>> I've tried to do similar testing on my machine: 2 HVM domains + 64-bit
>> Pv dom0. dom0 and one HVM domain are ballooned up and down all the time
>> while the other HVM domain is being migrated (localhost) in a loop.
>>
>> Migration count is at 600 already...
> 
> So it turns out that I've now reproduce this ASSERT() once without any
> patches from this series applied.
> 
> Therefore, it is a latent bug in either XenServer or Xen, but shouldn't
> block this series (Especially as this series makes it easier to reproduce).
> 
> At this point, as we're planning to take the series for 4.11, it might
> be better to throw the whole series in and get some wider testing that way.

I believe taking this for RC3 tomorrow isn't the best idea, so lets wait
until Monday. This way we can let OSStest take a try with the series.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.