[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Error during update_runstate_area with KPTI activated


  • To: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Bertrand Marquis <Bertrand.Marquis@xxxxxxx>
  • Date: Thu, 14 May 2020 14:28:12 +0000
  • Accept-language: en-GB, en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=2BZVsjf+c438uDvCdiV4xwfZF+wKN8GYiL936qBJPPI=; b=mZkg2iRdHQJG1rcQLcz5HYKkCPELi/Gf9dETmDqkIgE+aiXRcv0IqmABB6FrHKnrruBbSNdG3T+3QeebuV+G7f7kohghJwtofBKgKzNtWHJTuhtvMHgPY9QqvvmCgi31RTvP3yk0r0HdDJ3yihJYvaanJnV4sa+u61kD7R7C+LfPqBEJyRzX22lXPf8w15Zy2YPYf+m+cWkaT48C4umDS4h2GfO33vf5PsHm3/L3lJQKLBixjNHjmbSXK1KUIDVCoN5zzjIWZvqeeI2zIQ6z3C1mbZVhEizy7+YjFfk9UB7UvKH3IHLxgmahRPu6xhHVspYGk3bkLT09Xoyw676AYQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=j+l8XD0dKrkplkqaxyZYOY1onvoVcj0IwCPbQFyEKVdyjUAmMcoAerqoUn5M31Q5RFnO10OZC400MzIhQ0F6zCJ8arVfFBCxk9889mqOR7tcjpCeWTwKwyfAkiiMF8DMdBdqpAIrE0l8NLYpE8Cj1c0AmwneM42lEmN3bu+IGqGmMNqJbq/RVDymSoGkB5ywqVpWCPj6HFymm6gxYvMA1UFb1ITaCiIYxFubSw295mz7ENf+EQYA+6Fh8yL86cL3q6fwXNN6Wbz3YhGjY3C3LI3x6UFkBPQgAQ/t0CrJNv9tnrbvwjXw2z25oc03/QNCGDT3q+whYCZ7Cvp9lZOsCQ==
  • Authentication-results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; lists.xenproject.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;lists.xenproject.org; dmarc=bestguesspass action=none header.from=arm.com;
  • Authentication-results-original: lists.xenproject.org; dkim=none (message not signed) header.d=none; lists.xenproject.org; dmarc=none action=none header.from=arm.com;
  • Cc: nd <nd@xxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <stefano.stabellini@xxxxxxxxxx>
  • Delivery-date: Thu, 14 May 2020 14:29:30 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Nodisclaimer: true
  • Original-authentication-results: lists.xenproject.org; dkim=none (message not signed) header.d=none;lists.xenproject.org; dmarc=none action=none header.from=arm.com;
  • Thread-index: AQHWKfvm/FV2s2q8bUucgMUz0AtbkA==
  • Thread-topic: Error during update_runstate_area with KPTI activated

Hi,

When executing linux on arm64 with KPTI activated (in Dom0 or in a DomU), I 
have a lot of walk page table errors like this:
(XEN) p2m.c:1890: d1v0: Failed to walk page-table va 0xffffff837ebe0cd0

After implementing a call trace, I found that the problem was coming from the 
update_runstate_area when linux has KPTI activated.

I have the following call trace:
(XEN) p2m.c:1890: d1v0: Failed to walk page-table va 0xffffff837ebe0cd0
(XEN) backtrace.c:29: Stacktrace start at 0x8007638efbb0 depth 10
(XEN)    [<000000000027780c>] get_page_from_gva+0x180/0x35c
(XEN)    [<00000000002700c8>] guestcopy.c#copy_guest+0x1b0/0x2e4
(XEN)    [<0000000000270228>] raw_copy_to_guest+0x2c/0x34
(XEN)    [<0000000000268dd0>] domain.c#update_runstate_area+0x90/0xc8
(XEN)    [<000000000026909c>] domain.c#schedule_tail+0x294/0x2d8
(XEN)    [<0000000000269524>] context_switch+0x58/0x70
(XEN)    [<00000000002479c4>] core.c#sched_context_switch+0x88/0x1e4
(XEN)    [<000000000024845c>] core.c#schedule+0x224/0x2ec
(XEN)    [<0000000000224018>] softirq.c#__do_softirq+0xe4/0x128
(XEN)    [<00000000002240d4>] do_softirq+0x14/0x1c

Discussing this subject with Stefano, he pointed me to a discussion started a 
year ago on this subject here:
https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03053.html

And a patch was submitted:
https://lists.xenproject.org/archives/html/xen-devel/2019-05/msg02320.html

I rebased this patch on current master and it is solving the problem I have 
seen.

It sounds to me like a good solution to introduce a 
VCPUOP_register_runstate_phys_memory_area to not depend on the area actually 
being mapped in the guest when a context switch is being done (which is 
actually the problem happening when a context switch is trigger while a guest 
is running in EL0).

Is there any reason why this was not merged at the end ?

Thanks
Bertrand




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.