[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH 1/1] xen: Use a global mapping for runstate


  • To: Julien Grall <julien@xxxxxxx>
  • From: Bertrand Marquis <Bertrand.Marquis@xxxxxxx>
  • Date: Fri, 29 May 2020 14:02:29 +0000
  • Accept-language: en-GB, en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=97AUB9UKezqq0E2Ia39v4XMDJZMNKBJzPcztESS3ZLw=; b=gz5ScWkhWlQgY6v4EI48oSHgL2q3gOv9NwNXJ2tFp8ZKDSR1E6GZsVSMS3ySI05GHdWILqoD7kHrADtJ/HjEEb2gbkIi0IKLc0Xj0sk3ocPRttdrE4uH8SrvCK4/GTglDmveCsZuMh8ysdxnH0BSXgPA5/SXZAkSx9k8eVxuCtbhZSxBg9OQ+PcfEHW/mwmOYbGJxAqQMG+3s0XauNTfU5HQHdYy7pVXkgPufeEjy293hKJWevgiRD4XSvguA1CdDJuCocf9MtJcKbTJvTu/gTmVcDnScvU1rNvLzK506+DSFPzJGzi/OrAKe+1Io2u6MG1ZRdi4uqE8XEuGBHIy0w==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=U0dGsYGgGgc4OjFT40K13gA2Jtv/oKjchl5s+K7CoZypLszutEhvW2ptprPsIRSAlC+A9oqeAdkyPhrvTHULK5AACa48VLOxYBhsdOOW1GmKiT3kKwyrFkA/g8uLVz5GKiHP3kWunzbpUHBXkvQDZuqrmk83D1jEQd48RgdGuSSYfPr9kBgyO8f5wOxdBYxnrqwudYBbidw0WTmCF7+LJTtb+QWEkOHQssPhNFMfmQv6gL+YD4T9IjsrVWd8e69KT9OipemRY2bNkEkVkXpJ/tRciflqtpzksHudjwffrFqC8ZJhQX0eC5UaM+IIDVUHme+bKZ+Yt7ydPLPYCaX21g==
  • Authentication-results-original: xen.org; dkim=none (message not signed) header.d=none;xen.org; dmarc=none action=none header.from=arm.com;
  • Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, "paul@xxxxxxx" <paul@xxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Ian Jackson <ian.jackson@xxxxxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, "Xia, Hongyan" <hongyxia@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, nd <nd@xxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Delivery-date: Fri, 29 May 2020 14:02:49 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Nodisclaimer: true
  • Original-authentication-results: xen.org; dkim=none (message not signed) header.d=none;xen.org; dmarc=none action=none header.from=arm.com;
  • Thread-index: AQHWNP6nIxIB/r3XC025DLnqIbmaJai92N6AgADfWQCAABkLAIAASFMA
  • Thread-topic: [RFC PATCH 1/1] xen: Use a global mapping for runstate


> On 29 May 2020, at 10:43, Julien Grall <julien@xxxxxxx> wrote:
> 
> Hi Bertrand,
> 
> On 29/05/2020 09:13, Bertrand Marquis wrote:
>> Hi Julien,
>>> On 28 May 2020, at 19:54, Julien Grall <julien@xxxxxxx> wrote:
>>> 
>>> Hi Bertrand,
>>> 
>>> Thank you for the patch.
>>> 
>>> On 28/05/2020 16:25, Bertrand Marquis wrote:
>>>> At the moment on Arm, a Linux guest running with KTPI enabled will
>>>> cause the following error when a context switch happens in user mode:
>>>> (XEN) p2m.c:1890: d1v0: Failed to walk page-table va 0xffffff837ebe0cd0
>>>> This patch is modifying runstate handling to map the area given by the
>>>> guest inside Xen during the hypercall.
>>>> This is removing the guest virtual to physical conversion during context
>>>> switches which removes the bug
>>> 
>>> It would be good to spell out that a virtual address is not stable. So 
>>> relying on it is wrong.
>>> 
>>>> and improve performance by preventing to
>>>> walk page tables during context switches.
>>> 
>>> With Secret free hypervisor in mind, I would like to suggest to map/unmap 
>>> the runstate during context switch.
>>> 
>>> The cost should be minimal when there is a direct map (i.e on Arm64 and 
>>> x86) and still provide better performance on Arm32.
>> Even with a minimal cost this is still adding some non real-time behaviour 
>> to the context switch.
> 
> Just to be clear, by minimal I meant the mapping part is just a virt_to_mfn() 
> call and the unmapping is a NOP.
> 
> IHMO, if virt_to_mfn() ends up to add non-RT behavior then you have much 
> bigger problem than just this call.
> 
>> But definitely from the security point of view as we have to map a page from 
>> the guest, we could have accessible in Xen some data that should not be 
>> there.
>> There is a trade here where:
>> - xen can protect by map/unmapping
>> - a guest which wants to secure his data should either not use it or make 
>> sure there is nothing else in the page
> 
> Both are valid and depends on your setup. One may want to protect all the 
> existing guests, so modifying a guest may not be a solution.

The fact to map/unmap is increasing the protection but not removing the problem 
completely.

> 
>> That sounds like a thread local storage kind of problematic where we want 
>> data from xen to be accessible fast from the guest and easy to be modified 
>> from xen.
> 
> Agree. On x86, they have a perdomain mapping so it would be possible to do 
> it. We would need to add the feature on Arm.

That would definitely be cleaner.

> 
>>> 
>>> The change should be minimal compare to the current approach but this could 
>>> be taken care separately if you don't have time.
>> I could add that to the serie in a separate patch so that it can be 
>> discussed and test separately ?
> 
> If you are picking the task, then I would suggest to add it directly in this 
> patch.

I will see that but the discussion is leading more on a result where we accept 
the current status.

> 
>>> 
>>>> --
>>>> In the current status, this patch is only working on Arm and needs to
>>>> be fixed on X86 (see #error on domain.c for missing get_page_from_gva).
>>>> Signed-off-by: Bertrand Marquis <bertrand.marquis@xxxxxxx>
>>>> ---
>>>>  xen/arch/arm/domain.c   | 32 +++++++++-------
>>>>  xen/arch/x86/domain.c   | 51 ++++++++++++++-----------
>>>>  xen/common/domain.c     | 84 ++++++++++++++++++++++++++++++++++-------
>>>>  xen/include/xen/sched.h | 11 ++++--
>>>>  4 files changed, 124 insertions(+), 54 deletions(-)
>>>> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
>>>> index 31169326b2..799b0e0103 100644
>>>> --- a/xen/arch/arm/domain.c
>>>> +++ b/xen/arch/arm/domain.c
>>>> @@ -278,33 +278,37 @@ static void ctxt_switch_to(struct vcpu *n)
>>>>  /* Update per-VCPU guest runstate shared memory area (if registered). */
>>>>  static void update_runstate_area(struct vcpu *v)
>>>>  {
>>>> -    void __user *guest_handle = NULL;
>>>> -    struct vcpu_runstate_info runstate;
>>>> +    struct vcpu_runstate_info *runstate;
>>>>  -    if ( guest_handle_is_null(runstate_guest(v)) )
>>>> +    /* XXX why do we accept not to block here */
>>>> +    if ( !spin_trylock(&v->runstate_guest_lock) )
>>>>          return;
>>>>  -    memcpy(&runstate, &v->runstate, sizeof(runstate));
>>>> +    runstate = runstate_guest(v);
>>>> +
>>>> +    if (runstate == NULL)
>>>> +    {
>>>> +        spin_unlock(&v->runstate_guest_lock);
>>>> +        return;
>>>> +    }
>>>>        if ( VM_ASSIST(v->domain, runstate_update_flag) )
>>>>      {
>>>> -        guest_handle = &v->runstate_guest.p->state_entry_time + 1;
>>>> -        guest_handle--;
>>>> -        runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
>>>> -        __raw_copy_to_guest(guest_handle,
>>>> -                            (void *)(&runstate.state_entry_time + 1) - 1, 
>>>> 1);
>>>> +        runstate->state_entry_time |= XEN_RUNSTATE_UPDATE;
>>>>          smp_wmb();
>>> 
>>> Because you set v->runstate.state_entry_time below, the placement of the 
>>> barrier seems a bit odd.
>>> 
>>> I would suggest to update v->runstate.state_entry_time first and then 
>>> update runstate->state_entry_time.
>> We do want the guest to know when we modify the runstate so we need to make 
>> sure the XEN_RUNSTATE_UPDATE is actually set in a visible way before we do 
>> the memcpy.
>> That’s why the barrier is after.
> 
> I think you misunderstood my comment here. I meant to write the following 
> code:
> 
> v->runstate.state_entry_time = ...
> runstate->state_entry_time = ...
> smp_wmb()
> 
> So it is much clearer that the barrier is because runstate->state_entry_time 
> needs to be updated before the memcpy().
> 
>>>> +
>>>> +#ifdef CONFIG_ARM
>>>> +    page = get_page_from_gva(v, area->addr.p, GV2M_WRITE);
>>> 
>>> A guest is allowed to setup the runstate for a different vCPU than the 
>>> current one. This will lead to get_page_from_gva() to fail as the function 
>>> cannot yet work with a vCPU other than current.
>> If the area is mapped per cpu but isn’t the aim of this to have a way to 
>> check other cpus status ?
> 
> The aim is to collect how much time a vCPU has been unscheduled. This doesn't 
> have to be run from a different vCPU.
> 
> Anyway, my point is the hypercall allows it today. If you want to make such 
> restriction, then we need to agree on it and document it.
> 
>>> 
>>> AFAICT, there is no restriction on when the runstate hypercall can be 
>>> called. So this can even be called before the vCPU is brought up.
>> I understand the remark but it still feels very weird to allow an invalid 
>> address in an hypercall.
>> Wouldn’t we have a lot of potential issues accepting an address that we 
>> cannot check ?
> 
> Well, that's why you see errors when using KPTI ;). If you use a global 
> mapping, then it is not possible to continue without validating the address.
> 
> But to do this, you will have to put restriction on the hypercalls. I would 
> be OK to make such restriction on Arm.
> 
> Cheers,
> 
> -- 
> Julien Grall


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.