Xen project Mailing List

Re: [PATCH v3] xen/arm: Convert runstate address during hypcall

To: Bertrand Marquis <Bertrand.Marquis@xxxxxxx>

From: Stefano Stabellini <sstabellini@xxxxxxxxxx>

Date: Fri, 31 Jul 2020 16:03:29 -0700 (PDT)

Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Wei Liu <wl@xxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Ian Jackson <ian.jackson@xxxxxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, nd <nd@xxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>

Delivery-date: Fri, 31 Jul 2020 23:03:54 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Fri, 31 Jul 2020, Bertrand Marquis wrote: > Sorry missed some points in my previous answer. > > > On 30 Jul 2020, at 22:50, Julien Grall <julien@xxxxxxx> wrote: > > > > Hi Bertrand, > > > > To avoid extra work on your side, I would recommend to wait a bit before > > sending a new version. It would be good to at least settle the conversation > > in v2 regarding the approach taken. > > > > On 30/07/2020 11:24, Bertrand Marquis wrote: > >> At the moment on Arm, a Linux guest running with KTPI enabled will > >> cause the following error when a context switch happens in user mode: > >> (XEN) p2m.c:1890: d1v0: Failed to walk page-table va 0xffffff837ebe0cd0 > >> The error is caused by the virtual address for the runstate area > >> registered by the guest only being accessible when the guest is running > >> in kernel space when KPTI is enabled. > >> To solve this issue, this patch is doing the translation from virtual > >> address to physical address during the hypercall and mapping the > >> required pages using vmap. This is removing the conversion from virtual > >> to physical address during the context switch which is solving the > >> problem with KPTI. > > > > To echo what Jan said on the previous version, this is a change in a stable > > ABI and therefore may break existing guest. FAOD, I agree in principle with > > the idea. However, we want to explain why breaking the ABI is the *only* > > viable solution. > > > > From my understanding, it is not possible to fix without an ABI breakage > > because the hypervisor doesn't know when the guest will switch back from > > userspace to kernel space. The risk is the information provided by the > > runstate wouldn't contain accurate information and could affect how the > > guest handle stolen time. > > > > Additionally there are a few issues with the current interface: > > 1) It is assuming the virtual address cannot be re-used by the userspace. > > Thanksfully Linux have a split address space. But this may change with KPTI > > in place. > > 2) When update the page-tables, the guest has to go through an invalid > > mapping. So the translation may fail at any point. > > > > IOW, the existing interface can lead to random memory corruption and > > inacurracy of the stolen time. > > > >> This is done only on arm architecture, the behaviour on x86 is not > >> modified by this patch and the address conversion is done as before > >> during each context switch. > >> This is introducing several limitations in comparison to the previous > >> behaviour (on arm only): > >> - if the guest is remapping the area at a different physical address Xen > >> will continue to update the area at the previous physical address. As > >> the area is in kernel space and usually defined as a global variable this > >> is something which is believed not to happen. If this is required by a > >> guest, it will have to call the hypercall with the new area (even if it > >> is at the same virtual address). > >> - the area needs to be mapped during the hypercall. For the same reasons > >> as for the previous case, even if the area is registered for a different > >> vcpu. It is believed that registering an area using a virtual address > >> unmapped is not something done. > > > > This is not clear whether the virtual address refer to the current vCPU or > > the vCPU you register the runstate for. From the past discussion, I think > > you refer to the former. It would be good to clarify. > > > > Additionally, all the new restrictions should be documented in the public > > interface. So an OS developper can find the differences between the > > architectures. > > > > To answer Jan's concern, we certainly don't know all the guest OSes > > existing, however we also need to balance the benefit for a large majority > > of the users. > > > > From previous discussion, the current approach was deemed to be acceptable > > on Arm and, AFAICT, also x86 (see [1]). > > > > TBH, I would rather see the approach to be common. For that, we would an > > agreement from Andrew and Jan in the approach here. Meanwhile, I think this > > is the best approach to address the concern from Arm users. > > > >> inline functions in headers could not be used as the architecture > >> domain.h is included before the global domain.h making it impossible > >> to use the struct vcpu inside the architecture header. > >> This should not have any performance impact as the hypercall is only > >> called once per vcpu usually. > >> Signed-off-by: Bertrand Marquis <bertrand.marquis@xxxxxxx> > >> --- > >> Changes in v2 > >> - use vmap to map the pages during the hypercall. > >> - reintroduce initial copy during hypercall. > >> Changes in v3 > >> - Fix Coding style > >> - Fix vaddr printing on arm32 > >> - use write_atomic to modify state_entry_time update bit (only > >> in guest structure as the bit is not used inside Xen copy) > >> --- > >> xen/arch/arm/domain.c | 161 ++++++++++++++++++++++++++++++----- > >> xen/arch/x86/domain.c | 29 ++++++- > >> xen/arch/x86/x86_64/domain.c | 4 +- > >> xen/common/domain.c | 19 ++--- > >> xen/include/asm-arm/domain.h | 9 ++ > >> xen/include/asm-x86/domain.h | 16 ++++ > >> xen/include/xen/domain.h | 5 ++ > >> xen/include/xen/sched.h | 16 +--- > >> 8 files changed, 206 insertions(+), 53 deletions(-) > >> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c > >> index 31169326b2..8b36946017 100644 > >> --- a/xen/arch/arm/domain.c > >> +++ b/xen/arch/arm/domain.c > >> @@ -19,6 +19,7 @@ > >> #include <xen/sched.h> > >> #include <xen/softirq.h> > >> #include <xen/wait.h> > >> +#include <xen/vmap.h> > >> #include <asm/alternative.h> > >> #include <asm/cpuerrata.h> > >> @@ -275,36 +276,156 @@ static void ctxt_switch_to(struct vcpu *n) > >> virt_timer_restore(n); > >> } > >> -/* Update per-VCPU guest runstate shared memory area (if registered). */ > >> -static void update_runstate_area(struct vcpu *v) > >> +static void cleanup_runstate_vcpu_locked(struct vcpu *v) > >> { > >> - void __user *guest_handle = NULL; > >> + if ( v->arch.runstate_guest ) > >> + { > >> + vunmap((void *)((unsigned long)v->arch.runstate_guest & > >> PAGE_MASK)); > >> + > >> + put_page(v->arch.runstate_guest_page[0]); > >> + > >> + if ( v->arch.runstate_guest_page[1] ) > >> + put_page(v->arch.runstate_guest_page[1]); > >> + > >> + v->arch.runstate_guest = NULL; > >> + } > >> +} > >> + > >> +void arch_vcpu_cleanup_runstate(struct vcpu *v) > >> +{ > >> + spin_lock(&v->arch.runstate_guest_lock); > >> + > >> + cleanup_runstate_vcpu_locked(v); > >> + > >> + spin_unlock(&v->arch.runstate_guest_lock); > >> +} > >> + > >> +static int setup_runstate_vcpu_locked(struct vcpu *v, vaddr_t vaddr) > >> +{ > >> + unsigned int offset; > >> + mfn_t mfn[2]; > >> + struct page_info *page; > >> + unsigned int numpages; > >> struct vcpu_runstate_info runstate; > >> + void *p; > >> - if ( guest_handle_is_null(runstate_guest(v)) ) > >> - return; > >> + /* user can pass a NULL address to unregister a previous area */ > >> + if ( vaddr == 0 ) > >> + return 0; > >> + > >> + offset = vaddr & ~PAGE_MASK; > >> + > >> + /* provided address must be aligned to a 64bit */ > >> + if ( offset % alignof(struct vcpu_runstate_info) ) > > > > This new restriction wants to be explained in the commit message and public > > header. > > ok > > > > >> + { > >> + gprintk(XENLOG_WARNING, "Cannot map runstate pointer at > >> 0x%"PRIvaddr > >> + ": Invalid offset\n", vaddr); > > > > We usually enforce 80 character per lines except for format string. So it > > is easier to grep them in the code. > > Ok i will fix this one and the following ones. > But here PRIvaddr would break any attempt to grep something in fact. > > > > >> + return -EINVAL; > >> + } > >> + > >> + page = get_page_from_gva(v, vaddr, GV2M_WRITE); > >> + if ( !page ) > >> + { > >> + gprintk(XENLOG_WARNING, "Cannot map runstate pointer at > >> 0x%"PRIvaddr > >> + ": Page is not mapped\n", vaddr); > >> + return -EINVAL; > >> + } > >> + > >> + mfn[0] = page_to_mfn(page); > >> + v->arch.runstate_guest_page[0] = page; > >> + > >> + if ( offset > (PAGE_SIZE - sizeof(struct vcpu_runstate_info)) ) > >> + { > >> + /* guest area is crossing pages */ > >> + page = get_page_from_gva(v, vaddr + PAGE_SIZE, GV2M_WRITE); > >> + if ( !page ) > >> + { > >> + put_page(v->arch.runstate_guest_page[0]); > >> + gprintk(XENLOG_WARNING, > >> + "Cannot map runstate pointer at 0x%"PRIvaddr > >> + ": 2nd Page is not mapped\n", vaddr); > >> + return -EINVAL; > >> + } > >> + mfn[1] = page_to_mfn(page); > >> + v->arch.runstate_guest_page[1] = page; > >> + numpages = 2; > >> + } > >> + else > >> + { > >> + v->arch.runstate_guest_page[1] = NULL; > >> + numpages = 1; > >> + } > >> - memcpy(&runstate, &v->runstate, sizeof(runstate)); > >> + p = vmap(mfn, numpages); > >> + if ( !p ) > >> + { > >> + put_page(v->arch.runstate_guest_page[0]); > >> + if ( numpages == 2 ) > >> + put_page(v->arch.runstate_guest_page[1]); > >> - if ( VM_ASSIST(v->domain, runstate_update_flag) ) > >> + gprintk(XENLOG_WARNING, "Cannot map runstate pointer at > >> 0x%"PRIvaddr > >> + ": vmap error\n", vaddr); > >> + return -EINVAL; > >> + } > >> + > >> + v->arch.runstate_guest = p + offset; > >> + > >> + if (v == current) > >> + memcpy(v->arch.runstate_guest, &v->runstate, sizeof(v->runstate)); > >> + else > >> { > >> - guest_handle = &v->runstate_guest.p->state_entry_time + 1; > >> - guest_handle--; > >> - runstate.state_entry_time |= XEN_RUNSTATE_UPDATE; > >> - __raw_copy_to_guest(guest_handle, > >> - (void *)(&runstate.state_entry_time + 1) - 1, > >> 1); > >> - smp_wmb(); > >> + vcpu_runstate_get(v, &runstate); > >> + memcpy(v->arch.runstate_guest, &runstate, sizeof(v->runstate)); > >> } > >> - __copy_to_guest(runstate_guest(v), &runstate, 1); > >> + return 0; > >> +} > >> + > >> +int arch_vcpu_setup_runstate(struct vcpu *v, > >> + struct vcpu_register_runstate_memory_area > >> area) > >> +{ > >> + int rc; > >> + > >> + spin_lock(&v->arch.runstate_guest_lock); > >> + > >> + /* cleanup if we are recalled */ > >> + cleanup_runstate_vcpu_locked(v); > >> + > >> + rc = setup_runstate_vcpu_locked(v, (vaddr_t)area.addr.v); > >> + > >> + spin_unlock(&v->arch.runstate_guest_lock); > >> - if ( guest_handle ) > >> + return rc; > >> +} > >> + > >> + > >> +/* Update per-VCPU guest runstate shared memory area (if registered). */ > >> +static void update_runstate_area(struct vcpu *v) > >> +{ > >> + spin_lock(&v->arch.runstate_guest_lock); > >> + > >> + if ( v->arch.runstate_guest ) > >> { > >> - runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE; > >> - smp_wmb(); > >> - __raw_copy_to_guest(guest_handle, > >> - (void *)(&runstate.state_entry_time + 1) - 1, > >> 1); > >> + if ( VM_ASSIST(v->domain, runstate_update_flag) ) > >> + { > >> + v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE; > >> + write_atomic(&(v->arch.runstate_guest->state_entry_time), > >> + v->runstate.state_entry_time); > > > > NIT: You want to indent v-> at the same level as the argument from the > > first line. > > Ok > > > > > Also, I think you are missing a smp_wmb() here. > > The atomic operation itself would not need a barrier. > I do not see why you think a barrier is needed here. > For the internal structure ? We need to make sure the other-end sees the XEN_RUNSTATE_UPDATE change before other changes. Otherwise, due to cpu reordering, the writes could be seen in reverse order. (Technically the reader would have to use a read-barrier but that's a separate topic.)

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.