Xen project Mailing List

Re: [Xen-devel] [PATCH v3] xen: introduce VCPUOP_register_runstate_phys_memory_area hypercall

>>> On 24.05.19 at 20:12, <andrii.anisov@xxxxxxxxx> wrote: > From: Andrii Anisov <andrii_anisov@xxxxxxxx> > > Existing interface to register runstate are with its virtual address > is prone to issues which became more obvious with KPTI enablement in > guests. The nature of those issues is the fact that the guest could > be interrupted by the hypervisor at any time, and there is no guarantee > to have the registered virtual address translated with the currently > available guest's page tables. Before the KPTI such a situation was > possible in case the guest is caught in the middle of PT processing > (e.g. superpage shattering). With the KPTI this happens also when the > guest runs userspace, so has a pretty high probability. Except when there's no need for KPTI in the guest in the first place, as is the case for x86-64 PV guests. I think this is worthwhile clarifying. > So it was agreed to register runstate with the guest's physical address > so that its mapping is permanent from the hypervisor point of view. [1] > > The hypercall employs the same vcpu_register_runstate_memory_area > structure for the interface, but requires a registered area to not > cross a page boundary. > > [1] > https://lists.xenproject.org/archives/html/xen-devel/2019-02/msg00416.html > > Signed-off-by: Andrii Anisov <andrii_anisov@xxxxxxxx> I would have really hoped that you would outline the intended interaction between both interfaces, such that while reviewing one can judge whether you're actually matching the goal. I think it is actually mandatory to make explicit in the public header what level of mixing is permitted, what is not permitted, and what mix of requests is simply undefined. In particular, while we did work out during prior discussions that some level of mixing has to be permitted, I'm unconvinced that arbitrary mixing has to be allowed. For example, switching from one model to another could be permitted only with just a single active vCPU. This might allow to do away again with the runstate_in_use field you add. > --- a/xen/arch/arm/domain.c > +++ b/xen/arch/arm/domain.c > @@ -274,17 +274,15 @@ static void ctxt_switch_to(struct vcpu *n) > virt_timer_restore(n); > } > > -/* Update per-VCPU guest runstate shared memory area (if registered). */ > -static void update_runstate_area(struct vcpu *v) > +static void update_runstate_by_gvaddr(struct vcpu *v) > { > void __user *guest_handle = NULL; > > - if ( guest_handle_is_null(runstate_guest(v)) ) > - return; > + ASSERT(!guest_handle_is_null(runstate_guest_virt(v))); > > if ( VM_ASSIST(v->domain, runstate_update_flag) ) > { > - guest_handle = &v->runstate_guest.p->state_entry_time + 1; > + guest_handle = &v->runstate_guest.virt.p->state_entry_time + 1; > guest_handle--; > v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE; > __raw_copy_to_guest(guest_handle, > @@ -292,7 +290,7 @@ static void update_runstate_area(struct vcpu *v) > smp_wmb(); > } > > - __copy_to_guest(runstate_guest(v), &v->runstate, 1); > + __copy_to_guest(runstate_guest_virt(v), &v->runstate, 1); In a prior version you did the mechanical part of adjusting the VA-based code in a prereq patch, aiding review. Is there a particular reason you folded everything into one patch now? > @@ -303,6 +301,53 @@ static void update_runstate_area(struct vcpu *v) > } > } > > +static void update_runstate_by_gpaddr(struct vcpu *v) > +{ > + struct vcpu_runstate_info *runstate = > + (struct vcpu_runstate_info *)v->runstate_guest.phys; What's the cast for here? > @@ -1614,6 +1613,92 @@ bool update_runstate_area(struct vcpu *v) > return rc; > } > > +static bool update_runstate_by_gpaddr_native(struct vcpu *v) > +{ > + struct vcpu_runstate_info *runstate = > + (struct vcpu_runstate_info *)v->runstate_guest.phys; > + > + ASSERT(runstate != NULL); > + > + if ( VM_ASSIST(v->domain, runstate_update_flag) ) > + { > + runstate->state_entry_time |= XEN_RUNSTATE_UPDATE; > + smp_wmb(); > + v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE; > + } > + > + memcpy(v->runstate_guest.phys, &v->runstate, sizeof(v->runstate)); > + > + if ( VM_ASSIST(v->domain, runstate_update_flag) ) > + { > + runstate->state_entry_time &= ~XEN_RUNSTATE_UPDATE; > + smp_wmb(); > + v->runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE; > + } > + > + return true; > +} I can't help thinking that this matches the Arm code. Can common things please be put in common code? I may be asking too much, but if the pre-existing implementations are similar enough (I didn't check) perhaps they could be folded in a first step, too? > +static bool update_runstate_by_gpaddr_compat(struct vcpu *v) > +{ > + struct compat_vcpu_runstate_info *runstate = > + (struct compat_vcpu_runstate_info *)v->runstate_guest.phys; > + > + ASSERT(runstate != NULL); > + > + if ( VM_ASSIST(v->domain, runstate_update_flag) ) > + { > + runstate->state_entry_time |= XEN_RUNSTATE_UPDATE; > + smp_wmb(); > + v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE; > + } > + > + { > + struct compat_vcpu_runstate_info info; > + XLAT_vcpu_runstate_info(&info, &v->runstate); > + memcpy(v->runstate_guest.phys, &info, sizeof(info)); > + } > + else > + memcpy(v->runstate_guest.phys, &v->runstate, sizeof(v->runstate)); This "else" does not seem to be paired with an if(). Does this build at all? > --- a/xen/arch/x86/x86_64/domain.c > +++ b/xen/arch/x86/x86_64/domain.c > @@ -12,6 +12,8 @@ > CHECK_vcpu_get_physid; > #undef xen_vcpu_get_physid > > +extern void discard_runstate_area(struct vcpu *v); No, this is not allowed. The declaration must be visible to both consumer and producer, so that when either side is changed things won't build until the other side gets changed too. > @@ -35,8 +37,16 @@ arch_compat_vcpu_op( > !compat_handle_okay(area.addr.h, 1) ) > break; > > + while( xchg(&v->runstate_in_use, 1) == 0); At the very least such loops want a cpu_relax() in their bodies. But this being on a hypercall path - are there theoretical guarantees that a guest can't abuse this to lock up a CPU? Furthermore I don't understand how this is supposed to work in the first place. The first xchg() will store 1, no matter what. Thus on the second iteration you'll find the wanted value unless the other side stored 0. Yet the other side is a simple xchg() too. Hence its first attempt would fail, but its second attempt (which we didn't exit the critical region here yet) would succeed. Also - style. > --- a/xen/common/domain.c > +++ b/xen/common/domain.c > @@ -698,6 +698,74 @@ int rcu_lock_live_remote_domain_by_id(domid_t dom, > struct domain **d) > return 0; > } > > +static void unmap_runstate_area(struct vcpu *v) > +{ > + mfn_t mfn; > + > + if ( ! v->runstate_guest.phys ) Stray blank after ! . > + return; > + > + mfn = domain_page_map_to_mfn(v->runstate_guest.phys); > + > + unmap_domain_page_global((void *) > + ((unsigned long)v->runstate_guest.phys & > + PAGE_MASK)); > + > + v->runstate_guest.phys = NULL; I think you would better store NULL before unmapping. > @@ -734,7 +802,10 @@ int domain_kill(struct domain *d) > if ( cpupool_move_domain(d, cpupool0) ) > return -ERESTART; > for_each_vcpu ( d, v ) > + { > + discard_runstate_area_locked(v); > unmap_vcpu_info(v); > + } What is the "locked" aspect here about? The guest itself is dead (i.e. fully paused) at this point, isn't it? And it won't ever be unpaused again. > --- a/xen/include/xen/sched.h > +++ b/xen/include/xen/sched.h > @@ -163,17 +163,31 @@ struct vcpu > void *sched_priv; /* scheduler-specific data */ > > struct vcpu_runstate_info runstate; > + > + enum { > + RUNSTATE_NONE = 0, > + RUNSTATE_PADDR = 1, > + RUNSTATE_VADDR = 2, > + } runstate_guest_type; > + > + unsigned long runstate_in_use; Why "unsigned long"? Isn't a bool all you need? Also these would now all want to be grouped in a sub-structure named "runstate", rather than having "runstate_" prefixes. > + void* phys; Bad ordering between * and the blanks. There ought to be a blank ahead of the *, and I don't see why you need any after it. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.