Xen project Mailing List

Re: [Xen-devel] [PATCH] xen: add hypercall option to temporarily pin a vcpu

On 26/02/16 11:39, Jan Beulich wrote: >>>> On 25.02.16 at 17:50, <JGross@xxxxxxxx> wrote: >> @@ -670,7 +676,13 @@ int cpu_disable_scheduler(unsigned int cpu) >> if ( cpumask_empty(&online_affinity) && >> cpumask_test_cpu(cpu, v->cpu_hard_affinity) ) >> { >> - printk(XENLOG_DEBUG "Breaking affinity for %pv\n", v); >> + if ( v->affinity_broken ) >> + { >> + /* The vcpu is temporarily pinned, can't move it. */ >> + vcpu_schedule_unlock_irqrestore(lock, flags, v); >> + ret = -EBUSY; >> + continue; >> + } > > So far the function can only return 0 or -EAGAIN. By using "continue" > here you will make it impossible for the caller to reliably determine > whether possibly both things failed. Despite -EBUSY being a logical > choice here, I think you'd better use -EAGAIN here too. And it needs > to be determined whether continuing the loop in this as well as the > pre-existing cases is actually the right thing to do. EBUSY vs. EAGAIN: by returning EAGAIN I would signal to Xen tools that the hypervisor is currently not able to do the desired operation (especially removing a cpu from a cpupool), but the situation will change automatically via scheduling. EBUSY will stop retries in Xen tools and this is want I want here: I can't be sure the situation will change soon. Regarding continuation of the loop: I think you are right in the EBUSY case: I should break out of the loop. I should not do so in the EAGAIN case as I want to remove as many vcpus from the physical cpu as possible without returning to the Xen tools in between. > >> @@ -679,6 +691,8 @@ int cpu_disable_scheduler(unsigned int cpu) >> v->affinity_broken = 1; >> } >> >> + printk(XENLOG_DEBUG "Breaking affinity for %pv\n", v); > > Wouldn't it be even better to make this the "else" to the > preceding if(), since in the suspend case this is otherwise going > to be printed for every vCPU not currently running on pCPU0? Yes, I'll change it. > >> @@ -753,14 +767,22 @@ static int vcpu_set_affinity( >> struct vcpu *v, const cpumask_t *affinity, cpumask_t *which) >> { >> spinlock_t *lock; >> + int ret = 0; >> >> lock = vcpu_schedule_lock_irq(v); >> >> - cpumask_copy(which, affinity); >> + if ( v->affinity_broken ) >> + { >> + ret = -EBUSY; >> + } > > Unnecessary braces. Will remove. > >> @@ -979,6 +1001,53 @@ void watchdog_domain_destroy(struct domain *d) >> kill_timer(&d->watchdog_timer[i]); >> } >> >> +static long do_pin_temp(int cpu) >> +{ >> + struct vcpu *v = current; >> + spinlock_t *lock; >> + long ret = -EINVAL; >> + >> + lock = vcpu_schedule_lock_irq(v); >> + >> + if ( cpu == -1 ) >> + { >> + if ( v->affinity_broken ) >> + { >> + cpumask_copy(v->cpu_hard_affinity, v->cpu_hard_affinity_saved); >> + v->affinity_broken = 0; >> + set_bit(_VPF_migrating, &v->pause_flags); >> + ret = 0; >> + } >> + } >> + else if ( cpu < nr_cpu_ids && cpu >= 0 ) > > Perhaps easier to simply use "cpu < 0" in the first if()? Okay. > >> + { >> + if ( v->affinity_broken ) >> + { >> + ret = -EBUSY; >> + } >> + else if ( cpumask_test_cpu(cpu, VCPU2ONLINE(v)) ) >> + { > > This is a rather ugly restriction: How would a caller fulfill its job > when this is not the case? He can't. We should document that at least on hardware requiring this functionality it is a bad idea to remove cpu 0 from the cpupool with the hardware domain. > >> @@ -1088,6 +1157,23 @@ ret_t do_sched_op(int cmd, >> XEN_GUEST_HANDLE_PARAM(void) arg) >> break; >> } >> >> + case SCHEDOP_pin_temp: >> + { >> + struct sched_pin_temp sched_pin_temp; >> + >> + ret = -EFAULT; >> + if ( copy_from_guest(&sched_pin_temp, arg, 1) ) >> + break; >> + >> + ret = xsm_schedop_pin_temp(XSM_PRIV); >> + if ( ret ) >> + break; >> + >> + ret = do_pin_temp(sched_pin_temp.pcpu); >> + >> + break; >> + } > > So having come here I still don't see why this is called "temp": > Nothing enforces this to be a temporary state, and hence the > sub-op name currently is actively misleading. I've chosen this name as the old affinity is saved and can (and should) be recovered later. So it is intended to be temporary. >> --- a/xen/include/public/sched.h >> +++ b/xen/include/public/sched.h >> @@ -118,6 +118,15 @@ >> * With id != 0 and timeout != 0, poke watchdog timer and set new timeout. >> */ >> #define SCHEDOP_watchdog 6 >> + >> +/* >> + * Temporarily pin the current vcpu to one physical cpu or undo that >> pinning. >> + * @arg == pointer to sched_pin_temp_t structure. >> + * >> + * Setting pcpu to -1 will undo a previous temporary pinning. >> + * This call is allowed for domains with domain control privilege only. >> + */ > > Why domain control privilege? I'd actually suggest limiting the > ability to the hardware domain, at once eliminating the need > for the XSM check. Sure, I'd be happy to simplify the patch. > >> +struct sched_pin_temp { >> + int pcpu; > > Fixed width types only please in the public interface. Also this needs > an entry in xen/include/xlat.lst, and a consumer of the resulting > check macro. Aah, okay. Thanks for the review, Juergen _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.