[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v5 for Xen 4.7 1/4] xen: enable per-VCPU parameter settings for RTDS scheduler



On Thu, 2016-02-04 at 16:50 -0600, Chong Li wrote:
> Add XEN_DOMCTL_SCHEDOP_getvcpuinfo and _putvcpuinfo hypercalls
> to independently get and set the scheduling parameters of each
> vCPU of a domain
> 
> Signed-off-by: Chong Li <chong.li@xxxxxxxxx>
> Signed-off-by: Meng Xu <mengxu@xxxxxxxxxxxxx>
> Signed-off-by: Sisu Xi <xisisu@xxxxxxxxx>
> 
> ---
> Changes on PATCH v4:
> 1) Add uint32_t vcpu_index to struct xen_domctl_scheduler_op.
> When processing XEN_DOMCTL_SCHEDOP_get/putvcpuinfo, we call
> hypercall_preemption_check in case the current hypercall lasts
> too long. If we decide to preempt the current hypercall, we record
> the index of the most-recent finished vcpu into the vcpu_index of
> struct xen_domctl_scheduler_op. So when we resume the hypercall after
> preemption, we start processing from the posion specified by
> vcpu_index,
> and don't need to repeat the work that has already been done in the
> hypercall before the preemption.
> (This design is based on the do_grant_table_op() in grant_table.c)
> 
Ok.

This now looks like it could be fine now. However, I remember asking
you to look at howÂXEN_SYSCTL_pcitopoinfo is handled, because I thought
that way of doing things is a better fit for this usecase.

So, I've got to ask to check that and give it a try implementing things
that way.

In fact, apart from being a better suited arrangement of the code, it
should, if I'm not mistaken, allow you to avoid having to introduce the
vcpu_index field in the domctl struct.

> diff --git a/xen/common/domctl.c b/xen/common/domctl.c
> index 46b967e..b294221 100644
> --- a/xen/common/domctl.c
> +++ b/xen/common/domctl.c
> @@ -847,9 +847,14 @@ long
> do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
> ÂÂÂÂÂ}
> Â
> ÂÂÂÂÂcase XEN_DOMCTL_scheduler_op:
> +ÂÂÂÂ{
> ÂÂÂÂÂÂÂÂÂret = sched_adjust(d, &op->u.scheduler_op);
> +ÂÂÂÂÂÂÂÂif ( ret == -ERESTART )
> +ÂÂÂÂÂÂÂÂÂÂÂÂret = hypercall_create_continuation(
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ__HYPERVISOR_domctl, "h", u_domctl);
> ÂÂÂÂÂÂÂÂÂcopyback = 1;
> ÂÂÂÂÂÂÂÂÂbreak;
> +ÂÂÂÂ}
> 
As said, have a look at theÂXEN_SYSCTL_pcitopoinfo case inÂdo_sysctl(),
and atÂxc_pcitopoinfo().

That basically works by re-issueing the hypercall in libxc, rather than
on create_continuation.

Elaborating a bit more, I think that is a better fit in this case
because:
Â- like in that case, you know how many vcpus you want to act upon inÂ
 Âlibxc already, and you can take advantage of that in there;
Â- it makes the xc_ call do something useful;
Â- it avoids (again, if I'm not wrong) having to clobber the Xen
 Âinterface with a useless and weirdly-looking vcpu_index field;
Â- it makes it very easy to process a bunch of vcpus, and then check
 Âfor preemption, instead of checking for that at each iteration
 Â(of course, this can be implemented with the 'standard' contination
 Âapproach as well, but again, I like how it's done there).

> diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
> index 3f1d047..34ae48d 100644
> --- a/xen/common/sched_rt.c
> +++ b/xen/common/sched_rt.c
> @@ -86,8 +86,21 @@
> Â#define RTDS_DEFAULT_PERIODÂÂÂÂÂ(MICROSECS(10000))
> Â#define RTDS_DEFAULT_BUDGETÂÂÂÂÂ(MICROSECS(4000))
> Â
> +/*
> + * Max period: max delta of time type
> + * Min period: 100 us, considering the scheduling overhead
> + */
>
Comments like these are not super useful. You need to explain why you
are using the maximum delta ("because period is added to the time a
vcpu activates, so this must not overflow etc..."), and express a
little bit better what you mean with "considering the scheduling
overhead".

Moreover...

> +#define RTDS_MAX_PERIODÂÂÂÂÂ(STIME_DELTA_MAX)
> +#define RTDS_MIN_PERIODÂÂÂÂÂ(MICROSECS(10))
> +
Comment says 100us, but then it's 10?

> +/*
> + * Min budget: 100 us
> + */
> +#define RTDS_MIN_BUDGETÂÂÂÂÂ(MICROSECS(10))
> +
As above, why? (scheduling overhead again, I would say).

> Â#define UPDATE_LIMIT_SHIFTÂÂÂÂÂÂ10
> Â#define MAX_SCHEDULEÂÂÂÂÂÂÂÂÂÂÂÂ(MILLISECS(1))
> +
>
Spurious blank line addition.

> Â/*
> Â * Flags
> Â */

> @@ -1163,6 +1168,94 @@ rt_dom_cntl(
> ÂÂÂÂÂÂÂÂÂ}
> ÂÂÂÂÂÂÂÂÂspin_unlock_irqrestore(&prv->lock, flags);
> ÂÂÂÂÂÂÂÂÂbreak;
> +ÂÂÂÂcase XEN_DOMCTL_SCHEDOP_getvcpuinfo:Â
> +ÂÂÂÂÂÂÂÂfor ( index = op->u.v.vcpu_index; index < op->u.v.nr_vcpus;
> index++ )
> +ÂÂÂÂÂÂÂÂ{
> +ÂÂÂÂÂÂÂÂÂÂÂÂspin_lock_irqsave(&prv->lock, flags);
> +ÂÂÂÂÂÂÂÂÂÂÂÂif ( copy_from_guest_offset(&local_sched,
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂop->u.v.vcpus, index, 1) )
>
vcpus is an handle, and it needs to be validated with
guest_handle_is_null().

> +ÂÂÂÂÂÂÂÂÂÂÂÂ{
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂrc = -EFAULT;
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂspin_unlock_irqrestore(&prv->lock, flags);
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂbreak;
> +ÂÂÂÂÂÂÂÂÂÂÂÂ}
> +ÂÂÂÂÂÂÂÂÂÂÂÂif ( local_sched.vcpuid >= d->max_vcpus ||
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂd->vcpu[local_sched.vcpuid] == NULL )
> +ÂÂÂÂÂÂÂÂÂÂÂÂ{
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂrc = -EINVAL;
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂspin_unlock_irqrestore(&prv->lock, flags);
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂbreak;
> +ÂÂÂÂÂÂÂÂÂÂÂÂ}
> +ÂÂÂÂÂÂÂÂÂÂÂÂsvc = rt_vcpu(d->vcpu[local_sched.vcpuid]);
> +
> +ÂÂÂÂÂÂÂÂÂÂÂÂlocal_sched.s.rtds.budget = svc->budget / MICROSECS(1);
> +ÂÂÂÂÂÂÂÂÂÂÂÂlocal_sched.s.rtds.period = svc->period / MICROSECS(1);
> +
> +ÂÂÂÂÂÂÂÂÂÂÂÂif ( __copy_to_guest_offset(op->u.v.vcpus, index,
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ&local_sched, 1) )
> +ÂÂÂÂÂÂÂÂÂÂÂÂ{
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂrc = -EFAULT;
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂspin_unlock_irqrestore(&prv->lock, flags);
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂbreak;
>
In any case, i.e., no matter whether the code keeps using this
approach, or becomes more similar toÂXEN_SYSCTL_pcitopoinfo, can these
failing paths be factored instead than repeated?

> +ÂÂÂÂÂÂÂÂÂÂÂÂ}
> +ÂÂÂÂÂÂÂÂÂÂÂÂspin_unlock_irqrestore(&prv->lock, flags);
> +ÂÂÂÂÂÂÂÂÂÂÂÂif ( hypercall_preempt_check() )
> +ÂÂÂÂÂÂÂÂÂÂÂÂ{
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂop->u.v.vcpu_index = index + 1;
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ/* hypercall (after preemption) will continue at
> vcpu_index */
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂrc = -ERESTART;
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂbreak;
> +ÂÂÂÂÂÂÂÂÂÂÂÂ}
> +ÂÂÂÂÂÂÂÂ}
> +ÂÂÂÂÂÂÂÂbreak;
> +ÂÂÂÂcase XEN_DOMCTL_SCHEDOP_putvcpuinfo:
>
All what I've said about adopting the XEN_SYSCTL_pcitopoinfo, handle
validation and (attempting to) factoring the error paths applies to
both get and put.

> +ÂÂÂÂÂÂÂÂfor ( index = op->u.v.vcpu_index; index < op->u.v.nr_vcpus;
> index++ )
> +ÂÂÂÂÂÂÂÂ{
> +ÂÂÂÂÂÂÂÂÂÂÂÂspin_lock_irqsave(&prv->lock, flags);
> +ÂÂÂÂÂÂÂÂÂÂÂÂif ( copy_from_guest_offset(&local_sched,
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂop->u.v.vcpus, index, 1) )
> +ÂÂÂÂÂÂÂÂÂÂÂÂ{
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂrc = -EFAULT;
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂspin_unlock_irqrestore(&prv->lock, flags);
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂbreak;
> +ÂÂÂÂÂÂÂÂÂÂÂÂ}
> +ÂÂÂÂÂÂÂÂÂÂÂÂif ( local_sched.vcpuid >= d->max_vcpus ||
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂd->vcpu[local_sched.vcpuid] == NULL )
> +ÂÂÂÂÂÂÂÂÂÂÂÂ{
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂrc = -EINVAL;
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂspin_unlock_irqrestore(&prv->lock, flags);
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂbreak;
> +ÂÂÂÂÂÂÂÂÂÂÂÂ}
> +ÂÂÂÂÂÂÂÂÂÂÂÂsvc = rt_vcpu(d->vcpu[local_sched.vcpuid]);
> +ÂÂÂÂÂÂÂÂÂÂÂÂperiod = MICROSECS(local_sched.s.rtds.period);
> +ÂÂÂÂÂÂÂÂÂÂÂÂbudget = MICROSECS(local_sched.s.rtds.budget);
> +ÂÂÂÂÂÂÂÂÂÂÂÂif ( period > RTDS_MAX_PERIOD || budget <
> RTDS_MIN_BUDGET ||
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂbudget > period )
>
Isn't checking against RTDS_MIN_PERIOD missing?

> +ÂÂÂÂÂÂÂÂÂÂÂÂ{
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂrc = -EINVAL;
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂspin_unlock_irqrestore(&prv->lock, flags);
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂbreak;
> +ÂÂÂÂÂÂÂÂÂÂÂÂ}
> +
> +ÂÂÂÂÂÂÂÂÂÂÂÂ/*Â
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂ* We accept period/budget less than 100 us, but will
> warn users about
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂ* the large scheduling overhead due to it
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂ*/
> +ÂÂÂÂÂÂÂÂÂÂÂÂif ( period < MICROSECS(100) || budget < MICROSECS(100)
> )
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂprintk("Warning: period/budget less than 100 micro-
> secs "
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ"results in large scheduling overhead.\n");
> +
"WARNING: period or budget set to less than 100us.\n"
" This may result in high scheduling overhead"

Or something like this. But try to make the two lines as independent as
possible. In general, we try not to break these messages, so that
grepping the source code for any substring of the will work. In this
case, we indeed need to break this, but we still keep grep-ability in
mind.

See, for example, sched_credit2:2147.

> +ÂÂÂÂÂÂÂÂÂÂÂÂsvc->period = period;
> +ÂÂÂÂÂÂÂÂÂÂÂÂsvc->budget = budget;
> +ÂÂÂÂÂÂÂÂÂÂÂÂspin_unlock_irqrestore(&prv->lock, flags);
> +ÂÂÂÂÂÂÂÂÂÂÂÂif ( hypercall_preempt_check() )
> +ÂÂÂÂÂÂÂÂÂÂÂÂ{
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂop->u.v.vcpu_index = index + 1;
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂrc = -ERESTART;
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂbreak;
> +ÂÂÂÂÂÂÂÂÂÂÂÂ}
> +ÂÂÂÂÂÂÂÂ}
> +ÂÂÂÂÂÂÂÂbreak;
> ÂÂÂÂÂ}
> Â
> ÂÂÂÂÂreturn rc;
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index c195129..f4a4032 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -1148,10 +1148,19 @@ long sched_adjust(struct domain *d, struct
> xen_domctl_scheduler_op *op)
> ÂÂÂÂÂif ( ret )
> ÂÂÂÂÂÂÂÂÂreturn ret;
> Â
> -ÂÂÂÂif ( (op->sched_id != DOM2OP(d)->sched_id) ||
> -ÂÂÂÂÂÂÂÂÂ((op->cmd != XEN_DOMCTL_SCHEDOP_putinfo) &&
> -ÂÂÂÂÂÂÂÂÂÂ(op->cmd != XEN_DOMCTL_SCHEDOP_getinfo)) )
> +ÂÂÂÂif ( op->sched_id != DOM2OP(d)->sched_id )
> ÂÂÂÂÂÂÂÂÂreturn -EINVAL;
> +ÂÂÂÂelse
> +ÂÂÂÂÂÂÂÂswitch ( op->cmd )
> +ÂÂÂÂÂÂÂÂ{
> +ÂÂÂÂÂÂÂÂcase XEN_DOMCTL_SCHEDOP_putinfo:
> +ÂÂÂÂÂÂÂÂcase XEN_DOMCTL_SCHEDOP_getinfo:
> +ÂÂÂÂÂÂÂÂcase XEN_DOMCTL_SCHEDOP_putvcpuinfo:
> +ÂÂÂÂÂÂÂÂcase XEN_DOMCTL_SCHEDOP_getvcpuinfo:
> +ÂÂÂÂÂÂÂÂÂÂÂÂbreak;
> +ÂÂÂÂÂÂÂÂdefault:
> +ÂÂÂÂÂÂÂÂÂÂÂÂreturn -EINVAL;
> +ÂÂÂÂÂÂÂÂ}
> Â
Maybe I'm misremembering (in which case, sorry), was using a switch
like this instead of an if suggested during one of the previous rounds?

> diff --git a/xen/include/public/domctl.h
> b/xen/include/public/domctl.h
> index 7a56b3f..6f429ec 100644
> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -338,24 +338,57 @@
> DEFINE_XEN_GUEST_HANDLE(xen_domctl_max_vcpus_t);
> Â#define XEN_SCHEDULER_ARINC653 7
> Â#define XEN_SCHEDULER_RTDSÂÂÂÂÂ8
> Â
> -/* Set or get info? */
> +typedef struct xen_domctl_sched_credit {
> +ÂÂÂÂuint16_t weight;
> +ÂÂÂÂuint16_t cap;
> +} xen_domctl_sched_credit_t;
> +
> +typedef struct xen_domctl_sched_credit2 {
> +ÂÂÂÂuint16_t weight;
> +} xen_domctl_sched_credit2_t;
> +
> +typedef struct xen_domctl_sched_rtds {
> +ÂÂÂÂuint32_t period;
> +ÂÂÂÂuint32_t budget;
> +} xen_domctl_sched_rtds_t;
> +
> +typedef struct xen_domctl_schedparam_vcpu {
> +ÂÂÂÂunion {
> +ÂÂÂÂÂÂÂÂxen_domctl_sched_credit_t credit;
> +ÂÂÂÂÂÂÂÂxen_domctl_sched_credit2_t credit2;
> +ÂÂÂÂÂÂÂÂxen_domctl_sched_rtds_t rtds;
> +ÂÂÂÂ} s;
> +ÂÂÂÂuint16_t vcpuid;
> +ÂÂÂÂuint16_t padding[3];
> +} xen_domctl_schedparam_vcpu_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_domctl_schedparam_vcpu_t);
> +
> +/* Set or get info?
>
Comment style (opening '/*').

> + * For schedulers supporting per-vcpu settings (e.g., RTDS):
> + * using XEN_DOMCTL_SCHEDOP_putinfo sets params for all vcpus;
>
kill the "using", and maybe indent the three lines by 1 and/or put a
marker (like '-' or '+') to make this looks like a bulleted list.

> + * using XEN_DOMCTL_SCHEDOP_getinfo gets default params;
> + * using XEN_DOMCTL_SCHEDOP_put(get)vcpuinfo sets (gets) params of
> vcpus;
>
I'd leave an empty line here (still within the comment).

> + * For schedulers not supporting per-vcpu settings:
> + * using XEN_DOMCTL_SCHEDOP_putinfo sets params for all vcpus;
> + * using XEN_DOMCTL_SCHEDOP_getinfo gets domain-wise params;
>
The wording of these two items above should be the same.

> + * using XEN_DOMCTL_SCHEDOP_put(get)vcpuinfo returns error code;
>
Just "returns error".

> + */
> Â#define XEN_DOMCTL_SCHEDOP_putinfo 0
> Â#define XEN_DOMCTL_SCHEDOP_getinfo 1
> +#define XEN_DOMCTL_SCHEDOP_putvcpuinfo 2
> +#define XEN_DOMCTL_SCHEDOP_getvcpuinfo 3
> Âstruct xen_domctl_scheduler_op {
> ÂÂÂÂÂuint32_t sched_id;ÂÂ/* XEN_SCHEDULER_* */
> ÂÂÂÂÂuint32_t cmd;ÂÂÂÂÂÂÂ/* XEN_DOMCTL_SCHEDOP_* */
> ÂÂÂÂÂunion {
> -ÂÂÂÂÂÂÂÂstruct xen_domctl_sched_credit {
> -ÂÂÂÂÂÂÂÂÂÂÂÂuint16_t weight;
> -ÂÂÂÂÂÂÂÂÂÂÂÂuint16_t cap;
> -ÂÂÂÂÂÂÂÂ} credit;
> -ÂÂÂÂÂÂÂÂstruct xen_domctl_sched_credit2 {
> -ÂÂÂÂÂÂÂÂÂÂÂÂuint16_t weight;
> -ÂÂÂÂÂÂÂÂ} credit2;
> -ÂÂÂÂÂÂÂÂstruct xen_domctl_sched_rtds {
> -ÂÂÂÂÂÂÂÂÂÂÂÂuint32_t period;
> -ÂÂÂÂÂÂÂÂÂÂÂÂuint32_t budget;
> -ÂÂÂÂÂÂÂÂ} rtds;
> +ÂÂÂÂÂÂÂÂxen_domctl_sched_credit_t credit;
> +ÂÂÂÂÂÂÂÂxen_domctl_sched_credit2_t credit2;
> +ÂÂÂÂÂÂÂÂxen_domctl_sched_rtds_t rtds;
> +ÂÂÂÂÂÂÂÂstruct {
> +ÂÂÂÂÂÂÂÂÂÂÂÂXEN_GUEST_HANDLE_64(xen_domctl_schedparam_vcpu_t) vcpus;
> +ÂÂÂÂÂÂÂÂÂÂÂÂuint32_t nr_vcpus;
> +ÂÂÂÂÂÂÂÂÂÂÂÂuint32_t vcpu_index;
> +ÂÂÂÂÂÂÂÂ} v;
> ÂÂÂÂÂ} u;
> Â};
> Âtypedef struct xen_domctl_scheduler_op xen_domctl_scheduler_op_t;

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.