[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v6 for Xen 4.7 1/4] xen: enable per-VCPU parameter settings for RTDS scheduler



On Tue, 2016-03-08 at 02:10 -0700, Jan Beulich wrote:
> > > > On 07.03.16 at 18:53, <dario.faggioli@xxxxxxxxxx> wrote:
> > On Mon, 2016-03-07 at 09:40 -0700, Jan Beulich wrote:
> > > 
> > IIRC, I was looking at how XEN_SYSCTL_pcitopoinfo is handled, for
> > reference, and that has some guest_handle_is_null()==>EINVAL
> > sainity
> > checking (in xen/common/sysctl.c), which, when I thought about it,
> > made
> > sense to me.
> > 
> > My reasoning was, sort of:
> >  1. if the handle is NULL, no point getting into the somewhat 
> >     complicated logic of the while,
> >  2. more accurate error reporting: as being passed a NULL handler 
> >     looked something we could identify and call invalid, rather
> > than 
> >     waiting for the copy to fault.
> I think the XEN_SYSCTL_pcitopoinfo was misguided in this respect,
> cloning non applicable logic here which returns the number of needed
> (array) elements in such a case for a few other operations.
> 
Sorry, I'm not sure I am getting: are you saying that, for _these_
domctls, we should consider the handle being NULL as a way of the
caller to ask for the size of the array?

*If* yes, well, that is "just" the number of vcpus of the guest, but,
nevertheless, that, FWIW, looks fine to me.

> > About the structure of the code, as said above, I do like
> > how XEN_SYSCTL_pcitopoinfo ended up being handled, I think it is a
> > great fit for this specific case and, comparing at both this and
> > previous version, I do think this one is (bugs apart) looking
> > better.
> > 
> > I'm sure I said this --long ago-- when discussing v4 (and maybe
> > even
> > previous versions), as well as more recently, when reviewing v5,
> > and
> > that's why Chong (finally! :-D) did it.
> > 
> > So, with the comment in place (and with bugs fixed :-)), are you
> > (Jan)
> > ok with this being done this way?
>
> Well, this _might_ be acceptable for "get" (since the caller
> abandoning the sequence of calls prematurely is no problem),
> but for "set" it looks less suitable, as similar abandoning would
> leave the guest in some inconsistent / unintended state.
>
Are you referring to the fact that, with this interface, the caller has
the chance to leave intentionally, or that it may happen that not all
vcpus are updated because of some bug (still in the caller)?

Well, if it's intentional, or even if the caller is buggy in the sense
that the code is written in a way that it misses updating some vcpus
(and if the interface and the behavior is well documented, as you
request), then one gets what he "wants" (and, in the latter case, it
wouldn't be too hard to debug and figure out the issue, I think).

If it's for bugs (still in the caller) like copy_from_guest_offset()
faulting because the array is too small, that can happen if using
continuation too, can't it? And it would still leave the guest in
similar inconsistent or unintended state, IMO...

One last point. Of course, since we are talking about bugs, the final
status is not the one desired, but it's not inconsistent in the sense
that the guest can't continue running, or crashes, or anything like
that. It's something like:
 - you wants all the 32 vcpus of guest A to have these new parameters
 - due to a bug, you're (for instance) passing me an array with only 
   16 vcpus parameters
 - result: onlyt 16 vcpus will have the new parameters.

>  The
> issue with XEN_SYSCTL_pcitopoinfo was, iirc, the lack of a
> good way of encoding the continuation information, and while
> that would seem applicable here too I'm not sure now whether
> doing it the way it was done was the best choice. 
>
As far as I can remember and see, it was being done by means of an
additional dedicated parameter in the handle (called ti->first_dev in
that case). Then at some point, you said:

http://lists.xenproject.org/archives/html/xen-devel/2015-03/msg02623.html
"Considering this is a tools only interface, enforcing a not too high
 limit on num_devs would seem better than this not really clean
 continuation mechanism. The (tool stack) caller(s) can be made
 iterate."

With which I did agree (and I still do :-)), as well as I agree on the
fact that we basically are in the same situation here.

Chong tried doing things with continuations for a few rounds, including
v5, which is here:
http://lists.xenproject.org/archives/html/xen-devel/2016-02/msg00817.html

and he also used an additional field (vcpu_index).

So, all this being said, my preference stays for the way the code looks
like in this version (with all the due commenting added). Of course,
it's your preference that really matters here, me not being the
maintainer of this code. :-)

So, how do you prefer Chong to continue doing this?

> Clearly
> stating (in the public interface header) that certain normally
> input-only fields are volatile would allow the continuation to
> be handled without tool stack assistance afaict.
>
Which (sorry, I'm not getting again) I guess is something
different/more than what was done in v5 (the relevant hunks of which
I'm pasting at the bottom of this email, for convenience)?

> > BTW, Chong, I'm not sure this has to do with what Jan is saying,
> > but
> > looking again at XEN_SYSCTL_pcitopoinfo, it looks to me you're
> > missing
> > copying nr_vcpus back up to the guest (which is actually what makes
> > libxc knows whether all vcpus have been processed or now).
> Indeed that is why the conditional makes sense there, but not here.
> And the copying back is already being taken care of by the caller of
> sched_adjust().
> 
Indeed. So your original point was: we're always copying back, so it's
fine to always update the field, with the only exception of errors
having occurred. I do get it now.

Regards,
Dario

---
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index 46b967e..b294221 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -847,9 +847,14 @@ long
do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) 
u_domctl)
     }
 
     case XEN_DOMCTL_scheduler_op:
+    {
         ret = sched_adjust(d, &op->u.scheduler_op);
+        if ( ret == -ERESTART )
+            ret = hypercall_create_continuation(
+                __HYPERVISOR_domctl, "h", u_domctl);
         copyback = 1;
         break;
+    }
 
     case XEN_DOMCTL_getdomaininfo:
     {

diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 3f1d047..34ae48d 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -1163,6 +1168,94 @@ rt_dom_cntl(
         }
         spin_unlock_irqrestore(&prv->lock, flags);
         break;
+    case XEN_DOMCTL_SCHEDOP_getvcpuinfo: 
+        for ( index = op->u.v.vcpu_index; index < op->u.v.nr_vcpus; index++ )
+        {
+            spin_lock_irqsave(&prv->lock, flags);
+            if ( copy_from_guest_offset(&local_sched,
+                          op->u.v.vcpus, index, 1) )
+            {
+                rc = -EFAULT;
+                spin_unlock_irqrestore(&prv->lock, flags);
+                break;
+            }
+            if ( local_sched.vcpuid >= d->max_vcpus ||
+                          d->vcpu[local_sched.vcpuid] == NULL )
+            {
+                rc = -EINVAL;
+                spin_unlock_irqrestore(&prv->lock, flags);
+                break;
+            }
+            svc = rt_vcpu(d->vcpu[local_sched.vcpuid]);
+
+            local_sched.s.rtds.budget = svc->budget / MICROSECS(1);
+            local_sched.s.rtds.period = svc->period / MICROSECS(1);
+
+            if ( __copy_to_guest_offset(op->u.v.vcpus, index,
+                    &local_sched, 1) )
+            {
+                rc = -EFAULT;
+                spin_unlock_irqrestore(&prv->lock, flags);
+                break;
+            }
+            spin_unlock_irqrestore(&prv->lock, flags);
+            if ( hypercall_preempt_check() )
+            {
+                op->u.v.vcpu_index = index + 1;
+                /* hypercall (after preemption) will continue at vcpu_index */
+                rc = -ERESTART;
+                break;
+            }
+        }
+        break;
+    case XEN_DOMCTL_SCHEDOP_putvcpuinfo:
+        for ( index = op->u.v.vcpu_index; index < op->u.v.nr_vcpus; index++ )
+        {
+            spin_lock_irqsave(&prv->lock, flags);
+            if ( copy_from_guest_offset(&local_sched,
+                          op->u.v.vcpus, index, 1) )
+            {
+                rc = -EFAULT;
+                spin_unlock_irqrestore(&prv->lock, flags);
+                break;
+            }
+            if ( local_sched.vcpuid >= d->max_vcpus ||
+                          d->vcpu[local_sched.vcpuid] == NULL )
+            {
+                rc = -EINVAL;
+                spin_unlock_irqrestore(&prv->lock, flags);
+                break;
+            }
+            svc = rt_vcpu(d->vcpu[local_sched.vcpuid]);
+            period = MICROSECS(local_sched.s.rtds.period);
+            budget = MICROSECS(local_sched.s.rtds.budget);
+            if ( period > RTDS_MAX_PERIOD || budget < RTDS_MIN_BUDGET ||
+                          budget > period )
+            {
+                rc = -EINVAL;
+                spin_unlock_irqrestore(&prv->lock, flags);
+                break;
+            }
+
+            /* 
+             * We accept period/budget less than 100 us, but will warn users 
about
+             * the large scheduling overhead due to it
+             */
+            if ( period < MICROSECS(100) || budget < MICROSECS(100) )
+                printk("Warning: period/budget less than 100 micro-secs "
+                       "results in large scheduling overhead.\n");
+
+            svc->period = period;
+            svc->budget = budget;
+            spin_unlock_irqrestore(&prv->lock, flags);
+            if ( hypercall_preempt_check() )
+            {
+                op->u.v.vcpu_index = index + 1;
+                rc = -ERESTART;
+                break;
+            }
+        }
+        break;
     }
 
     return rc;
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.