Xen project Mailing List

Re: [Xen-devel] [PATCH v1 1/4] xen: add real time scheduler rt

Hi Dario, I have modified the code based on your comments in the previous email. Since it has a lot of points to modify, if I don't reply to one point, it means I agree with your comments and have changed it. :-) > >> +/* >> + * Useful inline functions >> + */ >> +static inline struct rt_private *RT_PRIV(const struct scheduler *_ops) >> +{ >> + return _ops->sched_data; >> +} >> + >> +static inline struct rt_vcpu *RT_VCPU(const struct vcpu *_vcpu) >> +{ >> + return _vcpu->sched_priv; >> +} >> + >> +static inline struct rt_dom *RT_DOM(const struct domain *_dom) >> +{ >> + return _dom->sched_priv; >> +} >> + >> +static inline struct list_head *RUNQ(const struct scheduler *_ops) >> +{ >> + return &RT_PRIV(_ops)->runq; >> +} >> + > I see what's happening, and I remember the suggestion of using static > inline-s, with which I agree. At that point, however, I'd have the > function names lower case-ed. > > It's probably not a big deal, and I don't think we have any official > saying about this, but it looks more consistent to me. Because other schedulers uses the upper-case name, (because they are macros instead of static inline,) in order to keep the style consistent with other schedulers, I think I will keep the name in upper-case. I can send another patch set to change all other schedulers' macro definition of these functions to static inline, and then change the name to lower-case. What do you think? >> +/* >> + * Debug related code, dump vcpu/cpu information >> + */ >> +static void >> +rt_dump_vcpu(const struct rt_vcpu *svc) >> +{ >> + char cpustr[1024]; >> + >> + ASSERT(svc != NULL); >> + /* flag vcpu */ >> + if( svc->sdom == NULL ) >> + return; >> + >> + cpumask_scnprintf(cpustr, sizeof(cpustr), svc->vcpu->cpu_hard_affinity); >> + printk("[%5d.%-2u] cpu %u, (%"PRId64", %"PRId64"), cur_b=%"PRId64" >> cur_d=%"PRId64" last_start=%"PRId64" onR=%d runnable=%d cpu_hard_affinity=%s >> ", >> > long line. > > Also, we have PRI_stime (look in xen/include/xen/time.h). > >> + svc->vcpu->domain->domain_id, >> + svc->vcpu->vcpu_id, >> + svc->vcpu->processor, >> + svc->period, >> + svc->budget, >> + svc->cur_budget, >> + svc->cur_deadline, >> + svc->last_start, >> + __vcpu_on_runq(svc), >> + vcpu_runnable(svc->vcpu), >> + cpustr); >> + memset(cpustr, 0, sizeof(cpustr)); >> + cpumask_scnprintf(cpustr, sizeof(cpustr), >> cpupool_scheduler_cpumask(svc->vcpu->domain->cpupool)); > here too. > >> + printk("cpupool=%s\n", cpustr); >> + >> + /* TRACE */ >> + { >> + struct { >> + unsigned dom:16,vcpu:16; >> + unsigned processor; >> + unsigned cur_budget_lo, cur_budget_hi, cur_deadline_lo, >> cur_deadline_hi; >> + unsigned is_vcpu_on_runq:16,is_vcpu_runnable:16; >> + } d; >> + d.dom = svc->vcpu->domain->domain_id; >> + d.vcpu = svc->vcpu->vcpu_id; >> + d.processor = svc->vcpu->processor; >> + d.cur_budget_lo = (unsigned) svc->cur_budget; >> + d.cur_budget_hi = (unsigned) (svc->cur_budget >> 32); >> + d.cur_deadline_lo = (unsigned) svc->cur_deadline; >> + d.cur_deadline_hi = (unsigned) (svc->cur_deadline >> 32); >> + d.is_vcpu_on_runq = __vcpu_on_runq(svc); >> + d.is_vcpu_runnable = vcpu_runnable(svc->vcpu); >> + trace_var(TRC_RT_VCPU_DUMP, 1, >> + sizeof(d), >> + (unsigned char *)&d); >> + } > Not a too big deal, but is it really useful to trace vcpu dumps? How? Actually, this function is called in other functions in this file. Whenever we want to look into the details of a vcpu (for example, when we insert a vcpu to the RunQ), we call this function and it will trace the vcpu information. I extract this as a function to avoid writing this trace again and again in all of those functions, such as rt_alloc_vdata, rt_free_vdata, burn_budgets. In addition, we use xl sched-rt to dump the RunQ information to check the RunQ is properly sorted for debug reason. When it is upstreamed, I will change this function to static inline, remove the printf in this function and leave the trace there. >> + list_add(&svc->runq_elem, &prv->flag_vcpu->runq_elem); >> > Mmm... flag_vcpu, eh? I missed it above where it's declared, but I > really don't like the name. 'depleted_vcpus'? Or something else (sorry, > not very creative in this very moment :-))... but flag_vcpu, I found it > confusing. It's declared in struct rt_private. It's inited in rt_init function. The reason why I need this vcpu is: The RunQ has two parts, the first part is the vcpus with budget and the first part is sorted based on priority of vcpus. The second part is the vcpus without budget and not sorted (because we won't schedule a depleted vcpu, so we don't need to sort them.) The flag_vcpu is the dummy vcpu that separate the two parts so that we don't have to scan the whole RunQ when we insert a vcpu without budget into the RunQ. (If you remember that your suggested to split the RunQ to two queues: one keeps vcpus with budget and one keeps vcpus without budget, this is my implementation to your suggestion. It's virtually split the RunQ. If I have another depleted queue, I need to add some more functions, like depleted_queue_insert, etc, which adds much more code than this current implementation. :-P) > >> +/* >> + * point per_cpu spinlock to the global system lock; all cpu have same >> global system lock > long line. > >> + */ >> +static void * >> +rt_alloc_pdata(const struct scheduler *ops, int cpu) >> +{ >> + struct rt_private *prv = RT_PRIV(ops); >> + >> + cpumask_set_cpu(cpu, &prv->cpus); >> + >> + per_cpu(schedule_data, cpu).schedule_lock = &prv->lock; >> + >> + printtime(); >> + printk("%s total cpus: %d", __func__, cpumask_weight(&prv->cpus)); >> + /* same as credit2, not a bogus pointer */ >> + return (void *)1; >> > Can you put in the comment the reason why it is ok to do this, without > referencing credit2? Explained in the comment in next patch. In schedule.c, they use the return value to check if this function correctly allocate pdata by checking if it returns 1. (Well, this is inconsistent with other code which uses 0 to indicate success.) The function definition needs return a void *, so we have to cast the 1 to void *. That's the reason. :-P >> +static void * >> +rt_alloc_vdata(const struct scheduler *ops, struct vcpu *vc, void *dd) >> +{ >> + struct rt_vcpu *svc; >> + s_time_t now = NOW(); >> + long count; >> + >> + /* Allocate per-VCPU info */ >> + svc = xzalloc(struct rt_vcpu); >> + if ( svc == NULL ) >> + { >> + printk("%s, xzalloc failed\n", __func__); >> + return NULL; >> + } >> + >> + INIT_LIST_HEAD(&svc->runq_elem); >> + INIT_LIST_HEAD(&svc->sdom_elem); >> + svc->flags = 0U; >> + svc->sdom = dd; >> + svc->vcpu = vc; >> + svc->last_start = 0; /* init last_start is 0 */ >> + > As it is, the comment is not very useful... we can see you're > initializing it to zero. If there's something about the why you do it > that you think it's worth mentioning, go ahead, otherwise, kill it. > >> + svc->period = RT_DEFAULT_PERIOD; >> + if ( !is_idle_vcpu(vc) ) >> + svc->budget = RT_DEFAULT_BUDGET; >> + >> + count = (now/MICROSECS(svc->period)) + 1; >> + /* sync all VCPU's start time to 0 */ >> + svc->cur_deadline += count * MICROSECS(svc->period); >> + > why "+="? What value do you expect cur_deadline to hold at this time? It should be =. This does not cause a bug because it's in the rt_alloc_vdata() and the svc->cur_deadline is 0. But "+=" is not correct in logic. I modified it. Thank you! > > Also, unless I'm missing something, or doing the math wrong, what you > want to do here is place the deadline one period ahead of NOW(). Ah, no. I think you are thinking about the CBS server mechanisms. (Now I start to use some notations in real-time academic field) For the deferrable server, if the release offset of a implicit-deadline deferrable vcpu is O_i, its deadline will be O_i + p_i * k, where k is a natural number. So we want to set deadline to the end of the period during which NOW() is fall in. > > In my head, this is just: > > svc->cur_deadline = NOW() + svc->period; > > What is it that fiddling with count is buying you that I don't see? So this calculation is incorrect. :-P > >> + /* Debug only: dump new vcpu's info */ >> + printtime(); >> + rt_dump_vcpu(svc); >> + > As said before, this has to go, quite possibly by means of replacing it > by some tracing. When no more other comments to this patch set, I will make rt_dump_vcpu as a static inline function and it only has the tracing inside. Is that OK? Now I just want to use some dump to help me with some quick debug. :-) >> +/* >> + * TODO: same as rt_vcpu_insert() >> + */ >> +static void >> +rt_vcpu_remove(const struct scheduler *ops, struct vcpu *vc) >> +{ >> + struct rt_vcpu * const svc = RT_VCPU(vc); >> + struct rt_dom * const sdom = svc->sdom; >> + >> + printtime(); >> + rt_dump_vcpu(svc); >> + >> + BUG_ON( sdom == NULL ); >> + BUG_ON( __vcpu_on_runq(svc) ); >> + >> + if ( !is_idle_vcpu(vc) ) >> + list_del_init(&svc->sdom_elem); >> +} >> + > I remember commenting about these two functions already, during RFC > phase. It does not look like they've changed much, in response to that > commenting. > > Can you please re-look at > http://lists.xen.org/archives/html/xen-devel/2014-07/msg01624.html , and > either reply/explain, or cope with that? In summary, what I was asking > was whether adding/removing the vcpu to the list of vcpus of a domain > could be done somewhere or somehow else, as doing it like this means, > for some code-path, removing and then re-adding it right away. > > Looking more carefully, I do see that credit2 does the same thing but, > for one, that doesn't make it perfect (I still don't like it :-P), and > for two, credit2 does a bunch of other stuff there, that you don't. > > Looking even more carefully, this seems to be related to the TODO you > put (which may be a new addition wrt RFCv2, I can't remember). So, yes, > I seriously think you should take care of the fact that, when this > function is called for moving a domain from a cpupool to another, you > need to move the vcpus from the old cpupool's runqueue to the new one's > runqueue. > > Have you tested doing such operation (moving domains between cpupools)? > Because, with the code looking like this, it seems a bit unlikely... Hmm, I actually did test this operations. I migrate a dom from Pool-o to the newly created cpupool test with credit scheduler. That's also the scenario I showed in the cover page of this patch set. :-) But I didn't try to migrate the domU back to the Pool-o and check if its vcpus are inserted back to the RunQ. Now I add the code of removing the vcpu from the RunQ, and tested it was removed from the RunQ when I move the domU to the new cpupool. As to remove the vcpu from the domain's vcpu list. I'm not operating the vcpu list of the structure domain{}, which are used by other components in Xen. Actually, we have a scheduler-specific private structure rt_vcpu which uses its field runq_elem to link the vcpu to the RunQ of the scheduler; We also have a scheduler-specific private structure rt_dom to record the scheduler-specific data for the domain (it's defined in sched_rt.c). Inside the rt_dom, it has a vcpu list to record the scheduler-specific vcpus of this domain. When we move a domU from a cpupool to another cpupool (say, with rt scheduler), the domU's scheduler-specific vcpu should be moved to the destination cpupool's RunQ. So we need to remove the scheduler-specific vcpu from the source cpupool's RunQ by using this rt_vcpu_remove() and then insert it to the dest. cpupool's RunQ by using the rt_vcpu_insert(). Does this make sense to you? > > Finally, the insert_vcpu hook is called from sched_init_vcpu() too, > still in schedule.c. And in fact, all the other scheduler use this > function to put a vcpu on its runqueue for the first time. Once you do > this, you a fine behaviour, cpupool-wise, almost for free. You > apparently don't. Why is that? If you'd go for a similar approach, you > will get the same benefit. Where is it that you insert a vcpu in the > RunQ for the first time? Again, why there and not here? Added the runq_insert() statement in this function to add the vcpu to the RunQ now. In the old patch, the vcpu is inserted in the wake-up function. :-( >> + * update deadline info: When deadline is in the past, >> + * it need to be updated to the deadline of the current period, >> + * and replenish the budget >> + */ >> + delta = now - svc->cur_deadline; >> + if ( delta >= 0 ) >> + { >> + count = ( delta/MICROSECS(svc->period) ) + 1; >> + svc->cur_deadline += count * MICROSECS(svc->period); >> + svc->cur_budget = svc->budget * 1000; >> + > Ditto, about time units and conversions. > > I also am sure I said before that I prefer an approach like: > > while ( svc->cur_deadline < now ) > svc->cur_deadline += svc->period;+/* I explained in the previous comment. :-) Now could be several periods late after the current deadline. And this is deferrable server. :-) >> + delta = now - svc->last_start; >> + /* >> + * delta < 0 only happens in nested virtualization; >> + * TODO: how should we handle delta < 0 in a better way? */ >> > Ah, nice to hear you tested this in nested virt? Whay config is that, > Xen on top of VMWare (I'm assuming this from what I've seen on the > rt-xen mailing list). I run Xen in VirtualBox on MacBookAir. The VirtualBox needs to enable the Intel VT-x in the VM's configuration. Actually, I developed the code in VirtualBox which can reboot much faster. After it works well in virtualBox, I tested it on the bare machine. > >> + if ( delta < 0 ) >> + { >> + printk("%s, ATTENTION: now is behind last_start! delta = %ld for ", >> + __func__, delta); >> + rt_dump_vcpu(svc); >> + svc->last_start = now; /* update last_start */ >> + svc->cur_budget = 0; /* FIXME: should we recover like this? */ >> + return; >> + } >> + > Bha, this just should not happen, and if it does, it's either a bug or > an hardware problem, isn't it? I don't think you should do much more > than this. I'd remove the rt_dump_vcpu() but, in this case, I'm fine > with the printk(), as it's something that really should not happen, and > we want to inform sysadmin that something has gone very bad. :-) When I run Xen in virtualBox and configure 4 (virtual) cores to the VM of the virtualBox, these 4 cores are four threads for my host machine (i.e., Macbookair). I'm guessing if the four virtual cores do not have a good (or timely) time synchronization, when a vcpu migrates to another virtual core, the time might be late after its last_start time. This never happens when I run Xen on the bare machine, but happens when I run Xen inside virtualBox on Macbookair. >> + if ( svc->cur_budget == 0 ) >> + return; >> + >> + svc->cur_budget -= delta; >> + if ( svc->cur_budget < 0 ) >> + svc->cur_budget = 0; >> + > I see you're not dealing with overruns (where with dealing I mean try to > compensate). That's fine for now, let's leave this as a future > development. > Sure. >> +/* >> + * RunQ is sorted. Pick first one within cpumask. If no one, return NULL >> + * lock is grabbed before calling this function >> + */ >> +static struct rt_vcpu * >> +__runq_pick(const struct scheduler *ops, cpumask_t mask) >> +{ >> + struct list_head *runq = RUNQ(ops); >> + struct list_head *iter; >> + struct rt_vcpu *svc = NULL; >> + struct rt_vcpu *iter_svc = NULL; >> + cpumask_t cpu_common; >> + cpumask_t *online; >> + struct rt_private * prv = RT_PRIV(ops); >> + >> + list_for_each(iter, runq) >> + { >> + iter_svc = __runq_elem(iter); >> + >> + /* flag vcpu */ >> + if(iter_svc->sdom == NULL) >> + break; >> + >> + /* mask is intersection of cpu_hard_affinity and cpupool and >> priv->cpus */ >> + online = cpupool_scheduler_cpumask(iter_svc->vcpu->domain->cpupool); >> + cpumask_and(&cpu_common, online, &prv->cpus); >> + cpumask_and(&cpu_common, &cpu_common, >> iter_svc->vcpu->cpu_hard_affinity); >> + cpumask_and(&cpu_common, &mask, &cpu_common); >> + if ( cpumask_empty(&cpu_common) ) >> + continue; >> + > I remember asking this in > http://lists.xen.org/archives/html/xen-devel/2014-07/msg01624.html: > > "What's in priv->cpus, BTW? How is it different form the cpupool online > mask (returned in 'online' by cpupool_scheduler_cpumask() )?" > > while I don't remember having received an answer, and I see it's still. > here in the code. Am I missing something? If not, can you explain? prv->cpus is the online cpus for the scheduler. It should have the same value with the cpupool_scheduler_cpumask(). Should I remove this? (I plan to release the next version this weekend, I will remove this after receiving the command. :-)) Thank you very much for your time and review! Hope I have solved all of them, except the ones I asked in the email. (I think I solved all of them now. :-P ) Thanks, Meng ----------- Meng Xu PhD Student in Computer and Information Science University of Pennsylvania _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.