Xen project Mailing List

Re: [Xen-devel] Design and Question: Eliminate Xen (RTDS) scheduler overhead on dedicated CPU

From: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>

Date: Tue, 24 Mar 2015 11:54:37 +0000

Cc: Dario Faggioli <dario.faggioli@xxxxxxxxxx>, Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>

Delivery-date: Tue, 24 Mar 2015 11:54:42 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Tue, Mar 24, 2015 at 3:50 AM, Meng Xu <xumengpanda@xxxxxxxxx> wrote: > Hi Dario and George, > > I'm exploring the design choice of eliminating the Xen scheduler overhead on > the dedicated CPU. A dedicated CPU is a PCPU that has a full capacity VCPU > pinned onto it and no other VCPUs will run on that PCPU. Hey Meng! This sounds awesome, thanks for looking into it. > [Problems] > The issue I'm encountering is as follows: > After I implemented the dedicated cpu feature, I compared the latency of a > cpu-intensive task in domU on dedicated CPU (denoted as R_dedcpu) and the > latency on non-dedicated CPU (denoted as R_nodedcpu). The expected result > should be R_dedcpu < R_nodedcpu since we avoid the scheduler overhead. > However, the actual result is R_dedcpu > R_nodedcpu, and R_dedcpu - > R_nodedcpu ~= 1000 cycles. > > After adding some trace to every function that may raise the > SCHEDULE_SOFTIRQ, I found: > When a cpu is not marked as dedicated cpu and the scheduler on it is not > disabled, the vcpu_block() is triggered 2896 times during 58280322928ns > (i.e., triggered once every 20,124,421ns in average) on the dedicated cpu. > However, > When I disable the scheduler on a dedicated cpu, the function > vcpu_block(void) @schedule.c will be triggered very frequently; the > vcpu_block(void) is triggered 644824 times during 8,918,636,761ns (i.e., > once every 13831ns in average) on the dedicated cpu. > > To sum up the problem I'm facing, the vcpu_block(void) is trigger much > faster and more frequently when the scheduler is disabled on a cpu than when > the scheduled is enabled. > > [My question] > I'm very confused at the reason why vcpu_block(void) is triggered so > frequently when the scheduler is disabled. The vcpu_block(void) is called > by the SCHEDOP_block hypercall, but why this hypercall will be triggered so > frequently? > > It will be great if you know the answer directly. (This is just a pure hope > and I cannot really expect it. :-) ) > But I really appreciate it if you could give me some directions on how I > should figure it out. I grepped vcpu_block(void) and SCHEDOP_block in the > xen code base, but didn't found much call to them. > > What confused me most is that the dedicated VCPU should be blocked less > frequently instead of more frequently when the scheduler is disabled on the > dedicated CPU, because the dedicated VCPU is always running on the CPU now > without the hypervisor scheduler's interference. So if I had to guess, I would guess that you're not actually blocking when the guest tries to block. Normally if the guest blocks, it blocks in a loop like this: do { enable_irqs(); hlt; disable_irqs; } while (!interrup_pending); For a PV guest, the hlt() would be replaced with a PV block() hypercall. Normally, when a guest calls block(), then it's taken off the runqueue; and if there's nothing on the runqueue, then the scheduler will run the idle domain; it's the idle domain that actually does the blocking. If you've hardwired it always to return the vcpu in question rather than the idle domain, then it will never block -- it will busy-wait, calling block millions of times. The simplest way to get your prototype working, in that case, would be to return the idle vcpu for that pcpu if the guest is blocked. But a brief comment on your design: Looking at your design at the moment, you will get rid of the overhead of the scheduler-related interrupts, and any pluggable-cpu accounting that needs to happen (e.g., calculating credits burned, &c). And that's certainly not nothing. But it's not really accurate to say that you're avoiding the scheduler entirely. At the moment, as far as I can tell, you're still going through all the normal schedule.c machinery between wake-up and actually running the vm; and the normal machinery for interrupt delivery. I'm wondering -- are people really going to want to just pin a single vcpu from a domain like this? Or are they going to want to pin all vcpus from a given domain? For the first to be useful, the guest OS would need to understand somehow that this cpu has better properties than the other vcpus on its system. Which I suppose could be handled manually (e.g., by the guest admin pinning processes to that cpu or something). The reason I'm asking is because another option that would avoid the need for special per-cpu flags would to make a "sched_place" scheduler (sched_partition?), which would essentially do what you've done here -- when you add a vcpu to the scheduler, it simply chooses one of its free cpus and dedicates it to that vcpu. If no such cpus are available, it returns an error. In that case, you could use the normal cpupool machinery to assign cpus to that scheduler, without needing to introduce these extra flags, and to make each of the pluggable schedulers need to deal with the complexity of implementing the "dedicated" scheduling. The only downside is that at the moment you can't have a domain cross cpupools; so either all vcpus of a domain would have to be dedicated, or none. Thoughts? -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.