[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Design and Question: Eliminate Xen (RTDS) scheduler overhead on dedicated CPU
On Tue, Mar 24, 2015 at 3:27 PM, Meng Xu <xumengpanda@xxxxxxxxx> wrote: >> The simplest way to get your prototype working, in that case, would be >> to return the idle vcpu for that pcpu if the guest is blocked. > > > Exactly! Thank you so much for pointing this out! I did hardwired it always > to return the vcpu that is supposed to be blocked. Now I totally understand > what happened. :-) > > But this lead to another issue to my design: > If I return the idle vcpu when the dedicated VCPU is blocked, it will do the > context_switch(prev, next); when the dedicated VCPU is unblocked, another > context_switch() is triggered. > It means that we can not eliminate the context_switch overhead for the > dedicated CPU. > The ideal performance for the dedicated VCPU on the dedicated CPU should be > super-close to the bare-metal CPU. Here we still have the context_switch > overhead, which is about 1500-2000 cycles. > > Can we avoid the context switch overhead? If you look at xen/arch/x86/domain.c:context_switch(), you'll see that it's already got clever algorithms for avoiding as much context switch work as possible. In particular, __context_switch() (which on x86 does the actual work of context switching) won't be called when switching *into* the idle vcpu; nor will it be called if you're switching from the idle vcpu back to the vcpu it switched away from (curr_vcpu == next). Not familiar with the arm path, but hopefully they do something similar. IOW, a context switch to the idle domain isn't really a context switch. :-) > However, because credit2 scheduler counts the credit in domain level, the > function of counting the credit burned should not be avoided. Actually, that's not true. In credit2, the weight is set at a domain level, but that only changes the "burn rate". Individual vcpus are assigned and charged their own credits; and credit of a vcpu in one runqueue has no comparison to or direct effect on the credit of a vcpu in another runqueue. It wouldn't be at all inconsistent to simply not do the credit calculation for a "dedicated" vcpu. The effect on other vcpus would be exactly the same as having that vcpu on a runqueue by itself. >> But it's not really accurate to say >> that you're avoiding the scheduler entirely. At the moment, as far as >> I can tell, you're still going through all the normal schedule.c >> machinery between wake-up and actually running the vm; and the normal >> machinery for interrupt delivery. > > > Yes. :-( > Ideally, I want to isolate all such interference from the dedicated CPU so > that the dedicated VCPU on it will have the high-performance that is close > to the bare-metal cpu. However, I'm concerning about how complex it will be > and how it will affect the existing functions that relies on interrupts. Right; so there are several bits of overhead you might address: 1. The overhead of scheduling calculations -- credit, load balancing, sorting lists, &c; and regular scheduling interrupts. 2. The overhead in the generic code of having the flexibility to run more than one vcpu. This would (probably) be measured in the number of instructions from a waking interrupt to actually running the guest OS handler. 3. The maintenance things that happen in softirq context, like periodic clock synchronization, &c. Addressing #1 is fairly easy. The most simple thing to do would be to make a new scheduler and use cpupools; but it shouldn't be terribly difficult to build the functionality within existing schedulers. My guess is that #2 would involve basically rewriting a parallel set of entry / exit routines which were pared down to an absolute minimum, and then having machinery in place to switch a CPU to use those routines (with a specific vcpu) rather than the current, more fully-functional ones. It might also require cutting back on the functionality given to the guest as well in terms of hypecalls -- making this "minimalist" Xen environment work with all the existing hypercalls might be a lot of work. That sounds like a lot of very complicated work, and before you tried it I think you'd want to be very much convinced that it would pay off in terms of reduced wake-up latency. Getting from 5000 cycles down to 1000 cycles might be worth it; getting from 1400 cycles down to 1000, or 5000 cycles down to 4600, maybe not so much. :-) I'm not sure exactly what #3 would entail; it might involve basically taking the cpu offline from Xen's perspective. (Again, not sure if it's possible or worth it.) You might take a look at this presentation from FOSDEM last year, to see if you can get any interesting ideas: https://archive.fosdem.org/2014/schedule/event/virtiaas13/ Any opinions, Dario / Jan / Tim? -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |