[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Design and Question: Eliminate Xen (RTDS) scheduler overhead on dedicated CPU
2015-03-24 7:54 GMT-04:00 George Dunlap <George.Dunlap@xxxxxxxxxxxxx>: On Tue, Mar 24, 2015 at 3:50 AM, Meng Xu <xumengpanda@xxxxxxxxx> wrote: :-) I think it is a useful feature for extreme low latency applications. â
âExactly! Thank you so much for pointing this out! I did hardwired it always to return the vcpu that is supposed to be blocked. Now I totally understand what happened. :-) â But this lead to another issue to my design: If I return the idle vcpu when the dedicated VCPU is blocked, it will do the context_switch(prev, next); when the dedicated VCPU is unblocked, another context_switch() is triggered. It means that we can not eliminate the context_switch overhead for the dedicated CPU. The ideal performance for the dedicated VCPU on the dedicated CPU should be super-close to the bare-metal CPU. Here we still have the context_switch overhead, which is about Â1500-2000 Âcycles. Can we avoid the context switch overhead?Â
âYes. The schedule() function is avoided. Right now, I only apply the dedicated cpu feature to the RTDS scheduler. So when a dedicated VCPU is pinned and running on the dedicated CPU, it should be a full-capacity vcpu and we don't need to count the budget burned. However, because credit2 scheduler counts the credit in domain level, the function of counting the credit burned should not be avoided. Actually, the trace code in the scheduler() will also be bypassed on the dedicated CPU. I'm not sure if we need the trace code working on the dedicated CPU or not. Since we are aiming to provide the dedicated VCPU that has close-to-bare-metal CPU performance, the tracing mechanism in the schedule() is unnecessary IMHO. But it's not really accurate to say âYes. :-( Ideally, I want to isolate all such interference from the dedicated CPU so that the dedicated VCPU on it will have the high-performance that is close to the bare-metal cpu. However, I'm concerning about how complex it will be and how it will affect the existing functions that relies on Âinterrupts. â Â
âRight. The guest OS will be running on heterogeneous cpus.â In my mind, not all processes in the guest ask for the extreme low latency. So guest OS can just pin those latency-critical processes onto the dedicated VCPU (which is mapped to dedicated CPU), and pin other processes to the non-dedicated VCPUs. This could be more flexible for the guest OS and accommodate more domains on the same number of cpus. But (of course), it introduce more complexity into the hypervisor and management in guest OS. Â
âThis is also a good idea, if we don't aim to avoid the context switch overhead and avoid calling the schedule() function. The biggest strength of this approach is that it has as little impact as possible on the existing functions.â Actually, I can extend the RTDS scheduler to include this feature. This is more like a fast path in the scheduler on the dedicated CPU: Instead of scanning the runq and deciding which vcpu should run next, we just always pick the dedicated VCPU if the vcpu is not blocked. (If the dedicated VCPU is blocked, we pick the idle VCPU.) âHowever, this just reduce (instead of remove) the scheduler() overheadâ and cannot avoid the context switch overhead either.
âYes. I think this is a secondary concern. I'm more concerned about how much overhead can we remove by using the dedicated CPU. Ideally, the more overhead we remove, the better performance we get. âDo you have any suggestion/insights on the performance goal of âthe dedicated CPU feature? I think it will affect how far we should go to remove the overheads. âThank you very much!â âBest, Mengâ ----------- Meng Xu PhD Student in Computer and Information Science University of Pennsylvania _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |