[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] vmx: VT-d posted-interrupt core logic handling
On 10/03/16 10:46, George Dunlap wrote: > On 10/03/16 10:35, David Vrabel wrote: >> On 10/03/16 10:18, Jan Beulich wrote: >>>>>> On 10.03.16 at 11:05, <kevin.tian@xxxxxxxxx> wrote: >>>>> From: Tian, Kevin >>>>> Sent: Thursday, March 10, 2016 5:20 PM >>>>> >>>>>> From: Jan Beulich [mailto:JBeulich@xxxxxxxx] >>>>>> Sent: Thursday, March 10, 2016 5:06 PM >>>>>> >>>>>> >>>>>>> There are many linked list usages today in Xen hypervisor, which >>>>>>> have different theoretical maximum possible number. The closest >>>>>>> one to PI might be the usage in tmem (pool->share_list) which is >>>>>>> page based so could grow 'overly large'. Other examples are >>>>>>> magnitude lower, e.g. s->ioreq_vcpu_list in ioreq server (which >>>>>>> could be 8K in above example), and d->arch.hvm_domain.msixtbl_list >>>>>>> in MSI-x virtualization (which could be 2^11 per spec). Do we >>>>>>> also want to create some artificial scenarios to examine them >>>>>>> since based on actual operation K-level entries may also become >>>>>>> a problem? >>>>>>> >>>>>>> Just want to figure out how best we can solve all related linked-list >>>>>>> usages in current hypervisor. >>>>>> >>>>>> As you say, those are (perhaps with the exception of tmem, which >>>>>> isn't supported anyway due to XSA-15, and which therefore also >>>>>> isn't on by default) in the order of a few thousand list elements. >>>>>> And as mentioned above, different bounds apply for lists traversed >>>>>> in interrupt context vs such traversed only in "normal" context. >>>>>> >>>>> >>>>> That's a good point. Interrupt context should have more restrictions. >>>> >>>> Hi, Jan, >>>> >>>> I'm thinking your earlier idea about evenly distributed list: >>>> >>>> -- >>>> Ah, right, I think that limitation was named before, yet I've >>>> forgotten about it again. But that only slightly alters the >>>> suggestion: To distribute vCPU-s evenly would then require to >>>> change their placement on the pCPU in the course of entering >>>> blocked state. >>>> -- >>>> >>>> Actually after more thinking, there is no hard requirement that >>>> the vcpu must block on the pcpu which is configured in 'NDST' >>>> of that vcpu's PI descriptor. What really matters, is that the >>>> vcpu is added to the linked list of the very pcpu, then when PI >>>> notification comes we can always find out the vcpu struct from >>>> that pcpu's linked list. Of course one drawback of such placement >>>> is additional IPI incurred in wake up path. >>>> >>>> Then one possible optimized policy within vmx_vcpu_block could >>>> be: >>>> >>>> (Say PCPU1 which VCPU1 is currently blocked on) >>>> - As long as the #vcpus in the linked list on PCPU1 is below a >>>> threshold (say 16), add VCPU1 to the list. NDST set to PCPU1; >>>> Upon PI notification on PCPU1, local linked list is searched to >>>> find VCPU1 and then VCPU1 will be unblocked on PCPU1; >>>> >>>> - Otherwise, add VCPU1 to PCPU2 based on a simple distribution >>>> algorithm (based on vcpu_id/vm_id). VCPU1 still blocks on PCPU1 >>>> but NDST set to PCPU2. Upon notification on PCPU2, local linked >>>> list is searched to find VCPU1 and then an IPI is sent to PCPU1 to >>>> unblock VCPU1; >>> >>> Sounds possible, if the lock handling can be got right. But of >>> course there can't be any hard limit like 16, at least not alone >>> (on a systems with extremely many mostly idle vCPU-s we'd >>> need to allow larger counts - see my earlier explanations in this >>> regard). >> >> You could also consider only waking the first N VCPUs and just making >> the rest runnable. If you wake more VCPUs than PCPUs at the same time >> most of them won't actually be scheduled. > > "Waking" a vcpu means "changing from blocked to runnable", so those two > things are the same. And I can't figure out what you mean instead -- > can you elaborate? > > Waking up 1000 vcpus is going to take strictly more time than checking > whether there's a PI interrupt pending on 1000 vcpus to see if they need > to be woken up. Waking means making it runnable /and/ attempt to make it running. So I mean, for the > N'th VCPU don't call __runq_tickle(), only call __runq_insert(). David _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |