Xen project Mailing List

Re: [Xen-devel] Xen/arm: Virtual ITS command queue handling

On Fri, 2015-05-15 at 16:56 +0530, Vijay Kilari wrote: > On Fri, May 15, 2015 at 4:29 PM, Ian Campbell <ian.campbell@xxxxxxxxxx> wrote: > > On Wed, 2015-05-13 at 15:26 +0100, Julien Grall wrote: > >> >>> on that vits; > >> >>> * On receipt of an interrupt notification arising from Xen's own use > >> >>> of `INT`; (see discussion under Completion) > >> >>> * On any interrupt injection arising from a guests use of the `INT` > >> >>> command; (XXX perhaps, see discussion under Completion) > >> >> > >> >> With all the solution suggested, it will be very likely that we will try > >> >> to execute multiple the scheduling pass at the same time. > >> >> > >> >> One way is to wait, until the previous pass as finished. But that would > >> >> mean that the scheduler would be executed very often. > >> >> > >> >> Or maybe you plan to offload the scheduler in a softirq? > >> > > >> > Good point. > >> > > >> > A soft irq might be one solution, but it is problematic during emulation > >> > of `CREADR`, when we would like to do a pass immediately to complete any > >> > operations outstanding for the domain doing the read. > >> > > >> > Or just using spin_try_lock and not bothering if one is already in > >> > progress might be another. But has similar problems. > >> > > >> > Or we could defer only scheduling from `INT` (either guest or Xen's own) > >> > to a softirq but do ones from `CREADR` emulation synchronously? The > >> > softirq would be run on return from the interrupt handler but multiple > >> > such would be coalesced I think? > >> > >> I think we could defer the scheduling to a softirq for CREADR too, if > >> the guest is using: > >> - INT completion: vits.creadr would have been correctly update when > >> receiving the INT in xen. > >> - polling completion: the guest will loop on CREADR. It will likely > >> get > >> the info on the next read. The drawback is the guest may loose few > >> instructions cycle. > >> > >> Overall, I don't think it's necessary to have an accurate CREADR. > > > > Yes, deferring the update by one exit+enter might be tolerable. I added > > after this list: > > This may result in lots of contention on the scheduler > > locking. Therefore we consider that in each case all which happens > > is > > triggering of a softirq which will be processed on return to guest, > > and just once even for multiple events. The is considered OK for the > > `CREADR` case because at worst the value read will be one cycle out > > of > > date. > > > > > > > >> > >> [..] > >> > >> >> AFAIU the process suggested, Xen will inject small batch as long as the > >> >> physical command queue is not full. > >> > > >> >> Let's take a simple case, only a single domain is using vITS on the > >> >> platform. If it injects a huge number of commands, Xen will split it > >> >> with lots of small batch. All batch will be injected in the same pass as > >> >> long as it fits in the physical command queue. Am I correct? > >> > > >> > That's how it is currently written, yes. With the "possible > >> > simplification" above the answer is no, only a batch at a time would be > >> > written for each guest. > >> > > >> > BTW, it doesn't have to be a single guest, the sum total of the > >> > injections across all guests could also take a similar amount of time. > >> > Is that a concern? > >> > >> Yes, the example with only a guest was easier to explain. > > > > So as well as limiting the number of commands in each domains batch we > > also want to limit the total number of batches? > > > >> >> I think we have to restrict total number of batch (i.e for all the > >> >> domain) injected in a same scheduling pass. > >> >> > >> >> I would even tend to allow only one in flight batch per domain. That > >> >> would limit the possible problem I pointed out. > >> > > >> > This is the "possible simplification" I think. Since it simplifies other > >> > things (I think) as well as addressing this issue I think it might be a > >> > good idea. > >> > >> With the limitation of command send per batch, would the fairness you > >> were talking on the design doc still required? > > > > I think we still want to schedule the guest's in a strict round robin > > manner, to avoid one guest monopolising things. > > > >> >>> Therefore it is proposed that the restriction that a single vITS maps > >> >>> to one pITS be retained. If a guest requires access to devices > >> >>> associated with multiple pITSs then multiple vITS should be > >> >>> configured. > >> >> > >> >> Having multiple vITS per domain brings other issues: > >> >> - How do you know the number of ITS to describe in the device tree > >> >> at boot? > >> > > >> > I'm not sure. I don't think 1 vs N is very different from the question > >> > of 0 vs 1 though, somehow the tools need to know about the pITS setup. > >> > >> I don't see why the tools would require to know the pITS setup. > > > > Even with only a single vits the tools need to know if the system has 0, > > 1, or more pits, to know whether to vreate a vits at all or not. > > > >> >> - How do you tell to the guest that the PCI device is mapped to a > >> >> specific vITS? > >> > > >> > Device Tree or IORT, just like on native and just like we'd have to tell > >> > the guest about that mapping even if there was a single vITS. > >> > >> Right, although the root controller can only be attached to one ITS. > >> > >> It will be necessary to have multiple root controller in the guest in > >> the case of we passthrough devices using different ITS. > >> > >> Is pci-back able to expose multiple root controller? > > > > In principal the xenstore protocol supports it, but AFAIK all toolstacks > > have only every used "bus" 0, so I wouldn't be surprised if there were > > bugs lurking. > > > > But we could fix those, I don't think it is a requirement that this > > stuff suddenly springs into life on ARM even with existing kernels. > > > >> > I think the complexity of having one vITS target multiple pITSs is going > >> > to be quite high in terms of data structures and the amount of > >> > thinking/tracking scheduler code will have to do, mostly down to out of > >> > order completion of things put in the pITS queue. > >> > >> I understand the complexity, but exposing on vITS per pITS means that we > >> are exposing the underlying hardware to the guest. > > > > Some aspect of it, yes, but it is still a virtual ITs. > > > >> That bring a lot of complexity in the guest layout, which is right now > >> static. How do you decide the number of vITS/root controller exposed > >> (think about PCI hotplug)? > >> > >> Given that PCI passthrough doesn't allow migration, maybe we could use > >> the layout of the hardware. > > > > That's an option. > > > >> If we are going to expose multiple vITS to the guest, we should only use > >> vITS for guest using PCI passthrough. This is because migration won't be > >> compatible with it. > > > > It would be possible to support one s/w only vits for migration, i.e the > > evtchn thing at the end, but for the general case that is correct. On > > x86 I believe that if you hot unplug all passthrough devices you can > > migrate and then plug in other devices at the other end. > > > > Anyway, more generally there are certainly problems with multiple vITS. > > However there are also problems with a single vITS feeding multiple > > pITSs: > > > > * What to do with global commands? Inject to all pITS and then > > synchronise on them all finishing. > > * Handling of out of order completion of commands queued with > > different pITS, since the vITS must appear to complete in order. > > Apart from the book keeping question it makes scheduling more > > interesting: > > * What if you have a pITS with slots available, and the > > guest command queue contains commands which could go to > > the pITS, but behind ones which are targetting another > > pITS which has no slots > > * What if one pITS is very busy and another is mostly idle > > and a guest submits one command to the busy one > > (contending with other guest) followed by a load of > > commands targeting the idle one. Those commands would be > > held up in this situation. > > * Reasoning about fairness may be harder. > > > > I've but both your list and mine into the next revision of the document. > > I think this remains an important open question. > > > > Handling of Single vITS and multipl pITS can be made simple. > > All ITS commands except SYNC & INVALL has device id which will > help us to know to which pITS it should be sent. > > SYNC & INVALL can be dropped by Xen on Guest request > and let Xen append where ever SYNC & INVALL is required. > (Ex; Linux driver adds SYNC for required commands). > With this assumption, all ITS commands are mapped to pITS > and no need of synchronization across pITS You've ignored the second bullet its three sub-bullets, I think. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.