[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen/arm: Virtual ITS command queue handling

On Fri, May 15, 2015 at 4:29 PM, Ian Campbell <ian.campbell@xxxxxxxxxx> wrote:
> On Wed, 2015-05-13 at 15:26 +0100, Julien Grall wrote:
>> >>>   on that vits;
>> >>> * On receipt of an interrupt notification arising from Xen's own use
>> >>>   of `INT`; (see discussion under Completion)
>> >>> * On any interrupt injection arising from a guests use of the `INT`
>> >>>   command; (XXX perhaps, see discussion under Completion)
>> >>
>> >> With all the solution suggested, it will be very likely that we will try
>> >> to execute multiple the scheduling pass at the same time.
>> >>
>> >> One way is to wait, until the previous pass as finished. But that would
>> >> mean that the scheduler would be executed very often.
>> >>
>> >> Or maybe you plan to offload the scheduler in a softirq?
>> >
>> > Good point.
>> >
>> > A soft irq might be one solution, but it is problematic during emulation
>> > of `CREADR`, when we would like to do a pass immediately to complete any
>> > operations outstanding for the domain doing the read.
>> >
>> > Or just using spin_try_lock and not bothering if one is already in
>> > progress might be another. But has similar problems.
>> >
>> > Or we could defer only scheduling from `INT` (either guest or Xen's own)
>> > to a softirq but do ones from `CREADR` emulation synchronously? The
>> > softirq would be run on return from the interrupt handler but multiple
>> > such would be coalesced I think?
>> I think we could defer the scheduling to a softirq for CREADR too, if
>> the guest is using:
>>       - INT completion: vits.creadr would have been correctly update when
>> receiving the INT in xen.
>>       - polling completion: the guest will loop on CREADR. It will likely get
>> the info on the next read. The drawback is the guest may loose few
>> instructions cycle.
>> Overall, I don't think it's necessary to have an accurate CREADR.
> Yes, deferring the update by one exit+enter might be tolerable. I added
> after this list:
>         This may result in lots of contention on the scheduler
>         locking. Therefore we consider that in each case all which happens is
>         triggering of a softirq which will be processed on return to guest,
>         and just once even for multiple events. The is considered OK for the
>         `CREADR` case because at worst the value read will be one cycle out of
>         date.
>> [..]
>> >> AFAIU the process suggested, Xen will inject small batch as long as the
>> >> physical command queue is not full.
>> >
>> >> Let's take a simple case, only a single domain is using vITS on the
>> >> platform. If it injects a huge number of commands, Xen will split it
>> >> with lots of small batch. All batch will be injected in the same pass as
>> >> long as it fits in the physical command queue. Am I correct?
>> >
>> > That's how it is currently written, yes. With the "possible
>> > simplification" above the answer is no, only a batch at a time would be
>> > written for each guest.
>> >
>> > BTW, it doesn't have to be a single guest, the sum total of the
>> > injections across all guests could also take a similar amount of time.
>> > Is that a concern?
>> Yes, the example with only a guest was easier to explain.
> So as well as limiting the number of commands in each domains batch we
> also want to limit the total number of batches?
>> >> I think we have to restrict total number of batch (i.e for all the
>> >> domain) injected in a same scheduling pass.
>> >>
>> >> I would even tend to allow only one in flight batch per domain. That
>> >> would limit the possible problem I pointed out.
>> >
>> > This is the "possible simplification" I think. Since it simplifies other
>> > things (I think) as well as addressing this issue I think it might be a
>> > good idea.
>> With the limitation of command send per batch, would the fairness you
>> were talking on the design doc still required?
> I think we still want to schedule the guest's in a strict round robin
> manner, to avoid one guest monopolising things.
>> >>> Therefore it is proposed that the restriction that a single vITS maps
>> >>> to one pITS be retained. If a guest requires access to devices
>> >>> associated with multiple pITSs then multiple vITS should be
>> >>> configured.
>> >>
>> >> Having multiple vITS per domain brings other issues:
>> >>    - How do you know the number of ITS to describe in the device tree at 
>> >> boot?
>> >
>> > I'm not sure. I don't think 1 vs N is very different from the question
>> > of 0 vs 1 though, somehow the tools need to know about the pITS setup.
>> I don't see why the tools would require to know the pITS setup.
> Even with only a single vits the tools need to know if the system has 0,
> 1, or more pits, to know whether to vreate a vits at all or not.
>> >>    - How do you tell to the guest that the PCI device is mapped to a
>> >> specific vITS?
>> >
>> > Device Tree or IORT, just like on native and just like we'd have to tell
>> > the guest about that mapping even if there was a single vITS.
>> Right, although the root controller can only be attached to one ITS.
>> It will be necessary to have multiple root controller in the guest in
>> the case of we passthrough devices using different ITS.
>> Is pci-back able to expose multiple root controller?
> In principal the xenstore protocol supports it, but AFAIK all toolstacks
> have only every used "bus" 0, so I wouldn't be surprised if there were
> bugs lurking.
> But we could fix those, I don't think it is a requirement that this
> stuff suddenly springs into life on ARM even with existing kernels.
>> > I think the complexity of having one vITS target multiple pITSs is going
>> > to be quite high in terms of data structures and the amount of
>> > thinking/tracking scheduler code will have to do, mostly down to out of
>> > order completion of things put in the pITS queue.
>> I understand the complexity, but exposing on vITS per pITS means that we
>> are exposing the underlying hardware to the guest.
> Some aspect of it, yes, but it is still a virtual ITs.
>> That bring a lot of complexity in the guest layout, which is right now
>> static. How do you decide the number of vITS/root controller exposed
>> (think about PCI hotplug)?
>> Given that PCI passthrough doesn't allow migration, maybe we could use
>> the layout of the hardware.
> That's an option.
>> If we are going to expose multiple vITS to the guest, we should only use
>> vITS for guest using PCI passthrough. This is because migration won't be
>> compatible with it.
> It would be possible to support one s/w only vits for migration, i.e the
> evtchn thing at the end, but for the general case that is correct. On
> x86 I believe that if you hot unplug all passthrough devices you can
> migrate and then plug in other devices at the other end.
> Anyway, more generally there are certainly problems with multiple vITS.
> However there are also problems with a single vITS feeding multiple
> pITSs:
>       * What to do with global commands? Inject to all pITS and then
>         synchronise on them all finishing.
>       * Handling of out of order completion of commands queued with
>         different pITS, since the vITS must appear to complete in order.
>         Apart from the book keeping question it makes scheduling more
>         interesting:
>               * What if you have a pITS with slots available, and the
>                 guest command queue contains commands which could go to
>                 the pITS, but behind ones which are targetting another
>                 pITS which has no slots
>               * What if one pITS is very busy and another is mostly idle
>                 and a guest submits one command to the busy one
>                 (contending with other guest) followed by a load of
>                 commands targeting the idle one. Those commands would be
>                 held up in this situation.
>               * Reasoning about fairness may be harder.
> I've but both your list and mine into the next revision of the document.
> I think this remains an important open question.

Handling of Single vITS and multipl pITS can be made simple.

All ITS commands except SYNC & INVALL has device id which will
help us to know to which pITS it should be sent.

SYNC & INVALL can be dropped by Xen on Guest request
 and let Xen append where ever SYNC & INVALL is required.
(Ex; Linux driver adds SYNC for required commands).
With this assumption, all ITS commands are mapped to pITS
and no need of synchronization across pITS

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.