[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen on ARM vITS Handling Draft B (Was Re: Xen/arm: Virtual ITS command queue handling)



Hi Ian,

On 15/05/15 12:45, Ian Campbell wrote:
> On Tue, 2015-05-12 at 16:02 +0100, Ian Campbell wrote:
>> I've written up my thinking as a design doc below (it's pandoc and the
>> pdf version is also at
>> http://xenbits.xen.org/people/ianc/vits/draftA.pdf FWIW).
> 
> Here is a second draft based on the feedback so far. Also at
> http://xenbits.xen.org/people/ianc/vits/draftB.{pdf,html}.
> 
> So far I think we are mostly at the stage of gather open questions and
> enumerate the issues rather than actually beginning reaching any
> conclusion. That's OK (and part of the purpose).
> 
> Ian.
> -----
> 
> % Xen on ARM vITS Handling
> % Ian Campbell <ian.campbell@xxxxxxxxxx>
> % Draft B
> 
> # Changelog
> 
> ## Since Draft A
> 
> * Added discussion of when/where command translation occurs.
> * Contention on scheduler lock, suggestion to use SOFTIRQ.
> * Handling of domain shutdown.
> * More detailed discussion of multiple vs single vits pros/cons.
> 
> # Introduction
> 
> ARM systems containing a GIC version 3 or later may contain one or
> more ITS logical blocks. An ITS is used to route Message Signalled
> interrupts from devices into an LPI injection on the processor.
> 
> The following summarises the ITS hardware design and serves as a set
> of assumptions for the vITS software design. (XXX it is entirely
> possible I've horribly misunderstood how this stuff fits
> together). For full details of the ITS see the "GIC Architecture
> Specification".

I read again the spec today and notice that I was wrong on the maximum
size of the command queue. The field GITS_CBASER.Size encode the number
of 4KB page minus 0. Its size is 8 bits which means the maximum size is
2^8 * 4KB = 1MB.

Given that each command is 32 bytes, we would have a maximum of 32768
commands in the queue.

Although I don't think that change the design as processing a such
number of command in one go can be very slow.

[..]

> ### Command translation
> 
> In order to virtualise the Command Queue each command must be
> translated (this is described in the GIC spec).
> 
> Translation of certain commands can be expensive (XXX citation
> needed).

The term "expensive" is subjective. I think we can end up to cheap
translation if we properly pre-allocate information (such as device,
LPIs...). We can have all the informations before the guest as boot or
during hotplug part. It wouldn't take more memory than it should use.

During command translation, we would just need to enable the device/LPIs.

The remaining expensive part would be the validation. I think we can
improve most of them of O(1) (such as collection checking) or O(log(n))
(such as device checking).

> Translation can be done in two places:
> 
> * During scheduling.
> * On write to `CWRITER`, into a per `vits_cq` queue which the
>   scheduler then propagates to the pits.
> 
> Doing the translate during scheduling means that potentially expensive
> operations may be accounted to `current`, who may have nothing to do
> with those operations (this is true whether it is IRQ context or
> SOFTIRQ context).
> 
> Doing the translate during `CWRITER` emulation accounts it to the
> right place, but introduces a potentially long synchronous operation
> which ties down a VCPU. Introducing batching here means we have
> essentially the same issue wrt when to replenish the translated queue
> as doing translate during scheduling.
> 
> Translate during `CWRITER` also has memory overheads. Unclear if they
> are at a problematic scale or not.
> 
> XXX need a solution for this.

Command translation can be improved. It may be good too add a section
explaining how translation of command foo can be done.

> ### pITS Scheduling
> 
> A pITS scheduling pass is attempted:
> 
> * On write to any virtual `CWRITER` iff that write results in there
>   being new outstanding requests for that vits;
> * On read from a virtual `CREADR` iff there are commands outstanding
>   on that vits;
> * On receipt of an interrupt notification arising from Xen's own use
>   of `INT`; (see discussion under Completion)
> * On any interrupt injection arising from a guests use of the `INT`
>   command; (XXX perhaps, see discussion under Completion)
> 
> This may result in lots of contention on the scheduler
> locking. Therefore we consider that in each case all which happens is
> triggering of a softirq which will be processed on return to guest,
> and just once even for multiple events.
> 
> Such deferal could be considered OK (XXX ???) for the `CREADR` case

deferral?

> because at worst the value read will be one cycle out of date. A guest
> which receives an `INT` notification might reasonably expect a
> subsequent read of `CREADR` to reflect that. However that should be
> covered by the softint processing which would occur on entry to the
> guest to inject the `INT`.
> 
> Each scheduling pass will:
> 
> * Read the physical `CREADR`;
> * For each command between `pits.last_creadr` and the new `CREADR`
>   value process completion of that command and update the
>   corresponding `vits_cq.creadr`.
> * Attempt to refill the pITS Command Queue (see below).

[..]

> ### Filling the pITS Command Queue.
> 
> Various algorithms could be used here. For now a simple proposal is
> to traverse the `pits.schedule_list` starting from where the last
> refill finished (i.e not from the top of the list each time).
> 
> If a `vits_cq` has no pending commands then it is removed from the
> list.
> 
> If a `vits_cq` has some pending commands then `min(pits-free-slots,
> vits-outstanding, VITS_BATCH_SIZE)` will be taken from the vITS
> command queue, translated and placed onto the pITS
> queue. `vits_cq.progress` will be updated to reflect this.
> 
> Each `vits_cq` is handled in turn in this way until the pITS Command
> Queue is full or there are no more outstanding commands.
> 
> There will likely need to be a data structure which shadows the pITS
> Command Queue slots with references to the `vits_cq` which has a
> command currently occupying that slot and corresponding the index into
> the virtual command queue, for use when completing a command.
> 
> `VITS_BATCH_SIZE` should be small, TBD say 4 or 8.
> 
> Possible simplification: If we arrange that no guest ever has multiple
> batches in flight (which can occur if we wrap around the list several
> times) then we may be able to simplify the book keeping
> required. However this may need some careful thought wrt fairness for
> guests submitting frequent small batches of commands vs those sending
> large batches.
> 
> XXX concern: Time spent filling the pITS queue could be significant if
> guests are allowed to fill the ring completely.

I guess you sent this design before the end of the discussion? I think
that limiting the number of batch/command sent per pass would allow a
small pass.

[..]

> ### Multiple vITS instances in a single guest
> 
> As described above each vITS maps to exactly one pITS (while each pITS
> serves multiple vITSs).
> 
> It could be possible to arrange that a vITS can enqueue commands to
> different pITSs depending on e.g. the device id.
> 
> However each approach has issues.
> 
> In 1 vITS per pITS:
> 
> * Exposing on vITS per pITS means that we are exposing something about

s/on/one/

>   the underlying hardware to the guest.
> * Adds complexity to the guest layout, which is right now static. How
>   do you decide the number of vITS/root controller exposed:
>     * Hotplug is tricky
> * Toolstack needs greater knowledge of the host layout
> * Given that PCI passthrough doesn't allow migration, maybe we could
>   use the layout of the hardware.
> 
> In 1 vITS for all pITS:
> 
> * What to do with global commands? Inject to all pITS and then
>   synchronise on them all finishing.
> * Handling of out of order completion of commands queued with
>   different pITS, since the vITS must appear to complete in
>   order. Apart from the book keeping question it makes scheduling more
>   interesting:
>     * What if you have a pITS with slots available, and the guest command
>       queue contains commands which could go to the pITS, but behind ones
>       which are targetting another pITS which has no slots
>     * What if one pITS is very busy and another is mostly idle and a
>       guest submits one command to the busy one (contending with other
>       guest) followed by a load of commands targeting the idle one. Those
>       commands would be held up in this situation.
>     * Reasoning about fairness may be harder.
> 
> XXX need a solution/decision here.

> In addition the introduction of direct interrupt injection in version
> 4 GICs may imply a vITS per pITS. (Update: it seems not)

Other items to add: NUMA and I/O NUMA. I don't know much about it but I
think the first solution would be more suitable.

Regards,
-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.