[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen/arm: Virtual ITS command queue handling

Hi Ian,

On 13/05/15 14:23, Ian Campbell wrote:
> On Tue, 2015-05-12 at 18:35 +0100, Julien Grall wrote:
>>> On read from the virtual `CREADR` iff the vits_cq is such that
>> s/iff/if/
> "iff" is a shorthand for "if and only if". Apparently not as common as I
> think it is though!

Oh ok. I wasn't aware about this shorthand.

>>> commands are outstanding then a scheduling pass is attempted (in order
>>> to update `vits_cq.creadr`). The current value of `vitq_cq.creadr` is
>>> then returned.
>>> ### pITS Scheduling
>> I'm not sure if the design document is the right place to talk about it.
>> If a domain die during the process , how would it affect the scheduler?
> So I think we have to wait for them to finish.
> Vague thoughts:
>         We can't free a `vits_cq` while has things on the physical
>         control
>         queue, and we cannot cancel things which are on the control
>         queue.
>         So we must wait.
>         Obviously don't enqueue anything new onto the pits if
>         `d->is_dying`.


>         `domain_relinquish_resources()` waits (somehow, with suitable
>         continuations etc) for anything which the `vits_cq` has
>         outstanding to be completed so that the datastructures can be
>         cleared.
> ?

I think that would work.

> I've added that to a new section "Domain Shutdown" right after
> scheduling.


>>>   on that vits;
>>> * On receipt of an interrupt notification arising from Xen's own use
>>>   of `INT`; (see discussion under Completion)
>>> * On any interrupt injection arising from a guests use of the `INT`
>>>   command; (XXX perhaps, see discussion under Completion)
>> With all the solution suggested, it will be very likely that we will try
>> to execute multiple the scheduling pass at the same time.
>> One way is to wait, until the previous pass as finished. But that would
>> mean that the scheduler would be executed very often.
>> Or maybe you plan to offload the scheduler in a softirq?
> Good point.
> A soft irq might be one solution, but it is problematic during emulation
> of `CREADR`, when we would like to do a pass immediately to complete any
> operations outstanding for the domain doing the read.
> Or just using spin_try_lock and not bothering if one is already in
> progress might be another. But has similar problems.
> Or we could defer only scheduling from `INT` (either guest or Xen's own)
> to a softirq but do ones from `CREADR` emulation synchronously? The
> softirq would be run on return from the interrupt handler but multiple
> such would be coalesced I think?

I think we could defer the scheduling to a softirq for CREADR too, if
the guest is using:
        - INT completion: vits.creadr would have been correctly update when
receiving the INT in xen.
        - polling completion: the guest will loop on CREADR. It will likely get
the info on the next read. The drawback is the guest may loose few
instructions cycle.

Overall, I don't think it's necessary to have an accurate CREADR.


>> AFAIU the process suggested, Xen will inject small batch as long as the
>> physical command queue is not full.
>> Let's take a simple case, only a single domain is using vITS on the
>> platform. If it injects a huge number of commands, Xen will split it
>> with lots of small batch. All batch will be injected in the same pass as
>> long as it fits in the physical command queue. Am I correct?
> That's how it is currently written, yes. With the "possible
> simplification" above the answer is no, only a batch at a time would be
> written for each guest.
> BTW, it doesn't have to be a single guest, the sum total of the
> injections across all guests could also take a similar amount of time.
> Is that a concern?

Yes, the example with only a guest was easier to explain.

>> I think we have to restrict total number of batch (i.e for all the
>> domain) injected in a same scheduling pass.
>> I would even tend to allow only one in flight batch per domain. That
>> would limit the possible problem I pointed out.
> This is the "possible simplification" I think. Since it simplifies other
> things (I think) as well as addressing this issue I think it might be a
> good idea.

With the limitation of command send per batch, would the fairness you
were talking on the design doc still required?


>>> This assumes that there is no particular benefit to keeping the
>>> `CWRITER` rolling ahead of the pITS's actual processing.
>> I don't understand this assumption. CWRITER will always point to the
>> last command in the queue.
> Correct, but that might be ahead of where the pITS has actually gotten
> to (which we cannot see).
> What I am trying to say here is that there is no point in trying to
> eagerly complete things (by checking `CREADR`) such that we can write
> new things (and hence push `CWRITER` forward) just to keep ahead of the
> pITS' processing.

With your explanation IRL, I better understand this point now. Thanks
for the explanation.

>>> Therefore it is proposed that the restriction that a single vITS maps
>>> to one pITS be retained. If a guest requires access to devices
>>> associated with multiple pITSs then multiple vITS should be
>>> configured.
>> Having multiple vITS per domain brings other issues:
>>      - How do you know the number of ITS to describe in the device tree at 
>> boot?
> I'm not sure. I don't think 1 vs N is very different from the question
> of 0 vs 1 though, somehow the tools need to know about the pITS setup.

I don't see why the tools would require to know the pITS setup.

>>      - How do you tell to the guest that the PCI device is mapped to a
>> specific vITS?
> Device Tree or IORT, just like on native and just like we'd have to tell
> the guest about that mapping even if there was a single vITS.

Right, although the root controller can only be attached to one ITS.

It will be necessary to have multiple root controller in the guest in
the case of we passthrough devices using different ITS.

Is pci-back able to expose multiple root controller?

> I think the complexity of having one vITS target multiple pITSs is going
> to be quite high in terms of data structures and the amount of
> thinking/tracking scheduler code will have to do, mostly down to out of
> order completion of things put in the pITS queue.

I understand the complexity, but exposing on vITS per pITS means that we
are exposing the underlying hardware to the guest.

That bring a lot of complexity in the guest layout, which is right now
static. How do you decide the number of vITS/root controller exposed
(think about PCI hotplug)?

Given that PCI passthrough doesn't allow migration, maybe we could use
the layout of the hardware.

If we are going to expose multiple vITS to the guest, we should only use
vITS for guest using PCI passthrough. This is because migration won't be
compatible with it.


Julien Grall

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.