[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Draft C] Xen on ARM vITS Handling



On Wed, May 27, 2015 at 5:18 PM, Ian Campbell <ian.campbell@xxxxxxxxxx> wrote:
> Here follows draft C based on previous feedback.
>
> Also at:
>
> http://xenbits.xen.org/people/ianc/vits/draftC.{pdf,html}
>
> I think I've captured most of the previous discussion, except where
> explicitly noted by XXX or in other replies, but please do point out
> places where I've missed something.
>
> One area where I am pretty sure I've dropped the ball is on the
> completion and update of `CREADR`. That conversation ended up
> bifurcating along the 1:N vs N:N mapping scheme lines, and I didn't
> manage to get the various proposals straight. Since we've now agreed on
> N:N hopefully we can reach a conclusion (no pun intended) on the
> completion aspect too (sorry that this probably means rehasing at least
> a subset of the previous thread).
>
> Ian.
>
> % Xen on ARM vITS Handling
> % Ian Campbell <ian.campbell@xxxxxxxxxx>
> % Draft C
>
> # Changelog
>
> ## Since Draft B
>
> * Details of command translation (thanks to Julien and Vijay)
> * Added background on LPI Translation and Pending tablesd
> * Added background on Collections
> * Settled on `N:N` scheme for vITS:pITS mapping.
> * Rejigged section nesting a bit.
> * Since we now thing translation should be cheap, settle on
>   translation at scheduling time.
> * Lazy `INVALL` and `SYNC`
>
> ## Since Draft A
>
> * Added discussion of when/where command translation occurs.
> * Contention on scheduler lock, suggestion to use SOFTIRQ.
> * Handling of domain shutdown.
> * More detailed discussion of multiple vs single vits pros/cons.
>
> # Introduction
>
> ARM systems containing a GIC version 3 or later may contain one or
> more ITS logical blocks. An ITS is used to route Message Signalled
> interrupts from devices into an LPI injection on the processor.
>
> The following summarises the ITS hardware design and serves as a set
> of assumptions for the vITS software design. (XXX it is entirely
> possible I've horribly misunderstood how this stuff fits
> together). For full details of the ITS see the "GIC Architecture
> Specification".
>
> ## Device Identifiers
>
> Each device using the ITS is associated with a unique identifier.
>
> The device IDs are typically described via system firmware, e.g. the
> ACPI IORT table or via device tree.
>
> The number of device ids is variable and can be discovered via
> `GITS_TYPER.Devbits`. This field allows an ITS to have up to 2^32
> device.
>
> ## Interrupt Collections
>
> Each interrupt is a member of an Interrupt Collection. This allows
> software to manage large numbers of physical interrupts with a small
> number of commands rather than issuing one command per interrupt.
>
> On a system with N processors, the ITS must provide at least N+1
> collections.
>
> ## Target Addresses
>
> The Target Address correspond to a specific GIC re-distributor. The format
> of this field depends on the value of the `GITS_TYPER.PTA` bit:
>
> * 1: the base address of the re-distributor target is used
> * 0: a unique processor number is used. The mapping between the
>   processor affinity value (`MPIDR`) and the processor number is
>   discoverable via `GICR_TYPER.ProcessorNumber`.
>
> ## ITS Translation Table
>
> Message signalled interrupts are translated into an LPI via an ITS
> translation table which must be configured for each device which can
> generate an MSI.
>
> The ITS translation table maps the device id of the originating devic
> into an Interrupt Collection and then into a target address.
>
> ## ITS Configuration
>
> The ITS is configured and managed, including establishing and
> configuring Translation Table for each device, via an in memory ring
> shared between the CPU and the ITS controller. The ring is managed via
> the `GITS_CBASER` register and indexed by `GITS_CWRITER` and
> `GITS_CREADR` registers.
>
> A processor adds commands to the shared ring and then updates
> `GITS_CWRITER` to make them visible to the ITS controller.
>
> The ITS controller processes commands from the ring and then updates
> `GITS_CREADR` to indicate the the processor that the command has been
> processed.
>
> Commands are processed sequentially.
>
> Commands sent on the ring include operational commands:
>
> * Routing interrupts to processors;
> * Generating interrupts;
> * Clearing the pending state of interrupts;
> * Synchronising the command queue
>
> and maintenance commands:
>
> * Map device/collection/processor;
> * Map virtual interrupt;
> * Clean interrupts;
> * Discard interrupts;
>
> The field `GITS_CBASER.Size` encodes the number of 4KB pages minus 0
> consisting of the command queue. This field is 8 bits which means the
> maximum size is 2^8 * 4KB = 1MB. Given that each command is 32 bytes,
> there is a maximum of 32768 commands in the queue.
>
> The ITS provides no specific completion notification
> mechanism. Completion is monitored by a combination of a `SYNC`
> command and either polling `GITS_CREADR` or notification via an
> interrupt generated via the `INT` command.
>
> Note that the interrupt generation via `INT` requires an originating
> device ID to be supplied (which is then translated via the ITS into an
> LPI). No specific device ID is defined for this purpose and so the OS
> software is expected to fabricate one.
>
> Possible ways of inventing such a device ID are:
>
> * Enumerate all device ids in the system and pick another one;
> * Use a PCI BDF associated with a non-existent device function (such
>   as an unused one relating to the PCI root-bridge) and translate that
>   (via firmware tables) into a suitable device id;
> * ???
>
> ## LPI Configuration Table
>
> Each LPI has an associated configuration byte in the LPI Configuration
> Table (managed via the GIC Redistributor and placed at
> `GICR_PROPBASER` or `GICR_VPROPBASER`). This byte configures:
>
> * The LPI's priority;
> * Whether the LPI is enabled or disabled.
>
> Software updates the Configuration Table directly but must then issue
> an invalidate command (per-device `INV` ITS command, global `INVALL`
> ITS command or write `GICR_INVLPIR`) for the affect to be guaranteed
> to become visible (possibly requiring an ITS `SYNC` command to ensure
> completion of the `INV` or `INVALL`). Note that it is valid for an
> implementaiton to reread the configuration table at any time (IOW it
> is _not_ guarenteed that a change to the LPI Configuration Table won't
> be visible until an invalidate is issued).
>
> ## LPI Pending Table
>
> Each LPI also has an associated bit in the LPI Pending Table (managed
> by the GIC redistributor). This bit signals whether the LPI is pending
> or not.
>
> # vITS
>
> A guest domain which is allowed to use ITS functionality (i.e. has
> been assigned pass-through devices which can generate MSIs) will be
> presented with a virtualised ITS.
>
> Accesses to the vITS registers will trap to Xen and be emulated and a
> virtualised Command Queue will be provided.
>
> Commands entered onto the virtual Command Queue will be translated
> into physical commands, as described later in this document.
>
> XXX there are other aspects to virtualising the ITS (LPI collection
> management, assignment of LPI ranges to guests, device
> management). However these are not currently considered here. XXX
> Should they be/do they need to be?
>
> # Requirements
>
> Emulation should not block in the hypervisor for extended periods. In
> particular Xen should not busy wait on the physical ITS. Doing so
> blocks the physical CPU from doing anything else (such as scheduling
> other VCPUS)
>
> There may be multiple guests which have a vITS, all targeting the same
> underlying pITS. A single guest VCPU should not be able to monopolise
> the pITS via its vITS and all guests should be able to make forward
> progress.
>
> # vITS to pITS mapping
>
> A physical system may have multiple physical ITSs.
>
> We assume that a given device is only associated with one pITS.
>
> A guest which is given access to multiple devices associated with
> multiple pITSs will need to be given virtualised access to all
> associated pITSs.
>
> There are several possible models for achieving this:
>
> * `1:N`: One virtual ITS tired to multiple physical ITS.
> * `N:N`: One virtual ITS per physical ITS.
> * `M:N`: Multiple virtual ITS tied to a differing number of physical ITSs.
>
> This design assumes an `N:N` model, which is thought to be simpler on
> the Xen side since it avoids questions of how to fairly schedule
> commands in the `1:N` model while avoiding starvation as well as
> simplifying the virtualisation of global commands such as `INVALL` or
> `SYNC`.
>
> The `N:N` model is also a better fit for I/O NUMA systems.
>
> Since the choice of model is internal to the hypervisor/tools and is
> communicated to the guest via firmware tables we are not tied to this
> model as an ABI if we decide to change.
>
> New toolstack domctls or extension to existing domctls will likely be
> required to allow the toolstack to determine the number of vITS which
> will be required for the guest and to determine the mapping for
> passed-through devices.
>
> # LPI Configuration Table Virtualistion
>
> A guest's write accesses to its LPI Configuration Table (which is just
> an area of guest RAM which the guest has nominated) will be trapped to
> the hypervisor, using stage 2 MMU permissions, in order for changes to
> be propagated into the physical LPI Configuration Table.
>
> A host wide LPI dirty bit map, with 1 bit per LPI, will be maintained
> which indicates whether an update to the physical LPI Configuration
> Table has been flushed (via an invalidate command). The corresponding
> bit will be set whenever a guest changes the configuration of an LPI.
>
> This dirty bit map will be used during the handling of relevant ITS
> Commands (`INV`, `INVALL` etc).
>
> Note that no invalidate is required during the handling of an LPI
> Configuration Table trap.
>
> # Command Queue Virtualisation
>
> The command queue of each vITS is represented by a data structure:
>
>     struct vits_cq {
>         list_head schedule_list; /* Queued onto pits.schedule_list */
>         uint32_t creadr;         /* Virtual creadr */
>         uint32_t cwriter;        /* Virtual cwriter */
>         uint32_t progress;       /* Index of last command queued to pits */
>         [ Reference to command queue memory ]
>     };
>
> Each pITS has an associated data structure:
>
>     struct pits {
>         list_head schedule_list; /* Contains list of vitq_cq.schedule_lists */
>         uint32_t last_creadr;
>     };
>
> On write to the virtual `CWRITER` the cwriter field is updated and if
> that results in there being new outstanding requests then the vits_cq
> is enqueued onto pITS' schedule_list (unless it is already there).
>
> On read from the virtual `CREADR` iff the vits_cq is such that
> commands are outstanding then a scheduling pass is attempted (in order
> to update `vits_cq.creadr`). The current value of `vitq_cq.creadr` is
> then returned.
>
> ## Command translation
>
> In order to virtualise the Command Queue each command must be
> translated (this is described in the GIC spec).
>
> Translation of certain commands is potentially expensive, however we
> will attempt to arrange things (data structures etc) such that the
> overhead a translation time is minimised (see later).
>
> Translation can be done in two places:
>
> * During scheduling.
> * On write to `CWRITER`, into a per `vits_cq` queue which the
>   scheduler then propagates to the pits.
>
> Doing the translate during scheduling means that potentially expensive
> operations may be accounted to `current`, who may have nothing to do
> with those operations (this is true whether it is IRQ context or
> SOFTIRQ context).
>
> Doing the translate during `CWRITER` emulation accounts it to the
> right place, but introduces a potentially long synchronous operation
> which ties down a VCPU. Introducing batching here means we have
> essentially the same issue wrt when to replenish the translated queue
> as doing translate during scheduling.
>
> Translate during `CWRITER` also has memory overheads. Unclear if they
> are at a problematic scale or not.
>
> Since we have arranged for translation overheads to be minimised it
> seems that translation during scheduling should be tollerable.
>
> ## pITS Scheduling
>
> A pITS scheduling pass is attempted:
>
> * On write to any virtual `CWRITER` iff that write results in there
>   being new outstanding requests for that vits;

   You mean, scheduling pass (softirq trigger)  is triggered iff there is no
ongoing requests from that vits?

> * On read from a virtual `CREADR` iff there are commands outstanding
>   on that vits;
> * On receipt of an interrupt notification arising from Xen's own use
>   of `INT`; (see discussion under Completion)
> * On any interrupt injection arising from a guests use of the `INT`
>   command; (XXX perhaps, see discussion under Completion)
>
> This may result in lots of contention on the scheduler
> locking. Therefore we consider that in each case all which happens is
> triggering of a softirq which will be processed on return to guest,
> and just once even for multiple events.

Is it required to have all the cases to trigger scheduling pass?
Just on CWRITER if no ongoing request and on Xen's own completion INT
is not sufficient?

>
> Such deferral could be considered OK (XXX ???) for the `CREADR` case
> because at worst the value read will be one cycle out of date. A guest
> which receives an `INT` notification might reasonably expect a
> subsequent read of `CREADR` to reflect that. However that should be
> covered by the softint processing which would occur on entry to the
> guest to inject the `INT`.
>
> Each scheduling pass will:
>
> * Read the physical `CREADR`;
> * For each command between `pits.last_creadr` and the new `CREADR`
>   value process completion of that command and update the
>   corresponding `vits_cq.creadr`.
> * Attempt to refill the pITS Command Queue (see below).
>
> ## Domain Shutdown
>
> We can't free a `vits_cq` while has things on the physical control
> queue, and we cannot cancel things which are on the control queue.
>
> So we must wait.
>
> Obviously don't enqueue anything new onto the pits if `d->is_dying`.
>
> `domain_relinquish_resources()` waits (somehow, with suitable
> continuations etc) for anything which the `vits_cq` has outstanding to
> be completed so that the datastructures can be cleared.
>
> ## Filling the pITS Command Queue.
>
> Various algorithms could be used here. For now a simple proposal is
> to traverse the `pits.schedule_list` starting from where the last
> refill finished (i.e not from the top of the list each time).
>
> In order to simplify bookkeeping and to bound the amount of time spent
> on a single scheduling pass each `vitq_cq` will only have a single
> batch of commands enqueued with the PITs at a time.
>
> If a `vits_cq` has no pending commands then it is removed from the
> list.
>
> If a `vits_cq` already has commands enqueued with the pITS Command
> Queue then it is skipped.
>
> If a `vits_cq` has some pending commands then `min(pits-free-slots,
> vits-outstanding, VITS_BATCH_SIZE)` will be taken from the vITS
> command queue, translated and placed onto the pITS
> queue. `vits_cq.progress` will be updated to reflect this.
>
> Each `vits_cq` is handled in turn in this way until the pITS Command
> Queue is full, there are no more outstanding commands or each active
> `vits_cq` has commands enqueued with the pITS.
>
> There will likely need to be a data structure which shadows the pITS
> Command Queue slots with references to the `vits_cq` which has a
> command currently occupying that slot and corresponding the index into
> the virtual command queue, for use when completing a command.
>
> `VITS_BATCH_SIZE` should be small, TBD say 4 or 8.
>
> ## Completion
>
> It is expected that commands will normally be completed (resulting in
> an update of the corresponding `vits_cq.creadr`) via guest read from
> `CREADR`. This will trigger a scheduling pass which will ensure the
> `vits_cq.creadr` value is up to date before it is returned.
>
> A guest which does completion via the use of `INT` cannot observe
> `CREADR` without reading it, so updating on read from `CREADR`
> suffices from the point of view of the guests observation of the
> state. (Of course we will inject the interrupt at the designated point
> and the guest may well then read `CREADR`)
>
> However in order to keep the pITS Command Queue moving along we need
> to consider what happens if there are no `INT` based events nor reads
> from `CREADR` to drive completion and therefore refilling of the Queue
> with other outstanding commands.
>
> A guest which enqueues some commands and then never checks for
> completion cannot itself block things because any other guest which
> reads `CREADR` will drive completion. However if _no_ guest reads from
> `CREADR` then completion will not occur and this must be dealt with.
>
> Even if we include completion on `INT`-base interrupt injection then
> it is possible that the pITS queue may not contain any such
> interrupts, either because no guest is using them or because the
> batching means that none of them are enqueued on the active ring at
> the moment.
>
> So we need a fallback to ensure that queue keeps moving. There are
> several options:
>
> * A periodic timer in Xen which runs whenever there are outstanding
>   commands in the pITS. This is simple but pretty sucky.
> * Xen injects its own `INT` commands into the pITS ring. This requires
>   figuring out a device ID to use.
>
> The second option is likely to be preferable if the issue of selecting
> a device ID can be addressed.
>
> A secondary question is when these `INT` commands should be inserted
> into the command stream:
>
> * After each batch taken from a single `vits_cq`;

   Is this not enough? because Scheduling pass just sends a one batch of
command with Xen's INT command

> * After each scheduling pass;
> * One active in the command stream at any given time;
>
> The latter should be sufficient, by arranging to insert a `INT` into
> the stream at the end of any scheduling pass which occurs while there
> is not a currently outstanding `INT` we have sufficient backstop to
> allow us to refill the ring.
>
> This assumes that there is no particular benefit to keeping the
> `CWRITER` rolling ahead of the pITS's actual processing. This is true
> because the ITS operates on commands in the order they appear in the
> queue, so there is no need to maintain a runway ahead of the ITS
> processing. (XXX If this is a concern perhaps the INT could be
> inserted at the head of the final batch of commands in a scheduling
> pass instead of the tail).
>
> Xen itself should never need to issue an associated `SYNC` command,
> since the individual guests would need to issue those themselves when
> they care. The `INT` only serves to allow Xen to enqueue new commands
> when there is space on the ring, it has no interest itself on the
> actual completion.
>
> ## Locking
>
> It may be preferable to use `atomic_t` types for various fields
> (e.g. `vits_cq.creadr`) in order to reduce the amount and scope of
> locking required.
>
> # ITS Command Translation
>
> This section is based on the section 5.13 of GICv3 specification
> (PRD03-GENC-010745 24.0). The goal is to provide insight of the cost
> to emulate ITS commands in Xen.
>
> The ITS provides 12 commands in order to manage interrupt collections,
> devices and interrupts. Possible command parameters are device ID
> (`ID`), Event ID (`vID`), Collection ID (`vCID`), Target Address
> (`vTA`) parameters.
>
> These parameters need to be validated and translated from Virtual to
> Physical.
>
> ## Parameter Validation / Translation
>
> Each command contains parameters that needs to be validated before any
> usage in Xen or passing to the hardware.
>
> ### Device ID (`ID`)
>
> This parameter is used by commands which manage a specific device and
> the interrupts associated with that device. Checking if a device is
> present and retrieving the data structure must be fast.
>
> The device identifiers may not be assigned contiguously and the maximum
> number is very high (2^32).
>
> XXX In the context of virtualised device ids this may not be the case,
> e.g. we can arrange for (mostly) contiguous device ids and we know the
> bound is significantly lower than 2^32
>
> Possible efficient data structures would be:
>
> 1. List: The lookup/deletion is in O(n) and the insertion will depend
>    if the device should be sorted following their identifier. The
>    memory overhead is 18 bytes per element.
> 2. Red-black tree: All the operations are O(log(n)). The memory
>    overhead is 24 bytes per element.
>
> A Red-black tree seems the more suitable for having fast deviceID
> validation even though the memory overhead is a bit higher compare to
> the list.

When PHYSDEVOP_pci_device_add is called, memory for its_device structure
and other needed structure for this device is allocated added to RB-tree
with all necessary information

>
> ### Event ID (`vID`)
>
> This is the per-device Interrupt identifier (i.e. the MSI index). It
> is configured by the device driver software.
>
> It is not necessary to translate a `vID`, however they may need to be
> represented in various data structures given to the pITS.
>
> XXX is any of this true?
>
> ### Interrupt Collection (`vCID`)
>
> This parameter is used in commands which manage collections and
> interrupt in order to move them for one CPU to another. The ITS is
> only mandated to implement N + 1 collections where N is the number of
> processor on the platform (i.e max number of VCPUs for a given
> guest). Furthermore, the identifiers are always contiguous.
>
> If we decide to implement the strict minimum (i.e N + 1), an array is
> enough and will allow operations in O(1).
>
> XXX Could forgo array and go straight to vcpu_info/domain_info.
>
> ### Target Address (`vTA`)
>
> This parameter is used in commands which manage collections. It is a
> unique identifier per processor. The format is different following the
> value of the `GITS_TYPER.PTA` bit . The value of the field is fixed by
> the ITS implementation and the software has to handle the 2 cases.
>
> A solution with `GITS_TYPER.PTA` set to one will require some
> computation in order to find the VCPU associated with the
> redistributor address. It will be similar to get_vcpu_from_rdist in
> the vGICv3 emulation (xen/arch/arm/vgic-v3.c).
>
> On another hand, setting GITS_TYPER.PTA to zero will give us control to
> decide the linear process number which could simply be the vcpu_id (always
> linear).
>
> XXX Non-linear VCPUs e.g. via AFFR1 hierarchy?
>
> ## Command Translation
>
> Of the existing GICv3 ITS commands, `MAPC`, `MAPD`, `MAPVI`/`MAPI` are
> potentially time consuming commands as these commands creates entry in
> the Xen ITS structures, which are used to validate other ITS commands.
>
> `INVALL` and `SYNC` are global and potentially disruptive to other
> guests and so need consideration.
>
> All other ITS command like `MOVI`, `DISCARD`, `INV`, `INT`, `CLEAR`
> just validate and generate physical command.
>
> ### `MAPC` command translation
>
> Format: `MAPC vCID, vTA`
>
   -  The GITS_TYPER.PAtype is emulated as 0. Hence vTA is always represents
      vcpu number. Hence vTA is validated against physical Collection
IDs by querying
      ITS driver and corresponding Physical Collection ID is retrieved.
   -  Each vITS will have cid_map (struct cid_mapping) which holds mapping of
      Virtual Collection ID(vCID), Virtual Target address(vTA) and
      Physical Collection ID (pCID).
      If vCID entry already exists in cid_map, then that particular
mapping is updated with
      the new pCID and vTA else new entry is made in cid_map
   -  MAPC pCID, pTA physical ITS command is generated

   Here there is no overhead, the cid_map entries are preallocated
with size of nr_cpus
   in the platform.


> - `MAPC pCID, pTA` physical ITS command is generated
>
> ### `MAPD` Command translation
>
> Format: `MAPD device, Valid, ITT IPA, ITT Size`
>
> `MAPD` is sent with `Valid` bit set if device needs to be added and reset
> when device is removed.
>
> If `Valid` bit is set:
>
> - Allocate memory for `its_device` struct
> - Validate ITT IPA & ITT size and update its_device struct
> - Find number of vectors(nrvecs) for this device by querying PCI
>   helper function
> - Allocate nrvecs number of LPI XXX nrvecs is a function of `ITT Size`?
> - Allocate memory for `struct vlpi_map` for this device. This
>   `vlpi_map` holds mapping of Virtual LPI to Physical LPI and ID.
> - Find physical ITS node with which this device is associated
> - Call `p2m_lookup` on ITT IPA addr and get physical ITT address
> - Validate ITT Size
> - Generate/format physical ITS command: `MAPD, ITT PA, ITT Size`
>
> Here the overhead is with memory allocation for `its_device` and `vlpi_map`
>
> XXX Suggestion was to preallocate some of those at device passthrough
> setup time?

If Validation bit is set:
   - Query its_device tree and get its_device structure for this device.
   - (XXX: If pci device is hidden from dom0, does this device is added
       with PHYSDEVOP_pci_device_add hypercall?)
   - If device does not exists return
   - If device exists in RB-tree then
          - Validate ITT IPA & ITT size and update its_device struct
          - Check if device is already assigned to the domain,
            if not then
               - Find number of vectors(nrvecs) for this device.
               - Allocate nrvecs number of LPI
               - Fetch vlpi_map for this device (preallocated at the
time of adding
                 this device to Xen). This vlpi_map holds mapping of
Virtual LPI to
                 Physical LPI and ID.
               - Call p2m_lookup on ITT IPA addr and get physical ITT address
               - Assign this device to this domain and mark as enabled
          - If this device already exists with the domain (Domain is
remapping the device)
               - Validate ITT IPA & ITT size and update its_device struct
               - Call p2m_lookup on ITT IPA addr and get physical ITT address
               - Disable all the LPIs of this device by searching
through vlpi_map and LPI
                 configuration table

          - Generate/format physical ITS command: MAPD, ITT PA, ITT Size

>
> If Validation bit is not set:
>
> - Validate if the device exits by checking vITS device list
> - Clear all `vlpis` assigned for this device
> - Remove this device from vITS list
> - Free memory
>
> XXX If preallocation presumably shouldn't free here either.
>

If Validation bit is not set:
    - Validate if the device exits by checking RB-tree and is assigned
to this domain
    - Disable all the LPIs associated with this device and release irq
    - Clear all vlpis mapping for this device
    - Remove this device from the domain

> ### `MAPVI`/`MAPI` Command translation
>
> Format: `MAPVI device, ID, vID, vCID`
>
> - Validate if the device exits by checking vITS device list
> - Validate vCID and get pCID by searching cid_map
>
> - if vID does not have entry in `vlpi_entries` of this device allocate
>   a new pID from `vlpi_map` of this device and update `vlpi_entries`
>   with new pID
> - Allocate irq descriptor and add to RB tree
> - call `route_irq_to_guest()` for this pID
> - Generate/format physical ITS command: `MAPVI device ID, pID, pCID`
>

- Validate if the device exists by checking vITS device RB-tree.
- Validate vCID and get pCID by searching cid_map
- if vID does not have entry in vlpi_entries of this device
      -  Allot pID from vlpi_map of this device and update
vlpi_entries with new pID.
      - Allocate irq descriptor and add to RB tree
      - call route_irq_to_guest() for this pID
  If exists,
     - If vCID is different ( remapping interrupts to differnt collection ),
            - Disable LPI
            - Update the vlpi_map
             (XXX: Enable LPI on guest request?)
- Generate/format physical ITS command: MAPVI device ID, pID, pCID

> Here the overhead is allocating physical ID, allocate memory for irq
> descriptor and routing interrupt.
>
> XXX Suggested to preallocate?
>
> ### `INVALL` Command translation
>
> A physical `INVALL` is only generated if the LPI dirty bitmap has any
> bits set. Otherwise it is skipped.
>
> XXX Perhaps bitmap should just be a simple counter?
>
> XXX bitmap is host global, a per-domain bitmap would allow us to elide
> `INVALL` unless an LPI associated with the guest making the request
> was dirty. Would also need some sort of "ITS INVALL" clock in order
> that other guests can elide their own `INVALL` if one has already
> happened. Complexity not worth it at this stage?
>
> ### `SYNC` Command translation
>
> Can be omitted from the physical command stream if the previous
> command was also a `SYNC`, i.e. due to a guest sending a series of
> `SYNC` commands or one guest's batch ending with one and the nexts
> beggining.
>
> XXX TBD can we do anything more? e.g. omit sync if the guest hasn't
> done anything of importance since the last sync?
>
> # GICv4 Direct Interrupt Injection
>
> GICv4 will directly mark the LPIs pending in the virtual pending table
> which is per-redistributor (i.e per-vCPU).
>
> LPIs will be received by the guest the same way as an SPIs. I.e trap in
> IRQ mode then read ICC_IAR1_EL1 (for GICv3).
>
> Therefore GICv4 will not require one vITS per pITS.
>
> # Event Channels
>
> It has been proposed that it might be nice to inject event channels as
> LPIs in the future. Whether or not that would involve any sort of vITS
> is unclear, but if it did then it would likely be a separate emulation
> to the vITS emulation used with a pITS and as such is not considered
> further here.
>
> # Glossary
>
> * _MSI_: Message Signalled Interrupt
> * _ITS_: Interrupt Translation Service
> * _GIC_: Generic Interrupt Controller
> * _LPI_: Locality-specific Peripheral Interrupt
>
> # References
>
> "GIC Architecture Specification" PRD03-GENC-010745 24.0
>
>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.