[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Draft D] Xen on ARM vITS Handling



On 05/06/15 11:24, Ian Campbell wrote:
> On Thu, 2015-06-04 at 18:55 +0100, Julien Grall wrote:
>>> # Scope
>>>
>>> The ITS is rather complicated, especially when combined with
>>> virtualisation. To simplify things we initially omit the following
>>> functionality:
>>>
>>> - Interrupt -> vCPU -> pCPU affinity. The management of physical vs
>>>   virtual Collections is a feature of GICv4, thus is omitted in this
>>>   design for GICv3. Physical interrupts which occur on a pCPU where
>>>   the target vCPU is not already resident will be forwarded (via IPI)
>>>   to the correct pCPU for injection via the existing
>>>   `vgic_vcpu_inject_irq` mechanism (extended to handle LPI injection
>>>   correctly).
>>> - Clearing of the pending state of an LPI under various circumstances
>>>   (`MOVI`, `DISCARD`, `CLEAR` commands) is not done. This will result
>>>   in guests seeing some perhaps spurious interrupts.
>>> - vITS functionality will only be available on 64-bit ARM hosts,
>>>   avoiding the need to worry about fast access to guest owned data
>>>   structures (64-bit uses a direct map). (NB: 32-bit guests on 64-bit
>>>   hosts can be considered to have access)
>>>
>>> XXX Can we assume that `GITS_TYPER.Devbits` will be sane,
>>> i.e. requiring support for the full 2^32 device ids would require a
>>> 32GB device table even for native, which is improbable except on
>>> systems with RAM measured in TB. So we can probably assume that
>>> Devbits will be appropriate to the size of the system. _Note_: We
>>> require per guest device tables, so size of the native Device Table is
>>> not the only factor here.
>>
>> As we control the vBDF we can control the vDevID.
> 
> Not for dom0 though, unfortunately.
> 
>> If we have a single PCI bus, the number won't be too high.
> 
> Right.
> 
> Perhaps we can make some useful simplification from assuming 1:1 for
> dom0 and a reduced Devbits space controlled by us) for domU?
> 
>>> XXX Likewise can we assume that `GITS_TYPER.IDbits` will be sane?
>>
>> The GITS_TYPER.IDbits of who? The physical ITS?
> 
> Yes, although I think it then has an impact on what we virtualise, for
> the dom0 case in particular.

AFAICT it's only impact DOM0. For DOMU we are able to know the maximum
number of LPIs (based on the PCI assigned). So we can limit the
GITS_TYPER.IDbits.

> 
>>> ## Device Table
>>>
>>> The `pITS` device table will be allocated and given to the pITS at
>>> start of day.
>>
>> We don't really care about this. This is part of the memory provision at
>> initialization based on GITS_BASER*
>>
>> Furthermore, the ITS may already have in the table in-memory and
>> therefore this allocation is not neccesary.
> 
> 
> I was just trying to indicate that the physical device was setup in some
> appropriate way, I've replaced this section with another sentence at the
> end of the previous one:
>         The physical ITS will be provisioned with whatever tables it requests
>         via its `GITS_BASERn` registers.
> 
>>> ## Per Device Information
>>>
>>> Each physical device in the system which can be used together with an
>>> ITS (whether using passthrough or not) will have associated with it a
>>> data structure:
>>>
>>>     struct its_device {
>>>         uintNN_t phys_device_id;
>>>         uintNN_t virt_device_id;
>>>         unsigned int *events;
>>>         unsigned int nr_events;
>>>         struct page_info *pitt;
>>>         unsigned int nr_pitt_pages
>>
>> You need to have a pointer to the pITS associated.
> 
> In general I suppose that is likely, so far as this particular vits spec
> goes is there a specific use here which requires it?

Well, you were speaking about the structure associated to a physical
device and provide information about the physical IIT which is not
strictly speaking vITS specific.

> I'll add the field, but I'll also add:
> 
>       /* Other fields relating to pITS maintenance but unrelated to vITS */
> 
>>> ## Device Discovery/Registration and Configuration
>>>
>>> Per device information will be discovered based on firmware tables (DT
>>> or ACPI) and information provided by dom0 (e.g. registration via
>>> PHYSDEVOP_pci_device_add or new custom hypercalls).
>>>
>>> This information shall include at least:
>>>
>>> - The Device ID of the device.
>>> - The maximum number of Events which the device is capable of
>>>   generating.
>>
>> Well, the maximum number of Events doesn't need to come through a new
>> hypercall. We can directly retrieve it via the PCI CFG space.
> 
> I don't think it matters, the point of this paragraph is that we know
> the value from somewhere of all these things before the device can be
> used for its.
> 
> Anyway, I added PCI cfg space to the e.g. in the first paragraph. From
> the point of view of this spec I don't really care about the details of
> where the nr events comes from.
> 
>>> (IPA, size, cache attributes to use when mapping) will be recorded in
>>> `struct domain`.
>>
>> map_domain_page assume that the memory is WB/WT. What happen if the
>> guest decides to choose a different attribute?
> 
> This is an interesting point. Although GITS_BASERn (and similar
> registers) allow the cacheability of the tables to be specified by the
> OS its not clear to me why when the tables are essentially opaque to the
> OS, so why would it be accessing them at all?

I have no idea why they request to specify the cacheability in the
GITS_BASER.

> 
> IOW so long as Xen always uses the same attributes for its mappings I'm
> not sure we need to worry about this. Anything odd which occurs because
> the guest touches the memory with the wrong cache attribute would be
> equally odd if the guest touched it with the matching one.

> 
>>
>>>
>>> All other `GITS_BASERn.Valid == 0`.
>>>
>>> ## vITS to pITS mapping
>>>
>>> A physical system may have multiple physical ITSs.
>>>
>>> With the simplified vits command model presented here only a single
>>> `vits` is required.
>>
>> What about DOM0? We would have to browse the firmware table (ACPI or DT)
>> to replace the ITS parent.
> 
> That's ok though, isn't it?
> 
>>> On write `bit[0]` of the written byte is the enable/disable state for
>>> the irq and is handled thus:
>>>
>>>     lpi = (addr - table_base);
>>>     if ( byte & 1 )
>>>         enable_irq(lpi);
>>>     else
>>>         disable_irq(lpi);
>>>
>>> Note that in the context of this emulation we do not have access to a
>>> Device ID, and therefore cannot make decisions based on whether the
>>> LPI/event has been `MAPD`d etc. In any case we have an `lpi` in our
>>> hand and not an `event`, IOW we would need to do a _reverse_ lookup in
>>> the ITT.
>>
>> I'm struggling to see how this would work. After enabling/disabling an
>> IRQ you need to send an INV which require the devID and the eventID.
> 
> This is a detail for the pits driver, which is deliberately out of scope
> here. The interface to core Xen system is enable_irq/disable_irq and so
> that is what we must use here.
> 
> I imagine the pits driver will need to store some details about this
> somewhere for its own internal purposes, perhaps in struct its_device.

It's rather strange to put this out of scope because it would have
solved your issue (i.e getting the vDevID, ID).

If you have a mapping vLPI -> pLPI, you can therefore get the its_device
which store the vDevID and you can device the eventID.

Which that you can decide what to do.

>> Also, until now you haven't explained how the vLPI will be mapped to the
>> pLPI.
> 
> I didn't think anything here so far needed that mapping, so of course I
> didn't explain it.
> 
> If some alternative scheme which you are proposing (or solution to an
> outstanding issue) requires this mapping then we would have to add it.
> 
>>> LPI priority (the remaining bits in the written byte) is currently
>>> ignored.
>>>
>>> ## LPI Pending Table Virtualisation
>>>
>>> XXX Can we simply ignore this? 4.8.5 suggests it is not necessarily in
>>> sync and the mechanism to force a sync is `IMPLEMENTATION DEFINED`.
>>>
>>> ## Device Table Virtualisation
>>>
>>> The IPA, size and cacheability attributes of the guest device table
>>> will be recorded in `struct domain` upon write to `GITS_BASER0`.
>>>
>>> In order to lookup an entry for `device`:
>>>
>>>     define {get,set}_vdevice_entry(domain, device, struct device_table 
>>> *entry):
>>>         offset = device*sizeof(struct vdevice_table)
>>>         if offset > <DT size>: error
>>>
>>>         dt_entry = <DT base IPA> + device*sizeof(struct vdevice_table)
>>>         page = p2m_lookup(domain, dt_entry, p2m_ram)
>>>         if !page: error
>>>         /* nb: non-RAM pages, e.g. grant mappings,
>>>          * are rejected by this lookup */
>>
>> It is allowed to pass device memory here. See Device-nGnRnE.
> 
> The guest is allowed to tell us that we should use RAM with strongly
> ordered etc device like semantics, but telling us to use an actual MMIO
> region is nuts, do you really think we need to suppor that case?

I agree that it may be strange but I wasn't able to find a place which
rule out this possibility...

Though, it may be possible to have MMIO region to use specifically for
the DOM0 ITS (?).

> 
>>>         d = irq_guest->domain
>>>         virq = irq_guest->virq
>>>         its_device = irq_guest->its_device
>>>
>>>         get_vitt_entry(d, its_device->virt_device_id, virq, &vitt)
>>>         vcpu = d->its_collections[vitt.collection]
>>>         vgic_vcpu_inject_irq(d, &d->vcpus[vcpu])
>>
>> Shouldn't you pass at least the vLPIs?
> 
> Yes, I meant
> 
>         vgic_vcpu_inject_irq(&d->vcpus[vcpu], vitt.vpli)
> 
>> We would also need to ensure that the vitt.vpli is effectively an LPI.
>> Otherwise the guest may send an invalid value (such as 1023) which will
>> potentially crash the platform.
> 
> Yes. This sort of thing is largely implicit as far as the error handling
> goes, i.e. error handling should be done at all stages.
> 
> I added a specific line about this though.
> 
>>> - Not `MAPD`d -- on `MAPD` enable all associate LPIs which are enabled
>>>   in LPI CFG Table.
>>> - Not `MAPI`/`MAPVI`d -- on `MAPI`/`MAPVI` enable LPI if enabled in
>>>   CFG Table.
>>> - Not `MAPC`d -- tricky. Need to know lists of LPIs associated with a
>>>   virtual collection. A `list_head` in `struct irq_guest` implies a
>>>   fair bit of overhead, number of LPIs per collection is the total
>>>   number of LPIs (guest could assign all to the same
>>>   collection). Could walk entire LPI CFG and reenable all set LPIs,
>>>   might involve walking over several KB of memory though. Could inject
>>>   unmapped collections to vcpu0, forcing the guest to deal with the
>>>   spurious interrupts?
>>>
>>> XXX Only the `MAPC` issue seems problematic. Is this a critical issue or
>>> can we get away with it?
>>
>> This is not the only issue. With your solution, you let the possibility
>> to the guest having one vLPI for multiple (devID, ID)/LPI.
> 
> I'm not sure, but isn't this UNPREDICTABLE or something? The "Where:"
> under Figure 19 (and most of the other related tables) suggests that the
> the identifier must be unique. I'd like to find a stronger statement if
> we were to rely on this though, true.

I think so. But you are assuming that the guest is nice and will never
touch the memory provisioned to the ITS (ITT, Device Table...).

The vLPI is stored in the guest memory. A malicious guest could decide
to modify different ITT entries to point to the same vLPI. It would be
impossible to know which pLPI we have to EOI.

We can't validate this case when the interrupt is mapped via MAPVI as
the guest can modify afterward the memory. And checking at injection
time would require to browse all the ITTs.

So, shall we trust a guest using vITS? If not I'm afraid that we can't
make usage of memory provisioned by the guest.

FWIW, this is not the first place in this proposal where you assume the
table won't be modified by the guest.

> 
>>  Even if we
>> validate it when the command MAPI/MAPVI is sent, the guest could modify
>> himself the ITT.
>>
>> Furthermore, each virtual IRQ of a domain is associated to a struct
>> pending_irq. This structure contains an irq_desc which is a pointer to
>> the physical IRQ. With a craft ITT the guest could mess up those
>> datastructure. Although, I think we can get back on our feet after the
>> domain is destroyed.
>>
>> After reading this draft, I think we can avoid to browse the Device
>> Table and the ITT. As said on the previous paragraph, the pending_irq
>> structure is linked to an irq_desc. In your proposal, you suggested to
>> store the its_device in the irq_guest (part of irq_desc). If we make
>> usage of pending_irq->desc to store the physical descriptor, we can have
>> a mapping vLPI <=> pLPI. Therefore, this would resolve UI1 and AFAICT,
>> the memory usage would be the same in Xen of the ITT/Device base solution.
> 
> If you have a scheme along those lines that would be good. From your
> next message:
> 
>> BTW, I will suggest a pseudo-code based on your draft tomorrow. It may 
>> be easier to understand.
> 
> Yes, please, but please update everything which is affected, add new
> sections as necessary etc so the document remains a self consistent
> plausible design.

I'm afraid to say that I won't be able to come up with a full proposal
today and I will be away from tonight until the 15th of June.

I can try to suggest few lead but I still need to think about the INT
problem.

Regards,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.