[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC] [Draft Design] ACPI/IORT Support in Xen.



Hi,

On 25/10/17 09:22, Manish Jaggi wrote:
>
>
> On 10/23/2017 7:27 PM, Andre Przywara wrote:
>> Hi Manish,
>>
>> On 12/10/17 22:03, Manish Jaggi wrote:
>>> ACPI/IORT Support in Xen.
>>> --------------------------------------
>>>
>>> I had sent out patch series [0] to hide smmu from Dom0 IORT. Extending
>>> the scope
>>> and including all that is required to support ACPI/IORT in Xen.
>>> Presenting for review
>>> first _draft_ of design of ACPI/IORT support in Xen. Not complete
>>> though.
>>>
>>> Discussed is the parsing and generation of IORT table for Dom0 and
>>> DomUs.
>>> It is proposed that IORT be parsed and the information in saved into xen
>>> data-structure
>>> say host_iort_struct and is reused by all xen subsystems like ITS / SMMU
>>> etc.
>>>
>>> Since this is first draft is open to technical comments, modifications
>>> and suggestions. Please be open and feel free to add any missing points
>>> / additions.
>>>
>>> 1. What is IORT. What are its components ?
>>> 2. Current Support in Xen
>>> 3. IORT for Dom0
>>> 4. IORT for DomU
>>> 5. Parsing of IORT in Xen
>>> 6. Generation of IORT
>>> 7. Future Work and TODOs
>>>
>>> 1. What is IORT. What are its components ?
>>> --------------------------------------------
>>> IORT refers to Input Output remapping table. It is essentially used
>>> to find
>>> information about the IO topology (PCIRC-SMMU-ITS) and relationships
>>> between
>>> devices.
>>>
>>> A general structure of IORT is has nodes which have information about
>>> PCI RC,
>>> SMMU, ITS and Platform devices. Using an IORT table relationship between
>>> RID -> StreamID -> DeviceId can be obtained. More specifically which
>>> device is
>>> behind which SMMU and which interrupt controller, this topology is
>>> described in
>>> IORT Table.
>>>
>>> RID is a requester ID in PCI context,
>>> StreamID is the ID of the device in SMMU context,
>>> DeviceID is the ID programmed in ITS.
>>>
>>> For a non-pci device RID could be simply an ID.
>>>
>>> Each iort_node contains an ID map array to translate from one ID into
>>> another.
>>> IDmap Entry {input_range, output_range, output_node_ref, id_count}
>>> This array is present in PCI RC node,SMMU node, Named component node etc
>>> and can reference to a SMMU or ITS node.
>>>
>>> 2. Current Support of IORT
>>> ---------------------------
>>> Currently Xen passes host IORT table to dom0 without any modifications.
>>> For DomU no IORT table is passed.
>>>
>>> 3. IORT for Dom0
>>> -----------------
>>> IORT for Dom0 is prepared by xen and it is fairly similar to the host
>>> iort.
>>> However few nodes could be removed removed or modified. For instance
>>> - host SMMU nodes should not be present
>>> - ITS group nodes are same as host iort but, no stage2 mapping is done
>>> for them.
>> What do you mean with stage2 mapping?
> Please ignore this line. Copy paste error. Read it as follows
>
> - ITS group nodes are same as host iort.
> (though I would modify the same as in next draft)
>
>>
>>> - platform nodes (named components) may be selectively present depending
>>> on the case where xen is using some. This could be controlled by  xen
>>> command
>>> line.
>> Mmh, I am not so sure platform devices described in the IORT (those
>> which use MSIs!) are so much different from PCI devices here. My
>> understanding is those platform devices are network adapters, for
>> instance, for which Xen has no use.
> ok.
>> So I would translate "Named Components" or "platform devices" as devices
>> just not using the PCIe bus (so no config space and no (S)BDF), but
>> being otherwise the same from an ITS or SMMU point of view.
> Correct.
>>> - More items : TODO
>> I think we agreed upon rewriting the IORT table instead of patching it?
> yes. In fact if you look at my patch v2 on IORT SMMU hiding, it was
> _rewriting_ most of Dom0 IORT and not patching it.

I was just after the wording above:
"IORT for Dom0 is prepared by xen and it is fairly similar to the host
iort. However few nodes could be removed removed or modified."
... which sounds a bit like you alter the h/w IORT.
It would be good to clarify this by explicitly mentioning the
parsing/generation cycle, as this is a fundamental design decision.

> We can have a IRC discussion on this.
>
> I think apart from rewriting, the other tasks which were required that
> are handled in this epic task
> - parse IORT and save in xen internal data structures
> - common code to generate IORT for dom0/domU
> - All xen code that parses IORT multiple times use now the xen internal
> data structures.

Yes, that sounds about right.

> (I have explained this in this mail below)
>> So to some degree your statements are true, but when we rewrite the IORT
>> table without SMMUs (and possibly without other components like the
>> PMUs), it would be kind of a stretch to call it "fairly similar to the
>> host IORT". I think "based on the host IORT" would be more precise.
> Yes. Based on host IORT is better,thanks.
>>
>>> 4. IORT for DomU
>>> -----------------
>>> IORT for DomU is generated by the toolstack. IORT topology is different
>>> when DomU supports device passthrough.
>> Can you elaborate on that? Different compared to what? My understanding
>> is that without device passthrough there would be no IORT in the first
>> place?
> I was exploring the possibility of having virtual devices for DomU.
> So if a virtual is assigned to guest there needs to be some mapping in
> IORT as well.
> This virtual device can be on a PCI bus / or as a platform device.
>
> Device Pass-through can be split into two parts
> a. platform device passthrough (not on PCI bus)
> b. PCI device PT

I understand that, but am still wondering how it would be "different".
We just start with creating our mapping data structure *from scratch*,
the same one we generate by *parsing* the host IORT.
Whether this points to a purely virtual device, a PCI PT or a platform
PT, should not matter for this purpose.

> => If we discount the possibility of a virtual device for domU and
> platform device passthrough
>  then you are correct no IORT is required.

I believe we need an IORT once we have devices which use MSIs.

> When PCI device passthrough is supported, the PCIRC is itself virtual
> (emulated by Xen).
> One can have any number of virtual PCIRC  and may be virtual SMMUs.
> Hence the topology can vary.

I think I don't disagree, my initial comment was just about the
confusion that this "IORT topology is *different* from" term created.

> Now read the below lines.
>>> At a minimum domU IORT should include a single PCIRC and ITS Group.
>>> Similar PCIRC can be added in DSDT.
>>> Additional node can be added if platform device is assigned to domU.
>>> No extra node should be required for PCI device pass-through.
>> Again I don't fully understand this last sentence.
> The last line is continuation of the first line "At a minimum..."

OK, but still I don't get how we would end up with an IORT without
(pass-throughed) PCI devices in the first place?

>>> It is proposed that the idrange of PCIRC and ITS group be constant for
>>> domUs.
>> "constant" is a bit confusing here. Maybe "arbitrary", "from scratch" or
>> "independent from the actual h/w"?
> ok. that is implementation defined.
>>
>>> In case if PCI PT,using a domctl toolstack can communicate
>>> physical RID: virtual RID, deviceID: virtual deviceID to xen.
>>>
>>> It is assumed that domU PCI Config access would be trapped in Xen. The
>>> RID at which assigned device is enumerated would be the one provided
>>> by the
>>> domctl, domctl_set_deviceid_mapping
>>>
>>> TODO: device assign domctl i/f.
>>> Note: This should suffice the virtual deviceID support pointed by Andre.
>>> [4]
>> Well, there's more to it. First thing: while I tried to include virtual
>> ITS deviceIDs to be different from physical ones, in the moment there
>> are fixed to being mapped 1:1 in the code.
> oh
>> So the first step would be to go over the ITS code and identify where
>> "devid" refers to a virtual deviceID and where to a physical one
>> (probably renaming them accordingly). Then we would need a function to
>> translate between the two. At the moment this would be a dummy function
>> (just return the input value). Later we would loop in the actual table.
> Some thought here..
> Wouldn't it be better to call a helper function to translate the devid
> coming from guest. The helper function
> would look at the table created by handling successive domctls (the one
> mentioned here)

Exactly.

>>> We might not need this domctl if assign_device hypercall is extended to
>>> provide this information.
>> Do we actually need a new interface or even extend the existing one?
>> If I got Julien correctly, the existing interface is just fine?
> Could you explain which existing interface  can be used to translate
> guest device ID to host device ID when an ITS command gets trapped in Xen.
> may be I am missing something here.

I haven't looked in detail, but will do.

>>> 5. Parsing of IORT in Xen
>>> --------------------------
>>> IORT nodes can be saved in structures so that IORT table parsing can be
>>> done once and is reused by all xen subsystems like ITS / SMMU etc,
>>> domain
>>> creation.
>>> Proposed are the structures to hold IORT information, very similar to
>>> ACPI
>>> structures.
>>>
>>> iort_id_map {
>>>      range_t input_range;
>>>      range_t output_range;
>>>      void *output_reference;
>>> ...
>>> }
>> I guess you would need a "struct list_head list" here to chain the
>> ranges?
> yes :). That was in ...
>>
>>> =>output_reference points to object of iort_node.
>>>
>>> struct iort_node {
>>>      struct list_head id_map;
>>>      void *context;
>>>      struct list_head list;
>>> }
>>> => context could be a reference to acpi_iort_node.
>>>
>>> struct iort_table_struct {
>>>      struct list_head pci_rc_nodes;
>>>      struct list_head smmu_nodes;
>>>      struct list_head plat_devices;
>>>      struct list_head its_group;
>>> }
>> So quickly brainstorming with Julien I was wondering if we could
>> actually simplify this significantly:
>>  From Xen's point of view all we need to know is the mapping between PCI
>> requestor IDs (or some platform device IDs) to the physical ITS device
>> ID, and from requestor IDs to the SMMU stream ID.
>> That would be just *two* lookup tables, not connected to each other
>> aside from possibly having the same input ranges. At this point we could
>> also have *one* table, containing both the ITS deviceID and the SMMU
>> stream ID:
>>
>> struct iort_id_map {
>>     range_t input_range;
>>     uint32_t its_devid_base;
>>     uint32_t smmu_streamid_base;
>>     struct list_head list;
>> };
> This is just a simpler case of what the spec supports.

Sure, and without the need for a *chained* mapping (RC->SMMU->ITS)
this is actually all we need (RC->ITS; RC->SMMU). I have the feeling
that this simplifies the code a lot.

> Each PCIRC node can have an array of idmaps, each id map entry can have
> a multiple of idmap entries in
> the output reference smmu idmap.
>
> I had a similar discussion with the v1 version of my IORT SMMU hidepatch
> with julien.
>
> Moreover I dont quite understand the where iort_id_map would fit.
> As if you see below reply, we have a other things to take care as well
>>
>> So parsing the IORT would create and fill a list of those structures.
>> For a lookup we would just iterate over that list, find a matching entry
>> and:
>> return (input_id - match->input_range.base) + match->its_devid_base;
>>
>> Ideally we abstract this via some functions, so that we can later swap
>> this for more efficient data structures should the need arise.
>>
>>> This structure is created at the point IORT table is parsed say from
>>> acpi_iort_init.
>>> It is proposed to use this structure information in
>>> iort_init_platform_devices.
>>> [2] [RFC v2 4/7] ACPI: arm: Support for IORT
> I guess you missed this part. Without this the  context is missed.
>
> The main purpose of this whole task is split into two parts
> a. IORT parsing should be done once in Xen and later whenever IORT
> parsing is required use xen internal data structures
>       (iort_table_struct)
>
>     - this would be helpful for (b) below
>     - and for SMMU / platform devices initialization.
>       If you see [2], it again parses the IORT.
>       So the approach here is  [2] and (b) should use same Xen internal
> IORT data structures.

I think I get this, I was just wondering why we would need a more a
less exact replication of the IORT, with the pointering complicating
things. I believe what we need is:
1) a mapping from a PCI-RC or PT-NC to stream IDs, for programming the
SMMU in Xen
2) a mapping from a PCI-RC or PT-NC to ITS devIDs, for programming the
ITS when being asked for by a guest (incl. Dom0)
I think that the IORT is a streamlined and optimized representation of
those mappings, which we don't necessarily need to replicate 1:1 in an
in-memory data structure.
But admittedly I haven't looked with too much details into this, so if
you convince me that we need this graph structure, then so be it.

> =>      For that reason [2]/[5]  might need to be rebased on this task's
> patch. <=
> [5] https://www.mail-archive.com/xen-devel@xxxxxxxxxxxxx/msg123080.html
>
> b. Generate IORT for Doms without patching Host IORT, rather regenerate
> from xen internal data structures.
>
> based on this rationale, I think the data structures mentioned would be
> required.
>
>>>
>>> 6. IORT Generation
>>> -------------------
>>> There would be a common code to generate IORT table from
>>> iort_table_struct.
>> That sounds useful, but we would need to be careful with sharing code
>> between Xen and the tool stack. Has this actually been done before?
> I added the code sharing part here, but I am not hopeful that this would
> work as it would require lot of code change on toolstack.
> A simple difference is that the acpi header structures have different
> member variables. This is same for other structures.
> So we might have to create a lot of defines in common code for sharing
> and possibility of errors.
>
> See: struct acpi_header in acpi2_0.h (tools/libacpi)
> and struct acpi_table_header in actbl.h (xen/include/acpi)
>
> That is why I preferred a domctl, so xen coud prepare IORT for DomU.

I don't this it's justified to move a simple table generation task
into Xen, just to allow code sharing. After all this does not require
any Xen internal knowledge. So it should be done definitely in the
toolstack.
I think we should follow Julien's suggestion of looking at xen/common/libelf.

Cheers,
Andre.

> If not code sharing then code duplication might also work (In that case
> no domctl required)
> We can discuss on this more...
>>
>>> a. For Dom0
>>>      the structure (iort_table_struct) be modified to remove smmu nodes
>>>      and update id_mappings.
>>>      PCIRC idmap -> output refrence to ITS group.
>>>      (RID -> DeviceID).
>>>
>>>      TODO: Describe algo in update_id_mapping function to map RID ->
>>> DeviceID used
>>>      in my earlier patch [3]
>> If the above approach works, this would become a simple list iteration,
>> creating PCI rc nodes with the appropriate pointer to the ITS nodes.
> Yes, it works. see [3]
>>
>>> b. For DomU
>>>      - iort_table_struct would have minimal 2 nodes (1 PCIRC and 1 ITS
>>> group)
>>>      - populate a basic IORT in a buffer passed by toolstack( using a
>>> domctl : domctl_prepare_dom_iort)
>> I think we should reduce this to iterating the same data structure as
>> for Dom0. Each pass-through-ed PCI device would possibly create one
>> struct instance, and later on we do the same iteration as we do for
>> Dom0. If that proves to be simple enough, we might even live with the
>> code duplication between Xen and the toolstack.
> Yes, thats the Idea. for domu and dom0 the IORT generation code would take
>
> iort_table_struct
> as input.
>
>>
>> Cheers,
>> Andre.
>>
>>>      - DSDT for the DomU is updated by toolstack to include a PCIRC.
>>>      - If a named component is added to domU that information is passed
>>> in the
>>>      same/additional domctl.
>>>          - <TODO: domctl_prepare_dom_iort i/f >
>>>      Note: Julien I have tried to incorporate your suggestion for code
>>> reuse.
>>>
>>> 7. References:
>>> -------------
>>> [0] https://www.mail-archive.com/xen-devel@xxxxxxxxxxxxx/msg121667.html
>>> [1] ARM DEN0049C:
>>> http://infocenter.arm.com/help/topic/com.arm.doc.den0049c/DEN0049C_IO_Remapping_Table.pdf
>>>
>>>
>>> [2] https://www.mail-archive.com/xen-devel@xxxxxxxxxxxxx/msg123082.html
>>> [3] https://www.mail-archive.com/xen-devel@xxxxxxxxxxxxx/msg121669.html:
>>> update_id_mapping function.
>>> [4] https://www.mail-archive.com/xen-devel@xxxxxxxxxxxxx/msg123434.html
>>>
>>>
>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.