[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [XEN RFC PATCH v4 2/5] docs/designs: Add a design document for IOMMU subsystem redesign



Hi,

I have absolutely no idea how much of the code is yours, vs what was there
before. I'm just going to infodump as I go and hope that proves helpful in some
way.

On Mon Nov 4, 2024 at 2:28 PM GMT, Teddy Astie wrote:
> Current IOMMU subsystem has some limitations that make PV-IOMMU practically 
> impossible.
> One of them is the assumtion that each domain is bound to a single "IOMMU 
> domain", which
> also causes complications with quarantine implementation.
>
> Moreover, current IOMMU subsystem is not entirely well-defined, for instance, 
> the behavior
> of map_page between ARM SMMUv3 and x86 VT-d/AMD-Vi greatly differs. On ARM, 
> it can modifies
> the domain page table while on x86, it may be forbidden (e.g using HAP with 
> PVH), or only
> modifying the devices PoV (e.g using PV).
>
> The goal of this redesign is to define more explicitely the behavior and 
> interface of the
> IOMMU subsystem while allowing PV-IOMMU to be effectively implemented.
>
> Signed-off-by Teddy Astie <teddy.astie@xxxxxxxxxx>
> ---
> Changed in V2:
> * nit s/dettach/detach/
>
> Changed in v4:
> * updated for iommu_context locking changes
> ---
>  docs/designs/iommu-contexts.md | 403 +++++++++++++++++++++++++++++++++
>  1 file changed, 403 insertions(+)
>  create mode 100644 docs/designs/iommu-contexts.md
>
> diff --git a/docs/designs/iommu-contexts.md b/docs/designs/iommu-contexts.md
> new file mode 100644
> index 0000000000..9d6fb95549
> --- /dev/null
> +++ b/docs/designs/iommu-contexts.md
> @@ -0,0 +1,403 @@
> +# IOMMU context management in Xen
> +
> +Status: Experimental
> +Revision: 0
> +
> +# Background
> +
> +The design for *IOMMU paravirtualization for Dom0* [1] explains that some 
> guests may

nit: imo, either "explains why" or "states that".

> +want to access to IOMMU features. In order to implement this in Xen, several 
> adjustments
> +needs to be made to the IOMMU subsystem.

nit: s/needs/need

> +
> +This "hardware IOMMU domain" is currently implemented on a per-domain basis 
> such as each

The "hardware IOMMU domain" is briefly mentioned in the commit message, but
wants defining before assuming it's known.

> +domain actually has a specific *hardware IOMMU domain*, this design aims to 
> allow a
> +single Xen domain to manage several "IOMMU context", and allow some domains 
> (e.g Dom0
> +[1]) to modify their IOMMU contexts.

Without an adequate definition of "hardware IOMMU domain", I really don't
understand this paragraph. I don't know if it's explaining what we have or what
we want to have.

> +
> +In addition to this, quarantine feature can be refactored into using IOMMU 
> contexts

nit: s/ quarantine/ the quarantine

Also, "can" or "will be"?

> +to reduce the complexity of platform-specific implementations and ensuring 
> more
> +consistency across platforms.
> +
> +# IOMMU context
> +
> +We define a "IOMMU context" as being a *hardware IOMMU domain*, but named as 
> a context
> +to avoid confusion with Xen domains.

"hardware IOMMU domain" was never defined in the first place though. Are they
domains in the more abstract sense then?

If so, I definitely prefer the "context" name and it'd be very helpful to have
a note at beginning of the design note stating that a hardware IOMMU domain in
the old nomenclature is conceptually the same as an IOMMU context in the new
one.

> +It represents some hardware-specific data structure that contains mappings 
> from a device
> +frame-number to a machine frame-number (e.g using a pagetable) that can be 
> applied to

e.g? or i.e? Is there any other way besides page tables in the intended design?

> +a device using IOMMU hardware.

nit: from "device frame numberS"

> +
> +This structure is bound to a Xen domain, but a Xen domain may have several 
> IOMMU context.

nit: s/context/contexts

Is "this structure" a context?

> +These contexts may be modifiable using the interface as defined in [1] aside 
> some
> +specific cases (e.g modifying default context).
> +
> +This is implemented in Xen as a new structure that will hold context-specific
> +data.

nit: Either "This is... that holds..." or "This will be... that will be".

> +
> +```c
> +struct iommu_context {
> +    u16 id; /* Context id (0 means default context) */

s/u16/uint16_t/ here and elsewhere, but...

also, seeing how this is a new ABI, I'd avoid mistakes of the past and go for a
64bit number. That way it'll never roll over in this universe's lifetime and
we'll never have to care about how to track used context IDs.

Are there any hot paths that take it? (I'd hope not). Any reason not to widen?

> +    struct list_head devices;
> +
> +    struct arch_iommu_context arch;
> +
> +    bool opaque; /* context can't be modified nor accessed (e.g HAP) */
> +};
> +```
> +
> +A context is identified by a number that is domain-specific and may be used 
> by IOMMU
> +users such as PV-IOMMU by the guest.

nit: are there any other users? The identifier _is_ a PV-IOMMU context id,
right?

> +
> +struct arch_iommu_context is splited from struct arch_iommu

nit: s/is splitted/is split/

> +
> +```c
> +struct arch_iommu_context
> +{

Lots to say about the way data seems arranged, but I suspect I'll see it better
in context (pun intended) in PoC later on.

> +    spinlock_t pgtables_lock;
> +    struct page_list_head pgtables;
> +
> +    union {
> +        /* Intel VT-d */
> +        struct {
> +            uint64_t pgd_maddr; /* io page directory machine address */
> +            domid_t *didmap; /* per-iommu DID */
> +            unsigned long *iommu_bitmap; /* bitmap of iommu(s) that the 
> context uses */
> +        } vtd;
> +        /* AMD IOMMU */
> +        struct {
> +            struct page_info *root_table;
> +        } amd;
> +    };
> +};
> +
> +struct arch_iommu
> +{
> +    spinlock_t mapping_lock; /* io page table lock */
> +    struct {
> +        struct page_list_head list;
> +        spinlock_t lock;
> +    } pgtables;
> +
> +    struct list_head identity_maps;
> +
> +    union {
> +        /* Intel VT-d */
> +        struct {
> +            /* no more context-specific values */
> +            unsigned int agaw; /* adjusted guest address width, 0 is level 2 
> 30-bit */
> +        } vtd;
> +        /* AMD IOMMU */
> +        struct {
> +            unsigned int paging_mode;
> +            struct guest_iommu *g_iommu;
> +        } amd;
> +    };
> +};
> +```
> +
> +IOMMU context information is now carried by iommu_context rather than being 
> integrated to
> +struct arch_iommu.

That's good. I wonder about the concurrency story though, as every context in a
domain is gated by a single lock. Presumably we could make it per-context
instead? There's parallelism already on the IOMMU when operating on contexts of
different domains, so I'm not sure why the limitation. Or whether there's a
restriction we're missing and we should have the lock be per IOMMU instead.

> +
> +# Xen domain IOMMU structure
> +
> +`struct domain_iommu` is modified to allow multiples context within a single 
> Xen domain

nit: s/multiples context/multiple contexts/

> +to exist :
> +
> +```c
> +struct iommu_context_list {
> +    uint16_t count; /* Context count excluding default context */
> +
> +    /* if count > 0 */
> +
> +    uint64_t *bitmap; /* bitmap of context allocation */
> +    struct iommu_context *map; /* Map of contexts */
> +};
> +
> +struct domain_iommu {
> +    /* ... */
> +
> +    struct iommu_context default_ctx;
> +    struct iommu_context_list other_contexts;
> +
> +    /* ... */
> +}
> +```
> +
> +default_ctx is a special context with id=0 that holds the page table mapping 
> the entire
> +domain, which basically preserve the previous behavior. All devices are 
> expected to be
> +bound to this context during initialization.

nit: "identity" seems better termed

> +
> +Along with this default context that always exist, we use a pool of contexts 
> that has a
> +fixed size at domain initialization, where contexts can be allocated (if 
> possible), and
> +have a id matching their position in the map (considering that id != 0).
> +These contexts may be used by IOMMU contexts users such as PV-IOMMU or 
> quarantine domain
> +(DomIO).
> +

... I'll leave it here for the time being. I need time to digest the rest.

Cheers,
Alejandro



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.