[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Unshared IOMMU issues

>>> On 16.02.17 at 19:09, <julien.grall@xxxxxxx> wrote:
> On 16/02/17 16:34, Jan Beulich wrote:
>>>>> On 16.02.17 at 17:11, <julien.grall@xxxxxxx> wrote:
>>> On 16/02/17 15:52, Jan Beulich wrote:
>>>>>>> On 16.02.17 at 16:02, <olekstysh@xxxxxxxxx> wrote:
>>>>> On Thu, Feb 16, 2017 at 11:36 AM, Jan Beulich <JBeulich@xxxxxxxx> wrote:
>>>>>>>>> On 15.02.17 at 18:43, <olekstysh@xxxxxxxxx> wrote:
>>>>>>> 1.
>>>>>>> I need:
>>>>>>> Allow P2M core on ARM to update IOMMU mapping from the first 
>>>>>>> "p2m_set_entry".
>>>>>>> I do:
>>>>>>> I explicitly set need_iommu flag for *every* guest domain during
>>>>>>> iommu_domain_init() on ARM in case if page table is not shared.
>>>>>>> At that moment I have no knowledge about will any device be assigned
>>>>>>> to this domain or not. I am just want to receive all mapping updates
>>>>>>> from P2M code. The P2M will update IOMMU mapping only when need_iommu
>>>>>>> is set and page table is not shared.
>>>>>>> I have doubts:
>>>>>>> Is it correct to just force need_iommu flag?
>>>>>> No, I don't think so. This is a waste of a measurable amount of
>>>>>> resources when page tables aren't shared.
>>>>>>> Or maybe another flag should be introduced?
>>>>>> Not sure what you think of here. Where's the problem with building
>>>>>> IOMMU page tables at the time the first device gets assigned, just
>>>>>> like x86 does?
>>>>> OK, I have already had a look at  arch_iommu_populate_page_table() for 
>>>>> x86.
>>>>> I don't know at the moment how this solution can help me.
>>>>> There are a least two points the prevent me from doing the similar thing.
>>>>> 1. For create IOMMU mapping I need both mfn and gfn. (+ flags).
>>>>> I am able to get mfn only. How can I find corresponding gfn?
>>>> As the x86 one shows, via mfn_to_gmfn(). If ARM doesn't have
>>>> this, perhaps it needs to gain it?
>>> Looking at the x86 implementation, mfn_to_gmfn is using a table for that
>>> indexed by the MFN. This is requiring virtual address space that is
>>> already scarce on ARM32 and also using physical memory.
>>> I am not convinced this is the right things to do on ARM as the only
>>> user so far will be the IOMMU code.
>>> Another solution would be to go through the stage-2 page table and
>>> replicate all the mappings.
>> That's certainly an option, if you want to save the memory (and
>> VA space on ARM32). It only makes the x86 model of establishing
>> the mappings slightly more compute intensive.
> I made a quick calculation, ARM32 supports up 40-bit PA and IPA (e.g 
> guest address), which means 28-bits of MFN/GFN. The GFN would have to be 
> stored in a 32-bit for alignment, so we would need 2^28 * 4 = 1GiB of 
> virtual address space and potentially physical memory.
> We don't have 1GB of VA space free on 32-bit right now.

Right, you'd have to pay a performance price here. Either, as you
say, by looking the translations up from the stage-2 tables, or by
using some on demand mapping scheme for the table here.

> ARM64 currently supports up to 48-bit PA and 48-bit IPA, which means 
> 36-bits of MFN/GFN. The GFN would have to be stored in 64-bit for 
> alignment, so we would need 2^36 * 8 = 512GiB of virtual address space 
> and potentially physical memory. While virtual address space is not a 
> problem, the memory is a problem for embedded platform. We want Xen to 
> be as lean as possible.

Which then leaves the stage-2 table lookup as the only option. Of
course one might consider a hybrid model - memory constrained
systems could go the stage-2 table lookup route, but an larger
systems the cheap direct table lookup could be used.

> I though a bit more on the advantage to create the IOMMU page tables 
> later on.
> For devices assigned at domain creation, we know that devices will be 
> assigned so we could let Xen and populated IOMMU while allocating the 
> memory for the domain.
> For hotplug devices, this would only happen for PCI as integrated 
> devices cannot be hotplug. As we go towards emulating a root complex in 
> Xen rather than the PV approach, you would need the root complex to be 
> instantiated when the domain is created (unless we want to hotplug 
> too?). IHMO, if you assign a root complex is likely that you will want 
> to assign a PCI afterwards. So allocating page tables at that time 
> sounds sensible.
> This would avoid to walk the stage-2 page tables at runtime.

Well, in the end it's your call, but I don't think this is an acceptable
model in the general case. Quite often - see the Viridian support in
x86 Xen for a very good example - distros (XenServer in this case)
enable functionality even if a guest (Linux in the case here) would
never really want to make use of it. Also you need to keep in mind
that for an admin it is better to not have to take care of all
eventualities before first starting a (perhaps long running) guest.
Granted we have a number of other limitations of that same kind,
but if such can be avoided, I'd always prefer to do so.

>>>>> 2. The d->page_list seems only contains domain RAM (not 100% sure).
>>>>> Where can I get other regions (mmios, etc)?
>>>> These necessarily are being tracked for the domain, so you need to
>>>> take them from wherever they're stored on ARM.
>>> Is there any reason why you don't seem to have such code on x86? AFAICT
>>> only RAM is currently mapped.
>> Well, no-one care so far, I would guess. Even runtime mappings of
>> MMIO space were mad work properly only very recently (by Roger).
>>> Regarding ARM, we know whether a domain is allowed to access a certain
>>> range of MMIO, but, similarly to above, we don't have the conversion MFN
>>> -> GFN for them. However in this case, we would not be able to use an
>>> M2P as a same MFN may be mapped in multiple domain.
>> Mapped by multiple domains? If one DomU and Dom0, I can see
>> this as possible, but not a requirement. If multiple DomU-s I have
>> to raise the question of security.
> The interrupt controller GICv2 supports virtualization and allow the 
> guest to manage interrupt as it was running on baremetal. There is a 
> per-CPU interface that is mapped on every domain. Obviously, the state 
> is saved/restored during vCPU context switch.

Now that looks like a very special case, which the code doing the
mapping could (and should) be aware of. Quite likely this area
even gets mapped at a predetermined GFN (range) for guests
(in which case no lookup is necessary at all)?


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.