[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 06/18] IOMMU/x86: restrict IO-APIC mappings for PV Dom0



On 02/12/2021 15:28, Jan Beulich wrote:
> On 02.12.2021 16:12, Roger Pau Monné wrote:
>> On Wed, Dec 01, 2021 at 12:45:12PM +0100, Jan Beulich wrote:
>>> On 01.12.2021 11:32, Roger Pau Monné wrote:
>>>> On Wed, Dec 01, 2021 at 10:27:21AM +0100, Jan Beulich wrote:
>>>>> On 01.12.2021 10:09, Roger Pau Monné wrote:
>>>>>> On Fri, Sep 24, 2021 at 11:46:57AM +0200, Jan Beulich wrote:
>>>>>>> @@ -267,44 +267,60 @@ static bool __hwdom_init hwdom_iommu_map
>>>>>>>       * that fall in unusable ranges for PV Dom0.
>>>>>>>       */
>>>>>>>      if ( (pfn > max_pfn && !mfn_valid(mfn)) || xen_in_range(pfn) )
>>>>>>> -        return false;
>>>>>>> +        return 0;
>>>>>>>  
>>>>>>>      switch ( type = page_get_ram_type(mfn) )
>>>>>>>      {
>>>>>>>      case RAM_TYPE_UNUSABLE:
>>>>>>> -        return false;
>>>>>>> +        return 0;
>>>>>>>  
>>>>>>>      case RAM_TYPE_CONVENTIONAL:
>>>>>>>          if ( iommu_hwdom_strict )
>>>>>>> -            return false;
>>>>>>> +            return 0;
>>>>>>>          break;
>>>>>>>  
>>>>>>>      default:
>>>>>>>          if ( type & RAM_TYPE_RESERVED )
>>>>>>>          {
>>>>>>>              if ( !iommu_hwdom_inclusive && !iommu_hwdom_reserved )
>>>>>>> -                return false;
>>>>>>> +                perms = 0;
>>>>>>>          }
>>>>>>> -        else if ( is_hvm_domain(d) || !iommu_hwdom_inclusive || pfn > 
>>>>>>> max_pfn )
>>>>>>> -            return false;
>>>>>>> +        else if ( is_hvm_domain(d) )
>>>>>>> +            return 0;
>>>>>>> +        else if ( !iommu_hwdom_inclusive || pfn > max_pfn )
>>>>>>> +            perms = 0;
>>>>>> I'm confused about the reason to set perms = 0 instead of just
>>>>>> returning here. AFAICT perms won't be set to any other value below,
>>>>>> so you might as well just return 0.
>>>>> This is so that ...
>>>>>
>>>>>>>      }
>>>>>>>  
>>>>>>>      /* Check that it doesn't overlap with the Interrupt Address Range. 
>>>>>>> */
>>>>>>>      if ( pfn >= 0xfee00 && pfn <= 0xfeeff )
>>>>>>> -        return false;
>>>>>>> +        return 0;
>>>>>>>      /* ... or the IO-APIC */
>>>>>>> -    for ( i = 0; has_vioapic(d) && i < d->arch.hvm.nr_vioapics; i++ )
>>>>>>> -        if ( pfn == PFN_DOWN(domain_vioapic(d, i)->base_address) )
>>>>>>> -            return false;
>>>>>>> +    if ( has_vioapic(d) )
>>>>>>> +    {
>>>>>>> +        for ( i = 0; i < d->arch.hvm.nr_vioapics; i++ )
>>>>>>> +            if ( pfn == PFN_DOWN(domain_vioapic(d, i)->base_address) )
>>>>>>> +                return 0;
>>>>>>> +    }
>>>>>>> +    else if ( is_pv_domain(d) )
>>>>>>> +    {
>>>>>>> +        /*
>>>>>>> +         * Be consistent with CPU mappings: Dom0 is permitted to 
>>>>>>> establish r/o
>>>>>>> +         * ones there, so it should also have such established for 
>>>>>>> IOMMUs.
>>>>>>> +         */
>>>>>>> +        for ( i = 0; i < nr_ioapics; i++ )
>>>>>>> +            if ( pfn == PFN_DOWN(mp_ioapics[i].mpc_apicaddr) )
>>>>>>> +                return rangeset_contains_singleton(mmio_ro_ranges, pfn)
>>>>>>> +                       ? IOMMUF_readable : 0;
>>>>>>> +    }
>>>>> ... this return, as per the comment, takes precedence over returning
>>>>> zero.
>>>> I see. This is because you want to map those in the IOMMU page tables
>>>> even if the IO-APIC ranges are outside of a reserved region.
>>>>
>>>> I have to admit this is kind of weird, because the purpose of this
>>>> function is to add mappings for all memory below 4G, and/or for all
>>>> reserved regions.
>>> Well, that was what it started out as. The purpose here is to be consistent
>>> about IO-APICs: Either have them all mapped, or none of them. Since we map
>>> them in the CPU page tables and since Andrew asked for the two mappings to
>>> be consistent, this is the only way to satisfy the requests. Personally I'd
>>> be okay with not mapping IO-APICs here (but then regardless of whether they
>>> are covered by a reserved region).
>> I'm unsure of the best way to deal with this, it seems like both
>> the CPU and the IOMMU page tables would never be equal for PV dom0,
>> because we have no intention to map the MSI-X tables in RO mode in the
>> IOMMU page tables.
>>
>> I'm not really opposed to having the IO-APIC mapped RO in the IOMMU
>> page tables, but I also don't see much benefit of doing it unless we
>> have a user-case for it. The IO-APIC handling in PV is already
>> different from native, so I would be fine if we add a comment noting
>> that while the IO-APIC is mappable to the CPU page tables as RO it's
>> not present in the IOMMU page tables (and then adjust hwdom_iommu_map
>> to prevent it's mapping).
> Andrew, you did request both mappings to get in sync - thoughts?

Lets step back to first principles.

On real hardware, there is no such thing as read-only-ness of the
physical address space.  Anything like that is a device which accepts
and discards writes.

It's not clear what a real hardware platform would do in this scenario,
but from reading some of the platform docs, I suspect the System Address
Decoder would provide a symmetric view of the hardware address space,
but this doesn't mean that UBOX would tolerate memory accesses uniformly
from all sources.  Also, there's nothing to say that all platforms
behave the same.


For HVM with shared-pt, the CPU and IOMMU mappings really are
identical.  The IOMMU really will get a read-only mapping of real MMCFG,
and holes for fully-emulated devices, which would suffer a IOMMU fault
if targetted.

For HVM without shared-pt, the translations are mostly kept in sync, but
the permissions in the CPU mappings may be reduced for e.g. logdirty
reasons.

For PV guests, things are mostly like the HVM shared-pt case, except
we've got the real IO-APICs mapped read-only, and no fully-emulated devices.


Putting the real IO-APICs in the IOMMU is about as short sighted as
letting the PV guest see them to begin with, but there is nothing
fundamentally wrong with letting a PV guest do a DMA read of the
IO-APIC, seeing as we let it do a CPU read.  (And whether the platform
will even allow it, is a different matter.)


However, it is really important for there to not be a load of special
casing (all undocumented, naturally) keeping the CPU and IOMMU views
different.  It is an error that the views were ever different
(translation wise), and the only legitimate permission difference I can
think of is to support logdirty mode for migration.  (Introspection
protection for device-enabled VMs will be left as an exercise to
whomever first wants to use it.)

Making the guest physical address space view consistent between the CPU
and device is a "because its obviously the correct thing to do" issue. 
Deciding "well it makes no sense for you to have an IO mapping of $FOO"
is a matter of policy that Xen has no legitimate right to be enforcing.

~Andrew



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.