[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 05/21] IOMMU/x86: restrict IO-APIC mappings for PV Dom0


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Wed, 4 May 2022 14:01:08 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=wqE6FbiLFOhEI0kMWqQB55i1XJ4lYeJnSAw3hJjs0qY=; b=WuxfI4+Gxax+xOL7Xsarn9UFog7sFtRl7XZh+Zgd072QWBnD3j2kiI4KA9u0i8ec0/68xdy6fd/4hv5HX/xI1K8GZaBfdxDwCYqGGkELSELnw2IatqpFhwQYSDlM3B1FgvN6ecEe4rEEZ2l1fZEpR2boRBgcxPoH2jHW3HFRtkTwXxcwz6FH4P/NxyeyZ1/f3W+wO+lpN5GOBvgCycxv84+IJYhO5zr6X3+YJO0WkdLn23pXFpEkSndZNMUd+CmUeQuuWvFaCCWyNha9FkkzJRLrS5l8T25++vQK8y2UjqtrTmIZM3uwBAgodNqp7mOy8lXRWeSpf9gMoRDrFHXuvQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=bkAPHt22NMIJx/QVZ9icgkGvKrqHjk/fRNta77S0IVI24iuXVNu2TKhC2I9SRSYw2hcKMiEBvy9Z4SMyIJ0WFpyjJW366IEEep4MvdUEbwlI91DouJPF1mTp9kZxey8yldu2F4cS6ojfl6i7H8bWSGUGl+pSEf4U0enDzIJL52h2Q8nkdYVBC1+QVcYN35nAzFmGfxr4fBOybJ3hRMYkGdvHnbytcx9ywldZ6mqhY09pwldqe5k7K0lP4kmDr1S6nHxJQKFYlgFuaSRzi+DMGiFBtJVDdhWZjOR4o8D+lWg8i8hbwLHRjDMm6EM0Owub4SDyfwQKQ9Szcc3AiERpYQ==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Paul Durrant <paul@xxxxxxx>
  • Delivery-date: Wed, 04 May 2022 12:01:21 +0000
  • Ironport-data: A9a23:/8vTMKutygN25hroqjIeDUruyufnVCBfMUV32f8akzHdYApBsoF/q tZmKWvUPPuMN2b0KIp1a4vgo05Xu5bVz4BqT1Nvq3g8QXlA+JbJXdiXEBz9bniYRiHhoOOLz Cm8hv3odp1coqr0/0/1WlTZhSAgk/nOHNIQMcacUsxLbVYMpBwJ1FQywobVvqYy2YLjW17V5 YuoyyHiEATNNwBcYzp8B52r8HuDjNyq0N/PlgVjDRzjlAa2e0g9VPrzF4noR5fLatA88tqBb /TC1NmEElbxpH/BPD8HfoHTKSXmSpaKVeSHZ+E/t6KK2nCurQRquko32WZ1he66RFxlkvgoo Oihu6BcRi8PeYDug98kcSNUGh5aJqBg8aP+PFSg5Jn7I03uKxMAwt1IJWRvZMg037gyBmtDs /sFNDoKcxaPwfqsx662QfVtgcJlK9T3OIQYuTdryjSx4fQOGMifBfmVo4IHmm5v3KiiHt6HD yYdQSBoYxnaJQVGJ38cCY4knffujX76G9FdgA3N+PRpuTeIpOB3+La3GdDccdm1edVynByJq lDdzmjnKChPYbRzzhLAqBpAnNTnnyn2RYYTH72Q7eNxjRuYwWl7IA0bUx63rOe0jma6WslDM AoE9yw2t68w+Ue3CN7nUHWQuHeZujYMVtwWFPc1gDxh0YLR6gedQ2QBEDhIbYV/sNdsHGNwk FiUg9nuGDpj9qWPTm6Q/auVqjX0PjUJKWgFZmkPSg5tD8TfnbzfRynnFr5LeJNZRPWscd0s6 1hmdBQDuog=
  • Ironport-hdrordr: A9a23:MxkFRK44Ewq53WoaGwPXwVqBI+orL9Y04lQ7vn2ZFiY5TiXIra qTdaogviMc6Ax/ZJjvo6HkBEClewKlyXcT2/hrAV7CZniehILMFu1fBOTZowEIdxeOldK1kJ 0QCZSWa+eAcmSS7/yKhzVQeuxIqLfnzEnrv5a5854Ed3AXV0gK1XYcNu/0KDwVeOEQbqBJaa Z0q/A37gaISDAyVICWF3MFV+/Mq5nik4/nWwcPA1oC5BOVhT2lxbbmG1zAty1uGA9n8PMHyy zoggb57qKsv7WSzQLd7Xba69BzlMH6wtVOKcSQgow+KynqiCyveIN9Mofy9AwdkaWK0hIHgd PMqxAvM4Ba7G7QRHi8pV/X1wzpwF8Vmgvf4G7dpUGmjd3yRTo8BcYEr5leaAHl500pu8w5+L 5X3kqC3qAnQi/orWDY3ZzlRhtqnk27rT4JiugIlUFSVoMYdft4sZEfxkVIC50NdRiKpLzPKN MeTf002cwmMW9zNxvizypSKZ2XLzkO9y69MwY/Upf/6UkVoJh7p3FosfD30E1wsa7VcKM0lt gsAp4Y6o2mcfVmHZ6VJN1xNvdfWVa9Ny4lDgqpUCfaPZBCHU7xgLjKx5hwzN2WWfUzvekPcd L6IRlliVI=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Wed, May 04, 2022 at 12:51:25PM +0200, Jan Beulich wrote:
> On 04.05.2022 12:30, Roger Pau Monné wrote:
> > On Wed, May 04, 2022 at 11:32:51AM +0200, Jan Beulich wrote:
> >> On 03.05.2022 16:50, Jan Beulich wrote:
> >>> On 03.05.2022 15:00, Roger Pau Monné wrote:
> >>>> On Mon, Apr 25, 2022 at 10:34:23AM +0200, Jan Beulich wrote:
> >>>>> While already the case for PVH, there's no reason to treat PV
> >>>>> differently here, though of course the addresses get taken from another
> >>>>> source in this case. Except that, to match CPU side mappings, by default
> >>>>> we permit r/o ones. This then also means we now deal consistently with
> >>>>> IO-APICs whose MMIO is or is not covered by E820 reserved regions.
> >>>>>
> >>>>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
> >>>>> ---
> >>>>> [integrated] v1: Integrate into series.
> >>>>> [standalone] v2: Keep IOMMU mappings in sync with CPU ones.
> >>>>>
> >>>>> --- a/xen/drivers/passthrough/x86/iommu.c
> >>>>> +++ b/xen/drivers/passthrough/x86/iommu.c
> >>>>> @@ -275,12 +275,12 @@ void iommu_identity_map_teardown(struct
> >>>>>      }
> >>>>>  }
> >>>>>  
> >>>>> -static bool __hwdom_init hwdom_iommu_map(const struct domain *d,
> >>>>> -                                         unsigned long pfn,
> >>>>> -                                         unsigned long max_pfn)
> >>>>> +static unsigned int __hwdom_init hwdom_iommu_map(const struct domain 
> >>>>> *d,
> >>>>> +                                                 unsigned long pfn,
> >>>>> +                                                 unsigned long max_pfn)
> >>>>>  {
> >>>>>      mfn_t mfn = _mfn(pfn);
> >>>>> -    unsigned int i, type;
> >>>>> +    unsigned int i, type, perms = IOMMUF_readable | IOMMUF_writable;
> >>>>>  
> >>>>>      /*
> >>>>>       * Set up 1:1 mapping for dom0. Default to include only 
> >>>>> conventional RAM
> >>>>> @@ -289,44 +289,60 @@ static bool __hwdom_init hwdom_iommu_map
> >>>>>       * that fall in unusable ranges for PV Dom0.
> >>>>>       */
> >>>>>      if ( (pfn > max_pfn && !mfn_valid(mfn)) || xen_in_range(pfn) )
> >>>>> -        return false;
> >>>>> +        return 0;
> >>>>>  
> >>>>>      switch ( type = page_get_ram_type(mfn) )
> >>>>>      {
> >>>>>      case RAM_TYPE_UNUSABLE:
> >>>>> -        return false;
> >>>>> +        return 0;
> >>>>>  
> >>>>>      case RAM_TYPE_CONVENTIONAL:
> >>>>>          if ( iommu_hwdom_strict )
> >>>>> -            return false;
> >>>>> +            return 0;
> >>>>>          break;
> >>>>>  
> >>>>>      default:
> >>>>>          if ( type & RAM_TYPE_RESERVED )
> >>>>>          {
> >>>>>              if ( !iommu_hwdom_inclusive && !iommu_hwdom_reserved )
> >>>>> -                return false;
> >>>>> +                perms = 0;
> >>>>>          }
> >>>>> -        else if ( is_hvm_domain(d) || !iommu_hwdom_inclusive || pfn > 
> >>>>> max_pfn )
> >>>>> -            return false;
> >>>>> +        else if ( is_hvm_domain(d) )
> >>>>> +            return 0;
> >>>>> +        else if ( !iommu_hwdom_inclusive || pfn > max_pfn )
> >>>>> +            perms = 0;
> >>>>>      }
> >>>>>  
> >>>>>      /* Check that it doesn't overlap with the Interrupt Address Range. 
> >>>>> */
> >>>>>      if ( pfn >= 0xfee00 && pfn <= 0xfeeff )
> >>>>> -        return false;
> >>>>> +        return 0;
> >>>>>      /* ... or the IO-APIC */
> >>>>> -    for ( i = 0; has_vioapic(d) && i < d->arch.hvm.nr_vioapics; i++ )
> >>>>> -        if ( pfn == PFN_DOWN(domain_vioapic(d, i)->base_address) )
> >>>>> -            return false;
> >>>>> +    if ( has_vioapic(d) )
> >>>>> +    {
> >>>>> +        for ( i = 0; i < d->arch.hvm.nr_vioapics; i++ )
> >>>>> +            if ( pfn == PFN_DOWN(domain_vioapic(d, i)->base_address) )
> >>>>> +                return 0;
> >>>>> +    }
> >>>>> +    else if ( is_pv_domain(d) )
> >>>>> +    {
> >>>>> +        /*
> >>>>> +         * Be consistent with CPU mappings: Dom0 is permitted to 
> >>>>> establish r/o
> >>>>> +         * ones there, so it should also have such established for 
> >>>>> IOMMUs.
> >>>>> +         */
> >>>>> +        for ( i = 0; i < nr_ioapics; i++ )
> >>>>> +            if ( pfn == PFN_DOWN(mp_ioapics[i].mpc_apicaddr) )
> >>>>> +                return rangeset_contains_singleton(mmio_ro_ranges, pfn)
> >>>>> +                       ? IOMMUF_readable : 0;
> >>>>
> >>>> If we really are after consistency with CPU side mappings, we should
> >>>> likely take the whole contents of mmio_ro_ranges and d->iomem_caps
> >>>> into account, not just the pages belonging to the IO-APIC?
> >>>>
> >>>> There could also be HPET pages mapped as RO for PV.
> >>>
> >>> Hmm. This would be a yet bigger functional change, but indeed would 
> >>> further
> >>> improve consistency. But shouldn't we then also establish r/w mappings for
> >>> stuff in ->iomem_caps but not in mmio_ro_ranges? This would feel like 
> >>> going
> >>> too far ...
> >>
> >> FTAOD I didn't mean to say that I think such mappings shouldn't be there;
> >> I have been of the opinion that e.g. I/O directly to/from the linear
> >> frame buffer of a graphics device should in principle be permitted. But
> >> which specific mappings to put in place can imo not be derived from
> >> ->iomem_caps, as we merely subtract certain ranges after initially having
> >> set all bits in it. Besides ranges not mapping any MMIO, even something
> >> like the PCI ECAM ranges (parts of which we may also force to r/o, and
> >> which we would hence cover here if I followed your suggestion) are
> >> questionable in this regard.
> > 
> > Right, ->iomem_caps is indeed too wide for our purpose.  What
> > about using something like:
> > 
> > else if ( is_pv_domain(d) )
> > {
> >     if ( !iomem_access_permitted(d, pfn, pfn) )
> >         return 0;
> 
> We can't return 0 here (as RAM pages also make it here when
> !iommu_hwdom_strict), so I can at best take this as a vague outline
> of what you really mean. And I don't want to rely on RAM pages being
> (imo wrongly) represented by set bits in Dom0's iomem_caps.

Well, yes, my suggestion was taking into account that ->iomem_caps for
dom0 has mostly holes for things that shouldn't be mapped, but
otherwise contains everything else as allowed (including RAM).

We could instead do:

else if ( is_pv_domain(d) && type != RAM_TYPE_CONVENTIONAL )
{
    ...

So that we don't rely on RAM being 'allowed' in ->iomem_caps?

> >     if ( rangeset_contains_singleton(mmio_ro_ranges, pfn) )
> >         return IOMMUF_readable;
> > }
> > 
> > That would get us a bit closer to allowed CPU side mappings, and we
> > don't need to special case IO-APIC or HPET addresses as those are
> > already added to ->iomem_caps or mmio_ro_ranges respectively by
> > dom0_setup_permissions().
> 
> This won't fit in a region of code framed by a (split) comment
> saying "Check that it doesn't overlap with ...". Hence if anything
> I could put something like this further down. Yet even then the
> question remains what to do with ranges which pass
> iomem_access_permitted() but
> - aren't really MMIO,
> - are inside MMCFG,
> - are otherwise special.
> 
> Or did you perhaps mean to suggest something like
> 
> else if ( is_pv_domain(d) && iomem_access_permitted(d, pfn, pfn) &&
>           rangeset_contains_singleton(mmio_ro_ranges, pfn) )
>     return IOMMUF_readable;

I don't think this would be fully correct, as we would still allow
mappings of IO-APIC pages explicitly banned in ->iomem_caps by not
handling those?

> ? Then there would only remain the question of whether mapping r/o
> MMCFG pages is okay (I don't think it is), but that could then be
> special-cased similar to what's done further down for vPCI (by not
> returning in the "else if", but merely updating "perms").

Well part of the point of this is to make CPU and Device mappings
more similar.  I don't think devices have any business in poking at
the MMCFG range, so it's fine to explicitly ban that range.  But I
would have also said the same for IO-APIC pages, so I'm unsure why are
IO-APIC pages fine to be mapped RO, but not the MMCFG range.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.