[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 02/18] VT-d: have callers specify the target level for page table walks


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Fri, 24 Sep 2021 16:45:13 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=dDQWpOYS7KFKCF6NT5OFbYwfuoN6iff1ZjYEVf9qt2s=; b=F9+660ItLvAM4WcUuClnAHbv/NDbGOY7BKqfX7HI1KgTAAoE9pKAaE9/gxR3BBjCFYnQUYdM46nbatAUcWzaGTBYeMcNbPzGZCyCuMnAN97pnw069XgXpgnNwsKnNHVfVqdhP0ZDk56Kk4oOf6ej6Aoifvk4tVnvewLOf8TGeYRM9wRAIh+xFJmvJZMTgaaXHOCDkL2n0qHXAI6ztmE71rgK/clK9Hz+y3OZO5z7oelHqfXuJeyxJ5CEJxjumGzACz5YggyLyhBfAhEzjy7xWlHBE9oZjNzsFsQC/PbnH+XvEEgix/8nPV2jb0uy+SESoareGYt49mm6amD7SeNcng==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ESjf9uV81iq9sG523Ttlli6B43SRoVYawd0tTwjkzEcRPTPSi8BInrjqa543TTNsxhfVqxvGdrS1XRGiQX4Nnj7gQPkv+9NEwieiYzo0Z1l5WWffj2AERIsq9y4rGAcTnHhbCTgxTe5ZGT2yHHYc6/DXeObCZcgPsnBH1gAclmzt/pOeg9q4yFZfjRXknTnSQe+nZI+sD0F87g+3RRbQJp9PSHg33vVXqoGqMEyKtT47zLLdC6BGA7/XhTH9Z08JjGiPJbl+hIxnvP0gb1O9VIYazj8K5/k/2IbOBzOOzL4q58IlZAQA90fW1AbGwuZuU2sbERq83UH0QULbA0CUyQ==
  • Authentication-results: esa2.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, "Andrew Cooper" <andrew.cooper3@xxxxxxxxxx>, Paul Durrant <paul@xxxxxxx>, Kevin Tian <kevin.tian@xxxxxxxxx>
  • Delivery-date: Fri, 24 Sep 2021 14:45:39 +0000
  • Ironport-data: A9a23:7M/Y9q+TZQLkypaVWYdBDrUDcXmTJUtcMsCJ2f8bNWPcYEJGY0x3x mZLWWnSOP+JazeheopyYIq08hlV65bUyddlTAJlqig8E34SpcT7XtnIdU2Y0wF+jyHgoOCLy +1EN7Es+ehtFie0Si9AttENlFEkvU2ybuOU5NXsZ2YhGGeIdA970Ug6w79j3tYy6TSEK1jlV e3a8pW31GCNg1aYAkpMg05UgEoy1BhakGpwUm0WPZinjneH/5UmJMt3yZWKB2n5WuFp8tuSH I4v+l0bElTxpH/BAvv9+lryn9ZjrrT6ZWBigVIOM0Sub4QrSoXfHc/XOdJFAXq7hQllkPhy9 PxEjd+Ccj0sJ4TPvLwWWRlTFgZhaPguFL/veRBTsOSWxkzCNXDt3+9vHAc9OohwFuRfWD8Us 6ZCcXZUM07F17neLLGTE4GAguw5K8bmJsUHs2xIxjDFF/c2B5vERs0m4PcFh2ps35sVRJ4yY eIaaxxdYFPxTyZPK1MMJZgOzMazhSjwJmgwRFW9+vNsvjm7IBZK+KDkLd79atGMA8JPkS6wp W/Y/mK/HhATMvSezyaI9jSngeqntSHxVZ8WFba43uV3m1DVzWsWYDUGWF3+rfSnh0qWX9NEN 1dS6icotbI19kGgUp/6RRLQnZKflkdCAZwKSbR8sVzTjPqPi+qEOoQaZj19QdUe9/0Nfzkz/ 1ytx9jjBTdql6LAHBpx6YyohT+1PCEUK0oLaikFURYJ7rHfnW0jsv7cZo08S/Pv0LUZDRm1m mrT/XVi3t3/mOZWj/3TwLzRv967SnElpCYO7wPLVyqO6gpjbeZJjKT5tACGtZ6swGufJ2RtX UToeeDCsIji7rnXzURhpdnh+pnzvJ5p1xWG3TZS82EJrWjFxpJaVdk4DMtCDEloKN0YXjTif VXevwhcjLcKYiDwMfUtPt3sV5hwpUQFKTgDfqqPBjapSsItHDJrAQk0PRLAt4wTuBJEfV4D1 WezLp/3UCdy5VVPxzuqXeYNuYLHNQhkrV4/savTlkz9uZLHPSb9Ye5cbDOmM7BohIvZ8V692 4sOaKO3J+B3DbSWjt//qtVIczjn7BETWPjLliCgXrTYf1U6SD1/Ua+5LHFIU9UNopm5X9zgp xmVckRZ1ED+lTvALwCLYWpkc7ThQdB0qndTAMDmFQzzs5T6SYrwvqoZabUterwrqL5qwfJuF qFXcMScGPVfDD/A/m1FP5X6qYVjcjWthB6PYHX5MGRuIcY4Slyb4MLgcyvu6DIKUni9u/whr uDyzQjcW5cCGVhvVZ6EdPK1wlqtlnEBg+YuDVDQK9xedRy0oohnIiD8lNEtJMQIJUmRzzeWz V/OUxwZufPMs8k+99yQ3fKIqIKgEu1fGEtGHjaEsebqZHeCpmf6mN1OSueFezzZRVjYwqT6a LUH1ez4Pd0GgE1O79h2HYF0wP9s/NDovbJbkFhpRS2Zc1SxB7p8CXCaxs0T5LZVz7pUtAbqC EKC/t5WZeeANM//SQNDIQMkaqKI1O0OmymU5vMweR2o6Chy9buBcENTIxjT13ANcOoraNsok bU7pcobyw2jkR57YN+Jgxdd+3mIMnFdAb4ssYsXAdOzhwcmor2YjUcw1sMiDEmzVuhx
  • Ironport-hdrordr: A9a23:ymFZAagHz88wOmsiu6zXedEsOXBQX0513DAbv31ZSRFFG/FwyP rBoB1L73DJYWgqNE3I+erhBEGBKUmsk6KdxbNhQItKPTOWwldASbsC0WKM+UyEJ8STzJ846U 4kSdkDNDSSNykKsS+Z2njBLz9I+rDum8rE9ISurQYccegpUdAa0+4QMHfkLqQcfng+OXNWLu v62iIRzADQBkj/I/7LSkXsGIP41qn2vaOjRSRDKw8s6QGIgz/twLnmEyKA1hNbdz9U278t/U XMjgS8v8yYwr+G4y6Z81WWw4VdmdPnxNcGLMuQivINIjGprgqzfoxuV5CLoThwiuCy71QBls XKvn4bTotOwkKUWlvwjQrm2gHm3jprw3j+yWWAiX+mmsD9TCJSMbsLuatpNj/ir2YwttB116 xGm0iDsYBMMB/GlCPho/DVShBDjCOP0DkfuN9Wq0YafZoVabdXo4Ba1lhSCo08ECXz751iOP VyDfvb+O1dfTqhHjDkV1FUsZmRt0kIb1O7qhBogL3T79EWpgE586Ig/r1cop9an6hNDaWt5I z/Q+xVff91P5YrhQ8UPpZ3fSKNMB2+ffv7ChPaHb3WLtB1B5vzke+D3FwU3pDhRHVa9up+pH z+OGkow1LaPXieUfGz4A==
  • Ironport-sdr: ZcmUGWPcInpnDb5NUTQf0gL/MaC0R3+gLOrMSIs6rHQK05QjoI4aysboCC2jk4Q6GA9CnTYQZ+ 3YYv3PU2tnUE2TPN3zdGdefVvN1R4JPpqabd+uKIWc8ywPgWgeGupNRn0/5JT6oJvILngC+Y65 SnFYm4krubZG0FIxh+VwTEjZ/ss2aB9e65mZeH/HDgRGAiGimjnrjX5t9CNfiWieRcxdbFWdYm 7JY61tZml2lnoZAdCcp6ctLBCRlJkfIcm6f5QMBPQgNgY+iMAE3hix0nyCqsDItUzES1upa+n4 loHbsP91JdeA/3bmzc1CDIKx
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Fri, Sep 24, 2021 at 11:42:13AM +0200, Jan Beulich wrote:
> In order to be able to insert/remove super-pages we need to allow
> callers of the walking function to specify at which point to stop the
> walk.
> 
> For intel_iommu_lookup_page() integrate the last level access into
> the main walking function.
> 
> dma_pte_clear_one() gets only partly adjusted for now: Error handling
> and order parameter get put in place, but the order parameter remains
> ignored (just like intel_iommu_map_page()'s order part of the flags).
> 
> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
> ---
> I have to admit that I don't understand why domain_pgd_maddr() wants to
> populate all page table levels for DFN 0.

I think it would be enough to create up to the level requested by the
caller?

Seems like a lazy way to always assert that the level requested by the
caller would be present.

> 
> I was actually wondering whether it wouldn't make sense to integrate
> dma_pte_clear_one() into its only caller intel_iommu_unmap_page(), for
> better symmetry with intel_iommu_map_page().
> ---
> v2: Fix build.
> 
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -264,63 +264,116 @@ static u64 bus_to_context_maddr(struct v
>      return maddr;
>  }
>  
> -static u64 addr_to_dma_page_maddr(struct domain *domain, u64 addr, int alloc)
> +/*
> + * This function walks (and if requested allocates) page tables to the
> + * designated target level. It returns
> + * - 0 when a non-present entry was encountered and no allocation was
> + *   requested,
> + * - a small positive value (the level, i.e. below PAGE_SIZE) upon allocation
> + *   failure,
> + * - for target > 0 the address of the page table holding the leaf PTE for
                          ^ physical

I think it's clearer, as the return type could be ambiguous.

> + *   the requested address,
> + * - for target == 0 the full PTE.

Could this create confusion if for example one PTE maps physical page
0? As the caller when getting back a full PTE with address 0 and some of
the low bits set could interpret the result as an error.

I think we already had this discussion on other occasions, but I would
rather add a parameter to be used as a return placeholder (ie: a
*dma_pte maybe?) and use the function return value just for errors
because IMO it's clearer, but I know you don't usually like this
approach, so I'm not going to insist.

> + */
> +static uint64_t addr_to_dma_page_maddr(struct domain *domain, daddr_t addr,
> +                                       unsigned int target,
> +                                       unsigned int *flush_flags, bool alloc)
>  {
>      struct domain_iommu *hd = dom_iommu(domain);
>      int addr_width = agaw_to_width(hd->arch.vtd.agaw);
>      struct dma_pte *parent, *pte = NULL;
> -    int level = agaw_to_level(hd->arch.vtd.agaw);
> -    int offset;
> +    unsigned int level = agaw_to_level(hd->arch.vtd.agaw), offset;
>      u64 pte_maddr = 0;
>  
>      addr &= (((u64)1) << addr_width) - 1;
>      ASSERT(spin_is_locked(&hd->arch.mapping_lock));
> +    ASSERT(target || !alloc);

Might be better to use an if with ASSERT_UNREACHABLE and return an
error? (ie: level itself?)

> +
>      if ( !hd->arch.vtd.pgd_maddr )
>      {
>          struct page_info *pg;
>  
> -        if ( !alloc || !(pg = iommu_alloc_pgtable(domain)) )
> +        if ( !alloc )
> +            goto out;
> +
> +        pte_maddr = level;
> +        if ( !(pg = iommu_alloc_pgtable(domain)) )
>              goto out;
>  
>          hd->arch.vtd.pgd_maddr = page_to_maddr(pg);
>      }
>  
> -    parent = (struct dma_pte *)map_vtd_domain_page(hd->arch.vtd.pgd_maddr);
> -    while ( level > 1 )
> +    pte_maddr = hd->arch.vtd.pgd_maddr;
> +    parent = map_vtd_domain_page(pte_maddr);
> +    while ( level > target )
>      {
>          offset = address_level_offset(addr, level);
>          pte = &parent[offset];
>  
>          pte_maddr = dma_pte_addr(*pte);
> -        if ( !pte_maddr )
> +        if ( !dma_pte_present(*pte) || (level > 1 && 
> dma_pte_superpage(*pte)) )
>          {
>              struct page_info *pg;
> +            /*
> +             * Higher level tables always set r/w, last level page table
> +             * controls read/write.
> +             */
> +            struct dma_pte new_pte = { DMA_PTE_PROT };
>  
>              if ( !alloc )
> -                break;
> +            {
> +                pte_maddr = 0;
> +                if ( !dma_pte_present(*pte) )
> +                    break;
> +
> +                /*
> +                 * When the leaf entry was requested, pass back the full PTE,
> +                 * with the address adjusted to account for the residual of
> +                 * the walk.
> +                 */
> +                pte_maddr = pte->val +

Wouldn't it be better to use dma_pte_addr(*pte) rather than accessing
pte->val, and then you could drop the PAGE_MASK?

Or is the addr parameter not guaranteed to be page aligned?

> +                    (addr & ((1UL << level_to_offset_bits(level)) - 1) &
> +                     PAGE_MASK);
> +                if ( !target )
> +                    break;

I'm confused by the conditional break here, why would you calculate
pte_maddr unconditionally to get overwritten just the line below if
target != 0?

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.