|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [RFC 18/22] xen/arm: p2m: Introduce p2m_set_entry and __p2m_set_entry
On Thu, 28 Jul 2016, Julien Grall wrote:
> The ARM architecture mandates to use of a break-before-make sequence
> when changing translation entries if the page table is shared between
> multiple CPUs whenever a valid entry is replaced by another valid entry
> (see D4.7.1 in ARM DDI 0487A.j for more details).
>
> The break-before-make sequence can be divided in the following steps:
> 1) Invalidate the old entry in the page table
> 2) Issue a TLB invalidation instruction for the address associated
> to this entry
> 3) Write the new entry
>
> The current P2M code implemented in apply_one_level does not respect
> this sequence and may result to break coherency on some processors.
>
> Adapting the current implementation to use the break-before-make
> sequence would imply some code duplication and more TLBs invalidation
> than necessary. For instance, if we are replacing a 4KB page and the
> current mapping in the P2M is using a 1GB superpage, the following steps
> will happen:
> 1) Shatter the 1GB superpage into a series of 2MB superpages
> 2) Shatter the 2MB superpage into a series of 4KB pages
> 3) Replace the 4KB page
>
> As the current implementation is shattering while descending and install
> the mapping, Xen would need to issue 3 TLB invalidation instructions
> which is clearly inefficient.
>
> Furthermore, all the operations which modify the page table are using
> the same skeleton. It is more complicated to maintain different code paths
> than having a generic function that set an entry and take care of the
> break-before-make sequence.
>
> The new implementation is based on the x86 EPT one which, I think,
> fits quite well for the break-before-make sequence whilst keeping
> the code simple.
>
> The main function of the new implementation is __p2m_get_entry. It will
> only work on mapping that are aligned to a block entry in the page table
> (i.e 1GB, 2MB, 4KB when using a 4KB granularity).
>
> Another function, p2m_get_entry, is provided to break down is region
> into mapping that is aligned to a block entry.
>
> Note that to keep this patch "small", there are no caller of those
> functions added in this patch (they will be added in follow-up patches).
>
> Signed-off-by: Julien Grall <julien.grall@xxxxxxx>
>
> ---
> I need to find the impact of this new implementation on ARM32 because
> the domheap is not always mapped. This means that Xen needs to map/unmap
> everytime the page associated to the page table. It might be possible
> to re-use some caching structure as the current implementation does
> or rework the way a domheap is mapped/unmapped.
>
> Also, this code still contain few TODOs mostly to add sanity
> checks and few optimization. The IOMMU is not yet supported.
> ---
> xen/arch/arm/p2m.c | 335
> +++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 335 insertions(+)
>
> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
> index c93e554..297b176 100644
> --- a/xen/arch/arm/p2m.c
> +++ b/xen/arch/arm/p2m.c
> @@ -750,6 +750,341 @@ static void p2m_put_l3_page(mfn_t mfn, p2m_type_t type)
> }
> }
>
> +#if 0
> +/* Free lpae sub-tree behind an entry */
> +static void p2m_free_entry(struct p2m_domain *p2m,
> + lpae_t entry, unsigned int level)
> +{
> + unsigned int i;
> + lpae_t *table;
> + mfn_t mfn;
> +
> + /* Nothing to do if the entry is invalid or a super-page */
Why are we not freeing superpages? It looks like this function could be
correctly called on a superpage (with level == target).
> + if ( !p2m_valid(entry) || p2m_is_superpage(entry, level) )
> + return;
> +
> + if ( level == 3 )
> + {
> + p2m_put_l3_page(_mfn(entry.p2m.base), entry.p2m.type);
> + return;
> + }
> +
> + table = map_domain_page(_mfn(entry.p2m.base));
> + for ( i = 0; i < LPAE_ENTRIES; i++ )
> + p2m_free_entry(p2m, *(table + i), level + 1);
> +
> + unmap_domain_page(table);
> +
> + /*
> + * Make sure all the references in the TLB have been removed before
> + * freing the intermediate page table.
> + * XXX: Should we defer the free of the page table to avoid the
> + * flush?
Yes, I would probably move the p2m flush out of this function.
> + */
> + if ( p2m->need_flush )
> + p2m_flush_tlb_sync(p2m);
> +
> + mfn = _mfn(entry.p2m.base);
> + ASSERT(mfn_valid(mfn_x(mfn)));
> +
> + free_domheap_page(mfn_to_page(mfn_x(mfn)));
> +}
> +
> +static bool p2m_split_superpage(struct p2m_domain *p2m, lpae_t *entry,
> + unsigned int level, unsigned int target,
> + const unsigned int *offsets)
> +{
> + struct page_info *page;
> + unsigned int i;
> + lpae_t pte, *table;
> + bool rv = true;
> +
> + /* Convenience aliases */
> + p2m_type_t t = entry->p2m.type;
> + mfn_t mfn = _mfn(entry->p2m.base);
> +
> + /* Convenience aliases */
> + unsigned int next_level = level + 1;
> + unsigned int level_order = level_orders[next_level];
> +
> + /*
> + * This should only be called with target != level and the entry is
> + * a superpage.
> + */
> + ASSERT(level < target);
> + ASSERT(p2m_is_superpage(*entry, level));
> +
> + page = alloc_domheap_page(NULL, 0);
> + if ( !page )
> + return false;
> +
> + page_list_add(page, &p2m->pages);
> + table = __map_domain_page(page);
> +
> + /*
> + * We are either splitting a first level 1G page into 512 second level
> + * 2M pages, or a second level 2M page into 512 third level 4K pages.
> + */
> + for ( i = 0; i < LPAE_ENTRIES; i++ )
> + {
> + lpae_t *new_entry = table + i;
> +
> + pte = mfn_to_p2m_entry(mfn, t, p2m->default_access);
> +
> + mfn = mfn_add(mfn, (1UL << level_order));
> +
> + /*
> + * First and second level pages set p2m.table = 0, but third
> + * level entries set p2m.table = 1.
> + */
> + if ( next_level < 3 )
> + pte.p2m.table = 0;
> +
> + write_pte(new_entry, pte);
> + }
> +
> + /*
> + * Shatter superpage in the page to the level we want to make the
> + * changes.
> + * This is done outside the loop to avoid checking the offset to
> + * know whether the entry should be shattered for every entry.
> + */
> + if ( next_level != target )
> + rv = p2m_split_superpage(p2m, table + offsets[next_level],
> + level + 1, target, offsets);
> +
> + if ( p2m->clean_pte )
> + clean_dcache_va_range(table, PAGE_SIZE);
> +
> + unmap_domain_page(table);
> +
> + pte = mfn_to_p2m_entry(_mfn(page_to_mfn(page)), p2m_invalid,
> + p2m->default_access);
> +
> + p2m_write_pte(entry, pte, p2m->clean_pte);
> +
> + /*
> + * Even if we failed, we should install the newly allocated LPAE
> + * entry. The caller will be in charge to free the sub-tree.
> + * XXX: See if we can free entry here.
> + */
> + *entry = pte;
Isn't the call to p2m_write_pte just above enough?
> +
> + return rv;
> +}
> +
> +/*
> + * Insert an entry in the p2m. This should be called with a mapping
> + * equal to a page/superpage (4K, 2M, 1G).
> + */
> +static int __p2m_set_entry(struct p2m_domain *p2m,
> + gfn_t sgfn,
> + unsigned int page_order,
> + mfn_t smfn,
> + p2m_type_t t,
> + p2m_access_t a)
> +{
> + paddr_t addr = pfn_to_paddr(gfn_x(sgfn));
> + unsigned int level = 0;
> + unsigned int target = 3 - (page_order / LPAE_SHIFT);
> + lpae_t *entry, *table, orig_pte;
> + int rc;
> +
> + /* Convenience aliases */
> + const unsigned int offsets[4] = {
> + zeroeth_table_offset(addr),
> + first_table_offset(addr),
> + second_table_offset(addr),
> + third_table_offset(addr)
> + };
> +
> + /* TODO: Check the validity for the address */
> +
> + ASSERT(p2m_is_write_locked(p2m));
> +
> + /*
> + * Check if the level target is valid: we only support
> + * 4K - 2M - 1G mapping.
> + */
> + ASSERT(target > 0 && target <= 3);
> +
> + table = p2m_get_root_pointer(p2m, sgfn);
> + if ( !table )
> + return -EINVAL;
> +
> + for ( level = P2M_ROOT_LEVEL; level < target; level++ )
> + {
> + rc = p2m_next_level(p2m, false, &table, offsets[level]);
If smfn is INVALID_MFN, does it make sense to create intermediate table
entries?
> + if ( rc == GUEST_TABLE_MAP_FAILED )
> + {
> + rc = -ENOENT;
> + goto out;
> + }
> + else if ( rc != GUEST_TABLE_NORMAL_PAGE )
> + break;
> + }
> +
> + entry = table + offsets[level];
> +
> + /*
> + * If we are here with level < target, we must be at a leaf node,
> + * and we need to break up the superpage.
> + */
> + if ( level < target )
> + {
> + /* We need to split the original page. */
> + lpae_t split_pte = *entry;
> +
> + ASSERT(p2m_is_superpage(*entry, level));
> +
> + if ( !p2m_split_superpage(p2m, &split_pte, level, target, offsets) )
> + {
> + p2m_free_entry(p2m, split_pte, level);
> + rc = -ENOMEM;
> + goto out;
> + }
> +
> + /*
> + * Follow the break-before-sequence to update the entry.
> + * For more details see (D4.7.1 in ARM DDI 0487A.j).
> + * XXX: Can we flush by address?
> + */
> + p2m_remove_pte(entry, p2m->clean_pte);
> + p2m_flush_tlb_sync(p2m);
> +
> + p2m_write_pte(entry, split_pte, p2m->clean_pte);
> +
> + /* then move to the level we want to make real changes */
> + for ( ; level < target; level++ )
> + {
> + rc = p2m_next_level(p2m, true, &table, offsets[level]);
> +
> + /*
> + * The entry should be found and either be a table
> + * or a superpage if level 3 is not targeted
> + */
> + ASSERT(rc == GUEST_TABLE_NORMAL_PAGE ||
> + (rc == GUEST_TABLE_SUPER_PAGE && target < 3));
> + }
> + entry = table + offsets[level];
> + }
> +
> + /*
> + * We should always be there with the correct level because
> + * all the intermediate tables have been installed if necessary.
> + */
> + ASSERT(level == target);
> +
> + orig_pte = *entry;
> +
> + /*
> + * The radix-tree can only work on 4KB. This is only used when
> + * memaccess is enabled.
"This" what? This function? The radix-tree?
> + */
> + ASSERT(!p2m->mem_access_enabled || page_order == 0);
> + /*
> + * The access type should always be p2m_access_rwx when the mapping
> + * is removed.
> + */
> + ASSERT(!mfn_eq(INVALID_MFN, smfn) || (a == p2m_access_rwx));
> + /*
> + * Update the mem access permission before update the P2M. So we
> + * don't have to revert the mapping if it has failed.
> + */
> + rc = p2m_mem_access_radix_set(p2m, sgfn, a);
> + if ( rc )
> + goto out;
> +
> + /*
> + * Always remove the entry in order to follow the break-before-make
> + * sequence when updating the translation table (D4.7.1 in ARM DDI
> + * 0487A.j).
> + */
> + if ( p2m_valid(orig_pte) )
> + p2m_remove_pte(entry, p2m->clean_pte);
> +
> + if ( mfn_eq(smfn, INVALID_MFN) )
> + /* Flush can be deferred if the entry is removed */
> + p2m->need_flush |= !!p2m_valid(orig_pte);
> + else
> + {
> + lpae_t pte;
> +
> + /*
> + * Flush the TLB before write the new one to keep coherency.
> + * XXX: Can we flush by address?
I was asking myself the same question
> + */
> + if ( p2m_valid(orig_pte) )
> + p2m_flush_tlb_sync(p2m);
> +
> + pte = mfn_to_p2m_entry(smfn, t, a);
> + if ( level < 3 )
> + pte.p2m.table = 0; /* Superpage entry */
> +
> + p2m_write_pte(entry, pte, p2m->clean_pte);
> +
> + p2m->max_mapped_gfn = gfn_max(p2m->max_mapped_gfn,
> + gfn_add(sgfn, 1 << page_order));
> + p2m->lowest_mapped_gfn = gfn_min(p2m->lowest_mapped_gfn, sgfn);
> + }
> +
> + /*
> + * Free the entry only if the original pte was valid and the base
> + * is different (to avoid freeing when permission is changed).
> + */
> + if ( p2m_valid(orig_pte) && entry->p2m.base != orig_pte.p2m.base )
> + p2m_free_entry(p2m, orig_pte, level);
> +
> + /* XXX: Flush iommu */
Please do, it would be best if we had iommu flushing in this patch
before committing it.
> + rc = 0;
> +
> +out:
> + unmap_domain_page(table);
> +
> + return rc;
> +}
> +
> +static int p2m_set_entry(struct p2m_domain *p2m,
> + gfn_t sgfn,
> + unsigned long todo,
> + mfn_t smfn,
> + p2m_type_t t,
> + p2m_access_t a)
> +{
> + int rc = 0;
> +
> + while ( todo )
> + {
> + unsigned long mask = gfn_x(sgfn) | mfn_x(smfn) | todo;
> + unsigned long order;
> +
> + /* Always map 4k by 4k when memaccess is enabled */
> + if ( unlikely(p2m->mem_access_enabled) )
> + order = THIRD_ORDER;
Missing an "else" here
> + if ( !(mask & ((1UL << FIRST_ORDER) - 1)) )
> + order = FIRST_ORDER;
> + else if ( !(mask & ((1UL << SECOND_ORDER) - 1)) )
> + order = SECOND_ORDER;
> + else
> + order = THIRD_ORDER;
> +
> + rc = __p2m_set_entry(p2m, sgfn, order, smfn, t, a);
> + if ( rc )
> + break;
Shouldn't we be checking that "(1<<order)" doesn't exceed "todo" before
calling __p2m_set_entry? Otherwise we risk creating a mapping bigger
than requested.
> + sgfn = gfn_add(sgfn, (1 << order));
> + if ( !mfn_eq(smfn, INVALID_MFN) )
> + smfn = mfn_add(smfn, (1 << order));
> +
> + todo -= (1 << order);
> + }
> +
> + return rc;
> +}
> +#endif
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |