[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 17/18] xen/riscv: add support of page lookup by GFN



On 17.09.2025 23:55, Oleksii Kurochko wrote:
> --- a/xen/arch/riscv/p2m.c
> +++ b/xen/arch/riscv/p2m.c
> @@ -978,3 +978,189 @@ int map_regions_p2mt(struct domain *d,
>  
>      return rc;
>  }
> +
> +

Nit: No double blank lines please.

> +/*
> + * p2m_get_entry() should always return the correct order value, even if an
> + * entry is not present (i.e. the GFN is outside the range):
> + *   [p2m->lowest_mapped_gfn, p2m->max_mapped_gfn]).    (1)
> + *
> + * This ensures that callers of p2m_get_entry() can determine what range of
> + * address space would be altered by a corresponding p2m_set_entry().
> + * Also, it would help to avoid cost page walks for GFNs outside range (1).
> + *
> + * Therefore, this function returns true for GFNs outside range (1), and in
> + * that case the corresponding level is returned via the level_out argument.
> + * Otherwise, it returns false and p2m_get_entry() performs a page walk to
> + * find the proper entry.
> + */
> +static bool check_outside_boundary(gfn_t gfn, gfn_t boundary, bool is_lower,
> +                                   unsigned int *level_out)
> +{
> +    unsigned int level;
> +
> +    if ( (is_lower && gfn_x(gfn) < gfn_x(boundary)) ||
> +         (!is_lower && gfn_x(gfn) > gfn_x(boundary)) )

I understand people write things this way, but personally I find it confusing
to read. Why not simply use a conditional operator here (and again below):

    if ( is_lower ? gfn_x(gfn) < gfn_x(boundary)
                  : gfn_x(gfn) > gfn_x(boundary) )

> +    {
> +        for ( level = P2M_ROOT_LEVEL; level; level-- )
> +        {
> +            unsigned long mask = PFN_DOWN(P2M_LEVEL_MASK(level));

Don't you need to accumulate the mask to use across loop iterations here
(or calculate it accordingly)? Else ...

> +            if ( (is_lower && ((gfn_x(gfn) & mask) < gfn_x(boundary))) ||
> +                 (!is_lower && ((gfn_x(gfn) & mask) > gfn_x(boundary))) )

... here you'll compare some middle part of the original GFN against the
boundary.

> +            {
> +                *level_out = level;
> +                return true;
> +            }
> +        }
> +    }
> +
> +    return false;
> +}
> +
> +/*
> + * Get the details of a given gfn.
> + *
> + * If the entry is present, the associated MFN will be returned and the
> + * p2m type of the mapping.
> + * The page_order will correspond to the order of the mapping in the page
> + * table (i.e it could be a superpage).
> + *
> + * If the entry is not present, INVALID_MFN will be returned and the
> + * page_order will be set according to the order of the invalid range.
> + *
> + * valid will contain the value of bit[0] (e.g valid bit) of the
> + * entry.
> + */
> +static mfn_t p2m_get_entry(struct p2m_domain *p2m, gfn_t gfn,
> +                           p2m_type_t *t,
> +                           unsigned int *page_order,
> +                           bool *valid)
> +{
> +    unsigned int level = 0;
> +    pte_t entry, *table;
> +    int rc;
> +    mfn_t mfn = INVALID_MFN;
> +    DECLARE_OFFSETS(offsets, gfn_to_gaddr(gfn));
> +
> +    ASSERT(p2m_is_locked(p2m));
> +
> +    if ( valid )
> +        *valid = false;

Wouldn't you better similarly set *t to some "default" value?

> +    if ( check_outside_boundary(gfn, p2m->lowest_mapped_gfn, true, &level) )
> +        goto out;
> +
> +    if ( check_outside_boundary(gfn, p2m->max_mapped_gfn, false, &level) )
> +        goto out;
> +
> +    table = p2m_get_root_pointer(p2m, gfn);
> +
> +    /*
> +     * The table should always be non-NULL because the gfn is below
> +     * p2m->max_mapped_gfn and the root table pages are always present.
> +     */
> +    if ( !table )
> +    {
> +        ASSERT_UNREACHABLE();
> +        level = P2M_ROOT_LEVEL;
> +        goto out;
> +    }
> +
> +    for ( level = P2M_ROOT_LEVEL; level; level-- )
> +    {
> +        rc = p2m_next_level(p2m, false, level, &table, offsets[level]);
> +        if ( (rc == P2M_TABLE_MAP_NONE) || (rc == P2M_TABLE_MAP_NOMEM) )
> +            goto out_unmap;

Getting back P2M_TABLE_MAP_NOMEM here is a bug, not really a loop exit
condition.

> +        if ( rc != P2M_TABLE_NORMAL )
> +            break;
> +    }
> +
> +    entry = table[offsets[level]];
> +
> +    if ( pte_is_valid(entry) )
> +    {
> +        if ( t )
> +            *t = p2m_get_type(entry);
> +
> +        mfn = pte_get_mfn(entry);
> +        /*
> +         * The entry may point to a superpage. Find the MFN associated
> +         * to the GFN.
> +         */
> +        mfn = mfn_add(mfn,
> +                      gfn_x(gfn) & (BIT(P2M_LEVEL_ORDER(level), UL) - 1));

May want to assert that the respective bits of "mfn" are actually clear
before this calculation.

> +        if ( valid )
> +            *valid = pte_is_valid(entry);
> +    }
> +
> + out_unmap:
> +    unmap_domain_page(table);
> +
> + out:
> +    if ( page_order )
> +        *page_order = P2M_LEVEL_ORDER(level);
> +
> +    return mfn;
> +}
> +
> +static mfn_t p2m_lookup(struct p2m_domain *p2m, gfn_t gfn, p2m_type_t *t)
> +{
> +    mfn_t mfn;
> +
> +    p2m_read_lock(p2m);
> +    mfn = p2m_get_entry(p2m, gfn, t, NULL, NULL);

Seeing the two NULLs here I wonder: What use is the "valid" parameter of that
function? And what use is the function here when it doesn't also return the
order? IOW I'm not sure having this helper is actually worthwhile. This is
even more so that ...

> +    p2m_read_unlock(p2m);
> +
> +    return mfn;
> +}
> +
> +struct page_info *p2m_get_page_from_gfn(struct p2m_domain *p2m, gfn_t gfn,
> +                                        p2m_type_t *t)
> +{
> +    struct page_info *page;
> +    p2m_type_t p2mt = p2m_invalid;
> +    mfn_t mfn;
> +
> +    p2m_read_lock(p2m);
> +    mfn = p2m_lookup(p2m, gfn, t);

... there's a locking problem here: You cannot acquire a read lock in a
nested fashion - that's a recipe for a deadlock when between the first
acquire and the 2nd acquire attempt another CPU tries to acquire the
lock for writing (which will result in no further readers being allowed
in). It wasn't all that long ago that in the security team we actually
audited the code base for the absence of such a pattern.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.