[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 11/17] xen/riscv: implement p2m_set_entry() and __p2m_set_entry()


  • To: Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Mon, 7 Jul 2025 14:53:51 +0200
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: Alistair Francis <alistair.francis@xxxxxxx>, Bob Eshleman <bobbyeshleman@xxxxxxxxx>, Connor Davis <connojdavis@xxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Julien Grall <julien@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Mon, 07 Jul 2025 12:54:09 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 07.07.2025 13:46, Oleksii Kurochko wrote:
> On 7/7/25 9:20 AM, Jan Beulich wrote:
>> On 04.07.2025 17:01, Oleksii Kurochko wrote:
>>> On 7/1/25 3:49 PM, Jan Beulich wrote:
>>>> On 10.06.2025 15:05, Oleksii Kurochko wrote:
>>>>> This patch introduces p2m_set_entry() and its core helper 
>>>>> __p2m_set_entry() for
>>>>> RISC-V, based loosely on the Arm implementation, with several 
>>>>> RISC-V-specific
>>>>> modifications.
>>>>>
>>>>> Key differences include:
>>>>> - TLB Flushing: RISC-V allows caching of invalid PTEs and does not require
>>>>>     break-before-make (BBM). As a result, the flushing logic is 
>>>>> simplified.
>>>>>     TLB invalidation can be deferred until p2m_write_unlock() is called.
>>>>>     Consequently, the p2m->need_flush flag is always considered true and 
>>>>> is
>>>>>     removed.
>>>>> - Page Table Traversal: The order of walking the page tables differs from 
>>>>> Arm,
>>>>>     and this implementation reflects that reversed traversal.
>>>>> - Macro Adjustments: The macros P2M_ROOT_LEVEL, P2M_ROOT_ORDER, and
>>>>>     P2M_ROOT_PAGES are updated to align with the new RISC-V 
>>>>> implementation.
>>>>>
>>>>> The main functionality is in __p2m_set_entry(), which handles mappings 
>>>>> aligned
>>>>> to page table block entries (e.g., 1GB, 2MB, or 4KB with 4KB granularity).
>>>>>
>>>>> p2m_set_entry() breaks a region down into block-aligned mappings and calls
>>>>> __p2m_set_entry() accordingly.
>>>>>
>>>>> Stub implementations (to be completed later) include:
>>>>> - p2m_free_entry()
>>>> What would a function of this name do?
>>> Recursively visiting all leaf PTE's for sub-tree behind an entry, then calls
>>> put_page() (which will free if there is no any reference to this page),
>>> freeing intermediate page table (after all entries were freed) by removing
>>> it from d->arch.paging.freelist, and removes correspondent page of 
>>> intermediate page
>>> table from p2m->pages list.
>>>
>>>> You can clear entries, but you can't
>>>> free them, can you?
>>> Is is a question regarding terminology?
>> Yes. If one sees a call to a function, it should be possible to at least
>> roughly know what it does without needing to go look at the implementation.
>>
>>> I can't free entry itself, but a page table or
>>> a page (if it is a leaf entry) on which it points could free.
>> Then e.g. pte_free_subtree() or some such?
> 
> It sounds fine to me. I'll use suggested name.
> 
> Just want to notice that other arches also have the same function
> for the same purpose with the same name.

As to x86, it's not general P2M code which uses this odd (for the purpose)
name, but only p2m-pt.c.

> Does it make sense then to change a name for all arches?

I think so.

>>>>> +{
>>>>> +    panic("%s: isn't implemented for now\n", __func__);
>>>>> +
>>>>> +    return false;
>>>>> +}
>>>> For this function in particular, though: Besides the "p2me" in the name
>>>> being somewhat odd (supposedly page table entries here are simply pte_t),
>>>> how is this going to be different from pte_is_valid()?
>>> pte_is_valid() is checking a real bit of PTE, but p2me_is_valid() is 
>>> checking
>>> what is a type stored in the radix tree (p2m->p2m_types):
>>>     /*
>>>      * In the case of the P2M, the valid bit is used for other purpose. Use
>>>      * the type to check whether an entry is valid.
>>>      */
>>>     static inline bool p2me_is_valid(struct p2m_domain *p2m, pte_t pte)
>>>     {
>>>         return p2m_type_radix_get(p2m, pte) != p2m_invalid;
>>>     }
>>>
>>> It is done to track which page was modified by a guest.
>> But then (again) the name doesn't convey what the function does.
> 
> Then probably p2me_type_is_valid(struct p2m_domain *p2m, pte_t pte) would 
> better.

For P2M type checks please don't invent new naming, but use what both x86
and Arm are already using. Note how we already have p2m_is_valid() in that
set. Just that it's not doing what you want here.

>>   Plus
>> can't a guest also arrange for an entry's type to move to p2m_invalid?
>> That's then still an entry that was modified by the guest.
> 
> I am not really sure that I fully understand the question.
> Do you ask if a guest can do something which will lead to a call of 
> p2m_set_entry()
> with p2m_invalid argument?

That I'm not asking, but rather stating. I.e. I expect such is possible.

> If yes, then it seems like it will be done only in case of 
> p2m_remove_mapping() what
> will mean that alongside with p2m_invalid INVALID_MFN will be also passed, 
> what means
> this entry isn't expected to be used anymore.

Right. But such an entry would still have been "modified" by the guest.

>> Overall I think I'm lacking clarity what you mean to use this predicate
>> for.
> 
> By using of "p2me_" predicate I wanted to express that not PTE's valid bit 
> will be
> checked, but the type saved in radix tree will be used.
> As suggested above probably it will be better drop "e" too and just use 
> p2m_type_is_valid().

See above regarding that name.

>>>>> +        /*
>>>>> +         * Don't take into account the MFN when removing mapping (i.e
>>>>> +         * MFN_INVALID) to calculate the correct target order.
>>>>> +         *
>>>>> +         * XXX: Support superpage mappings if nr is not aligned to a
>>>>> +         * superpage size.
>>>>> +         */
>>>> Does this really need leaving as a to-do?
>>> I think so, yes. It won’t break the current workflow if|nr| isn’t aligned,
>>> a smaller order will simply be chosen.
>> Well, my question was more like "Isn't it simple enough to cover the case
>> right away?"
>>
>>>>> +        mask = !mfn_eq(smfn, INVALID_MFN) ? mfn_x(smfn) : 0;
>>>>> +        mask |= gfn_x(sgfn) | nr;
>>>>> +
>>>>> +        for ( ; i != 0; i-- )
>>>>> +        {
>>>>> +            if ( !(mask & (BIT(XEN_PT_LEVEL_ORDER(i), UL) - 1)) )
>>>>> +            {
>>>>> +                    order = XEN_PT_LEVEL_ORDER(i);
>>>>> +                    break;
>>>> Nit: Style.
>>>>
>>>>> +            }
>>>>> +        }
>>>>> +
>>>>> +        rc = __p2m_set_entry(p2m, sgfn, order, smfn, t, a);
>>>>> +        if ( rc )
>>>>> +            break;
>>>>> +
>>>>> +        sgfn = gfn_add(sgfn, (1 << order));
>>>>> +        if ( !mfn_eq(smfn, INVALID_MFN) )
>>>>> +           smfn = mfn_add(smfn, (1 << order));
>>>>> +
>>>>> +        nr -= (1 << order);
>>>> Throughout maybe better be safe right away and use 1UL?
>>>>
>>>>> +    }
>>>>> +
>>>>> +    return rc;
>>>>>    }
>>>> How's the caller going to know how much of the range was successfully
>>>> mapped?
>>> There is no such option. Do other arches do that? I mean returns somehow
>>> the number of successfully mapped (sgfn,smfn).
>> On x86 we had to introduce some not very nice code to cover for the absence
>> of proper handling there. For a new port I think it wants at least seriously
>> considering not to repeat such a potentially unhelpful pattern.
>>
>>>> That part may need undoing (if not here, then in the caller),
>>>> or a caller may want to retry.
>>> So the caller in the case if rc != 0, can just undoing the full range
>>> (by using the same sgfn, nr, smfn).
>> Can it? How would it know what the original state was?
> 
> You're right — blindly unmapping the range assumes that no entries were valid
> beforehand and I missed that it could be that something valid was mapped 
> before
> p2m_set_entry(sgfn,...,smfn) was called.
> But then I am not really understand why it won't be an issue if will know
> how many GFNs were successfully mapped.

The caller may know what that range's state was. But what I really wanted to
convey is: Updating multiple entries in one go is complicated in some of the
corner cases. You will want to think this through now, in order to avoid the
need to re-write everything later again.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.