[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 12/17] xen/riscv: Implement p2m_free_entry() and related helpers


  • To: Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Mon, 14 Jul 2025 09:15:22 +0200
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: Alistair Francis <alistair.francis@xxxxxxx>, Bob Eshleman <bobbyeshleman@xxxxxxxxx>, Connor Davis <connojdavis@xxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Julien Grall <julien@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Mon, 14 Jul 2025 07:15:35 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 11.07.2025 17:56, Oleksii Kurochko wrote:
> On 7/1/25 4:23 PM, Jan Beulich wrote:
>> On 10.06.2025 15:05, Oleksii Kurochko wrote:
>>> +/*
>>> + * In the case of the P2M, the valid bit is used for other purpose. Use
>>> + * the type to check whether an entry is valid.
>>> + */
>>>   static inline bool p2me_is_valid(struct p2m_domain *p2m, pte_t pte)
>>>   {
>>> -    panic("%s: isn't implemented for now\n", __func__);
>>> +    return p2m_type_radix_get(p2m, pte) != p2m_invalid;
>>> +}
>> No checking of the valid bit?
> 
> As mentioned in the comment, only the P2M type should be checked, since the
> valid bit is used for other purposes we discussed earlier, for example, to
> track whether pages were accessed by a guest domain, or to support certain
> table invalidation optimizations (1) and (2).
> So, in this case, we only need to consider whether the entry is invalid
> from the P2M perspective.
> 
> (1)https://github.com/xen-project/xen/blob/19772b67/xen/arch/arm/mmu/p2m.c#L1245
> (2)https://github.com/xen-project/xen/blob/19772b67/xen/arch/arm/mmu/p2m.c#L1386

And there can be e.g. entries with the valid bit set and the type being
p2m_invalid? IOW there's no short-circuiting possible in any of the
possible cases, avoiding the radix tree lookup in at least some of the
cases?

>>> @@ -404,11 +426,127 @@ static int p2m_next_level(struct p2m_domain *p2m, 
>>> bool alloc_tbl,
>>>       return GUEST_TABLE_MAP_NONE;
>>>   }
>>>   
>>> +static void p2m_put_foreign_page(struct page_info *pg)
>>> +{
>>> +    /*
>>> +     * It's safe to do the put_page here because page_alloc will
>>> +     * flush the TLBs if the page is reallocated before the end of
>>> +     * this loop.
>>> +     */
>>> +    put_page(pg);
>> Is the comment really true? The page allocator will flush the normal
>> TLBs, but not the stage-2 ones. Yet those are what you care about here,
>> aiui.
> 
> In alloc_heap_pages():
>   ...
>       if ( need_tlbflush )
>          filtered_flush_tlb_mask(tlbflush_timestamp);
>   ...
>   
> filtered_flush_tlb_mask() calls arch_flush_tlb_mask().
> 
> and arch_flush_tlb_mask(), at least, on Arm (I haven't checked x86) is
> implented as:
>    void arch_flush_tlb_mask(const cpumask_t *mask)
>    {
>        /* No need to IPI other processors on ARM, the processor takes care of 
> it. */
>        flush_all_guests_tlb();
>    }
> 
> So it flushes stage-2 TLB.

Hmm, okay. And I take it you have the same plan on RISC-V? What I'd like to
ask for, though, is that the comment (also) mentions where that (guest)
flushing actually happens. That's not in page_alloc.c, and it also wasn't
originally intended for guest TLBs to also be flushed from there (as x86 is
where the flush avoidance machinery originates, which Arm and now also
RISC-V don't really use).

>>> +/* Put any references on the single 4K page referenced by mfn. */
>>> +static void p2m_put_4k_page(mfn_t mfn, p2m_type_t type)
>>> +{
>>> +    /* TODO: Handle other p2m types */
>>> +
>>> +    /* Detect the xenheap page and mark the stored GFN as invalid. */
>>> +    if ( p2m_is_ram(type) && is_xen_heap_mfn(mfn) )
>>> +        page_set_xenheap_gfn(mfn_to_page(mfn), INVALID_GFN);
>> Is this a valid thing to do? How do you make sure the respective uses
>> (in gnttab's shared and status page arrays) are / were also removed?
> 
> As grant table frame GFN is stored directly in struct page_info instead
> of keeping it in standalone status/shared arrays, thereby there is no need
> for status/shared arrays.

I fear I don't follow. Looking at Arm's header (which I understand you
derive from), I see

#define gnttab_shared_page(t, i)   virt_to_page((t)->shared_raw[i])

#define gnttab_status_page(t, i)   virt_to_page((t)->status[i])

Are you intending to do things differently?

>>> +/* Put any references on the superpage referenced by mfn. */
>>> +static void p2m_put_2m_superpage(mfn_t mfn, p2m_type_t type)
>>> +{
>>> +    struct page_info *pg;
>>> +    unsigned int i;
>>> +
>>> +    ASSERT(mfn_valid(mfn));
>>> +
>>> +    pg = mfn_to_page(mfn);
>>> +
>>> +    for ( i = 0; i < XEN_PT_ENTRIES; i++, pg++ )
>>> +        p2m_put_foreign_page(pg);
>>> +}
>>> +
>>> +/* Put any references on the page referenced by pte. */
>>> +static void p2m_put_page(struct p2m_domain *p2m, const pte_t pte,
>>> +                         unsigned int level)
>>> +{
>>> +    mfn_t mfn = pte_get_mfn(pte);
>>> +    p2m_type_t p2m_type = p2m_type_radix_get(p2m, pte);
>> This gives you the type of the 1st page. What guarantees that all other pages
>> in a superpage are of the exact same type?
> 
> Doesn't superpage mean that all the 4KB pages within that superpage have the
> same type and contiguous in memory?

If the mapping is a super-page one - yes. Yet I see nothing super-page-ish
here.

>>> +    if ( level == 1 )
>>> +        return p2m_put_2m_superpage(mfn, p2m_type);
>>> +    else if ( level == 0 )
>>> +        return p2m_put_4k_page(mfn, p2m_type);
>> Use switch() right away?
> 
> It could be, I think that no big difference at the moment, at least.
> But I am okay to rework it.

If you don't want to use switch() here, then my other style nit would
need giving: Please avoid "else" in situations like this.

>>> +static void p2m_free_page(struct domain *d, struct page_info *pg)
>>> +{
>>> +    if ( is_hardware_domain(d) )
>>> +        free_domheap_page(pg);
>> Why's the hardware domain different here? It should have a pool just like
>> all other domains have.
> 
> Hardware domain (dom0) should be no limit in the number of pages that can
> be allocated, so allocate p2m pages for hardware domain is done from heap.
> 
> An idea of p2m pool is to provide a way how to put clear limit and amount
> to the p2m allocation.

Well, we had been there on another thread, and I outlined how I think
Dom0 may want handling.

>>>   /* Free pte sub-tree behind an entry */
>>>   static void p2m_free_entry(struct p2m_domain *p2m,
>>>                              pte_t entry, unsigned int level)
>>>   {
>>> -    panic("%s: hasn't been implemented yet\n", __func__);
>>> +    unsigned int i;
>>> +    pte_t *table;
>>> +    mfn_t mfn;
>>> +    struct page_info *pg;
>>> +
>>> +    /* Nothing to do if the entry is invalid. */
>>> +    if ( !p2me_is_valid(p2m, entry) )
>>> +        return;
>> Does this actually apply to intermediate page tables (which you handle
>> later in the function), when that's (only) a P2M type check?
> 
> Yes, any PTE should have V bit set to 1, so from P2M perspective it also
> should be, at least, not equal to p2m_invalid.

I don't follow. Where would that type be set? The radix tree being GFN-
indexed, you would need to "invent" a GFN for every intermediate page table,
just to be able to (legitimately) invoke the type retrieval function. Maybe
you mean to leverage that (now, i.e. post-v2) you encode some of the types
directly in the PTE, and p2m_invalid may be one of them. But that wasn't
the case in the v2 submission, and hence the code looked wrong to me. Which
in turn suggests that at least some better commentary is going to be needed,
maybe even some BUILD_BUG_ON().

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.