[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 11/17] xen/riscv: implement p2m_set_entry() and __p2m_set_entry()


  • To: Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Mon, 7 Jul 2025 17:15:31 +0200
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: Alistair Francis <alistair.francis@xxxxxxx>, Bob Eshleman <bobbyeshleman@xxxxxxxxx>, Connor Davis <connojdavis@xxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Julien Grall <julien@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Mon, 07 Jul 2025 15:15:47 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 07.07.2025 17:00, Oleksii Kurochko wrote:
> On 7/7/25 2:53 PM, Jan Beulich wrote:
>> On 07.07.2025 13:46, Oleksii Kurochko wrote:
>>> On 7/7/25 9:20 AM, Jan Beulich wrote:
>>>> On 04.07.2025 17:01, Oleksii Kurochko wrote:
>>>>> On 7/1/25 3:49 PM, Jan Beulich wrote:
>>>>>> On 10.06.2025 15:05, Oleksii Kurochko wrote:
>>>>>>> +{
>>>>>>> +    panic("%s: isn't implemented for now\n", __func__);
>>>>>>> +
>>>>>>> +    return false;
>>>>>>> +}
>>>>>> For this function in particular, though: Besides the "p2me" in the name
>>>>>> being somewhat odd (supposedly page table entries here are simply pte_t),
>>>>>> how is this going to be different from pte_is_valid()?
>>>>> pte_is_valid() is checking a real bit of PTE, but p2me_is_valid() is 
>>>>> checking
>>>>> what is a type stored in the radix tree (p2m->p2m_types):
>>>>>      /*
>>>>>       * In the case of the P2M, the valid bit is used for other purpose. 
>>>>> Use
>>>>>       * the type to check whether an entry is valid.
>>>>>       */
>>>>>      static inline bool p2me_is_valid(struct p2m_domain *p2m, pte_t pte)
>>>>>      {
>>>>>          return p2m_type_radix_get(p2m, pte) != p2m_invalid;
>>>>>      }
>>>>>
>>>>> It is done to track which page was modified by a guest.
>>>> But then (again) the name doesn't convey what the function does.
>>> Then probably p2me_type_is_valid(struct p2m_domain *p2m, pte_t pte) would 
>>> better.
>> For P2M type checks please don't invent new naming, but use what both x86
>> and Arm are already using. Note how we already have p2m_is_valid() in that
>> set. Just that it's not doing what you want here.
> 
> Hm, why not doing what I want? p2m_is_valid() verifies if P2M entry is valid.
> And in here it is checked if P2M pte is valid from P2M point of view by 
> checking
> the type in radix tree and/or in reserved PTEs bits (just to remind we have 
> only 2
> free bits for type).

Because this is how it's defined on x86:

#define p2m_is_valid(_t)    (p2m_to_mask(_t) & \
                             (P2M_RAM_TYPES | p2m_to_mask(p2m_mmio_direct)))

I.e. more strict that simply "!= p2m_invalid". And I think such predicates
would better be uniform across architectures, such that in principle they
might also be usable in common code (as we already do with p2m_is_foreign()).

>>>>    Plus
>>>> can't a guest also arrange for an entry's type to move to p2m_invalid?
>>>> That's then still an entry that was modified by the guest.
>>> I am not really sure that I fully understand the question.
>>> Do you ask if a guest can do something which will lead to a call of 
>>> p2m_set_entry()
>>> with p2m_invalid argument?
>> That I'm not asking, but rather stating. I.e. I expect such is possible.
>>
>>> If yes, then it seems like it will be done only in case of 
>>> p2m_remove_mapping() what
>>> will mean that alongside with p2m_invalid INVALID_MFN will be also passed, 
>>> what means
>>> this entry isn't expected to be used anymore.
>> Right. But such an entry would still have been "modified" by the guest.
> 
> Yes, but nothing then is needed to do with it.

I understand that. Maybe I'm overly picky, but all of the above was in response
to you saying "It is done to track which page was modified by a guest." And I'm
simply trying to get you to use precise wording, both in code comments and in
discussions. In a case like the one here I simply can't judge whether you simply
expressed yourself not clear enough, or whether you indeed meant what you said.

>>>>>>> +        /*
>>>>>>> +         * Don't take into account the MFN when removing mapping (i.e
>>>>>>> +         * MFN_INVALID) to calculate the correct target order.
>>>>>>> +         *
>>>>>>> +         * XXX: Support superpage mappings if nr is not aligned to a
>>>>>>> +         * superpage size.
>>>>>>> +         */
>>>>>> Does this really need leaving as a to-do?
>>>>> I think so, yes. It won’t break the current workflow if|nr| isn’t aligned,
>>>>> a smaller order will simply be chosen.
>>>> Well, my question was more like "Isn't it simple enough to cover the case
>>>> right away?"
>>>>
>>>>>>> +        mask = !mfn_eq(smfn, INVALID_MFN) ? mfn_x(smfn) : 0;
>>>>>>> +        mask |= gfn_x(sgfn) | nr;
>>>>>>> +
>>>>>>> +        for ( ; i != 0; i-- )
>>>>>>> +        {
>>>>>>> +            if ( !(mask & (BIT(XEN_PT_LEVEL_ORDER(i), UL) - 1)) )
>>>>>>> +            {
>>>>>>> +                    order = XEN_PT_LEVEL_ORDER(i);
>>>>>>> +                    break;
>>>>>> Nit: Style.
>>>>>>
>>>>>>> +            }
>>>>>>> +        }
>>>>>>> +
>>>>>>> +        rc = __p2m_set_entry(p2m, sgfn, order, smfn, t, a);
>>>>>>> +        if ( rc )
>>>>>>> +            break;
>>>>>>> +
>>>>>>> +        sgfn = gfn_add(sgfn, (1 << order));
>>>>>>> +        if ( !mfn_eq(smfn, INVALID_MFN) )
>>>>>>> +           smfn = mfn_add(smfn, (1 << order));
>>>>>>> +
>>>>>>> +        nr -= (1 << order);
>>>>>> Throughout maybe better be safe right away and use 1UL?
>>>>>>
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    return rc;
>>>>>>>     }
>>>>>> How's the caller going to know how much of the range was successfully
>>>>>> mapped?
>>>>> There is no such option. Do other arches do that? I mean returns somehow
>>>>> the number of successfully mapped (sgfn,smfn).
>>>> On x86 we had to introduce some not very nice code to cover for the absence
>>>> of proper handling there. For a new port I think it wants at least 
>>>> seriously
>>>> considering not to repeat such a potentially unhelpful pattern.
>>>>
>>>>>> That part may need undoing (if not here, then in the caller),
>>>>>> or a caller may want to retry.
>>>>> So the caller in the case if rc != 0, can just undoing the full range
>>>>> (by using the same sgfn, nr, smfn).
>>>> Can it? How would it know what the original state was?
>>> You're right — blindly unmapping the range assumes that no entries were 
>>> valid
>>> beforehand and I missed that it could be that something valid was mapped 
>>> before
>>> p2m_set_entry(sgfn,...,smfn) was called.
>>> But then I am not really understand why it won't be an issue if will know
>>> how many GFNs were successfully mapped.
>> The caller may know what that range's state was. But what I really wanted to
>> convey is: Updating multiple entries in one go is complicated in some of the
>> corner cases. You will want to think this through now, in order to avoid the
>> need to re-write everything later again.
> 
> I can add one more argument to return the number of successfully mapped GFNs.
> Fortunately, that's very easy to do.
> 
> The problem for me is that I don’t really understand what the caller is 
> supposed
> to do with that information.

That's only the 2nd step to take. The first is: What behavior do you want, 
overall?

> The only use case I can think of is that the caller
> might try to map the remaining GFNs again. But that doesn’t seem very useful,
> if|p2m_set_entry()| wasn’t able to map the full range, it likely indicates a 
> serious
> issue, and retrying would probably result in the same error.
> 
> The same applies to rolling back the state. It wouldn’t be difficult to add a 
> local
> array to track all modified PTEs and then use it to revert the state if 
> needed.
> But again, what would the caller do after the rollback? At this point, it 
> still seems
> like the best option is simply to|panic(). |
> 
> Basically, I don’t see or understand the cases where knowing how many GFNs 
> were
> successfully mapped, or whether a rollback was performed, would really help — 
> because
> in most cases, I don’t have a better option than just calling|panic()| at the 
> end.

panic()-ing is of course only a last resort. Anything related to domain handling
would better crash only the domain in question. And even that only if suitable
error handling isn't possible.

> For example, if I call|map_regions_p2mt()| for an MMIO region described in a 
> device
> tree node, and the mapping fails partway through, I’m left with two options: 
> either
> ignore the device (if it's not essential for Xen or guest functionality) and 
> continue
>   booting; in which case I’d need to perform a rollback, and simply knowing 
> the number
> of successfully mapped GFNs may not be enough or, more likely, just panic.

Well, no. For example, before even trying to map you could check that the range
of P2M entries covered is all empty. _Then_ you know how to correctly roll back.
And yes, doing so may not even require passing back information on how much of
a region was successfully mapped.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.