[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 11/17] xen/riscv: implement p2m_set_entry() and __p2m_set_entry()

To: Jan Beulich <jbeulich@xxxxxxxx>
From: Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>
Date: Mon, 7 Jul 2025 18:10:41 +0200
Cc: Alistair Francis <alistair.francis@xxxxxxx>, Bob Eshleman <bobbyeshleman@xxxxxxxxx>, Connor Davis <connojdavis@xxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Julien Grall <julien@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
Delivery-date: Mon, 07 Jul 2025 16:10:49 +0000
List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 7/7/25 5:15 PM, Jan Beulich wrote:

On 07.07.2025 17:00, Oleksii Kurochko wrote:

On 7/7/25 2:53 PM, Jan Beulich wrote:

On 07.07.2025 13:46, Oleksii Kurochko wrote:

On 7/7/25 9:20 AM, Jan Beulich wrote:

On 04.07.2025 17:01, Oleksii Kurochko wrote:

On 7/1/25 3:49 PM, Jan Beulich wrote:

On 10.06.2025 15:05, Oleksii Kurochko wrote:

+{
+    panic("%s: isn't implemented for now\n", __func__);
+
+    return false;
+}

For this function in particular, though: Besides the "p2me" in the name
being somewhat odd (supposedly page table entries here are simply pte_t),
how is this going to be different from pte_is_valid()?

pte_is_valid() is checking a real bit of PTE, but p2me_is_valid() is checking
what is a type stored in the radix tree (p2m->p2m_types):
     /*
      * In the case of the P2M, the valid bit is used for other purpose. Use
      * the type to check whether an entry is valid.
      */
     static inline bool p2me_is_valid(struct p2m_domain *p2m, pte_t pte)
     {
         return p2m_type_radix_get(p2m, pte) != p2m_invalid;
     }

It is done to track which page was modified by a guest.

But then (again) the name doesn't convey what the function does.

Then probably p2me_type_is_valid(struct p2m_domain *p2m, pte_t pte) would better.

For P2M type checks please don't invent new naming, but use what both x86
and Arm are already using. Note how we already have p2m_is_valid() in that
set. Just that it's not doing what you want here.

Hm, why not doing what I want? p2m_is_valid() verifies if P2M entry is valid.
And in here it is checked if P2M pte is valid from P2M point of view by checking
the type in radix tree and/or in reserved PTEs bits (just to remind we have only 2
free bits for type).

Because this is how it's defined on x86:

#define p2m_is_valid(_t)    (p2m_to_mask(_t) & \
                             (P2M_RAM_TYPES | p2m_to_mask(p2m_mmio_direct)))

I.e. more strict that simply "!= p2m_invalid". And I think such predicates
would better be uniform across architectures, such that in principle they
might also be usable in common code (as we already do with p2m_is_foreign()).

Yeah, Arm isn't so strict in definition of p2m_is_valid() and it seems like
x86 and Arm have different understanding what is valid.

Except what mentioned in the comment that grant types aren't considered valid
for x86 (and shouldn't be the same then for Arm?), it isn't clear why x86's
p2m_is_valid() is stricter then Arm's one and if other arches should be also
so strict.
It seems like from the point of view of mapping/unmapping it is enough just
to verify a "copy" of PTE's valid bit (in terms of P2M it is p2m_invalid type).

   Plus
can't a guest also arrange for an entry's type to move to p2m_invalid?
That's then still an entry that was modified by the guest.

I am not really sure that I fully understand the question.
Do you ask if a guest can do something which will lead to a call of p2m_set_entry()
with p2m_invalid argument?

That I'm not asking, but rather stating. I.e. I expect such is possible.

If yes, then it seems like it will be done only in case of p2m_remove_mapping() what
will mean that alongside with p2m_invalid INVALID_MFN will be also passed, what means
this entry isn't expected to be used anymore.

Right. But such an entry would still have been "modified" by the guest.

Yes, but nothing then is needed to do with it.

I understand that. Maybe I'm overly picky, but all of the above was in response
to you saying "It is done to track which page was modified by a guest." And I'm
simply trying to get you to use precise wording, both in code comments and in
discussions. In a case like the one here I simply can't judge whether you simply
expressed yourself not clear enough, or whether you indeed meant what you said.

+        /*
+         * Don't take into account the MFN when removing mapping (i.e
+         * MFN_INVALID) to calculate the correct target order.
+         *
+         * XXX: Support superpage mappings if nr is not aligned to a
+         * superpage size.
+         */

Does this really need leaving as a to-do?

I think so, yes. It won’t break the current workflow if|nr| isn’t aligned,
a smaller order will simply be chosen.

Well, my question was more like "Isn't it simple enough to cover the case
right away?"

+        mask = !mfn_eq(smfn, INVALID_MFN) ? mfn_x(smfn) : 0;
+        mask |= gfn_x(sgfn) | nr;
+
+        for ( ; i != 0; i-- )
+        {
+            if ( !(mask & (BIT(XEN_PT_LEVEL_ORDER(i), UL) - 1)) )
+            {
+                    order = XEN_PT_LEVEL_ORDER(i);
+                    break;

Nit: Style.

+            }
+        }
+
+        rc = __p2m_set_entry(p2m, sgfn, order, smfn, t, a);
+        if ( rc )
+            break;
+
+        sgfn = gfn_add(sgfn, (1 << order));
+        if ( !mfn_eq(smfn, INVALID_MFN) )
+           smfn = mfn_add(smfn, (1 << order));
+
+        nr -= (1 << order);

Throughout maybe better be safe right away and use 1UL?

+    }
+
+    return rc;
    }

How's the caller going to know how much of the range was successfully
mapped?

There is no such option. Do other arches do that? I mean returns somehow
the number of successfully mapped (sgfn,smfn).

On x86 we had to introduce some not very nice code to cover for the absence
of proper handling there. For a new port I think it wants at least seriously
considering not to repeat such a potentially unhelpful pattern.

That part may need undoing (if not here, then in the caller),
or a caller may want to retry.

So the caller in the case if rc != 0, can just undoing the full range
(by using the same sgfn, nr, smfn).

Can it? How would it know what the original state was?

You're right — blindly unmapping the range assumes that no entries were valid
beforehand and I missed that it could be that something valid was mapped before
p2m_set_entry(sgfn,...,smfn) was called.
But then I am not really understand why it won't be an issue if will know
how many GFNs were successfully mapped.

The caller may know what that range's state was. But what I really wanted to
convey is: Updating multiple entries in one go is complicated in some of the
corner cases. You will want to think this through now, in order to avoid the
need to re-write everything later again.

I can add one more argument to return the number of successfully mapped GFNs.
Fortunately, that's very easy to do.

The problem for me is that I don’t really understand what the caller is supposed
to do with that information.

That's only the 2nd step to take. The first is: What behavior do you want, overall?

My initial idea was that if something went wrong ( rc != 0 ) then just panic(). But
based on your questions it seems like it isn't the best one idea.

The only use case I can think of is that the caller
might try to map the remaining GFNs again. But that doesn’t seem very useful,
if|p2m_set_entry()| wasn’t able to map the full range, it likely indicates a serious
issue, and retrying would probably result in the same error.

The same applies to rolling back the state. It wouldn’t be difficult to add a local
array to track all modified PTEs and then use it to revert the state if needed.
But again, what would the caller do after the rollback? At this point, it still seems
like the best option is simply to|panic(). |

Basically, I don’t see or understand the cases where knowing how many GFNs were
successfully mapped, or whether a rollback was performed, would really help — because
in most cases, I don’t have a better option than just calling|panic()| at the end.

panic()-ing is of course only a last resort. Anything related to domain handling
would better crash only the domain in question. And even that only if suitable
error handling isn't possible.

And if there is no still any runnable domain available, for example, we are creating
domain and some p2m mapping is called? Will it be enough just ignore to boot this domain?
If yes, then it is enough to return only error code without returning how many GFNs were
mapped or rollbacking as domain won't be ran anyway.
(just to mention, I am not trying to convince you that rollback or returning of an amount
of GFNs isn't necessary, I just trying to understand what is the best implementation of
handling none-fully mapped mappings you mentioned)

For example, if I call|map_regions_p2mt()| for an MMIO region described in a device
tree node, and the mapping fails partway through, I’m left with two options: either
ignore the device (if it's not essential for Xen or guest functionality) and continue
  booting; in which case I’d need to perform a rollback, and simply knowing the number
of successfully mapped GFNs may not be enough or, more likely, just panic.

Well, no. For example, before even trying to map you could check that the range
of P2M entries covered is all empty.

Could it be that they aren't all empty? Then it seems like we have overlapping and we can't
just do a mapping, right?
Won't be this procedure consume a lot of time as it is needed to go through each page
tables for each entry.

 _Then_ you know how to correctly roll back.
And yes, doing so may not even require passing back information on how much of
a region was successfully mapped.

If P2M entries were empty before start of the mapping then it is enough to just go
through the same range (sgfn,nr,smfn) and just clean them, right?

Thanks.

~ Oleksii

Follow-Ups:
- Re: [PATCH v2 11/17] xen/riscv: implement p2m_set_entry() and __p2m_set_entry()
  - From: Jan Beulich

References:
- Re: [PATCH v2 11/17] xen/riscv: implement p2m_set_entry() and __p2m_set_entry()
  - From: Jan Beulich
- Re: [PATCH v2 11/17] xen/riscv: implement p2m_set_entry() and __p2m_set_entry()
  - From: Oleksii Kurochko
- Re: [PATCH v2 11/17] xen/riscv: implement p2m_set_entry() and __p2m_set_entry()
  - From: Jan Beulich
- Re: [PATCH v2 11/17] xen/riscv: implement p2m_set_entry() and __p2m_set_entry()
  - From: Oleksii Kurochko
- Re: [PATCH v2 11/17] xen/riscv: implement p2m_set_entry() and __p2m_set_entry()
  - From: Jan Beulich
- Re: [PATCH v2 11/17] xen/riscv: implement p2m_set_entry() and __p2m_set_entry()
  - From: Oleksii Kurochko
- Re: [PATCH v2 11/17] xen/riscv: implement p2m_set_entry() and __p2m_set_entry()
  - From: Jan Beulich

Prev by Date: Re: [PATCH] xen/efi: Fix crash with initial empty EFI options
Next by Date: Re: [PATCH v1 1/5] vpci: const-ify some pdev instances
Previous by thread: Re: [PATCH v2 11/17] xen/riscv: implement p2m_set_entry() and __p2m_set_entry()
Next by thread: Re: [PATCH v2 11/17] xen/riscv: implement p2m_set_entry() and __p2m_set_entry()
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.