[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v6 12/16] xen: implement new foreign copy hypercall


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Frediano Ziglio <freddy77@xxxxxxxxx>
  • Date: Tue, 23 Jun 2026 22:18:34 +0100
  • Arc-authentication-results: i=1; mx.google.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=Cnc2fuoXsJcrWpZAtUBKnM72S4LMmkdFSFVFY0TpXoA=; fh=6TZj3pGG48U2F8n7KqE0Z4H7JONILxc7zEHghnyScxc=; b=kQKKtxsNGaoy/nIDlSrhenL/t5+kEVhJxZfoaGYgAE+Wz8+1HBr9SzoKtfyU4DqiDE WvVTBAWzGjVhI3lIe409rfti+ZtMbrn+//25jh9udJhjzYPtjsPxbRM6P6HV0dg/OLR1 u6IBTVCALqjk5yEmH6prg04q1Kl+6nTr+mJPVWl0lW/C8eHBdkwJJHbhtZLT5dy3sgLm 3QLJFETWjbZ6BLUDb+dHTHE4Dosp6OBme6OV3jh9E1ip/u7/KH0fwrBbFX1p/5tzmHVP 8nQM55EilX4OMynZw8M2GVmJU9yR2PHpVbEAbHhiFgYn5fMXLIFHCvhE31X6JXeZfyuD yHFw==; darn=lists.xenproject.org
  • Arc-seal: i=1; a=rsa-sha256; t=1782249526; cv=none; d=google.com; s=arc-20240605; b=G0efw4ipINLZ4bpQU8f9GSzvmSHjX8zGo3X0D+CvAW2Pjr2cBXSXxwnR27oGBacqV7 rk/Fbu/sv4PuNPtb9s3RRKQIdhmhZjCBiKoLzvgpSq+gJu90UXxjRVe4lyTwhBm2E0Yf XwP8Mmv1z7dK8slQ4JLA3VBzKIuMsb+KAvPvsPNG4BT3Gs7SZjvJi8k2nit4MJrZXkk5 UpOmQhbMbDuQUCoh6qya5pgSGX+svy4JBZr5llRj9GriHwn1140Ip7LqELPvQjNZqMhz 7jbuug19m98eLWKQdJDZ8qgxmBH7n2K0tVgIb/P5TCyTc1VWr8zPVtTEVU0vZCPvVRoE pVdg==
  • Authentication-results: eu.smtp.expurgate.cloud; dkim=pass header.s=20251104 header.d=gmail.com header.i="@gmail.com" header.h="Cc:To:Subject:Message-ID:Date:From:In-Reply-To:References:MIME-Version"
  • Cc: Frediano Ziglio <frediano.ziglio@xxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Teddy Astie <teddy.astie@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Juergen Gross <jgross@xxxxxxxx>, "Daniel P . Smith" <dpsmith@xxxxxxxxxxxxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Tue, 23 Jun 2026 21:18:56 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Tue, 23 Jun 2026 at 14:21, Jan Beulich <jbeulich@xxxxxxxx> wrote:
>
> On 23.06.2026 12:55, Frediano Ziglio wrote:
> > On Mon, 22 Jun 2026 at 11:34, Jan Beulich <jbeulich@xxxxxxxx> wrote:
> >> On 19.06.2026 15:04, Frediano Ziglio wrote:
> >>> --- a/xen/common/memory.c
> >>> +++ b/xen/common/memory.c
> >>> @@ -1545,6 +1545,139 @@ static int acquire_resource(
> >>>      return rc;
> >>>  }
> >>>
> >>> +/*
> >>> + * The "noinline" qualifier avoids the compiler to create a large 
> >>> function
> >>> + * consuming quite a lot of stack.
> >>> + */
> >>> +static int noinline mem_foreigncopy(
> >>> +    XEN_GUEST_HANDLE_PARAM(xen_foreigncopy_t) arg)
> >>> +{
> >>> +    struct domain *d, *const currd = current->domain;
> >>> +    xen_foreigncopy_t copy;
> >>> +    int rc, direction;
> >>> +
> >>> +    if ( copy_from_guest(&copy, arg, 1) )
> >>> +        return -EFAULT;
> >>> +
> >>> +    if ( copy.flags & ~XENMEM_foreigncopy_direction )
> >>> +        return -EINVAL;
> >>> +
> >>> +    direction = copy.flags & XENMEM_foreigncopy_direction;
> >>> +
> >>> +    rc = rcu_lock_remote_domain_by_id(copy.domid, &d);
> >>
> >> Iirc I did ask before why this isn't ..._by_any_id().
> >
> > I probably was confused by the question about MMUEXT and the 2 domains.
> > There are different similar hypercalls (like the mentioned MMUEXT but
> > also hypercalls to map foreign domain memory) that have this check
> > (not the same domain). Any domain has, obviously, access to its own
> > memory, so it should not have to use hypercall to access its own
> > memory. If it does it looks like a mistake causing performance issues
> > or an attempt to circumvent security; in either case you would like to
> > avoid it.
>
> No. Self-grants are possible as well, for example, and for a good reason.
> Allowing normally-remote operations on oneself helps with testing, for
> example. It may also help avoid needing to special-case "self" in code
> which needs to cover both cases.
>

But this is not a grant, it's a copy.

> >>> +    if ( rc )
> >>> +        return rc;
> >>> +
> >>> +    if ( copy.nr_frames == 0 )
> >>> +    {
> >>> +        rcu_unlock_domain(d);
> >>> +        return 0;
> >>> +    }
> >>
> >> Any reason this cannot also be "goto out"? The more that now that you have
> >> moved this past the domid validity check, imo it should further move to ...
> >
> > The only reason was style and to avoid a memory copy, but it's not a
> > hot case so I'll change to "goto out" (no strong about it).
> >
> >>> +    /*
> >>> +     * Check we are allowed to map and access these foreign pages.
> >>> +     */
> >>> +    rc = xsm_map_gmfn_foreign(XSM_TARGET, currd, d);
> >>> +    if ( rc )
> >>> +        goto out;
> >>
> >> ... below here. Perhaps simply as
> >>
> >>     if ( rc || !copy.nr_frames )
> >>         goto out;
> >>
> >
> > I think this would be confusing with the above "Check we are allowed
> > to map and access these foreign pages" comment.
> > Are you okay with just the change above to "goto out" ?
>
> I do want the order adjusted as indicated. I won't insist on (but I would
> prefer) folding both if()-s.
>

What about

    /*
     * Check we are allowed to map and access these foreign pages.
     */
    rc = xsm_map_gmfn_foreign(XSM_TARGET, currd, d);
    if ( rc )
        goto out;

    while ( copy.nr_frames )
    {
        /*
         * Arbitrary size.  Not too much stack space, and a reasonable stride
         * for continuation checks.
         */


> > Also moving here would potentially change the result and do a useless check.
>
> Affecting the result is the goal of the re-ordering.
>

Changed.

> >>> +    do {
> >>> +        /*
> >>> +         * Arbitrary size.  Not too much stack space, and a reasonable 
> >>> stride
> >>> +         * for continuation checks.
> >>> +         */
> >>> +        xen_pfn_t gfn_list[32];
> >>> +        unsigned int todo = MIN(ARRAY_SIZE(gfn_list), copy.nr_frames);
> >>> +
> >>> +        rc = -EFAULT;
> >>> +        if ( copy_from_guest(gfn_list, copy.frame_list, todo) )
> >>> +            goto out;
> >>> +
> >>> +        for ( unsigned int i = 0; i < todo; i++ )
> >>> +        {
> >>> +            struct page_info *foreign_page;
> >>> +            mfn_t foreign_mfn;
> >>> +            void *foreign;
> >>> +            p2m_type_t p2mt;
> >>> +            const unsigned long valid_mask =
> >>> +#ifdef CONFIG_X86
> >>> +                p2m_to_mask(p2m_ram_rw) | p2m_to_mask(p2m_ram_logdirty);
> >>> +#else
> >>> +                p2m_to_mask(p2m_ram_rw);
> >>> +#endif
> >>
> >> The set of permitted types didn't change, yet a justification for the 
> >> resulting
> >> limitation also didn't appear.
> >>
> >
> > Yes, that's missing, indeed.
> > Should the set of types be different for reading and writing? For
> > instance do not allow writing to read-only memory?
>
> Of course.
>
> > Given that it looks like different architectures have different
> > meanings and definitions for these constants, should it not be better
> > to define some new constants for this specific usage? For instance
> > P2M_READ_TYPES and P2M_WRITE_TYPES?
>
> Perhaps, yes. The suggested names look overly generic to me, though.
>

I suppose P2M_READABLE_TYPES and P2M_WRITABLE_TYPES are more correct
but still too generic.
P2M_EXPORTABLE_TYPES and P2M_IMPORTABLE_TYPES ?

> >>> +            foreign_page = get_page_from_gfn(d, gfn_list[i], &p2mt, 
> >>> P2M_ALLOC);
> >>> +
> >>> +            if ( unlikely(!(p2m_to_mask(p2mt) & valid_mask)) && 
> >>> foreign_page )
> >>> +            {
> >>> +                put_page(foreign_page);
> >>> +                foreign_page = NULL;
> >>> +            }
> >>> +            if ( unlikely(!foreign_page) )
> >>> +            {
> >>> +                gdprintk(XENLOG_WARNING,
> >>> +                         "Error accessing foreign gfn %" PRI_gfn "\n",
> >>> +                         gfn_list[i]);
> >>> +                rc = -EINVAL;
> >>> +                copy.nr_frames -= i;
> >>> +                guest_handle_add_offset(copy.frame_list, i);
> >>> +                goto out;
> >>> +            }
> >>> +
> >>> +            foreign_mfn = page_to_mfn(foreign_page);
> >>> +
> >>> +            /* A page is dirtied when it's being copied to. */
> >>> +            if ( direction == XENMEM_foreigncopy_to )
> >>> +                paging_mark_dirty(d, foreign_mfn);
> >>> +
> >>> +            foreign = map_domain_page(foreign_mfn);
> >>> +            if ( direction == XENMEM_foreigncopy_from )
> >>> +                rc = copy_to_guest(copy.buffer, foreign, PAGE_SIZE);
> >>> +            else
> >>> +                rc = copy_from_guest(foreign, copy.buffer, PAGE_SIZE);
> >>
> >> You cannot validly write to the page without holding a PGT_writable ref.
> >> Else you might overwrite a page table or a descriptor table in a PV guest.
> >>
> >
> > Given that this code was "inspired" by other hypercalls I'll also
> > check the other code.
> >
> >> Once again - can you please make sure you have addressed earlier review
> >> comments, before sending a new version? I did point this out before.
> >
> > Apparently not.
>
> https://lists.xen.org/archives/html/xen-devel/2026-06/msg00850.html
>

The "apparently not" is the reply to "can you please make sure you
have addressed earlier review comments, before sending a new version".

> >>> +                copy.nr_frames -= i;
> >>> +                guest_handle_add_offset(copy.frame_list, i);
> >>> +                goto out;
> >>> +            }
> >>> +
> >>> +            guest_handle_add_offset(copy.buffer, PAGE_SIZE);
> >>> +        }
> >>> +
> >>> +        copy.nr_frames -= todo;
> >>> +        guest_handle_add_offset(copy.frame_list, todo);
> >>
> >> Don't you need to also update copy.buffer?
> >
> > It's updated some lines above inside the loop.
>
> Oh, sorry. Yet then - not doing all updates together is, as you can see,
> potentially confusing.
>
> >>> @@ -2012,6 +2145,18 @@ long do_memory_op(unsigned long cmd, 
> >>> XEN_GUEST_HANDLE_PARAM(void) arg)
> >>>              start_extent);
> >>>          break;
> >>>
> >>> +    case XENMEM_foreigncopy:
> >>> +        /*
> >>> +         * Instead of using "start_extent" we update the structure back,
> >>> +         * we update it back in anyway to tell caller were the copy
> >>> +         * stopped.
> >>> +         */
> >>> +        if ( unlikely(start_extent) )
> >>> +            return -EINVAL;
> >>
> >> As before - please be precise with comments like this. We update it back 
> >> also
> >> when encoding a continuation. Perhaps instead "..., to indicate the point 
> >> of
> >> failure to the caller as well as to encode continuations without being
> >> constrained by MEMOP_EXTENT_SHIFT".
> >>
> >
> > What about (trying to include your suggestion, to be fixed for line length):
> >
> >         /*
> >          * Instead of using "start_extent" for the continuation, we
> > update the structure back,
> >          * we update the xen_foreigncopy structure back, so we are not
> > constrained
> >          * by MEMOP_EXTENT_SHIFT.
> >          * We copy it back also to tell the caller where the copy stopped.
> >          */
>
> One of the things I take issue with (because it's hard to read that way,
> at least for me) is the repeated use of "update ... back", effectively
> saying the same things twice. The last sentence also wants disambiguating
> towards the "stopped" possibly being a non-error situation as well.
>

Changed to

        /*
         * Instead of using "start_extent" for the continuation, we update
         * the xen_foreigncopy structure back, so we are not constrained by
         * MEMOP_EXTENT_SHIFT.
         * We copy it back also to tell the caller where the copy stopped
         * (either for error or because all frames were copied).
         */

> >>> --- a/xen/include/public/memory.h
> >>> +++ b/xen/include/public/memory.h
> >>> @@ -740,7 +740,49 @@ struct xen_vnuma_topology_info {
> >>>  typedef struct xen_vnuma_topology_info xen_vnuma_topology_info_t;
> >>>  DEFINE_XEN_GUEST_HANDLE(xen_vnuma_topology_info_t);
> >>>
> >>> -/* Next available subop number is 29 */
> >>> +/*
> >>> + * Copy memory from/to a given domain.
> >>> + * As this call requires target access and guest with target access 
> >>> won't be
> >>> + * compat guests supported for compat guests this is not implemented.
> >>
> >> As before - I question this. You simply can't know. (I'm also struggling 
> >> with
> >> wording / grammar.)
> >
> > I was trying to code the compatibility layer. Is there a way to have
> > 64 bit PFN even for compatibility guests instead of having to limit
> > and convert PFN numbers?
>
> compat_pfn_t is a typedef of unsigned int (since a 32-bit guest seeing
> "typedef unsigned long xen_pfn_t;" results in xen_pfn_t being a 32-bit
> quantity for it), so 32-bit guests can only supply 32-bit frame numbers.
> There's also no value in trying to be clever and using uint64_t instead
> for the frame_list handle, as 32-bit guests won't ever own pages with
> MFNs wider than 32 bits.
>

They don't need to own such pages. But probably they would also need
other hypercalls to support larger frame numbers. To be honest you
just point out a reason to not support this for compatible guests.

> >>> + */
> >>> +#define XENMEM_foreigncopy 29
> >>> +struct xen_foreigncopy {
> >>> +    /* IN - The domain whose memory is to be copied. */
> >>> +    domid_t domid;
> >>> +
> >>> +    /* IN - Flags. */
> >>> +#define XENMEM_foreigncopy_from 0
> >>> +#define XENMEM_foreigncopy_to 1
> >>> +#define XENMEM_foreigncopy_direction 1
> >>> +    uint16_t flags;
> >>> +
> >>> +    /*
> >>> +     * IN/OUT
> >>> +     *
> >>> +     * As an IN parameter number of frames of the domain to be copied.
> >>> +     * On output on error updated number of frames left.
> >>> +     */
> >>> +    uint32_t nr_frames;
> >>> +
> >>> +    /*
> >>> +     * IN/OUT
> >>> +     *
> >>> +     * Frames to be copied.
> >>> +     * On output on error updated to point to first frame unhandled.
> >>
> >> Is "on error" really correct / meaningful? The field can be updated at
> >> any intermediate point, when a continuation is scheduled. Perhaps:
> >>
> >>      * On output:
> >>      *  - on error updated to point to first frame which couldn't be 
> >> handled,
> >>      *  - on success undefined.
> >>
> >> Along these lines for nr_frames then as well (if needed at all, seeing
> >> that it could as well be undefined in both cases, as the information is
> >> redundant with the frame_list update).
> >>
> >>> +     */
> >>> +    XEN_GUEST_HANDLE(xen_pfn_t) frame_list;
> >>> +
> >>> +    /*
> >>> +     * IN/OUT
> >>> +     *
> >>> +     * Userspace buffer to read/write from.
> >>
> >> s/Userspace/Guest/ ?
> >>
> >> Also still no mention of when / how this field is updated.
> >>
> >
> > What about:
> >
> > /*
> >  * Copy memory from/to a given domain.
> >  */
> > #define XENMEM_foreigncopy 29
> > struct xen_foreigncopy {
> >     /* IN - The domain whose memory is to be copied. */
> >     domid_t domid;
> >
> >     /* IN - Flags. */
> > #define XENMEM_foreigncopy_from 0
> > #define XENMEM_foreigncopy_to 1
> > #define XENMEM_foreigncopy_direction 1
> >     uint16_t flags;
> >
> >     /*
> >      * IN/OUT
> >      *
> >      * As an IN parameter number of frames of the domain to be copied.
> >      * On output updated number of frames left (0 if success).
> >      */
> >     uint32_t nr_frames;
> >
> >     /*
> >      * IN/OUT
> >      *
> >      * Frames to be copied.
> >      * On output updated to point to the first frame unhandled.
>
> There may be no such frame, so at the very least add "..., if any"?
>

Added.

> >      */
> >     XEN_GUEST_HANDLE(xen_pfn_t) frame_list;
> >
> >     /*
> >      * IN/OUT
> >      *
> >      * Guest buffer to read/write from.
> >      * On output updated to point to the first frame unhandled.
>
> There's no frame here, as long as you don't switch to using two frame
> lists (for source and destination).
>

Changed to

    /*
     * IN/OUT
     *
     * Guest buffer to read/write from.
     * On output updated to point to the first page pointer unhandled.
     */

> >      */
> >     XEN_GUEST_HANDLE(uint8) buffer;
> > };
> > typedef struct xen_foreigncopy xen_foreigncopy_t;
> > DEFINE_XEN_GUEST_HANDLE(xen_foreigncopy_t);
> >
> >>> +     */
> >>> +    XEN_GUEST_HANDLE(uint8) buffer;
> >>> +};
> >>
> >> What was (again) left unaddressed is the question towards using GFNs on 
> >> both
> >> sides of the copy. This would eliminate the need for the flags field, taken
> >> by a 2nd domid_t one then.
> >>
> >
> > This was addressed in
> > https://lists.xenproject.org/archives/html/xen-devel/2026-06/msg00567.html
>
> Well, yes, but not in a satisfactory way. Back channels tell me that you
> actually got the same feedback already on internal review. Which makes it
> all the more puzzling that you insist on doing it differently. Multiple
> maintainers asking for the same thing may be an indication of something.
>

Not needing to have backchannel feedback, I already wrote that a
similar approach was tried and made the code more complicated.
Both maintainers didn't comment on my replies so I assume they were
fine with it.
And you are failing to provide positive feedback.
I asked (that one internally) for examples of guest buffers provided
as frame numbers but I got no answer (or better the answer was more
"currently there are not").
Also note that the location of xen_foreigncopy_t structure is also
provided using a guest pointer.
I remember there were some discussions about ABI changes (2/3 years
ago) to address this and other issues but I cannot see much progress.
That's why I say this is out of scope.

> > and in minor way by
> > https://lists.xenproject.org/archives/html/xen-devel/2026-06/msg00847.html.
> > It was considered but more complicated and worse from a performance 
> > perspective.
>
> Okay, performance-wise worse would of course be relevant. But that would
> need supporting by numbers (for both PV and PVH Dom0, as the latter
> incurs extra overhead for virtual-address-based hypercall buffer operands).
>

I'm more concerned about the PV case than PVH to be honest.

I was having an idea about solving the pointer/frames issue but, as I
said, it's out of scope here.

> Jan

Frediano



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.