[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v6 12/16] xen: implement new foreign copy hypercall


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Frediano Ziglio <freddy77@xxxxxxxxx>
  • Date: Tue, 23 Jun 2026 11:55:58 +0100
  • Arc-authentication-results: i=1; mx.google.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=wdGGmGll3Ht5YC6J3ISLcFBULbyrMD2jWIH0eULPVYk=; fh=q6YmllG+4Nuq2jB/7YclHQodtflgkS2EWtGkU+nirko=; b=XQlpvLpQZibWoRcldZvH40i80I0jcpFmFVjxTrihbKRyLK/4jp3Jfxc8ZirHCKogct 8+y32LIiY5mh5xG/ALfZyq340EXiEKWs3YWkLcjBxRkvuX4cWb8ZqwvlpoWBexyPlhw6 klvTERzI2Q2cFQU2jB9mN+v10iOcfMvuO5UsljrT5qaXEtAY66NpmmloIeMLYxR6LTV0 pLFu1RGXS0iuqEC1K1HFWbEmblKhBZPIeLPnef4aNUgJsme3NisLjGbXSco3dEd+rFuk Zcok3espNpHyxqUrydxk4nCb5m0I4kFeHgwGSciNqBCIsIhTHE23uEt9GxTcCFJ2YfIv dZdQ==; darn=lists.xenproject.org
  • Arc-seal: i=1; a=rsa-sha256; t=1782212171; cv=none; d=google.com; s=arc-20240605; b=g5H6PZkkAPyYFL3ge3tHcpL7YMfWlz6Qxcj49bru4EsAqHYB+qysFmHCfGlV3NANk8 kl2tjR4UOebsrs1IsCMTV4d81iTCfxDMaOZ0FRE7avAXnRba+mH6xuAhp5VC8R4kZJwE vsKisZL3bz2DZt8rjc0BeceZTOmDtnAk8wTkHkjFyNhExB1rfAFGeB6FpNXMaOxuuch5 yHWhUt6JsUu+RkzrDUA4Pi0kCS4DdL1znShjaMpo6zogaQCU4xCdSINvjgk229haXvNz HRUPvFrAcagbo/17SkIVEgbfcRjO5ErJvFvGmhX/A5YyYJEGWZXMqc6a3CoWPsK6bkPA nSJw==
  • Authentication-results: eu.smtp.expurgate.cloud; dkim=pass header.s=20251104 header.d=gmail.com header.i="@gmail.com" header.h="Cc:To:Subject:Message-ID:Date:From:In-Reply-To:References:MIME-Version"
  • Cc: Frediano Ziglio <frediano.ziglio@xxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Teddy Astie <teddy.astie@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Juergen Gross <jgross@xxxxxxxx>, "Daniel P . Smith" <dpsmith@xxxxxxxxxxxxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Tue, 23 Jun 2026 10:56:22 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Mon, 22 Jun 2026 at 11:34, Jan Beulich <jbeulich@xxxxxxxx> wrote:
>
> On 19.06.2026 15:04, Frediano Ziglio wrote:
> > --- a/xen/common/memory.c
> > +++ b/xen/common/memory.c
> > @@ -1545,6 +1545,139 @@ static int acquire_resource(
> >      return rc;
> >  }
> >
> > +/*
> > + * The "noinline" qualifier avoids the compiler to create a large function
> > + * consuming quite a lot of stack.
> > + */
> > +static int noinline mem_foreigncopy(
> > +    XEN_GUEST_HANDLE_PARAM(xen_foreigncopy_t) arg)
> > +{
> > +    struct domain *d, *const currd = current->domain;
> > +    xen_foreigncopy_t copy;
> > +    int rc, direction;
> > +
> > +    if ( copy_from_guest(&copy, arg, 1) )
> > +        return -EFAULT;
> > +
> > +    if ( copy.flags & ~XENMEM_foreigncopy_direction )
> > +        return -EINVAL;
> > +
> > +    direction = copy.flags & XENMEM_foreigncopy_direction;
> > +
> > +    rc = rcu_lock_remote_domain_by_id(copy.domid, &d);
>
> Iirc I did ask before why this isn't ..._by_any_id().
>

I probably was confused by the question about MMUEXT and the 2 domains.
There are different similar hypercalls (like the mentioned MMUEXT but
also hypercalls to map foreign domain memory) that have this check
(not the same domain). Any domain has, obviously, access to its own
memory, so it should not have to use hypercall to access its own
memory. If it does it looks like a mistake causing performance issues
or an attempt to circumvent security; in either case you would like to
avoid it.

> > +    if ( rc )
> > +        return rc;
> > +
> > +    if ( copy.nr_frames == 0 )
> > +    {
> > +        rcu_unlock_domain(d);
> > +        return 0;
> > +    }
>
> Any reason this cannot also be "goto out"? The more that now that you have
> moved this past the domid validity check, imo it should further move to ...
>

The only reason was style and to avoid a memory copy, but it's not a
hot case so I'll change to "goto out" (no strong about it).

> > +    /*
> > +     * Check we are allowed to map and access these foreign pages.
> > +     */
> > +    rc = xsm_map_gmfn_foreign(XSM_TARGET, currd, d);
> > +    if ( rc )
> > +        goto out;
>
> ... below here. Perhaps simply as
>
>     if ( rc || !copy.nr_frames )
>         goto out;
>

I think this would be confusing with the above "Check we are allowed
to map and access these foreign pages" comment.
Are you okay with just the change above to "goto out" ?
Also moving here would potentially change the result and do a useless check.

> > +    do {
> > +        /*
> > +         * Arbitrary size.  Not too much stack space, and a reasonable 
> > stride
> > +         * for continuation checks.
> > +         */
> > +        xen_pfn_t gfn_list[32];
> > +        unsigned int todo = MIN(ARRAY_SIZE(gfn_list), copy.nr_frames);
> > +
> > +        rc = -EFAULT;
> > +        if ( copy_from_guest(gfn_list, copy.frame_list, todo) )
> > +            goto out;
> > +
> > +        for ( unsigned int i = 0; i < todo; i++ )
> > +        {
> > +            struct page_info *foreign_page;
> > +            mfn_t foreign_mfn;
> > +            void *foreign;
> > +            p2m_type_t p2mt;
> > +            const unsigned long valid_mask =
> > +#ifdef CONFIG_X86
> > +                p2m_to_mask(p2m_ram_rw) | p2m_to_mask(p2m_ram_logdirty);
> > +#else
> > +                p2m_to_mask(p2m_ram_rw);
> > +#endif
>
> The set of permitted types didn't change, yet a justification for the 
> resulting
> limitation also didn't appear.
>

Yes, that's missing, indeed.
Should the set of types be different for reading and writing? For
instance do not allow writing to read-only memory?
Given that it looks like different architectures have different
meanings and definitions for these constants, should it not be better
to define some new constants for this specific usage? For instance
P2M_READ_TYPES and P2M_WRITE_TYPES?

> > +            foreign_page = get_page_from_gfn(d, gfn_list[i], &p2mt, 
> > P2M_ALLOC);
> > +
> > +            if ( unlikely(!(p2m_to_mask(p2mt) & valid_mask)) && 
> > foreign_page )
> > +            {
> > +                put_page(foreign_page);
> > +                foreign_page = NULL;
> > +            }
> > +            if ( unlikely(!foreign_page) )
> > +            {
> > +                gdprintk(XENLOG_WARNING,
> > +                         "Error accessing foreign gfn %" PRI_gfn "\n",
> > +                         gfn_list[i]);
> > +                rc = -EINVAL;
> > +                copy.nr_frames -= i;
> > +                guest_handle_add_offset(copy.frame_list, i);
> > +                goto out;
> > +            }
> > +
> > +            foreign_mfn = page_to_mfn(foreign_page);
> > +
> > +            /* A page is dirtied when it's being copied to. */
> > +            if ( direction == XENMEM_foreigncopy_to )
> > +                paging_mark_dirty(d, foreign_mfn);
> > +
> > +            foreign = map_domain_page(foreign_mfn);
> > +            if ( direction == XENMEM_foreigncopy_from )
> > +                rc = copy_to_guest(copy.buffer, foreign, PAGE_SIZE);
> > +            else
> > +                rc = copy_from_guest(foreign, copy.buffer, PAGE_SIZE);
>
> You cannot validly write to the page without holding a PGT_writable ref.
> Else you might overwrite a page table or a descriptor table in a PV guest.
>

Given that this code was "inspired" by other hypercalls I'll also
check the other code.

> Once again - can you please make sure you have addressed earlier review
> comments, before sending a new version? I did point this out before.
>

Apparently not.

> > +            unmap_domain_page(foreign);
> > +            put_page(foreign_page);
> > +
> > +            if ( unlikely(rc) )
> > +            {
> > +                gdprintk(XENLOG_WARNING,
> > +                         "Error %d copying gfn %" PRI_gfn "\n",
> > +                         -rc, gfn_list[i]);
>
> Why "-rc"? (See other log messages including error codes.)
>

Because the errors are positive but for ABI we return them as negative.
But I suppose if for other messages we use the negated value this
should be just "rc".
I'll change.

> > +                copy.nr_frames -= i;
> > +                guest_handle_add_offset(copy.frame_list, i);
> > +                goto out;
> > +            }
> > +
> > +            guest_handle_add_offset(copy.buffer, PAGE_SIZE);
> > +        }
> > +
> > +        copy.nr_frames -= todo;
> > +        guest_handle_add_offset(copy.frame_list, todo);
>
> Don't you need to also update copy.buffer?
>

It's updated some lines above inside the loop.

> > @@ -2012,6 +2145,18 @@ long do_memory_op(unsigned long cmd, 
> > XEN_GUEST_HANDLE_PARAM(void) arg)
> >              start_extent);
> >          break;
> >
> > +    case XENMEM_foreigncopy:
> > +        /*
> > +         * Instead of using "start_extent" we update the structure back,
> > +         * we update it back in anyway to tell caller were the copy
> > +         * stopped.
> > +         */
> > +        if ( unlikely(start_extent) )
> > +            return -EINVAL;
>
> As before - please be precise with comments like this. We update it back also
> when encoding a continuation. Perhaps instead "..., to indicate the point of
> failure to the caller as well as to encode continuations without being
> constrained by MEMOP_EXTENT_SHIFT".
>

What about (trying to include your suggestion, to be fixed for line length):

        /*
         * Instead of using "start_extent" for the continuation, we
update the structure back,
         * we update the xen_foreigncopy structure back, so we are not
constrained
         * by MEMOP_EXTENT_SHIFT.
         * We copy it back also to tell the caller where the copy stopped.
         */

> > --- a/xen/include/public/memory.h
> > +++ b/xen/include/public/memory.h
> > @@ -740,7 +740,49 @@ struct xen_vnuma_topology_info {
> >  typedef struct xen_vnuma_topology_info xen_vnuma_topology_info_t;
> >  DEFINE_XEN_GUEST_HANDLE(xen_vnuma_topology_info_t);
> >
> > -/* Next available subop number is 29 */
> > +/*
> > + * Copy memory from/to a given domain.
> > + * As this call requires target access and guest with target access won't 
> > be
> > + * compat guests supported for compat guests this is not implemented.
>
> As before - I question this. You simply can't know. (I'm also struggling with
> wording / grammar.)
>

I was trying to code the compatibility layer. Is there a way to have
64 bit PFN even for compatibility guests instead of having to limit
and convert PFN numbers?

> > + */
> > +#define XENMEM_foreigncopy 29
> > +struct xen_foreigncopy {
> > +    /* IN - The domain whose memory is to be copied. */
> > +    domid_t domid;
> > +
> > +    /* IN - Flags. */
> > +#define XENMEM_foreigncopy_from 0
> > +#define XENMEM_foreigncopy_to 1
> > +#define XENMEM_foreigncopy_direction 1
> > +    uint16_t flags;
> > +
> > +    /*
> > +     * IN/OUT
> > +     *
> > +     * As an IN parameter number of frames of the domain to be copied.
> > +     * On output on error updated number of frames left.
> > +     */
> > +    uint32_t nr_frames;
> > +
> > +    /*
> > +     * IN/OUT
> > +     *
> > +     * Frames to be copied.
> > +     * On output on error updated to point to first frame unhandled.
>
> Is "on error" really correct / meaningful? The field can be updated at
> any intermediate point, when a continuation is scheduled. Perhaps:
>
>      * On output:
>      *  - on error updated to point to first frame which couldn't be handled,
>      *  - on success undefined.
>
> Along these lines for nr_frames then as well (if needed at all, seeing
> that it could as well be undefined in both cases, as the information is
> redundant with the frame_list update).
>
> > +     */
> > +    XEN_GUEST_HANDLE(xen_pfn_t) frame_list;
> > +
> > +    /*
> > +     * IN/OUT
> > +     *
> > +     * Userspace buffer to read/write from.
>
> s/Userspace/Guest/ ?
>
> Also still no mention of when / how this field is updated.
>

What about:

/*
 * Copy memory from/to a given domain.
 */
#define XENMEM_foreigncopy 29
struct xen_foreigncopy {
    /* IN - The domain whose memory is to be copied. */
    domid_t domid;

    /* IN - Flags. */
#define XENMEM_foreigncopy_from 0
#define XENMEM_foreigncopy_to 1
#define XENMEM_foreigncopy_direction 1
    uint16_t flags;

    /*
     * IN/OUT
     *
     * As an IN parameter number of frames of the domain to be copied.
     * On output updated number of frames left (0 if success).
     */
    uint32_t nr_frames;

    /*
     * IN/OUT
     *
     * Frames to be copied.
     * On output updated to point to the first frame unhandled.
     */
    XEN_GUEST_HANDLE(xen_pfn_t) frame_list;

    /*
     * IN/OUT
     *
     * Guest buffer to read/write from.
     * On output updated to point to the first frame unhandled.
     */
    XEN_GUEST_HANDLE(uint8) buffer;
};
typedef struct xen_foreigncopy xen_foreigncopy_t;
DEFINE_XEN_GUEST_HANDLE(xen_foreigncopy_t);

> > +     */
> > +    XEN_GUEST_HANDLE(uint8) buffer;
> > +};
>
> What was (again) left unaddressed is the question towards using GFNs on both
> sides of the copy. This would eliminate the need for the flags field, taken
> by a 2nd domid_t one then.
>

This was addressed in
https://lists.xenproject.org/archives/html/xen-devel/2026-06/msg00567.html
and in minor way by
https://lists.xenproject.org/archives/html/xen-devel/2026-06/msg00847.html.
It was considered but more complicated and worse from a performance perspective.

> Jan

Frediano



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.