Xen project Mailing List

Re: [PATCH v6 12/16] xen: implement new foreign copy hypercall

From: Frediano Ziglio <freddy77@xxxxxxxxx>

Date: Fri, 26 Jun 2026 15:14:12 +0100

Arc-authentication-results: i=1; mx.google.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20260327; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=oM/zylGJKaVcnbd5CR3L2Xo39J/1wOlY7y+FkLODX6Y=; fh=r/bSQHOXqKreMRuH+rT/9CKiQXFwCXJK2kdrgt7oTag=; b=Q5FWjDSkYR9hWAzpXrEblT55ZTulZLWgoO7z7mR5kGdjvATBZuo9pHiVlquG6lxYPX 4c32wd61M9wAB0ZgD1SU7KszPeXW1Aeg3kMMz3tYrZqDNpwoCM3MOM4QSzunmT9CPtyH U+l1tR3JDl5fXNX+DnT5RSLEAHbLG1qPy+DzcQ0JuzkGyNAtGB+3sII+atv/0dVkHfBC ge88VE5+vAisgL8oJW4k1nKG6WKkMIbwBfe9oUclLaSFB7bHp39R23gA7a3BHgJswhUN b/EmYdne7HeSX8WN6EEnfEYNmLG0MP6W5S2k9ATFDmA29kgLbLa8j3vBWOCvWazxgz9v 5C0A==; darn=lists.xenproject.org

Arc-seal: i=1; a=rsa-sha256; t=1782483263; cv=none; d=google.com; s=arc-20260327; b=cMhzJk1Jr6q+LK9fVPcM2xEy3sZfd6vkPglwNaVZx04hic3Qc1wmosMvzK1otGywoL GiD12nf7Nf6zKPNjX7FaE0vbVGbGMQhkutG1q4RGrlLjQl6ix/W89WVrqmVX6o9k8a0y V1nogOSB3OraXIkjmUsgFHfT8HNdaov2ryqiC8tBc0a8encZkpM6HI6hVbQpj7l6IRql aankjjOYpjgPfFnC3gAPlNnfcox6sAje9lPVwveZMlSEYAffCCf5S63WkJi9nZFlGh+d zuqMR5wqD4tdGidNhzOwmEKRAp1f8i4HJIqMn1Jbeo0wlTIdqTENM520nh8kgSismuQr PzbQ==

Authentication-results: eu.smtp.expurgate.cloud; dkim=pass header.s=20251104 header.d=gmail.com header.i="@gmail.com" header.h="Cc:To:Subject:Message-ID:Date:From:In-Reply-To:References:MIME-Version"

Cc: Frediano Ziglio <frediano.ziglio@xxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Teddy Astie <teddy.astie@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Juergen Gross <jgross@xxxxxxxx>, "Daniel P . Smith" <dpsmith@xxxxxxxxxxxxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx

Delivery-date: Fri, 26 Jun 2026 14:14:43 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Wed, 24 Jun 2026 at 07:44, Jan Beulich <jbeulich@xxxxxxxx> wrote: > > On 23.06.2026 23:18, Frediano Ziglio wrote: > > On Tue, 23 Jun 2026 at 14:21, Jan Beulich <jbeulich@xxxxxxxx> wrote: > >> On 23.06.2026 12:55, Frediano Ziglio wrote: > >>> On Mon, 22 Jun 2026 at 11:34, Jan Beulich <jbeulich@xxxxxxxx> wrote: > >>>> On 19.06.2026 15:04, Frediano Ziglio wrote: > >>>>> --- a/xen/common/memory.c > >>>>> +++ b/xen/common/memory.c > >>>>> @@ -1545,6 +1545,139 @@ static int acquire_resource( > >>>>> return rc; > >>>>> } > >>>>> > >>>>> +/* > >>>>> + * The "noinline" qualifier avoids the compiler to create a large > >>>>> function > >>>>> + * consuming quite a lot of stack. > >>>>> + */ > >>>>> +static int noinline mem_foreigncopy( > >>>>> + XEN_GUEST_HANDLE_PARAM(xen_foreigncopy_t) arg) > >>>>> +{ > >>>>> + struct domain *d, *const currd = current->domain; > >>>>> + xen_foreigncopy_t copy; > >>>>> + int rc, direction; > >>>>> + > >>>>> + if ( copy_from_guest(&copy, arg, 1) ) > >>>>> + return -EFAULT; > >>>>> + > >>>>> + if ( copy.flags & ~XENMEM_foreigncopy_direction ) > >>>>> + return -EINVAL; > >>>>> + > >>>>> + direction = copy.flags & XENMEM_foreigncopy_direction; > >>>>> + > >>>>> + rc = rcu_lock_remote_domain_by_id(copy.domid, &d); > >>>> > >>>> Iirc I did ask before why this isn't ..._by_any_id(). > >>> > >>> I probably was confused by the question about MMUEXT and the 2 domains. > >>> There are different similar hypercalls (like the mentioned MMUEXT but > >>> also hypercalls to map foreign domain memory) that have this check > >>> (not the same domain). Any domain has, obviously, access to its own > >>> memory, so it should not have to use hypercall to access its own > >>> memory. If it does it looks like a mistake causing performance issues > >>> or an attempt to circumvent security; in either case you would like to > >>> avoid it. > >> > >> No. Self-grants are possible as well, for example, and for a good reason. > >> Allowing normally-remote operations on oneself helps with testing, for > >> example. It may also help avoid needing to special-case "self" in code > >> which needs to cover both cases. > > > > But this is not a grant, it's a copy. > > Sure, but the underlying principle is what matters. Plus you don't prevent > self-copy by using ..._by_id(), you only preclude the use of DOMID_SELF. > Sure about this? The current implementation is int rcu_lock_remote_domain_by_id(domid_t dom, struct domain **d) { if ( (*d = rcu_lock_domain_by_id(dom)) == NULL ) return -ESRCH; if ( *d == current->domain ) { rcu_unlock_domain(*d); return -EPERM; } return 0; } rcu_lock_domain_by_id returns NULL if the domain is DOMID_SELF so it would be a -ESRCH, if it's the current domain it would be excluded by *d == current->domain returning -EPERM. > >>>>> + if ( rc ) > >>>>> + return rc; > >>>>> + > >>>>> + if ( copy.nr_frames == 0 ) > >>>>> + { > >>>>> + rcu_unlock_domain(d); > >>>>> + return 0; > >>>>> + } > >>>> > >>>> Any reason this cannot also be "goto out"? The more that now that you > >>>> have > >>>> moved this past the domid validity check, imo it should further move to > >>>> ... > >>> > >>> The only reason was style and to avoid a memory copy, but it's not a > >>> hot case so I'll change to "goto out" (no strong about it). > >>> > >>>>> + /* > >>>>> + * Check we are allowed to map and access these foreign pages. > >>>>> + */ > >>>>> + rc = xsm_map_gmfn_foreign(XSM_TARGET, currd, d); > >>>>> + if ( rc ) > >>>>> + goto out; > >>>> > >>>> ... below here. Perhaps simply as > >>>> > >>>> if ( rc || !copy.nr_frames ) > >>>> goto out; > >>>> > >>> > >>> I think this would be confusing with the above "Check we are allowed > >>> to map and access these foreign pages" comment. > >>> Are you okay with just the change above to "goto out" ? > >> > >> I do want the order adjusted as indicated. I won't insist on (but I would > >> prefer) folding both if()-s. > >> > > > > What about > > > > /* > > * Check we are allowed to map and access these foreign pages. > > */ > > rc = xsm_map_gmfn_foreign(XSM_TARGET, currd, d); > > if ( rc ) > > goto out; > > > > while ( copy.nr_frames ) > > { > > /* > > * Arbitrary size. Not too much stack space, and a reasonable > > stride > > * for continuation checks. > > */ > > That's fine. > Changed > >>>>> + do { > >>>>> + /* > >>>>> + * Arbitrary size. Not too much stack space, and a reasonable > >>>>> stride > >>>>> + * for continuation checks. > >>>>> + */ > >>>>> + xen_pfn_t gfn_list[32]; > >>>>> + unsigned int todo = MIN(ARRAY_SIZE(gfn_list), copy.nr_frames); > >>>>> + > >>>>> + rc = -EFAULT; > >>>>> + if ( copy_from_guest(gfn_list, copy.frame_list, todo) ) > >>>>> + goto out; > >>>>> + > >>>>> + for ( unsigned int i = 0; i < todo; i++ ) > >>>>> + { > >>>>> + struct page_info *foreign_page; > >>>>> + mfn_t foreign_mfn; > >>>>> + void *foreign; > >>>>> + p2m_type_t p2mt; > >>>>> + const unsigned long valid_mask = > >>>>> +#ifdef CONFIG_X86 > >>>>> + p2m_to_mask(p2m_ram_rw) | > >>>>> p2m_to_mask(p2m_ram_logdirty); > >>>>> +#else > >>>>> + p2m_to_mask(p2m_ram_rw); > >>>>> +#endif > >>>> > >>>> The set of permitted types didn't change, yet a justification for the > >>>> resulting > >>>> limitation also didn't appear. > >>>> > >>> > >>> Yes, that's missing, indeed. > >>> Should the set of types be different for reading and writing? For > >>> instance do not allow writing to read-only memory? > >> > >> Of course. > >> > >>> Given that it looks like different architectures have different > >>> meanings and definitions for these constants, should it not be better > >>> to define some new constants for this specific usage? For instance > >>> P2M_READ_TYPES and P2M_WRITE_TYPES? > >> > >> Perhaps, yes. The suggested names look overly generic to me, though. > > > > I suppose P2M_READABLE_TYPES and P2M_WRITABLE_TYPES are more correct > > but still too generic. > > P2M_EXPORTABLE_TYPES and P2M_IMPORTABLE_TYPES ? > > First: Do you foresee uses of those constants anywhere else? If not (I > don't), tie the names to this particular operation. That'll make them > entirely non-generic. > > >>>>> @@ -2012,6 +2145,18 @@ long do_memory_op(unsigned long cmd, > >>>>> XEN_GUEST_HANDLE_PARAM(void) arg) > >>>>> start_extent); > >>>>> break; > >>>>> > >>>>> + case XENMEM_foreigncopy: > >>>>> + /* > >>>>> + * Instead of using "start_extent" we update the structure > >>>>> back, > >>>>> + * we update it back in anyway to tell caller were the copy > >>>>> + * stopped. > >>>>> + */ > >>>>> + if ( unlikely(start_extent) ) > >>>>> + return -EINVAL; > >>>> > >>>> As before - please be precise with comments like this. We update it back > >>>> also > >>>> when encoding a continuation. Perhaps instead "..., to indicate the > >>>> point of > >>>> failure to the caller as well as to encode continuations without being > >>>> constrained by MEMOP_EXTENT_SHIFT". > >>>> > >>> > >>> What about (trying to include your suggestion, to be fixed for line > >>> length): > >>> > >>> /* > >>> * Instead of using "start_extent" for the continuation, we > >>> update the structure back, > >>> * we update the xen_foreigncopy structure back, so we are not > >>> constrained > >>> * by MEMOP_EXTENT_SHIFT. > >>> * We copy it back also to tell the caller where the copy stopped. > >>> */ > >> > >> One of the things I take issue with (because it's hard to read that way, > >> at least for me) is the repeated use of "update ... back", effectively > >> saying the same things twice. The last sentence also wants disambiguating > >> towards the "stopped" possibly being a non-error situation as well. > >> > > > > Changed to > > > > /* > > * Instead of using "start_extent" for the continuation, we update > > * the xen_foreigncopy structure back, so we are not constrained by > > * MEMOP_EXTENT_SHIFT. > > * We copy it back also to tell the caller where the copy stopped > > * (either for error or because all frames were copied). > > */ > > Thanks. > > >>>>> + XEN_GUEST_HANDLE(uint8) buffer; > >>>>> +}; > >>>> > >>>> What was (again) left unaddressed is the question towards using GFNs on > >>>> both > >>>> sides of the copy. This would eliminate the need for the flags field, > >>>> taken > >>>> by a 2nd domid_t one then. > >>>> > >>> > >>> This was addressed in > >>> https://lists.xenproject.org/archives/html/xen-devel/2026-06/msg00567.html > >> > >> Well, yes, but not in a satisfactory way. Back channels tell me that you > >> actually got the same feedback already on internal review. Which makes it > >> all the more puzzling that you insist on doing it differently. Multiple > >> maintainers asking for the same thing may be an indication of something. > > > > Not needing to have backchannel feedback, I already wrote that a > > similar approach was tried and made the code more complicated. > > Even if indeed so: Yet at the same time more flexible. > > > Both maintainers didn't comment on my replies so I assume they were > > fine with it. > > And you are failing to provide positive feedback. > > I asked (that one internally) for examples of guest buffers provided > > as frame numbers but I got no answer (or better the answer was more > > "currently there are not"). > > Also note that the location of xen_foreigncopy_t structure is also > > provided using a guest pointer. > > I remember there were some discussions about ABI changes (2/3 years > > ago) to address this and other issues but I cannot see much progress. > > And it's that (very slowly progressing effort) which made me ask. The > fewer virtual addresses we bake into new sub-ops, the better for that > effort. And no, that doesn't go as far as completely eliminating > handles (presently representing virtual addresses) - that needs to be > part of the new ABI. > In other words, you want me to code something temporary that you already know that needs to be changed. > To preempt the argument towards "fewer virtual addresses" not really > being true when changing from handle-to-uint8 to handle-to-pfn: The > former won't be able to express a buffer mapped contiguously in VA > space, but discontiguous in PA space. The latter will, simply be > avoiding buffer VAs in the first place (the array of frame numbers > can e.g. be placed in a dedicated hypercall argument area known to be > physically contiguous). > If it's mapped continuously in VA and you pass the VA I don't understand the problem. From the way I see it's more the latter that's the problem. > > That's why I say this is out of scope. > > There's nothing scope related here. We're discussing how to shape the > new sub-op interface. > > >>> and in minor way by > >>> https://lists.xenproject.org/archives/html/xen-devel/2026-06/msg00847.html. > >>> It was considered but more complicated and worse from a performance > >>> perspective. > >> > >> Okay, performance-wise worse would of course be relevant. But that would > >> need supporting by numbers (for both PV and PVH Dom0, as the latter > >> incurs extra overhead for virtual-address-based hypercall buffer operands). > > > > I'm more concerned about the PV case than PVH to be honest. > > For your (immediate) internal purposes that may be fine, but PVH Dom0 > more likely being the future, for upstream both need considering > equally. > >From an internal discussion: -------------------- About HVM and VAs/PFNs, I was thinking. We pass a single PFN for a page containing either - a single list of PFNs fitting into the page (plus number of entries) for small hypercalls (most of them) - a page containing a list of PFNs pointing to arrays of PFNs for large hypercalls (like kexec load and few others) Now the GUEST_HANDLEs are treated differently, instead of VAs they contain offset into array above plus page offset. Okay, let's do an example. I want to call a DOMCTL which have a handle to a small array, so I need to pass the domctl structure and the array, I suppose I need 2 PFNs (unless domct or the array span multiple pages) so kernel would need to build and pass an array of 2 PFNs, let's say 0xabcabc01 and 0xabcabc02. I build a page with 1- 2 (number of PFNs) 2- 0xabcabc01 3- 0xabcabc02 I pass the PFN of the above page (how it's a detail, probably an additional register), number of hypercall and... a GUEST_HANDLE. Here the guest handle would be something like 0x0000ppp where ppp is the page offset inside 0xabcabc01 page. Why 0xabcabc01 ? Because the upper part of the guest handle is 0 (so the first entry in the PFNs array above). Inside the domctl structure the guest handle to the array will be something like 0x0001ppp where the 0x0001 means 0xabcabc02 while ppp here is the page offset into 0xabcabc02. What if the array is bigger than a single page? Let's say it spans into 3 pages, you would have something like 1- 4 (number of PFNs) 2- 0xabcabc01 3- 0xabcabc02 4- 0xabcabc03 5- 0xabcabc04 (okay, in all example pages are contiguous but not important) The guest handle for the array would still be 0x0001ppp but the code will continue on array entry 2 (0xabcabc03) and 3 (0xabcabc04) -------------------- The distinction for the above could be a flag added to the hypercall number or automatic on type of VM (PV/HVM) but for compatibility I would go for the first. But these are details. This would work without much API changes and minor hypervisor changes (mainly copy_from_guest and similar macros). Okay, the above is a bit OT here but the point is that the change you are asking me won't help with this in the future, basically you are asking me an implementation based on an implementation that is not currently even on paper. > Jan Frediano

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.