Xen project Mailing List

Re: [Xen-devel] [PATCH v3 1/2] x86/mem-sharing: Bulk mem-sharing entire domains

From: Tamas K Lengyel <tamas@xxxxxxxxxxxxx>

Date: Fri, 13 May 2016 08:50:18 -0600

Cc: Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Fri, 13 May 2016 14:50:36 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Fri, May 13, 2016 at 6:00 AM, Jan Beulich <JBeulich@xxxxxxxx> wrote: >>>> On 12.05.16 at 17:25, <tamas@xxxxxxxxxxxxx> wrote: >> --- a/xen/arch/x86/mm/mem_sharing.c >> +++ b/xen/arch/x86/mm/mem_sharing.c >> @@ -1294,6 +1294,43 @@ int relinquish_shared_pages(struct domain *d) >> return rc; >> } >> >> +static int bulk_share(struct domain *d, struct domain *cd, unsigned long >> max, >> + struct mem_sharing_op_bulk *bulk) >> +{ >> + int rc; >> + shr_handle_t sh, ch; >> + >> + while( bulk->start <= max ) >> + { >> + rc = mem_sharing_nominate_page(d, bulk->start, 0, &sh); >> + if ( rc == -ENOMEM ) >> + break; >> + if ( !rc ) >> + { >> + rc = mem_sharing_nominate_page(cd, bulk->start, 0, &ch); >> + if ( rc == -ENOMEM ) >> + break; > > If we get to this break, how will the caller know that the first > nomination succeeded but the second didn't? Or perhaps there > is some undo logic missing here? No, there is not. There really is no "unnominate" feature of memshare. So even if the user is calling nominate manually from userspace it won't have the option to unnominate the page so that error condition is currently useless for the user. > >> + if ( !rc ) >> + mem_sharing_share_pages(d, bulk->start, sh, cd, >> bulk->start, ch); > > You shouldn't be ignoring errors here. The only error this function returns is if the sh/ch handles are invalid. However we obtained those just now from successful nominations, so we are guaranteed to have valid handles. This error checking is only important when nominations/sharing happen independently where the handle may go stale in-between. Here that is not possible. > >> + } >> + >> + ++(bulk->start); > > Pointless parentheses. Pointless but I prefer this style. > >> + /* Check for continuation if it's not the last iteration. */ >> + if ( bulk->start < max && hypercall_preempt_check() ) > > The loop head has <=; why < here? Because we only do preempt check if there are more then one pages left (as the comment states). >> + { >> + rc = 1; > > I'd recommend using -ERESTART here, as we do elsewhere. > Ack. >> + break; >> + } >> + } >> + >> + /* We only propagate -ENOMEM so reset rc here */ >> + if ( rc < 0 && rc != -ENOMEM ) >> + rc = 0; > > What's the rationale for discarding all other errors? At least the > patch description, but perhaps even the comment (which btw > is lacking a full stop) should be explaining this. The reason we swallow errors here other then ENOMEM is that it's quite possible that max_gpfn page is unsharable for example, thus rc would have an EINVAL final error value. However, we don't care about the success/fail of individual pages, we only care about the overall state. For that only ENOMEM is critical. > >> @@ -1468,6 +1505,69 @@ int >> mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg) >> } >> break; >> >> + case XENMEM_sharing_op_bulk_share: >> + { >> + unsigned long max_sgfn, max_cgfn; >> + struct domain *cd; >> + >> + rc = -EINVAL; >> + if ( !mem_sharing_enabled(d) ) >> + goto out; >> + >> + rc = rcu_lock_live_remote_domain_by_id(mso.u.bulk.client_domain, >> + &cd); >> + if ( rc ) >> + goto out; >> + >> + rc = xsm_mem_sharing_op(XSM_DM_PRIV, d, cd, mso.op); > > Either you pass XENMEM_sharing_op_share here, or you need to > update xen/xsm/flask/policy/access_vectors (even if it's only a > comment which needs updating). Right, it should actually be sharing_op_share here. > > That said - are this and the similar pre-existing XSM checks actually > correct? I.e. is one of the two domains here really controlling the > other? I would have expected that a tool stack domain initiates the > sharing between two domains it controls... Not sure what was the original rationale behind it either. > >> + if ( rc ) >> + { >> + rcu_unlock_domain(cd); >> + goto out; >> + } >> + >> + if ( !mem_sharing_enabled(cd) ) >> + { >> + rcu_unlock_domain(cd); >> + rc = -EINVAL; >> + goto out; >> + } >> + >> + if ( !atomic_read(&d->pause_count) || >> + !atomic_read(&cd->pause_count) ) >> + { >> + rcu_unlock_domain(cd); >> + rc = -EINVAL; >> + goto out; >> + } >> + >> + max_sgfn = domain_get_maximum_gpfn(d); >> + max_cgfn = domain_get_maximum_gpfn(cd); >> + >> + if ( max_sgfn != max_cgfn || max_sgfn < mso.u.bulk.start ) > > Why would the two domains need to agree in their maximum > GPFN? There's nothing similar in this file so far. Nor does the > right side of the || match anything pre-existing... The use-case for this function is to deduplicate identical VMs, not to blindly share pages across arbitrary domains. So this is a safety check to avoid accidentally running this function on domains that obviously are not identical. The right hand size is a safety check against not properly initialized input structs where the start point is obviously outside the memory of the domain. > >> @@ -488,7 +489,18 @@ struct xen_mem_sharing_op { >> uint64_aligned_t client_gfn; /* IN: the client gfn */ >> uint64_aligned_t client_handle; /* IN: handle to the client >> page */ >> domid_t client_domain; /* IN: the client domain id */ >> - } share; >> + } share; >> + struct mem_sharing_op_bulk { /* OP_BULK_SHARE */ >> + uint64_aligned_t start; /* IN: start gfn. Set to 0 for >> + full deduplication. Field is >> + used internally and may >> change >> + when the hypercall returns. >> */ >> + uint64_aligned_t shared; /* OUT: the number of gfns >> + that are shared after this >> + operation including pages >> + already shared before */ >> + domid_t client_domain; /* IN: the client domain id */ >> + } bulk; > > Let's not repeat pre-existing mistakes: There is explicit padding > missing here, which then also ought to be checked to be zero on > input. This struct is part of a union and is smaller then largest struct in the union, even with padding. So how would padding have any effect on anything? Thanks, Tamas _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.