[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 5/9] x86/mem-sharing: copy GADDR based shared guest areas


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Thu, 28 Sep 2023 13:07:52 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=+k8Gqw7WTNJMkFZiBUjDxI20/SI9o9Eq5dWC5QRyy0g=; b=YFjKEUAnH8RBdWiS8Pp9nsUYB33y7n+0BzIEgsmQSA4fEEhIkQmklTyXxDQ4tNbZotELMrvH4Nr+8eebOlgd8FQK4hWRqmN+9b1RwxD2cjqjk31XoWoJJeKbzLotZBpbrEbR7dw6uYI4k4rx5gP9YhEh5jpeSTBjmDP4xb3kS9bYPT1jLgfHB1c1A1Lh6JIH1+5QsGOQ/qw4RroL/ZNA3b624osuyx2ELbvJQOzvTPv/Sy9pVVcitVA5BIod5KHrT7WPp66c1SlLwo7zs1BRZNF7ibUgM/vxRMEByDsbW4K+Dod9NIqh7cWFjRm32jXuLHSSJ3lyKSwjd7d/+I9v/g==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=gbiHZrvQC6Fo7lOgCBEKzMssG5TMNNa1Mrk4yOicTDtzeTOADnwPdYXX1W0okrFs4AddAZsXtRajFSapEyVn8eAkAUt1J+sfXh04G8FxmlQQ3s9+EywxgCknVd2g8fniECfEXUwkNzLm5WhjNfjXGmuWtnPsnmEWI4CLrZNgU04HUE6XJnJZFuwhXbgZ9P65LSi69B2SOymhxXcDGnn2VugcDehJhmj16u6vgsqNfqYFhvBEeZi5mzjA7nSNVKhxya0EBPbkPRhvPmdtrZwuGs8W5n4o2PR6O51ii12I6X/oCl1zdhMWmalvwspWBj8xaXFqIdoXqKahFuWCArjo7Q==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: Tamas K Lengyel <tamas@xxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Henry Wang <Henry.Wang@xxxxxxx>
  • Delivery-date: Thu, 28 Sep 2023 11:08:16 +0000
  • Ironport-data: A9a23:8qa3LavnfxX5vYemm9VY/n+qtOfnVKBfMUV32f8akzHdYApBsoF/q tZmKTyGO63bM2L9KtAgbI2+9kkD65LXzN82TlZt/i9jEnxH+JbJXdiXEBz9bniYRiHhoOCLz O1FM4Wdc5pkJpP4jk3wWlQ0hSAkjclkfpKlVaicfHg3HFc4IMsYoUoLs/YjhYJ1isSODQqIu Nfjy+XSI1bg0DNvWo4uw/vrRChH4rKq41v0gnRkPaoQ5A6ExyFMZH4iDfrZw0XQE9E88tGSH 44v/JnhlkvF8hEkDM+Sk7qTWiXmlZaLYGBiIlIPM0STqkAqSh4ai87XB9JFAatjsB2bnsgZ9 Tl4ncfYpTHFnEH7sL91vxFwS0mSNEDdkVPNCSDXXce7lyUqf5ZwqhnH4Y5f0YAwo45K7W9yG fMwMQhQQkC5is2PwrOWTcNn3eYTKJLPI9ZK0p1g5Wmx4fcOZ7nmG/+Pz/kBmTA6i4ZJAOrUY NcfZXx3dhPcbhZTO1ARTpUjgOOvgXq5eDpdwL6XjfNvvy6Pk0osif6xaLI5efTTLSlRtlyfq W/cuXzwHzkRNcCFyCrD+XWp7gPKtXqhBNpMTO3pqJaGhnWr5EM+KkVPTmG3nty5gVC+CvBwK V4Lr39GQa8asRbDosPGdx+yrWOAvxUcc8FNCOB84waIooLI6gOQHUAYTTpMbtM3uctwTjsvv neZktWsCTFxvbm9TXOG6qzSvT60ITISL2IJeWkDVwRty9v+pIA+iDrfQ9AlF7S65vXuAi35y T2OqCk4hp0QgNQN2qH9+krI6xqzorDZQwhz4R/YNkqF4wVjdciaboqnwVHB6LBLK4Pxc7Wal H0Nmszb5uZXC5iIzHaJWL9VQOnv4OuZOjrBh1IpB4Mm6zmm53+ke8ZX/S16I0BqdM0DfFcFf XPuhO+Y37cLVFPCUEO9S9vZ5xgCpUQ4KenYaw==
  • Ironport-hdrordr: A9a23:nbJlB6wq2y9rQO3SOveCKrPwCb1zdoMgy1knxilNoH1uEvBw8v rEoB1173HJYVoqKRQdcLO7V5VoI0m8yXcd2+B4V9rPYOCBghrQEGgL1/qE/9TOIVydygc379 YFT0ERMqyLMXFKyer8/QmkA5IB7bC8gd2Vbay39QYKcegQUdAC0+6hMHfiLqShfng8OaYE
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Thu, Sep 28, 2023 at 12:11:02PM +0200, Jan Beulich wrote:
> On 28.09.2023 11:51, Roger Pau Monné wrote:
> > On Thu, Sep 28, 2023 at 09:16:20AM +0200, Jan Beulich wrote:
> >> --- a/xen/arch/x86/mm/mem_sharing.c
> >> +++ b/xen/arch/x86/mm/mem_sharing.c
> >> @@ -1641,6 +1641,68 @@ static void copy_vcpu_nonreg_state(struc
> >>      hvm_set_nonreg_state(cd_vcpu, &nrs);
> >>  }
> >>  
> >> +static int copy_guest_area(struct guest_area *cd_area,
> >> +                           const struct guest_area *d_area,
> >> +                           struct vcpu *cd_vcpu,
> >> +                           const struct domain *d)
> >> +{
> >> +    mfn_t d_mfn, cd_mfn;
> >> +
> >> +    if ( !d_area->pg )
> >> +        return 0;
> >> +
> >> +    d_mfn = page_to_mfn(d_area->pg);
> >> +
> >> +    /* Allocate & map a page for the area if it hasn't been already. */
> >> +    if ( !cd_area->pg )
> >> +    {
> >> +        gfn_t gfn = mfn_to_gfn(d, d_mfn);
> >> +        struct p2m_domain *p2m = p2m_get_hostp2m(cd_vcpu->domain);
> >> +        p2m_type_t p2mt;
> >> +        p2m_access_t p2ma;
> >> +        unsigned int offset;
> >> +        int ret;
> >> +
> >> +        cd_mfn = p2m->get_entry(p2m, gfn, &p2mt, &p2ma, 0, NULL, NULL);
> >> +        if ( mfn_eq(cd_mfn, INVALID_MFN) )
> >> +        {
> >> +            struct page_info *pg = alloc_domheap_page(cd_vcpu->domain, 0);
> >> +
> >> +            if ( !pg )
> >> +                return -ENOMEM;
> >> +
> >> +            cd_mfn = page_to_mfn(pg);
> >> +            set_gpfn_from_mfn(mfn_x(cd_mfn), gfn_x(gfn));
> >> +
> >> +            ret = p2m->set_entry(p2m, gfn, cd_mfn, PAGE_ORDER_4K, 
> >> p2m_ram_rw,
> >> +                                 p2m->default_access, -1);
> >> +            if ( ret )
> >> +                return ret;
> >> +        }
> > 
> > I'm still unsure why map_guest_area() shouldn't be able to deal with a
> > forked child needing the page to be mapped.  What happens when a
> > forked child executes the hypercall to map such areas against not yet
> > populated gfns?
> > 
> > Shouldn't map_guest_area() be capable of handling those calls and
> > populating on demand?
> 
> Funny you should use this wording: P2M_ALLOC will deal with populating
> PoD slots, yes, but aiui forks don't fill their p2m with all PoD slots,
> but rather leave them empty. Yet again I need to leave it to Tamas to
> confirm or prove me wrong.

If the child p2m populating is only triggered by guest accesses then a
lot of hypercalls are likely to not work correctly on childs.

> 
> >> +        else if ( p2mt != p2m_ram_rw )
> >> +            return -EBUSY;
> >> +
> >> +        /*
> >> +         * Map the area into the guest. For simplicity specify the entire 
> >> range
> >> +         * up to the end of the page: All the function uses it for is to 
> >> check
> >> +         * that the range doesn't cross page boundaries. Having the area 
> >> mapped
> >> +         * in the original domain implies that it fits there and 
> >> therefore will
> >> +         * also fit in the clone.
> >> +         */
> >> +        offset = PAGE_OFFSET(d_area->map);
> >> +        ret = map_guest_area(cd_vcpu, gfn_to_gaddr(gfn) + offset,
> >> +                             PAGE_SIZE - offset, cd_area, NULL);
> >> +        if ( ret )
> >> +            return ret;
> >> +    }
> >> +    else
> >> +        cd_mfn = page_to_mfn(cd_area->pg);
> >> +
> >> +    copy_domain_page(cd_mfn, d_mfn);
> > 
> > I think the page copy should be done only once, when the page is
> > populated on the child p2m.  Otherwise areas smaller than a page size
> > (like vpcu_time_info_t) that share the same page will get multiple
> > copies of the same data for no reason.
> 
> I think you're right, but this would then be another issue in the original
> code that I merely didn't spot (and it's not just "copy for no reason",
> we'd actually corrupt what was put there before). IOW the copying needs to
> move ahead of map_guest_area() (or yet more precisely after the error
> checking for p2m->set_entry()), and in the original code it would have
> needed to live ahead of map_vcpu_info(). Once again I'd like Tamas to
> confirm (or otherwise) before making that change, though.

Yes, it's already an issue in the current code.  I wonder whether
logic in the guest or Xen could malfunctions due to the fact that
map_vcpu_info() unconditionally sets evtchn_upcall_pending and injects
an event channel upcall, but the later call to copy_domain_page()
might unset evtchn_upcall_pending while the vector is already injected.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.