[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 5/9] x86/mem-sharing: copy GADDR based shared guest areas


  • To: Tamas K Lengyel <tamas@xxxxxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Mon, 16 Oct 2023 11:50:51 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=JyZMMmHqsmVmtbKO221HB05oUdCWddLylbulPVfTzsY=; b=elKyKdHSrvxMaGSFmJR8EI1GqxYu5IBFg1acUCoG+6NwQ4toFl0egF0DGEzfTpB/8bff2/TrHVwHpuB0LmF6+7q+FYbp/zffsFlOEyVww7PPfek/H4Dm5ZFSvRM1Y8+oZnB9Q5V2g2I7cZMsXJaJTpbEaxwTKPRZBS98bF7N0MZf4ah07452Y+iIXm4DYP5SsUvGHMBO79uZxALVItUhNRB8j8ZnUhDKE0D8OeZvLD234YcTxwBSKuO6ZexjQYyM+MqsizkqRyiBaeN8yvjrsk7Gh9b17gBm1ZVW97Ac/ZmovWYFN6s5mPKLVzu7lAen75/eBeK/LhpXytoBvpBaAQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=oQA+LaaWZnVysgUyfA916AKaIZE7ZDL9Bd5B5edV0612n6GcdPLMJ3dHGv7jz/myPP2DUqXZz/hAxFlYiHLDLI1IMC/yXwjlu/1WsKS+kGrvHjrafjAnHIpnwyPs8xh3VIaix0ZmwRVMNxbnBqHwfcSqcy7zKBIIAOo28LB8qieZRWYk0KnHG4XgwJ31pN9oy7ygVvlVP/Y/pdigsGgXE2e5+uxQSYSY5QapQkY1vQ/Hw24vrrTBidWXj6U/y8gr0RXfnz7SO++sGs0kcjh/gll490W1lhCLDPsPizEF+cRl+cIacSwUlYZY31zd83sHu19Juifbt09oLFPKDQF3dQ==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Henry Wang <Henry.Wang@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Delivery-date: Mon, 16 Oct 2023 09:51:35 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 29.09.2023 15:01, Tamas K Lengyel wrote:
> On Thu, Sep 28, 2023 at 9:19 AM Jan Beulich <jbeulich@xxxxxxxx> wrote:
>>
>> On 28.09.2023 14:57, Tamas K Lengyel wrote:
>>> On Thu, Sep 28, 2023 at 7:08 AM Roger Pau Monné <roger.pau@xxxxxxxxxx> 
>>> wrote:
>>>> On Thu, Sep 28, 2023 at 12:11:02PM +0200, Jan Beulich wrote:
>>>>> On 28.09.2023 11:51, Roger Pau Monné wrote:
>>>>>> On Thu, Sep 28, 2023 at 09:16:20AM +0200, Jan Beulich wrote:
>>>>>>> +        /*
>>>>>>> +         * Map the area into the guest. For simplicity specify the 
>>>>>>> entire range
>>>>>>> +         * up to the end of the page: All the function uses it for is 
>>>>>>> to check
>>>>>>> +         * that the range doesn't cross page boundaries. Having the 
>>>>>>> area mapped
>>>>>>> +         * in the original domain implies that it fits there and 
>>>>>>> therefore will
>>>>>>> +         * also fit in the clone.
>>>>>>> +         */
>>>>>>> +        offset = PAGE_OFFSET(d_area->map);
>>>>>>> +        ret = map_guest_area(cd_vcpu, gfn_to_gaddr(gfn) + offset,
>>>>>>> +                             PAGE_SIZE - offset, cd_area, NULL);
>>>>>>> +        if ( ret )
>>>>>>> +            return ret;
>>>>>>> +    }
>>>>>>> +    else
>>>>>>> +        cd_mfn = page_to_mfn(cd_area->pg);
>>>>>>> +
>>>>>>> +    copy_domain_page(cd_mfn, d_mfn);
>>>>>>
>>>>>> I think the page copy should be done only once, when the page is
>>>>>> populated on the child p2m.  Otherwise areas smaller than a page size
>>>>>> (like vpcu_time_info_t) that share the same page will get multiple
>>>>>> copies of the same data for no reason.
>>>>>
>>>>> I think you're right, but this would then be another issue in the original
>>>>> code that I merely didn't spot (and it's not just "copy for no reason",
>>>>> we'd actually corrupt what was put there before). IOW the copying needs to
>>>>> move ahead of map_guest_area() (or yet more precisely after the error
>>>>> checking for p2m->set_entry()), and in the original code it would have
>>>>> needed to live ahead of map_vcpu_info(). Once again I'd like Tamas to
>>>>> confirm (or otherwise) before making that change, though.
>>>>
>>>> Yes, it's already an issue in the current code.  I wonder whether
>>>> logic in the guest or Xen could malfunctions due to the fact that
>>>> map_vcpu_info() unconditionally sets evtchn_upcall_pending and injects
>>>> an event channel upcall, but the later call to copy_domain_page()
>>>> might unset evtchn_upcall_pending while the vector is already injected.
>>>
>>> Sorry but I really don't follow the discussion here. My understanding
>>> was that map_vcpu_info, as its name suggests, maps the page. We use it
>>> to map a new page into that position in case the fork hasn't set it up
>>> yet but the parent has one. Then we follow with the copy from the
>>> parent so the page content is matching. If there is already a
>>> vcpu_info page in the fork, we just do the copy.
>>>
>>> Now, if map_vcpu_info does more then mapping, then I don't know what
>>> it does, why it does it, and what happens if we skip it when the fork
>>> is reset for example. Is the suggestion to call it map_vcpu_info every
>>> time the page content is reset (ie after the copy)?
>>
>> The vCPU info area (already prior to this series) and the two other areas
>> can be updated by the hypervisor at any time. Once one such area is
>> registered within a certain page, if another such area happens to live in
>> the same page, copying the entire page again would overwrite all updates
>> that might already have been made for the first area. IOW copying ought
>> to - imo - happen exactly once, when the new page is allocated.
>>
>> As to map_vcpu_info() - just look at the function: It writes to the newly
>> registered area. Even if the function name says just "map", that's an
>> integral part of the operation. We can't just map it, but leave the area
>> untouched.
> 
> The domains are paused (both the parent and the child) when a new fork
> is being made and also during fork reset. If Xen really does write to
> these memory areas "any time", even if the domain is paused, we will
> need a lock to tell Xen not touch it cause it hasn't finished being
> constructed/reset yet.

It's not really "any" time, but "any time the vCPU is (about to) run(ning)".

> Moreover, not sure what Xen is writing to these
> areas, but if it's anything "stateful" it should be discarded or
> copied from the parent because the guest state must match the parent
> after the fork/reset op.

How that? Both are running independently once unpaused, and the data
written is derived from various bits of system and guest state. I see
no reason why the initial populating would need to clone the parent's
when immediately afterwards the data would be overwritten by child-
specific values.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.