[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 5/7] libxl: support unmapping static shared memory areas during domain destruction



Hi Julien an Wei,

2018-02-08 0:54 GMT+08:00 Julien Grall <julien.grall@xxxxxxx>:
> On 07/02/18 16:27, Zhongze Liu wrote:
>>
>> Hi Wei and Julien,
>
>
> Hi,
>
>
>> 2018-02-07 2:06 GMT+08:00 Wei Liu <wei.liu2@xxxxxxxxxx>:
>>>
>>> On Tue, Feb 06, 2018 at 01:24:30PM +0000, Julien Grall wrote:
>>>>>
>>>>>        if (libxl__device_pci_destroy_all(gc, domid) < 0)
>>>>>            LOGD(ERROR, domid, "Pci shutdown failed");
>>>>>        rc = xc_domain_pause(ctx->xch, domid);
>>>>> diff --git a/tools/libxl/libxl_internal.h
>>>>> b/tools/libxl/libxl_internal.h
>>>>> index 2cfe4c08a7..c398b6a6b8 100644
>>>>> --- a/tools/libxl/libxl_internal.h
>>>>> +++ b/tools/libxl/libxl_internal.h
>>>>> @@ -4424,6 +4424,8 @@ static inline bool libxl__string_is_default(char
>>>>> **s)
>>>>>    _hidden int libxl__sshm_add(libxl__gc *gc, uint32_t domid,
>>>>>                                libxl_static_shm *sshm, int len);
>>>>> +_hidden int libxl__sshm_del(libxl__gc *gc, uint32_t domid);
>>>>> +
>>>>>    _hidden int libxl__sshm_check_overlap(libxl__gc *gc, uint32_t domid,
>>>>>                                          libxl_static_shm *sshms, int
>>>>> len);
>>>>>    _hidden int libxl__sshm_setdefault(libxl__gc *gc, uint32_t domid,
>>>>> diff --git a/tools/libxl/libxl_sshm.c b/tools/libxl/libxl_sshm.c
>>>>> index 562f46f299..1bf4d4c2dc 100644
>>>>> --- a/tools/libxl/libxl_sshm.c
>>>>> +++ b/tools/libxl/libxl_sshm.c
>>>>> @@ -86,6 +86,112 @@ int libxl__sshm_check_overlap(libxl__gc *gc,
>>>>> uint32_t domid,
>>>>>        return 0;
>>>>>    }
>>>>> +/* Decrease the refcount of an sshm. When refcount reaches 0,
>>>>
>>>>
>>>> NIT: Libxl coding style regarding the comment seems to be uncleared
>>>> (Ian,
>>>> Wei?). But I feel keep /* and */ in separate line is nicer.
>>>
>>>
>>> I don't have an opinion here.
>>>
>>>>
>>>>> + * clean up the whole sshm path.
>>>>> + */
>>>>> +static void libxl__sshm_decref(libxl__gc *gc, xs_transaction_t xt,
>>>>> +                               const char *sshm_path)
>>>>> +static void libxl__sshm_del_slave(libxl__gc *gc, xs_transaction_t xt,
>>>>> +                                  uint32_t domid, const char *id, bool
>>>>> isretry)
>>>>> +{
>>>>> +    const char *slave_path, *begin_str, *end_str;
>>>>> +    uint64_t begin, end;
>>>>> +
>>>>> +    slave_path = GCSPRINTF("%s/slaves/%"PRIu32, SSHM_PATH(id), domid);
>>>>> +
>>>>> +    begin_str = libxl__xs_read(gc, xt, GCSPRINTF("%s/begin",
>>>>> slave_path));
>>>>> +    end_str = libxl__xs_read(gc, xt, GCSPRINTF("%s/end", slave_path));
>>>>> +    begin = strtoull(begin_str, NULL, 16);
>>>>> +    end = strtoull(end_str, NULL, 16);
>>>>> +
>>>>> +    /* Avoid calling do_unmap many times in case of xs transaction
>>>>> retry */
>>>>> +    if (!isretry)
>>>>> +        libxl__sshm_do_unmap(gc, domid, id, begin, end);
>>>>
>>>>
>>>> IHMO, by unmapping the regions in middle of the transaction, you
>>>> increase
>>>> the potential failure of it. I would move that out of the transaction
>>>> path.
>>>>
>>>> I would be interested to hear the opinion of the tools maintainers here.
>>>>
>>>
>>> If you move the unmap after the loop you create a window in which
>>> the pages are still mapped but the toolstack thinks they are unmapped.
>>>
>>> While the code as-is now makes sure (assuming no error in unmap) the
>>> pages are unmapped no later than the transaction is committed. I think
>>> this can be done by moving unmap before the transaction.
>>>
>>> Zhongze, do you think the unmap must be done inside the loop? What kind
>>> of invariants do you have in mind?
>>>
>>> Then there is the question of "what do we do if unmap fails". Honestly I
>>> don't have an answer. It seems rather screwed up in that case and I
>>> doubt there is much libxl can do to rectify things.
>>>
>>
>> I put the unmap inside the transaction because I want to make the whole
>> read_mapping_info->unmap->update_mapping_info process atomic. If
>> I put unmap outside the transaction:  after I read out the information
>> that I need to do the unmap, and before I do the unmap and decrease the
>> refcnt, there could be another instance of this code trying to do the same
>> thing, which might lead to race condition.
>
>
> AFAIU, the transaction is not a "global" lock. You will just not see the the
> change from the others during the transactions. Your changes will be only be
> visible at the end. So two transaction can be happily started concurrently,
> and try to do the unmap together. Not even your code will protect against
> that.
>
> So can you give a bit more details here?
>

It seems that I mistakenly use transaction as a global lock. Now I don't have
any reasons not putting the unmap out of the transaction, but this will break
the original transaction into two, and I do think that we need some explicit
locking here.

@Wei. Do you have any suggestions here?

Cheers,

Zhongze Liu

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.