[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH for-4.17?] x86/paging: return -EINVAL for paging domctls for dying domains


  • To: Edwin Torok <edvin.torok@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Wed, 9 Nov 2022 10:58:10 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=yzCST69uBHrwmMjnwqRwykHt4WvodA0N+auO43+NDAg=; b=k8gbVXEnigynhGz7LFUPLecJNoEX6F+5VszUErLXJu5aIwYrH5Zg6qSLzTPIkCp7MGb4rPVSQgd2CaLnwFu13w/KWs1nrtlfIcbeHB21NKZcOeK6k69LPhW2BlR8RuBoxWJeOerZWsiHaBdJrM8eF9hWofnZq2g/TaKsJLHW9gh7KvVCkovVsJd89JItTjFoOyKyqwVLOOJNr92TuxoQ/2rEj5DCsZRvZR2l1FI1bol78IKzbH0rn6Xwebot4qtfkbMHIdRxN6kf6g5ZQjUvz6D4w9Bj7UZ6zQa6GuPdrIsm3xbyk02pkPN5k2PRKNqF1XtjxCDdC9YvZdG2FJcq+Q==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=fLfYGIdTNKwgRpN6o1fEyYCIRrN3McCYqbAvUlE/cz7X3ZjRwvT3s2XHBseKtfpgMMdBKQ+K4zAl3bneSEo8wawd3ijhiiUUOzNFscfgDKzw6lNcP8gmOWYp2ut7otpRHt1GYbqi2n+OJYihRwSHtSifnulaV//+XMTgjXLgmc7IsqEAhnlVeFh/oUWoZI/TbDN633Z8SWznWq7VwN6gHRfNomJRjsYwyNREe5Z/pKNipBpWA7XdqQ93mQWecrmOoFZTuyIeT5yy81mhS/MOrb+/BiEveN+rIMyZnRdanCagJlx2ISWr+FpFnsiCjOlZndV/WoYs4SgCxiaENGyNjw==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: Roger Pau Monne <roger.pau@xxxxxxxxxx>, "Henry.Wang@xxxxxxx" <Henry.Wang@xxxxxxx>, Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>, George Dunlap <George.Dunlap@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Wed, 09 Nov 2022 09:58:20 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 09.11.2022 10:45, Edwin Torok wrote:
>> On 9 Nov 2022, at 07:48, Jan Beulich <jbeulich@xxxxxxxx> wrote:
>> On 08.11.2022 18:15, Roger Pau Monné wrote:
>>> On Tue, Nov 08, 2022 at 06:03:54PM +0100, Jan Beulich wrote:
>>>> On 08.11.2022 17:43, Roger Pau Monné wrote:
>>>>> On Tue, Nov 08, 2022 at 05:14:40PM +0100, Jan Beulich wrote:
>>>>>> On 08.11.2022 12:38, Roger Pau Monne wrote:
>>>>>>> Like on the Arm side, return -EINVAL when attempting to do a p2m
>>>>>>> operation on dying domains.
>>>>>>>
>>>>>>> The current logic returns 0 and leaves the domctl parameter
>>>>>>> uninitialized for any parameter fetching operations (like the
>>>>>>> GET_ALLOCATION operation), which is not helpful from a toolstack point
>>>>>>> of view, because there's no indication that the data hasn't been
>>>>>>> fetched.
>>>>>>
>>>>>> While I can see how the present behavior is problematic when it comes
>>>>>> to consuming supposedly returned data, ...
>>>>>>
>>>>>>> --- a/xen/arch/x86/mm/paging.c
>>>>>>> +++ b/xen/arch/x86/mm/paging.c
>>>>>>> @@ -694,9 +694,10 @@ int paging_domctl(struct domain *d, struct 
>>>>>>> xen_domctl_shadow_op *sc,
>>>>>>>
>>>>>>>     if ( unlikely(d->is_dying) )
>>>>>>>     {
>>>>>>> -        gdprintk(XENLOG_INFO, "Ignoring paging op on dying domain 
>>>>>>> %u\n",
>>>>>>> +        gdprintk(XENLOG_INFO,
>>>>>>> +                 "Tried to do a paging domctl op on dying domain %u\n",
>>>>>>>                  d->domain_id);
>>>>>>> -        return 0;
>>>>>>> +        return -EINVAL;
>>>>>>>     }
>>>>>>
>>>>>> ... going from "success" to "failure" here has a meaningful risk of
>>>>>> regressing callers. It is my understanding that it was deliberate to
>>>>>> mimic success in this case (without meaning to assign "good" or "bad"
>>>>>> to that decision).
>>>>>
>>>>> I would assume that was the original intention, yes, albeit the commit
>>>>> message doesn't go into details about why mimicking success is
>>>>> required, it's very well possible the code relying on this was xend.
>>>>
>>>> Quite possible, but you never know who else has cloned code from there.
>>>>
>>>>>> Can you instead fill the data to be returned in
>>>>>> some simple enough way? I assume a mere memset() isn't going to be
>>>>>> good enough, though (albeit public/domctl.h doesn't explicitly name
>>>>>> any input-only fields, so it may not be necessary to preserve
>>>>>> anything). Maybe zeroing ->mb and ->stats would do?
>>>>>
>>>>> Hm, it still feels kind of wrong.  We do return errors elsewhere for
>>>>> operations attempted against dying domains, and that seems all fine,
>>>>> not sure why paging operations need to be different in this regard.
>>>>> Arm does also return -EINVAL in that case.
>>>>>
>>>>> So what about postponing this change to 4.18 in order to avoid
>>>>> surprises, but then taking it in its current form at the start of the
>>>>> development window, as to have time to detect any issues?
>>>>
>>>> Maybe, but to be honest I'm not convinced. Arm can't really be taken
>>>> for comparison, since the op is pretty new there iirc.
>>>
>>> Indeed, but the tools code paths are likely shared between x86 and
>>> Arm, as the hypercalls are the same.
>>
>> On x86 we have both xc_shadow_control() and (functional)
>> xc_logdirty_control(); on Arm only the former is used, while the latter
>> would also be impacted by your change. Plus you're not accounting for
>> external tool stacks (like xend would be if anyone had cared to forward
>> port it, when - as you said earlier - the suspicion is that the original
>> change was made to "please" xend).
> 
> I don't see how returning random uninitialised data (current behaviour) or 
> wrong data (all zeroes) is better than returning an explicit error code.

I didn't say anything like that. What I did say is that we cannot lightly
move from "success" to "failure".

> And even this change happens to break some code, it'd do so with a clear 
> error code and in a reproducible way, and then such breakage can be easily 
> fixed in the affected piece of code
> (e.g. a toolstack other than XAPI which was not looking for errors from this 
> domctl call)

My experience with error messages (including the conveying of error codes)
by the tool stack is rather poor. I also dare to question the
"reproducible" aspect: The main risk I see here is with a multi-threaded
tool stack where one thread hasn't become / been made properly aware of
another one being in the process of cleaning up after a domain. Its racy
issuing of further domctl-s may very well not be nicely reproducible.

> AFAICT XEN_DOMCTL_SHADOW_OP_GET_ALLOCATION is only used in the OCaml C stubs 
> (i.e. used XAPI/xenopsd) which always checks return value
> and raises OCaml exception and in python lowlevel libs, which checks return 
> value and raises python exception.

But XEN_DOMCTL_SHADOW_OP_GET_ALLOCATION isn't the only sub-op affected here,
is it?

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.