[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH for-4.17?] x86/paging: return -EINVAL for paging domctls for dying domains
On 09.11.2022 10:45, Edwin Torok wrote: >> On 9 Nov 2022, at 07:48, Jan Beulich <jbeulich@xxxxxxxx> wrote: >> On 08.11.2022 18:15, Roger Pau Monné wrote: >>> On Tue, Nov 08, 2022 at 06:03:54PM +0100, Jan Beulich wrote: >>>> On 08.11.2022 17:43, Roger Pau Monné wrote: >>>>> On Tue, Nov 08, 2022 at 05:14:40PM +0100, Jan Beulich wrote: >>>>>> On 08.11.2022 12:38, Roger Pau Monne wrote: >>>>>>> Like on the Arm side, return -EINVAL when attempting to do a p2m >>>>>>> operation on dying domains. >>>>>>> >>>>>>> The current logic returns 0 and leaves the domctl parameter >>>>>>> uninitialized for any parameter fetching operations (like the >>>>>>> GET_ALLOCATION operation), which is not helpful from a toolstack point >>>>>>> of view, because there's no indication that the data hasn't been >>>>>>> fetched. >>>>>> >>>>>> While I can see how the present behavior is problematic when it comes >>>>>> to consuming supposedly returned data, ... >>>>>> >>>>>>> --- a/xen/arch/x86/mm/paging.c >>>>>>> +++ b/xen/arch/x86/mm/paging.c >>>>>>> @@ -694,9 +694,10 @@ int paging_domctl(struct domain *d, struct >>>>>>> xen_domctl_shadow_op *sc, >>>>>>> >>>>>>> if ( unlikely(d->is_dying) ) >>>>>>> { >>>>>>> - gdprintk(XENLOG_INFO, "Ignoring paging op on dying domain >>>>>>> %u\n", >>>>>>> + gdprintk(XENLOG_INFO, >>>>>>> + "Tried to do a paging domctl op on dying domain %u\n", >>>>>>> d->domain_id); >>>>>>> - return 0; >>>>>>> + return -EINVAL; >>>>>>> } >>>>>> >>>>>> ... going from "success" to "failure" here has a meaningful risk of >>>>>> regressing callers. It is my understanding that it was deliberate to >>>>>> mimic success in this case (without meaning to assign "good" or "bad" >>>>>> to that decision). >>>>> >>>>> I would assume that was the original intention, yes, albeit the commit >>>>> message doesn't go into details about why mimicking success is >>>>> required, it's very well possible the code relying on this was xend. >>>> >>>> Quite possible, but you never know who else has cloned code from there. >>>> >>>>>> Can you instead fill the data to be returned in >>>>>> some simple enough way? I assume a mere memset() isn't going to be >>>>>> good enough, though (albeit public/domctl.h doesn't explicitly name >>>>>> any input-only fields, so it may not be necessary to preserve >>>>>> anything). Maybe zeroing ->mb and ->stats would do? >>>>> >>>>> Hm, it still feels kind of wrong. We do return errors elsewhere for >>>>> operations attempted against dying domains, and that seems all fine, >>>>> not sure why paging operations need to be different in this regard. >>>>> Arm does also return -EINVAL in that case. >>>>> >>>>> So what about postponing this change to 4.18 in order to avoid >>>>> surprises, but then taking it in its current form at the start of the >>>>> development window, as to have time to detect any issues? >>>> >>>> Maybe, but to be honest I'm not convinced. Arm can't really be taken >>>> for comparison, since the op is pretty new there iirc. >>> >>> Indeed, but the tools code paths are likely shared between x86 and >>> Arm, as the hypercalls are the same. >> >> On x86 we have both xc_shadow_control() and (functional) >> xc_logdirty_control(); on Arm only the former is used, while the latter >> would also be impacted by your change. Plus you're not accounting for >> external tool stacks (like xend would be if anyone had cared to forward >> port it, when - as you said earlier - the suspicion is that the original >> change was made to "please" xend). > > I don't see how returning random uninitialised data (current behaviour) or > wrong data (all zeroes) is better than returning an explicit error code. I didn't say anything like that. What I did say is that we cannot lightly move from "success" to "failure". > And even this change happens to break some code, it'd do so with a clear > error code and in a reproducible way, and then such breakage can be easily > fixed in the affected piece of code > (e.g. a toolstack other than XAPI which was not looking for errors from this > domctl call) My experience with error messages (including the conveying of error codes) by the tool stack is rather poor. I also dare to question the "reproducible" aspect: The main risk I see here is with a multi-threaded tool stack where one thread hasn't become / been made properly aware of another one being in the process of cleaning up after a domain. Its racy issuing of further domctl-s may very well not be nicely reproducible. > AFAICT XEN_DOMCTL_SHADOW_OP_GET_ALLOCATION is only used in the OCaml C stubs > (i.e. used XAPI/xenopsd) which always checks return value > and raises OCaml exception and in python lowlevel libs, which checks return > value and raises python exception. But XEN_DOMCTL_SHADOW_OP_GET_ALLOCATION isn't the only sub-op affected here, is it? Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |