[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v9 02/10] scripts: Coccinelle script to use ERRP_AUTO_PROPAGATE()



Vladimir Sementsov-Ogievskiy <vsementsov@xxxxxxxxxxxxx> writes:

> 17.03.2020 13:39, Markus Armbruster wrote:
>> Vladimir Sementsov-Ogievskiy <vsementsov@xxxxxxxxxxxxx> writes:
>>
>>> 16.03.2020 11:21, Markus Armbruster wrote:
>>>> Vladimir Sementsov-Ogievskiy <vsementsov@xxxxxxxxxxxxx> writes:
>>>>
>>>>> On 14.03.2020 00:54, Markus Armbruster wrote:
>>>>>> Vladimir Sementsov-Ogievskiy <vsementsov@xxxxxxxxxxxxx> writes:
>>>>>>
>>>>>>> 13.03.2020 18:42, Markus Armbruster wrote:
>>>>>>>> Vladimir Sementsov-Ogievskiy <vsementsov@xxxxxxxxxxxxx> writes:
>>>>>>>>
>>>>>>>>> 12.03.2020 19:36, Markus Armbruster wrote:
>>>>>>>>>> I may have a second look tomorrow with fresher eyes, but let's get 
>>>>>>>>>> this
>>>>>>>>>> out now as is.
>>>>>>>>>>
>>>>>>>>>> Vladimir Sementsov-Ogievskiy <vsementsov@xxxxxxxxxxxxx> writes:
>> [...]
>>>>>>>>>>> +@@
>>>>>>>>>>> +
>>>>>>>>>>> + fn(..., Error ** ____, ...)
>>>>>>>>>>> + {
>>>>>>>>>>> +     ...
>>>>>>>>>>> +     Error *local_err = NULL;
>>>>>>>>>>> +     ... when any
>>>>>>>>>>> +     Error *local_err2 = NULL;
>>>>>>>>>>> +     ... when any
>>>>>>>>>>> + }
>>>>>>
>>>>>> This flags functions that have more than one declaration along any
>>>>>> control flow path.  It doesn't flag this one:
>>>>>>
>>>>>>        void gnat(bool b, Error **errp)
>>>>>>        {
>>>>>>            if (b) {
>>>>>>                Error *local_err = NULL;
>>>>>>                foo(arg, &local_err);
>>>>>>                error_propagate(errp, local_err);
>>>>>>            } else {
>>>>>>                Error *local_err = NULL;
>>>>>>                bar(arg, &local_err);
>>>>>>                error_propagate(errp, local_err);
>>>>>>            }
>>>>>>        }
>>>>>>
>>>>>> The Coccinelle script does the right thing for this one regardless.
>>>>>>
>>>>>> I'd prefer to have such functions flagged, too.  But spending time on
>>>>>> convincing Coccinelle to do it for me is not worthwhile; I can simply
>>>>>> search the diff produced by Coccinelle for deletions of declarations
>>>>>> that are not indented exactly four spaces.
>>>>>>
>>>>>> But if we keep this rule, we should adjust its comment
>>>>>>
>>>>>>        // Warn several Error * definitions.
>>>>>>
>>>>>> because it sure suggests it also catches functions like the one I gave
>>>>>> above.
>>>>>
>>>>> Hmm, yes.. We can write "Warn several Error * definitions in _one_
>>>>> control flow (it's not so trivial to match _any_ case with several
>>>>> definitions with coccinelle)" or something like this.
>>>>
>>>> Ha, "trivial" reminds me of a story.  The math professor, after having
>>>> spent a good chunk of his lecture developing a proof on the blackboad
>>>> turns to the audience to explain why this little part doesn't require
>>>> proof with the words familiar to any math student "and this is trivial."
>>>> Pause, puzzled look...  "Is it trivial?"  Pause, storms out of the
>>>> lecture hall.  A minute or three pass.  Professor comes back beaming,
>>>> "it is trivial!", and proceeds with the proof.
>>>>
>>>> My point is: it might be trivial with Coccinelle once you know how to do
>>>> it.  We don't.
>>>>
>>>> Suggest "(can't figure out how to match several definitions regardless
>>>> of control flow)".
>>>
>>> Wrong too, because I can:) for example, chaining two rules, catching the
>>> positions of definition and check that they are different.. Or, some
>>> cheating with python script.. That's why I wrote "not trivial",
>>>
>>> So, most correct would be "(can't figure out how to simply match several 
>>> definitions regardless
>>>> of control flow)".
>>
>> Works for me.
>>
>>> But again, coccinelle is for matching control flows, so its probably 
>>> impossible to match such thing..
>> [...]
>>>>> OK, I almost OK with it, the only thing I doubt a bit is the following:
>>>>>
>>>>> We want to keep rule1.local_err inheritance to keep connection with
>>>>> local_err definition.
>>>>
>>>> Yes.
>>>>
>>>>> Interesting, when we have both rule1.fn and rule1.local_err inherited,
>>>>> do we inherit them in separate (i.e. all possible combinations of fn
>>>>> and local_err symbols from rule1) or do we inherit a pair, i.e. only
>>>>> fn/local_err pairs, found by rule1? If the latter is correct, that
>>>>> with your script we loss this pair inheritance, and go to all possible
>>>>> combinations of fn and local_err from rule1, possibly adding some wrong
>>>>> conversion (OK, you've checked that no such cases in current code tree).
>>>>
>>>> The chaining "identifier rule1.FOO" is by name.  It's reliable only as
>>>> long as there is exactly one instance of the name.
>>>>
>>>> We already discussed the case of the function name: if there are two
>>>> instances of foo(), and rule1 matches only one of them, then we
>>>> nevertheless apply the rules chained to rule1 to both.  Because that can
>>>> be wrong, you came up with the ___ trick, which chains reliably.
>>>>
>>>> The same issue exists with the variable name: if there are two instances
>>>> of @local_err, and rule1 matches only one of them, then we nevertheless
>>>> apply the rules chained to rule1 to both.  Can also be wrong.
>>>>
>>>> What are the conditions for "wrong"?
>>>>
>>>> Because the ___ chaining is reliable, we know rule1 matched the
>>>> function, i.e. it has a parameter Error **errp, and it has a automatic
>>>> variable Error *local_err = NULL.
>>>>
>>>> We're good as long as *all* identifiers @local_err in this function are
>>>> declared that way.  This seems quite likely.  It's not certain, though.
>>>>
>>>> Since nested declarations of Error ** variables are rare, we can rely on
>>>> review to ensure we transform these functions correctly.
>>>>
>>>>> So, dropping inheritance in check-rules makes sence, as it may match
>>>>> (and warn) more interesting cases.
>>>>>
>>>>> But for other rules, I'd prefere to be safer, and explictly inherit all
>>>>> actually inherited identifiers..
>>>>
>>>> I still can't see what chaining by function name in addition to the ___
>>>> chaining buys us.
>>>
>>> I'll check this thing soon. And resend today.
>
> Checked.
>
> Yes, it inherits pair of fn and local_err, and it definitely makes sense. It 
> more stable.
>
> Consider the following example:
>
> # cat a.c
> int f1(Error **errp)
> {
>     Error *err1 = NULL;
>     int err2 = 0;
>
>     error_propagate(errp, err1);
>
>     return err2;
> }
>
> int f2(Error **errp)
> {
>     Error *err2 = NULL;
>     int err1 = 0;
>
>     error_propagate(errp, err2);
>
>     return err1;
> }
>
>
> My script works correct and produces this change:
> --- a.c
> +++ /tmp/cocci-output-1753-10842a-a.c
> @@ -1,19 +1,15 @@
>  int f1(Error **errp)
>  {
> -    Error *err1 = NULL;
> +    ERRP_AUTO_PROPAGATE();
>      int err2 = 0;
>
> -    error_propagate(errp, err1);
> -
>      return err2;
>  }
>
>  int f2(Error **errp)
>  {
> -    Error *err2 = NULL;
> +    ERRP_AUTO_PROPAGATE();
>      int err1 = 0;
>
> -    error_propagate(errp, err2);
> -
>      return err1;
>  }
>
>
> But yours script is caught:
> --- a.c
> +++ /tmp/cocci-output-1814-b9b681-a.c
> @@ -1,19 +1,15 @@
>  int f1(Error **errp)
>  {
> -    Error *err1 = NULL;
> +    ERRP_AUTO_PROPAGATE();
>      int err2 = 0;
>
> -    error_propagate(errp, err1);
> -
> -    return err2;
> +    return *errp;
>  }
>
>  int f2(Error **errp)
>  {
> -    Error *err2 = NULL;
> +    ERRP_AUTO_PROPAGATE();
>      int err1 = 0;
>
> -    error_propagate(errp, err2);
> -
> -    return err1;
> +    return *errp;
>  }
>
>
> - see, it touches err1, which is unrelated to Error in f2. Hmm,
> interesting that it doesn't want to convert err1 declaration:)
>
> - this is because relation between local_err and fn is lost.

Let me try to think this through.

rule1 matches functions that propagate from a local variable @local_err
to parameter @errp.  It uses the ___ hack to reliably tag the function.
Later rules that should only apply to these functions can match ___.

These later rules each provide a part of the total error propagation
transformation.  They must transform exactly the @local_err and @errp
matched by rule1 in each function.

Your solution is to constrain the identifiers, i.e.

    identifier rule1.fn, rule1.local_err;

If rule1 matches only one function named foo(), and within that foo()
the local variable @local_err rule1 matches actually binds all
occurences of the identifier @local_err, the constraint is reliable.

Else, the constraint may still accept occurences of @local_err not bound
to the variable matched by rule1.

Example 1:

    int bar(Error **errp)
    {
        if (pred()) {
            Error *local_err = NULL;

            error_setg(&local_err, "zzzt");
            error_propagate(errp, local_err);
        } else {
            int local_err = 0;
            return local_err;
        }
        return 0;
    }

rule1 matches the first @local_err variable, and not the second one.  We
must transform occurences of the first one, and not occurences of the
second one.  We do transform all:

     int bar(Error **errp)
     {
    +    ERRP_AUTO_PROPAGATE();
         if (pred()) {
    -        Error *local_err = NULL;
    -
    -        error_setg(&local_err, "zzzt");
    -        error_propagate(errp, local_err);
    +        error_setg(errp, "zzzt");
         } else {
             int local_err = 0;
    -        return local_err;
    +        return *errp;
         }
         return 0;
     }

Example 2:

    int foo(Error **errp)
    {
        Error *local_err = NULL;

        error_setg(&local_err, "zzzt");
        error_propagate(errp, local_err);
        return 0;
    }

    int foo(Error **errp)
    {
        Error *err = NULL;
        int local_err = 0;

        error_setg(&local_err, "zzzt");
        error_propagate(errp, err);
        return local_err;
    }

rule1 matches @local_err in the first foo(), and @err in the second one.
We must transform @local_err in the first one(), and @err in the second
one.  We do transform both in both:

     int foo(Error **errp)
     {
    -    Error *local_err = NULL;
    +    ERRP_AUTO_PROPAGATE();

    -    error_setg(&local_err, "zzzt");
    -    error_propagate(errp, local_err);
    +    error_setg(errp, "zzzt");
         return 0;
     }

     int foo(Error **errp)
     {
    -    Error *err = NULL;
    +    ERRP_AUTO_PROPAGATE();
         int local_err = 0;

    -    error_setg(&local_err, "zzzt");
    -    error_propagate(errp, err);
    -    return local_err;
    +    error_setg(errp, "zzzt");
    +    return *errp;
     }

Constraining only the variable identifier like I proposed is even less
reliable, as you demonstrated: then the issue in example 2 exists even
for differently named functions.

For a reliable solution, we could use perhaps use the ___ hack again:
have rule1 rename @local_err it actually matches.  But to be honest, my
appetite for another round of wrestling with Coccinelle isn't what it
used to be.

I think we can do without as long as we're well aware of the script's
limitations, and we're confident we can detect problematic cases.

Detecting transformation of multiple functions with the same name should
be easy.

Detecting occurences of identifiers not bound by a certain variable
should be feasible: we find and review every transformed function that
doesn't declare the variable in its outermost scope.

Since "well aware" is going to erode with time, we may want to delete
the script when we're done converting.

> So, understanding that there no such cases in the whole tree, and even
> if your patch works faster on the whole tree, I still don't want to
> drop inheritance, because it's just a correct thing to do. Yes, we've
> added ____ helper. It helps to avoid some problems. Pair-inheritance
> helps to avoid another problems. I understand, that there still may
> other, not-covered problems, but better to be as safe as possible. And
> inheritance here is native and correct thing to do, even with our ____
> additional helper. What do you think?

I wouldn't call it correct.  It's still unreliable, but less so than
without the function name constraint.  That makes it less wrong.

100% reliable would be nice, but not at any cost.  Something we're
reasonably confident to get right should be good enough.

To be confident, we need to understand the script's limitations, and how
to compensate for them.  I figure we do now.  You too?

[...]


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.