[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v9 02/10] scripts: Coccinelle script to use ERRP_AUTO_PROPAGATE()


  • To: Markus Armbruster <armbru@xxxxxxxxxx>
  • From: Vladimir Sementsov-Ogievskiy <vsementsov@xxxxxxxxxxxxx>
  • Date: Thu, 19 Mar 2020 15:12:07 +0300
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=virtuozzo.com; dmarc=pass action=none header.from=virtuozzo.com; dkim=pass header.d=virtuozzo.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=PoMIg7xAqBHMlZ9bZu+U4I6A5l7J1RwQ4KLoChN3HgE=; b=oglKtx4av5fSPC2MnJ5NyW8q6IChPi+S+sxEK10Ia2/0ftgEmzSR0P4p5dmvtmoVLf8t/3ESE+B6REsU1/qTWIa/BirYA5dJ78/px7ea8aeC3NwsKUrFN9BfHAj9E6/dHFaHBy6AnN9GlTJ1m7jSB3G6rs4vMrf566BNdIPQog1iqB4J0j6gXFqoWjbZLkKMNYl1W53XO1Szr0hZcaEayzNPXwWjllxpDOG+4d6EIsEDmAvES65qTHa2P5R0iANXxvt8e5i9q4u2LkqJb/kuElsxDimhPRa9vgMLZDglN8btrEc8sOwSggX8FGGB1EAb8COR/pJj5IHOOgBDZkmymg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=mhxqduhCmooaSgC9whD0HxFIVnELbA85xZoXvUTbvME51jv+r7gLej6GJCPucM8LE2mgKDPARXdvnchIqcAQRJdn2DzBO71VpVxj3JmbZ7h5qfHOLDWW44c+rqYs7zN37DsS8Q9VlDWvDQf+h8iOkWsleHOFKVEJ6z64C054k0rtgjP8/3nTHm8Cw4U4NQzAhHb9VHs6pVRFbwCtuWF/+QM47D4+j0pRdkxH1YdBEj3v1ipt1hdYh992BpgDt80Wo7pjTMAA4PqPGDtjE/hqY0jwZxBnuc4tFaniKq9V0bXCap0tMizN/xa1ULbUnGuTAEfIR/7CVhJCm4w4gcQ+ug==
  • Authentication-results: spf=none (sender IP is ) smtp.mailfrom=vsementsov@xxxxxxxxxxxxx;
  • Cc: Kevin Wolf <kwolf@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, qemu-block@xxxxxxxxxx, Paul Durrant <paul@xxxxxxx>, Philippe Mathieu-Daudé <philmd@xxxxxxxxxx>, Christian Schoenebeck <qemu_oss@xxxxxxxxxxxxx>, Michael Roth <mdroth@xxxxxxxxxxxxxxxxxx>, qemu-devel@xxxxxxxxxx, Greg Kurz <groug@xxxxxxxx>, Gerd Hoffmann <kraxel@xxxxxxxxxx>, Stefan Hajnoczi <stefanha@xxxxxxxxxx>, Anthony Perard <anthony.perard@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, Max Reitz <mreitz@xxxxxxxxxx>, Laszlo Ersek <lersek@xxxxxxxxxx>, Stefan Berger <stefanb@xxxxxxxxxxxxx>
  • Delivery-date: Thu, 19 Mar 2020 12:12:24 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

19.03.2020 13:45, Markus Armbruster wrote:
Vladimir Sementsov-Ogievskiy <vsementsov@xxxxxxxxxxxxx> writes:

17.03.2020 13:39, Markus Armbruster wrote:
Vladimir Sementsov-Ogievskiy <vsementsov@xxxxxxxxxxxxx> writes:

16.03.2020 11:21, Markus Armbruster wrote:
Vladimir Sementsov-Ogievskiy <vsementsov@xxxxxxxxxxxxx> writes:

On 14.03.2020 00:54, Markus Armbruster wrote:
Vladimir Sementsov-Ogievskiy <vsementsov@xxxxxxxxxxxxx> writes:

13.03.2020 18:42, Markus Armbruster wrote:
Vladimir Sementsov-Ogievskiy <vsementsov@xxxxxxxxxxxxx> writes:

12.03.2020 19:36, Markus Armbruster wrote:
I may have a second look tomorrow with fresher eyes, but let's get this
out now as is.

Vladimir Sementsov-Ogievskiy <vsementsov@xxxxxxxxxxxxx> writes:
[...]
+@@
+
+ fn(..., Error ** ____, ...)
+ {
+     ...
+     Error *local_err = NULL;
+     ... when any
+     Error *local_err2 = NULL;
+     ... when any
+ }

This flags functions that have more than one declaration along any
control flow path.  It doesn't flag this one:

        void gnat(bool b, Error **errp)
        {
            if (b) {
                Error *local_err = NULL;
                foo(arg, &local_err);
                error_propagate(errp, local_err);
            } else {
                Error *local_err = NULL;
                bar(arg, &local_err);
                error_propagate(errp, local_err);
            }
        }

The Coccinelle script does the right thing for this one regardless.

I'd prefer to have such functions flagged, too.  But spending time on
convincing Coccinelle to do it for me is not worthwhile; I can simply
search the diff produced by Coccinelle for deletions of declarations
that are not indented exactly four spaces.

But if we keep this rule, we should adjust its comment

        // Warn several Error * definitions.

because it sure suggests it also catches functions like the one I gave
above.

Hmm, yes.. We can write "Warn several Error * definitions in _one_
control flow (it's not so trivial to match _any_ case with several
definitions with coccinelle)" or something like this.

Ha, "trivial" reminds me of a story.  The math professor, after having
spent a good chunk of his lecture developing a proof on the blackboad
turns to the audience to explain why this little part doesn't require
proof with the words familiar to any math student "and this is trivial."
Pause, puzzled look...  "Is it trivial?"  Pause, storms out of the
lecture hall.  A minute or three pass.  Professor comes back beaming,
"it is trivial!", and proceeds with the proof.

My point is: it might be trivial with Coccinelle once you know how to do
it.  We don't.

Suggest "(can't figure out how to match several definitions regardless
of control flow)".

Wrong too, because I can:) for example, chaining two rules, catching the
positions of definition and check that they are different.. Or, some
cheating with python script.. That's why I wrote "not trivial",

So, most correct would be "(can't figure out how to simply match several 
definitions regardless
of control flow)".

Works for me.

But again, coccinelle is for matching control flows, so its probably impossible 
to match such thing..
[...]
OK, I almost OK with it, the only thing I doubt a bit is the following:

We want to keep rule1.local_err inheritance to keep connection with
local_err definition.

Yes.

Interesting, when we have both rule1.fn and rule1.local_err inherited,
do we inherit them in separate (i.e. all possible combinations of fn
and local_err symbols from rule1) or do we inherit a pair, i.e. only
fn/local_err pairs, found by rule1? If the latter is correct, that
with your script we loss this pair inheritance, and go to all possible
combinations of fn and local_err from rule1, possibly adding some wrong
conversion (OK, you've checked that no such cases in current code tree).

The chaining "identifier rule1.FOO" is by name.  It's reliable only as
long as there is exactly one instance of the name.

We already discussed the case of the function name: if there are two
instances of foo(), and rule1 matches only one of them, then we
nevertheless apply the rules chained to rule1 to both.  Because that can
be wrong, you came up with the ___ trick, which chains reliably.

The same issue exists with the variable name: if there are two instances
of @local_err, and rule1 matches only one of them, then we nevertheless
apply the rules chained to rule1 to both.  Can also be wrong.

What are the conditions for "wrong"?

Because the ___ chaining is reliable, we know rule1 matched the
function, i.e. it has a parameter Error **errp, and it has a automatic
variable Error *local_err = NULL.

We're good as long as *all* identifiers @local_err in this function are
declared that way.  This seems quite likely.  It's not certain, though.

Since nested declarations of Error ** variables are rare, we can rely on
review to ensure we transform these functions correctly.

So, dropping inheritance in check-rules makes sence, as it may match
(and warn) more interesting cases.

But for other rules, I'd prefere to be safer, and explictly inherit all
actually inherited identifiers..

I still can't see what chaining by function name in addition to the ___
chaining buys us.

I'll check this thing soon. And resend today.

Checked.

Yes, it inherits pair of fn and local_err, and it definitely makes sense. It 
more stable.

Consider the following example:

# cat a.c
int f1(Error **errp)
{
     Error *err1 = NULL;
     int err2 = 0;

     error_propagate(errp, err1);

     return err2;
}

int f2(Error **errp)
{
     Error *err2 = NULL;
     int err1 = 0;

     error_propagate(errp, err2);

     return err1;
}


My script works correct and produces this change:
--- a.c
+++ /tmp/cocci-output-1753-10842a-a.c
@@ -1,19 +1,15 @@
  int f1(Error **errp)
  {
-    Error *err1 = NULL;
+    ERRP_AUTO_PROPAGATE();
      int err2 = 0;

-    error_propagate(errp, err1);
-
      return err2;
  }

  int f2(Error **errp)
  {
-    Error *err2 = NULL;
+    ERRP_AUTO_PROPAGATE();
      int err1 = 0;

-    error_propagate(errp, err2);
-
      return err1;
  }


But yours script is caught:
--- a.c
+++ /tmp/cocci-output-1814-b9b681-a.c
@@ -1,19 +1,15 @@
  int f1(Error **errp)
  {
-    Error *err1 = NULL;
+    ERRP_AUTO_PROPAGATE();
      int err2 = 0;

-    error_propagate(errp, err1);
-
-    return err2;
+    return *errp;
  }

  int f2(Error **errp)
  {
-    Error *err2 = NULL;
+    ERRP_AUTO_PROPAGATE();
      int err1 = 0;

-    error_propagate(errp, err2);
-
-    return err1;
+    return *errp;
  }


- see, it touches err1, which is unrelated to Error in f2. Hmm,
interesting that it doesn't want to convert err1 declaration:)

- this is because relation between local_err and fn is lost.

Let me try to think this through.

rule1 matches functions that propagate from a local variable @local_err
to parameter @errp.  It uses the ___ hack to reliably tag the function.
Later rules that should only apply to these functions can match ___.

These later rules each provide a part of the total error propagation
transformation.  They must transform exactly the @local_err and @errp
matched by rule1 in each function.

Your solution is to constrain the identifiers, i.e.

     identifier rule1.fn, rule1.local_err;

If rule1 matches only one function named foo(), and within that foo()
the local variable @local_err rule1 matches actually binds all
occurences of the identifier @local_err, the constraint is reliable.

Else, the constraint may still accept occurences of @local_err not bound
to the variable matched by rule1.

Example 1:

     int bar(Error **errp)
     {
         if (pred()) {
             Error *local_err = NULL;

             error_setg(&local_err, "zzzt");
             error_propagate(errp, local_err);
         } else {
             int local_err = 0;
             return local_err;
         }
         return 0;
     }

rule1 matches the first @local_err variable, and not the second one.  We
must transform occurences of the first one, and not occurences of the
second one.  We do transform all:

      int bar(Error **errp)
      {
     +    ERRP_AUTO_PROPAGATE();
          if (pred()) {
     -        Error *local_err = NULL;
     -
     -        error_setg(&local_err, "zzzt");
     -        error_propagate(errp, local_err);
     +        error_setg(errp, "zzzt");
          } else {
              int local_err = 0;
     -        return local_err;
     +        return *errp;
          }
          return 0;
      }


Aha, good example. And we even do not warn it.

Example 2:

     int foo(Error **errp)
     {
         Error *local_err = NULL;

         error_setg(&local_err, "zzzt");
         error_propagate(errp, local_err);
         return 0;
     }

     int foo(Error **errp)
     {
         Error *err = NULL;
         int local_err = 0;

         error_setg(&local_err, "zzzt");
         error_propagate(errp, err);
         return local_err;
     }

rule1 matches @local_err in the first foo(), and @err in the second one.
We must transform @local_err in the first one(), and @err in the second
one.  We do transform both in both:

      int foo(Error **errp)
      {
     -    Error *local_err = NULL;
     +    ERRP_AUTO_PROPAGATE();

     -    error_setg(&local_err, "zzzt");
     -    error_propagate(errp, local_err);
     +    error_setg(errp, "zzzt");
          return 0;
      }

      int foo(Error **errp)
      {
     -    Error *err = NULL;
     +    ERRP_AUTO_PROPAGATE();
          int local_err = 0;

     -    error_setg(&local_err, "zzzt");
     -    error_propagate(errp, err);
     -    return local_err;
     +    error_setg(errp, "zzzt");
     +    return *errp;
      }

Constraining only the variable identifier like I proposed is even less
reliable, as you demonstrated: then the issue in example 2 exists even
for differently named functions.

For a reliable solution, we could use perhaps use the ___ hack again:
have rule1 rename @local_err it actually matches. But to be honest, my
appetite for another round of wrestling with Coccinelle isn't what it
used to be. >
I think we can do without as long as we're well aware of the script's
limitations, and we're confident we can detect problematic cases.

Detecting transformation of multiple functions with the same name should
be easy.

Detecting occurences of identifiers not bound by a certain variable
should be feasible: we find and review every transformed function that
doesn't declare the variable in its outermost scope.

Since "well aware" is going to erode with time, we may want to delete
the script when we're done converting.

So, understanding that there no such cases in the whole tree, and even
if your patch works faster on the whole tree, I still don't want to
drop inheritance, because it's just a correct thing to do. Yes, we've
added ____ helper. It helps to avoid some problems. Pair-inheritance
helps to avoid another problems. I understand, that there still may
other, not-covered problems, but better to be as safe as possible. And
inheritance here is native and correct thing to do, even with our ____
additional helper. What do you think?

I wouldn't call it correct.  It's still unreliable, but less so than
without the function name constraint.  That makes it less wrong.

Agree.


100% reliable would be nice, but not at any cost.  Something we're
reasonably confident to get right should be good enough.

To be confident, we need to understand the script's limitations, and how
to compensate for them.  I figure we do now.  You too?


I will not be surprised, if we missed some more interesting cases :)
But we should proceed. What is our plan? Will you queue v10 for 5.1?

--
Best regards,
Vladimir

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.