[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Proposal for deviations in static analyser findings


  • To: Luca Fancellu <Luca.Fancellu@xxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Thu, 13 Oct 2022 09:50:17 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=hCdhuOo8PfwjO9SwCbYCLwTfZq8gL2IBG0UP5nKIdTo=; b=jSsNjFd3+Zn/EbeTWB18odtwvtc6U/9FNwJW7q50EWYz9jSj8EfZs8bs9BpyNKx0AMyETjtZ5oSg5Lz/cIUDiOZsH+rSNQtHF3CfQ0Jmrayqx/IQGiE2m6tKisc1g+nRBfQVwzftUq/59SuWcMwjHkV0X2P0LY0Xog7IRHGnC6ecgFboJ5jrjKxiNkuIfJN9f9L+quTVe8aSRwR30PKfWav+adYY6aoCG/OFdsVgbhQFdPvEDDiEqphZp1vHulaYndCLcDrIvvdf2LPnC9HNxt1crUljjq8WG58WU9GRkx+QXvtxMKNZYUAakiVSmbaTQ1SEOGEAa2yuEu4D2Ta2YQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=OWGbKS6FTBqsdMQntUDdzWCgCYsfurK8YQ7iYhHQTOdL3Pr/UCqO1s3dInYuQwrNsVVoXZO4jrxpzMw2hVhZgGugfYU7egSQiCaq+NPxptwRbTyaf4PJg5Z3cBQbo2APnM70k3apyxILQJ/dz6MEXnmw2PB1N/7ntOB0gy2nM6Ol/FOGCmfV2WryvIJyn/P8AOfmWDAKIRmAefSJLDK5EusG91MiIh13de3N4lVUnat8TKXR3FnvXYGkecHyRlgBByNrYDbGVcpJ4DaB2MQDTIb2+s78zdCfwJrKgj1ShYwGbWK4UxZ66z/kfibgj7dzzWQdQ0QzmS3Khf/+wH42/g==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: Bertrand Marquis <Bertrand.Marquis@xxxxxxx>, Wei Chen <Wei.Chen@xxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Thu, 13 Oct 2022 07:50:43 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 12.10.2022 18:00, Luca Fancellu wrote:
> Documenting violations
> ======================

I expect this is mean to become an in-tree document at some point?

> Static analysers are used on the Xen codebase for both static analysis and 
> MISRA
> compliance.
> There might be the need to suppress some findings instead of fixing them and
> many tools permit the usage of in-code comments that suppress findings so that
> they are not shown in the final report.
> 
> Xen includes a tool capable of translating a specific comment used in its
> codebase to the right proprietary in-code comment understandable by the 
> selected
> analyser that suppress its finding.

Is that tool in the tree already?

> In the Xen codebase, these tags will be used to document and suppress 
> findings:
> 
> - SAF-X-safe: This tag means that the next line of code contains a finding, 
> but
>   the non compliance to the checker is analysed and demonstrated to be safe.
> - SAF-X-false-positive: This tag means that the next line of code contains a
>   finding, but the finding is a bug of the tool.

We did discuss this: False positives are often specific to just one of the
tools used. I think this wants mentioning here, including the implications
(iirc the plan was to keep the tag generic but make the table entry express
which tool it is that is affected).

> SAF stands for Static Analyser Finding, the X is a placeholder for a positive
> number that starts from zero, the number after SAF- shall be incremental and
> unique.

Nit: "positive number" and "starts from zero" don't really fit together.
I guess you also want to spell out the radix to be used as well as whether
leading zeros are expected (and if so, to pad to how many digits total).

> Entries in the database should never be removed, even if they are not used
> anymore in the code (if a patch is removing or modifying the faulty line).
> This is to make sure that numbers are not reused which could lead to conflicts
> with old branches or misleading justifications.

Can we add provisions for shrinking such entries to e.g. just their "id"
line? Or is the intention to be able to re-use such an entry if a matching
instance appears again later?

> The files where to store all the justifications are in xen/docs/misra/ and are
> named as safe.json and false-positive.json, they have JSON format.

And both use independent ID numbering?

> Here is an example to add a new justification::
> 
> |{
> |    "version": "1.0",
> |    "content": [
> |        {
> |            "id":"SAF-0-safe",

Is the SAF- prefix really needed here? And with "safe" and "false positive"
being in separate files, is the respective suffix here needed either? I
think the file should be as clutter free as possible, considering that it'll
grow to quite significant size from all I can tell right now.

> |            "analyser": {
> |                "cppcheck": "misra-c2012-20.7",
> |                "coverity": "misra_c_2012_rule_20_7_violation",
> |                "eclair": "MC3R1.R20.7"
> |            },
> |            "name": “R20.7 C macro parameters not used as expression",

What is the second string here supposed to express? The explanation below
doesn't really clarify that, and the example here doesn't seem to fit the
R20.7 subject. Maybe it would have helped if ...

> |            "text": "The macro parameters used in this […]"

... you hadn't left this abridged.

Furthermore both this and ...

> |        },
> |        {
> |            "id":”SAF-1-safe",
> |            "analyser": {
> |                "cppcheck": "unreadVariable",
> |                "coverity": "UNUSED_VALUE"
> |            },
> |            "name": “Variable set but not used",
> |            "text": “It is safe because […]"

... this reminds me that there might be multiple items on the related
subsequent source line, only one of which is actually affected. In the
SAF-0-safe case for example only some of the "_var" uses inside the macro
are not (directly) expressions, whereas "_name" is unaffected.

Taking this example I also dare to ask: Shouldn't tools be aware that
token concatenation necessarily means no use of parentheses? See also
below.

Also two formatting remarks: In the example above you intermix ", ”, and “.
Are these fine to use at will? And can we aim at being consistent with the
use of padding blanks (you have them after some : but not after others)?

> |        },
> |        {
> |            "id":”SAF-2-safe",
> |            "analyser": {},
> |            "name": "Sentinel",
> |            "text": ""
> |        }
> |    ]
> |}
> 
> To document a finding, just add another block {[...]} before the sentinel 
> block,
> using the id contained in the sentinel block and increment by one the number
> contained in the id of the sentinel block.
> 
> Here a brief explanation of the field inside an object of the "content" array:
> - id: it is a unique string that is used to refer to the finding, many finding
>   can be tagged with the same id, if the justification holds for any applied
>   case.
>   It tells the tool to substitute a Xen in-code comment having this structure:
>   /* SAF-0-safe [...] \*/
> - analyser: it is an object containing pair of key-value strings, the key is
>   the analyser, so it can be cppcheck, coverity or eclair. The value is the
>   proprietary id corresponding on the finding, for example when coverity is
>   used as analyser, the tool will translate the Xen in-code coment in this 
> way:
>   /* SAF-0-safe [...] \*/ -> /* coverity[coverity-id] \*/

In here, where would coverity-id come from? And how does the transformation
here match up with the value of the "coverity": field in the table?

>   if the object doesn't have a key-value, then the corresponding in-code
>   comment won't be translated.

Iirc at least Coverity ignores certain instances of what it might consider
violations (fall-through in switch() statements in particular) in case
_any_ comment is present. Therefore may I suggest that such comments be
deleted (really: replaced by a blank line, to maintain correct line
numbering) if there's no matching key-value pair?

> - name: a simple name for the finding
> - text: a proper justification to turn off the finding.

The distinction between the last two doesn't really become clear. Taking
your “Variable set but not used" example above: Such a "name" will fit
many cases, yet the justification for each might be different. Hence
the question is how unique "name" should be and - if it doesn't need to
be unique - what information it is intended to convey.

> Here an example of the usage of the in-code comment tags:
> 
> /* SAF-0-safe [eventual developer message that shall not exceeds line char 
> max count, don’t break the line!] */
> #define string_param(_name, _var) \
>     __setup_str __setup_str_##_var[] = _name; \
>     __kparam __setup_##_var = \
>         { .name = __setup_str_##_var, \
>           .type = OPT_STR, \
>           .len = sizeof(_var), \
>           .par.var = &_var }
> 
> In the example above, the tool finding for this macro is suppressed. When 
> there are multiple findings for
> the same line, multiple in-code comments needs to be inserted, every one on a 
> different line.

Since this is about parenthesization, would

#define string_param(_name, _var) \
    __setup_str (__setup_str_##_var)[] = _name; \
    __kparam (__setup_##_var) = \
        { .name = (__setup_str_##_var), \
          .type = OPT_STR, \
          .len = sizeof(_var), \
          .par.var = &(_var) }

satisfy the tools? And wouldn't we better not mask detection on this
construct anyway, since the last of the uses of "_var" indeed does
violate the rule (without parentheses added)?

As to the placement of the label: It was repeatedly said that analysis
occurs on pre-processed sources. Is placing a label ahead of a macro
definition therefore going to have any effect at all? Wouldn't the thing
rather need to look like this (assuming a pre-processing mode is used
which retains comments and respects line splits despite the use of line
continuations in the macro definition):

#define string_param(_name, _var) \
    /* SAF-0-safe ... */ \
    __setup_str __setup_str_##_var[] = _name; \
    /* SAF-0-safe ... */ \
    __kparam __setup_##_var = \
        /* SAF-0-safe ... */ \
        { .name = __setup_str_##_var, \
          .type = OPT_STR, \
          .len = sizeof(_var), \
          .par.var = &(_var) }

Finally: Except for its mere mentioning, I didn't spot any word towards
multiple uses (in the sources) of a single label. To avoid long
discussions about whether a new tag is needed vs an existing one to be
re-used, the rules for this need to be as clear and obvious as possible.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.