[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Proposal for deviations in static analyser findings


  • To: Luca Fancellu <Luca.Fancellu@xxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Thu, 13 Oct 2022 13:34:54 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=5YaMyZ/Sm76bOlCWIFmpzJUDd91GPdZ/WxGp7jwyDls=; b=MH2gXdSp3451argX+wgYp1RITa6X2+nbVvaAcOdN5ZLq55BuMqo7cuioZHPwagLwPP7xMiq1xAxcaJ9qrRNy/K8lF3UTvgsMzcMvEJfy6x3jJiRDlkbpNx1TfelSjSlCtqrE3uWAKpKwyZdNYwe9bv9OSvKCgVImH4wNM9q7zUKHh/ltvaKJTb0W1XmsSjqwvakRjUGAcqFO/Wvqga2BPEcxLht/BzoKv5RG0+A7jqs1O2ZsQ8fgfPKUk5N8wK5+SnOd/ABiZHIeHmVk6sDUgAPTNGNlK+B+KxCPN959FJjF+6/99rZRQVA/2E+qvYd3O0uuVwgG/qRp2V5JC/p6hg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=EN1NC+YQTGmjcnD9/NVcC2FlVvpIoxQViYkM6ib+8wKgRkj7i7trYH4NnnVBOl/czWJZ8+re+XkM20AocMmbKARgDDJBS+39UuahDrG7NsWudQcyYRA6E/FL993OmOnNt0+v7mjilwHVGyWDDTVuxJEK0/bkX36bXDoeKsB5BmZKeI3TXUkiEFF72FPyGAd/JT8mZxHY9RQnFJ8RnQq0WE6IJGtPI1q9oC4fBc3+Bb04wbV1RCuS2+arVQiZli1V+oZamPSzUFg0SfmyUpPCjN+8YeGGRFW9u/3RI8kS1jhfnfOPt/i7tOyzA9KjtNKpmGrpaJNCjoshXQwBjB6hxA==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: Bertrand Marquis <Bertrand.Marquis@xxxxxxx>, Wei Chen <Wei.Chen@xxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Thu, 13 Oct 2022 11:35:19 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 13.10.2022 12:11, Luca Fancellu wrote:
>> On 13 Oct 2022, at 08:50, Jan Beulich <jbeulich@xxxxxxxx> wrote:
>> On 12.10.2022 18:00, Luca Fancellu wrote:
>>> In the Xen codebase, these tags will be used to document and suppress 
>>> findings:
>>>
>>> - SAF-X-safe: This tag means that the next line of code contains a finding, 
>>> but
>>>  the non compliance to the checker is analysed and demonstrated to be safe.
>>> - SAF-X-false-positive: This tag means that the next line of code contains a
>>>  finding, but the finding is a bug of the tool.
>>
>> We did discuss this: False positives are often specific to just one of the
>> tools used. I think this wants mentioning here, including the implications
>> (iirc the plan was to keep the tag generic but make the table entry express
>> which tool it is that is affected).
> 
> Yes, in the database format below, a false positive entry will have its 
> key-value item
> in the “analyser” dictionary. Moreover, a false positive entry could be 
> written for
> example as the line below, to suppress a cppcheck false positive for MISRA 
> rule 20.7:
> 
> /* SAF-0-false-positive cppcheck false-positive for rule 20.7 */
> 
> Clearly this comment wants the proper entry in false-positive.json with the 
> correct internal ID
> for the rule 20.7 given by cppcheck, that is “misra-c2012-20.7”, and a proper 
> justification that
> explains why it’s a bug of the tool and not a non-compliance of the code.

All of your response doesn't really seem to fit my request of making more
explicit that in the common case false positives are expected to be limited
to just one tool. (In fact I was wondering whether, other than for the
"safe" table, there wouldn't better be per-tool false-positives tables. Not
the least because false positives are also liable to be version dependent,
which currently you have no way to express.)

>>> Entries in the database should never be removed, even if they are not used
>>> anymore in the code (if a patch is removing or modifying the faulty line).
>>> This is to make sure that numbers are not reused which could lead to 
>>> conflicts
>>> with old branches or misleading justifications.
>>
>> Can we add provisions for shrinking such entries to e.g. just their "id"
>> line? Or is the intention to be able to re-use such an entry if a matching
>> instance appears again later?
> 
> I prefer to don’t shrink it, the name itself is not very long, even using 
> many digits of the incremental
> number, it removes also the dependency on the file name.

Name length isn't relevant here, and I have no idea what dependency on a
file name you're thinking of. My question is a scalability one: Over time
the table will grow large. If all entries remain there in full forever,
table size may become unwieldy.

>> Taking this example I also dare to ask: Shouldn't tools be aware that
>> token concatenation necessarily means no use of parentheses? See also
>> below.
> 
> Yes the tool should be aware, in the example below, the tool is complaining 
> just
> for the lines 75 and 80, in that particular example I would have fixed the 
> finding
> instead of using a justification, but I’ve reported that example just to show 
> how
> the finding can be suppressed.
> 
> Here the link to eclair: 
> https://eclairit.com:3787/fs/var/lib/jenkins/jobs/XEN/configurations/axis-Target/ARM64/axis-agent/public/builds/541/archive/ECLAIR/out/PROJECT.ecd;/sources/xen/include/xen/param.h.html#L75_violation
> 
> The coding standard wants just to have this:
> 
> #define string_param(_name, _var) \
>  __setup_str __setup_str_##_var[] = (_name); \
> __kparam __setup_##_var = \
>      { .name = __setup_str_##_var, \
>        .type = OPT_STR, \
>        .len = sizeof(_var), \
>        .par.var = &(_var) }

May I suggest that you pick a real example then rather than one where we
actually want to fix the code? People may derive more than just the
intended information from any examples given here.

>>> Here a brief explanation of the field inside an object of the "content" 
>>> array:
>>> - id: it is a unique string that is used to refer to the finding, many 
>>> finding
>>>  can be tagged with the same id, if the justification holds for any applied
>>>  case.
>>>  It tells the tool to substitute a Xen in-code comment having this 
>>> structure:
>>>  /* SAF-0-safe [...] \*/
>>> - analyser: it is an object containing pair of key-value strings, the key is
>>>  the analyser, so it can be cppcheck, coverity or eclair. The value is the
>>>  proprietary id corresponding on the finding, for example when coverity is
>>>  used as analyser, the tool will translate the Xen in-code coment in this 
>>> way:
>>>  /* SAF-0-safe [...] \*/ -> /* coverity[coverity-id] \*/
>>
>> In here, where would coverity-id come from? And how does the transformation
>> here match up with the value of the "coverity": field in the table?
> 
> I can put an example of that, as you pointed out it could be difficult to get 
> where
> this proprietary tool ID comes from.
> 
> The proprietary ID (Coverity in this case) comes from the report it produces:
> 
> […]
> <file path>:<line number>:
>   1. proprietary_ID: […]
> […]
> 
> after we see the finding, we take that ID, we put it in the “analyser” 
> dictionary as:
> 
> […]
> "id":”SAF-2-safe",
> “analyser”: {
>       “coverity”: “proprietary_ID"
> },
> […]
> 
> So in the source code we will have:
> 
> /* SAF-2-safe [optional text] */
> C code affected line;
> 
> And when the analysis will be performed, the tool (coverity for example) will 
> run on this source code:
> 
> /* coverity[proprietary_ID] */
> C code affected line;
> 
> The tool will write a report and will suppress the finding with 
> “proprietary_ID” that comes in the C code
> line after the comment.

Let me add some background to my earlier comment:

If we wanted to add such IDs to the table, then I guess this would result in
a proliferation of entries. If my observations haven't misguided me,
Coverity might re-use the same ID for multiple similar new issues found in a
single run, but it would not re-use them across runs. Hence irrespective of
their similarity, multiple table entries would be needed just because of the
different Coverity IDs.

> After the analysis, the source code will return as the original (with the 
> SAF-* tag).

While you mention something similar also as step 3 in the original document
near the top, I'm afraid I don't understand what this "return as the original"
means. If you want to run the tool on an altered (comments modified) source
tree, what I'd expect you to do is clone the sources into a throw-away tree,
massage the comments, run the tool, and delete the massaged tree.

>>>  if the object doesn't have a key-value, then the corresponding in-code
>>>  comment won't be translated.
>>
>> Iirc at least Coverity ignores certain instances of what it might consider
>> violations (fall-through in switch() statements in particular) in case
>> _any_ comment is present. Therefore may I suggest that such comments be
>> deleted (really: replaced by a blank line, to maintain correct line
>> numbering) if there's no matching key-value pair?
> 
> Yes the line won’t be altered if there is no match. This to ensure the 
> correct line
> numbering is not affected.

"won't be altered" is the opposite of what I've been asking to consider:
Observing that comments _regardless_ of their contents may silence findings,
the suggestion is to remove comments (leaving a blank line) when there's no
entry for the targeted tool in the table entry.

>>> - name: a simple name for the finding
>>> - text: a proper justification to turn off the finding.
>>
>> The distinction between the last two doesn't really become clear. Taking
>> your “Variable set but not used" example above: Such a "name" will fit
>> many cases, yet the justification for each might be different. Hence
>> the question is how unique "name" should be and - if it doesn't need to
>> be unique - what information it is intended to convey.
> 
> Name is not enforced to be unique, it’s convenient to get a subject for the 
> particular justification.
> If the name of two justification is the same, but the justification is 
> different, then it won’t require
> much effort to write a different name to quickly recall and differentiate the 
> one from the other.
> 
> However if no one finds the “name” field necessary, we can remove it. It was 
> introduced having
> In mind that at some point a document will be created with all the 
> justifications together.
> 
> If others are against it just reply to that.

I can't say whether I'm pro or con as long as it's not really clear what
information is to be conveyed by both. If "name" is somewhat like the
subject of an email and identical names are deemed fine, then so be it.
Question though is whether having perhaps dozens (or hundreds) of
identically named entries is very useful.

>>> Here an example of the usage of the in-code comment tags:
>>>
>>> /* SAF-0-safe [eventual developer message that shall not exceeds line char 
>>> max count, don’t break the line!] */
>>> #define string_param(_name, _var) \
>>>    __setup_str __setup_str_##_var[] = _name; \
>>>    __kparam __setup_##_var = \
>>>        { .name = __setup_str_##_var, \
>>>          .type = OPT_STR, \
>>>          .len = sizeof(_var), \
>>>          .par.var = &_var }
>>>
>>> In the example above, the tool finding for this macro is suppressed. When 
>>> there are multiple findings for
>>> the same line, multiple in-code comments needs to be inserted, every one on 
>>> a different line.
>>
>> Since this is about parenthesization, would
>>
>> #define string_param(_name, _var) \
>>    __setup_str (__setup_str_##_var)[] = _name; \
>>    __kparam (__setup_##_var) = \
>>        { .name = (__setup_str_##_var), \
>>          .type = OPT_STR, \
>>          .len = sizeof(_var), \
>>          .par.var = &(_var) }
>>
>> satisfy the tools? And wouldn't we better not mask detection on this
>> construct anyway, since the last of the uses of "_var" indeed does
>> violate the rule (without parentheses added)?
> 
> Yes this was just an example of how to suppress a finding, in this particular
> case, I would have fixed the error instead of suppressing it.
> The changes to fix the finding is above.
> 
>>
>> As to the placement of the label: It was repeatedly said that analysis
>> occurs on pre-processed sources. Is placing a label ahead of a macro
>> definition therefore going to have any effect at all? Wouldn't the thing
>> rather need to look like this (assuming a pre-processing mode is used
>> which retains comments and respects line splits despite the use of line
>> continuations in the macro definition):
>>
>> #define string_param(_name, _var) \
>>    /* SAF-0-safe ... */ \
>>    __setup_str __setup_str_##_var[] = _name; \
>>    /* SAF-0-safe ... */ \
>>    __kparam __setup_##_var = \
>>        /* SAF-0-safe ... */ \
>>        { .name = __setup_str_##_var, \
>>          .type = OPT_STR, \
>>          .len = sizeof(_var), \
>>          .par.var = &(_var) }
> 
> From the experience on cppcheck and coverity, it is enough to place the
> In-code comment above the first line of the macro to suppress the finding.

Interesting. How is the comment then propagated to all expansions of the
macro (in the course of pre-processing)?

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.