Xen project Mailing List

Re: Proposal for deviations in static analyser findings

From: Luca Fancellu <Luca.Fancellu@xxxxxxx>

Date: Thu, 13 Oct 2022 10:11:36 +0000

Accept-language: en-GB, en-US

Arc-authentication-results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=lists.xenproject.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com])

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none

Arc-message-signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=4D/J+gHHfOy7VYgI8Ai3xwgX82b4PFxb4xxT+U7GhkA=; b=YxS/wAjt08JydcQykBe8pWmxa663k0D5K+JgT/LAE8BA2wOFIQbNGIvblmuOUrCE8W6qOCxkxSrQQWCBkPtrmioDfTFusJn3jNQmmfuexk1q4VWZd0AfVjk8s7zPFDwZaXS1MP0F23VU/MbnIBMG/YWIkq1x2/hPBmtBGv10C3Fm/o88iH4XvdjBWLvnIV6spnlaYf/D3nmaVb0F0mv0LGBDLLyhzLPUdU5nP83Qlsz7r8nzsUcVd7BaD3a8CGkMtdDiXAcsqvU/MnNtTtniMZpOQVsT21m82JZPwp+ah/CWpJz9sZTe3D52WwM5gOBzgeuaJu/PrEbvG+1fFbOb+g==

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=4D/J+gHHfOy7VYgI8Ai3xwgX82b4PFxb4xxT+U7GhkA=; b=KEmkCHXAp/5aPcHoamuwxnzjjHOLIzgLymxJHwZr3yiB8VKaNPAGBjUqvegURcsabOqE02iD9gxA51Ru0X8lE6jhP11Dc9vxIO0ifpy/xoHhNKX0rjWv8v5ae/ShM5QU4WimELbZr42TzBPUxiHp82RV3IS/u8TCY/BkxQ6A3CEB0J7y2IfFttcnAnrrnLbaizZn/1rHY0iD4WHRDTRhDsUOmUKTlXlcluQkqq3wg+vmXxXZEAVsvXOjm9YI+QK/4t4Lr6GpTUMN9b0m+/EVOWcVujZErMbpcdcy5eJeZWOfGagFignZSycJp8sTOpwMGafg9p0aWt3Xqi+Jf4bH+Q==

Arc-seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=Kzrnl+C+dulUs5ODa3Igh4XHQk2r+Emxa1hr0jqQyLC8zigDKwNJujthudJ8cUb0CldVu8qwRVfLL/091zGnBQdO444Rurp0YFOcLkMtW6IVxNhsRzYSQZMh8tgoBPMAb77R8h6CJHfpTqIyDSvCHY8rYpQaNdbS7nbR0Pm1a1LQZMjMjCoP7jrsJa/L81zeJqAyvf1WtFZCIWcuY4jcclvX32c7yaXEjDJRqOmKPoOiL7Emyf1q2G3rKzjq8h08+pmryC48ozaSxtKck+SiusZbjAlTqCcbourLQ3sqH2j2p0PJZ4hhmbn/Fpz9a1c9l/n5uAZiIbZT4GJpJzeD/g==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=RVT9UMH8fkWiW+Ci8U/50cIhGhlANN/3Qc0YnKuIHRGfQKQ6t0CZHkAe+L9Ufta50L1qDDTDjM28/ogmzn+0XFr44OaMJeXDaUxR6Tma602z0PmkQqDG3Rbd5oVg83GVIaTfcIP+Sng1V/K+Uc/JJlDhY84CMFAwNwystWfr/VPaccXtgrMlq1VSemU9RMapNUkFuxVGlCLe09Wv4/bypS8VCmmjU2dHxEiyx7sYqDWVOkJ0W5EfX9Ey8LyfYP6oBVUQdNrUIgfeHvpY8oEaEMWIa+++emTR4fOfWVDxZC4/w59sqWoht1IQ7V1Tb5UCovarClXBHkg8/jaU4ukSbw==

Authentication-results-original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com;

Cc: Bertrand Marquis <Bertrand.Marquis@xxxxxxx>, Wei Chen <Wei.Chen@xxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Thu, 13 Oct 2022 10:12:04 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Nodisclaimer: true

Original-authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com;

Thread-index: AQHY3lPHTZwvWtdyCU+0K6Hw9nMyM64L9DeAgAAneoA=

Thread-topic: Proposal for deviations in static analyser findings

Hi Jan, > On 13 Oct 2022, at 08:50, Jan Beulich <jbeulich@xxxxxxxx> wrote: > > On 12.10.2022 18:00, Luca Fancellu wrote: >> Documenting violations >> ====================== > > I expect this is mean to become an in-tree document at some point? Yes, this will become part of the documentation at some point. > >> Static analysers are used on the Xen codebase for both static analysis and >> MISRA >> compliance. >> There might be the need to suppress some findings instead of fixing them and >> many tools permit the usage of in-code comments that suppress findings so >> that >> they are not shown in the final report. >> >> Xen includes a tool capable of translating a specific comment used in its >> codebase to the right proprietary in-code comment understandable by the >> selected >> analyser that suppress its finding. > > Is that tool in the tree already? Not yet, it will be part of the tree when I will send the implementation that will be shaped by this discussion. Here it was my fault to write sentence with present verbs instead of future. > >> In the Xen codebase, these tags will be used to document and suppress >> findings: >> >> - SAF-X-safe: This tag means that the next line of code contains a finding, >> but >> the non compliance to the checker is analysed and demonstrated to be safe. >> - SAF-X-false-positive: This tag means that the next line of code contains a >> finding, but the finding is a bug of the tool. > > We did discuss this: False positives are often specific to just one of the > tools used. I think this wants mentioning here, including the implications > (iirc the plan was to keep the tag generic but make the table entry express > which tool it is that is affected). Yes, in the database format below, a false positive entry will have its key-value item in the “analyser” dictionary. Moreover, a false positive entry could be written for example as the line below, to suppress a cppcheck false positive for MISRA rule 20.7: /* SAF-0-false-positive cppcheck false-positive for rule 20.7 */ Clearly this comment wants the proper entry in false-positive.json with the correct internal ID for the rule 20.7 given by cppcheck, that is “misra-c2012-20.7”, and a proper justification that explains why it’s a bug of the tool and not a non-compliance of the code. > >> SAF stands for Static Analyser Finding, the X is a placeholder for a positive >> number that starts from zero, the number after SAF- shall be incremental and >> unique. > > Nit: "positive number" and "starts from zero" don't really fit together. > I guess you also want to spell out the radix to be used as well as whether > leading zeros are expected (and if so, to pad to how many digits total). Since we did put a sentinel starting from zero, I guess we can just say “positive number”. A good point is the leading zeros, I don’t expect leading zeros and I will write it down clearly. > >> Entries in the database should never be removed, even if they are not used >> anymore in the code (if a patch is removing or modifying the faulty line). >> This is to make sure that numbers are not reused which could lead to >> conflicts >> with old branches or misleading justifications. > > Can we add provisions for shrinking such entries to e.g. just their "id" > line? Or is the intention to be able to re-use such an entry if a matching > instance appears again later? I prefer to don’t shrink it, the name itself is not very long, even using many digits of the incremental number, it removes also the dependency on the file name. > >> The files where to store all the justifications are in xen/docs/misra/ and >> are >> named as safe.json and false-positive.json, they have JSON format. > > And both use independent ID numbering? Yes, I will add this. > >> Here is an example to add a new justification:: >> >> |{ >> | "version": "1.0", >> | "content": [ >> | { >> | "id":"SAF-0-safe", > > Is the SAF- prefix really needed here? And with "safe" and "false positive" > being in separate files, is the respective suffix here needed either? I > think the file should be as clutter free as possible, considering that it'll > grow to quite significant size from all I can tell right now. > >> | "analyser": { >> | "cppcheck": "misra-c2012-20.7", >> | "coverity": "misra_c_2012_rule_20_7_violation", >> | "eclair": "MC3R1.R20.7" >> | }, >> | "name": “R20.7 C macro parameters not used as expression", > > What is the second string here supposed to express? The explanation below > doesn't really clarify that, and the example here doesn't seem to fit the > R20.7 subject. Maybe it would have helped if ... > >> | "text": "The macro parameters used in this […]" > > ... you hadn't left this abridged. The justification text could be long, so the name is supposed to be just the subject of the justification, a very short introduction that could be recalled, the text instead is the analysis that leads to the decision to suppress the finding. Those informations will be taken, in the future, to produce a report of every deviation in the code. > > Furthermore both this and ... > >> | }, >> | { >> | "id":”SAF-1-safe", >> | "analyser": { >> | "cppcheck": "unreadVariable", >> | "coverity": "UNUSED_VALUE" >> | }, >> | "name": “Variable set but not used", >> | "text": “It is safe because […]" > > ... this reminds me that there might be multiple items on the related > subsequent source line, only one of which is actually affected. In the > SAF-0-safe case for example only some of the "_var" uses inside the macro > are not (directly) expressions, whereas "_name" is unaffected. > > Taking this example I also dare to ask: Shouldn't tools be aware that > token concatenation necessarily means no use of parentheses? See also > below. Yes the tool should be aware, in the example below, the tool is complaining just for the lines 75 and 80, in that particular example I would have fixed the finding instead of using a justification, but I’ve reported that example just to show how the finding can be suppressed. Here the link to eclair: https://eclairit.com:3787/fs/var/lib/jenkins/jobs/XEN/configurations/axis-Target/ARM64/axis-agent/public/builds/541/archive/ECLAIR/out/PROJECT.ecd;/sources/xen/include/xen/param.h.html#L75_violation The coding standard wants just to have this: #define string_param(_name, _var) \ __setup_str __setup_str_##_var[] = (_name); \ __kparam __setup_##_var = \ { .name = __setup_str_##_var, \ .type = OPT_STR, \ .len = sizeof(_var), \ .par.var = &(_var) } > > Also two formatting remarks: In the example above you intermix ", ”, and “. > Are these fine to use at will? And can we aim at being consistent with the > use of padding blanks (you have them after some : but not after others)? For consistency it is expected to use the same format, here was my mistake, I tend to agree to leave a space after the the colon character. > >> | }, >> | { >> | "id":”SAF-2-safe", >> | "analyser": {}, >> | "name": "Sentinel", >> | "text": "" >> | } >> | ] >> |} >> >> To document a finding, just add another block {[...]} before the sentinel >> block, >> using the id contained in the sentinel block and increment by one the number >> contained in the id of the sentinel block. >> >> Here a brief explanation of the field inside an object of the "content" >> array: >> - id: it is a unique string that is used to refer to the finding, many >> finding >> can be tagged with the same id, if the justification holds for any applied >> case. >> It tells the tool to substitute a Xen in-code comment having this structure: >> /* SAF-0-safe [...] \*/ >> - analyser: it is an object containing pair of key-value strings, the key is >> the analyser, so it can be cppcheck, coverity or eclair. The value is the >> proprietary id corresponding on the finding, for example when coverity is >> used as analyser, the tool will translate the Xen in-code coment in this >> way: >> /* SAF-0-safe [...] \*/ -> /* coverity[coverity-id] \*/ > > In here, where would coverity-id come from? And how does the transformation > here match up with the value of the "coverity": field in the table? I can put an example of that, as you pointed out it could be difficult to get where this proprietary tool ID comes from. The proprietary ID (Coverity in this case) comes from the report it produces: […] <file path>:<line number>: 1. proprietary_ID: […] […] after we see the finding, we take that ID, we put it in the “analyser” dictionary as: […] "id":”SAF-2-safe", “analyser”: { “coverity”: “proprietary_ID" }, […] So in the source code we will have: /* SAF-2-safe [optional text] */ C code affected line; And when the analysis will be performed, the tool (coverity for example) will run on this source code: /* coverity[proprietary_ID] */ C code affected line; The tool will write a report and will suppress the finding with “proprietary_ID” that comes in the C code line after the comment. After the analysis, the source code will return as the original (with the SAF-* tag). > >> if the object doesn't have a key-value, then the corresponding in-code >> comment won't be translated. > > Iirc at least Coverity ignores certain instances of what it might consider > violations (fall-through in switch() statements in particular) in case > _any_ comment is present. Therefore may I suggest that such comments be > deleted (really: replaced by a blank line, to maintain correct line > numbering) if there's no matching key-value pair? Yes the line won’t be altered if there is no match. This to ensure the correct line numbering is not affected. > >> - name: a simple name for the finding >> - text: a proper justification to turn off the finding. > > The distinction between the last two doesn't really become clear. Taking > your “Variable set but not used" example above: Such a "name" will fit > many cases, yet the justification for each might be different. Hence > the question is how unique "name" should be and - if it doesn't need to > be unique - what information it is intended to convey. Name is not enforced to be unique, it’s convenient to get a subject for the particular justification. If the name of two justification is the same, but the justification is different, then it won’t require much effort to write a different name to quickly recall and differentiate the one from the other. However if no one finds the “name” field necessary, we can remove it. It was introduced having In mind that at some point a document will be created with all the justifications together. If others are against it just reply to that. > >> Here an example of the usage of the in-code comment tags: >> >> /* SAF-0-safe [eventual developer message that shall not exceeds line char >> max count, don’t break the line!] */ >> #define string_param(_name, _var) \ >> __setup_str __setup_str_##_var[] = _name; \ >> __kparam __setup_##_var = \ >> { .name = __setup_str_##_var, \ >> .type = OPT_STR, \ >> .len = sizeof(_var), \ >> .par.var = &_var } >> >> In the example above, the tool finding for this macro is suppressed. When >> there are multiple findings for >> the same line, multiple in-code comments needs to be inserted, every one on >> a different line. > > Since this is about parenthesization, would > > #define string_param(_name, _var) \ > __setup_str (__setup_str_##_var)[] = _name; \ > __kparam (__setup_##_var) = \ > { .name = (__setup_str_##_var), \ > .type = OPT_STR, \ > .len = sizeof(_var), \ > .par.var = &(_var) } > > satisfy the tools? And wouldn't we better not mask detection on this > construct anyway, since the last of the uses of "_var" indeed does > violate the rule (without parentheses added)? Yes this was just an example of how to suppress a finding, in this particular case, I would have fixed the error instead of suppressing it. The changes to fix the finding is above. > > As to the placement of the label: It was repeatedly said that analysis > occurs on pre-processed sources. Is placing a label ahead of a macro > definition therefore going to have any effect at all? Wouldn't the thing > rather need to look like this (assuming a pre-processing mode is used > which retains comments and respects line splits despite the use of line > continuations in the macro definition): > > #define string_param(_name, _var) \ > /* SAF-0-safe ... */ \ > __setup_str __setup_str_##_var[] = _name; \ > /* SAF-0-safe ... */ \ > __kparam __setup_##_var = \ > /* SAF-0-safe ... */ \ > { .name = __setup_str_##_var, \ > .type = OPT_STR, \ > .len = sizeof(_var), \ > .par.var = &(_var) } From the experience on cppcheck and coverity, it is enough to place the In-code comment above the first line of the macro to suppress the finding. > > Finally: Except for its mere mentioning, I didn't spot any word towards > multiple uses (in the sources) of a single label. To avoid long > discussions about whether a new tag is needed vs an existing one to be > re-used, the rules for this need to be as clear and obvious as possible. I will write it down properly that if a justification holds for multiple finding of the same non-compliance to the coding standard, the tag can and must be reused. > > Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.