[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 2/2] xen/misra: diff-report.py: add report patching feature



On Thu, 4 May 2023, Luca Fancellu wrote:
> Add a feature to the diff-report.py script that improves the comparison
> between two analysis report, one from a baseline codebase and the other
> from the changes applied to the baseline.
> 
> The comparison between reports of different codebase is an issue because
> entries in the baseline could have been moved in position due to addition
> or deletion of unrelated lines or can disappear because of deletion of
> the interested line, making the comparison between two revisions of the
> code harder.
> 
> Having a baseline report, a report of the codebase with the changes
> called "new report" and a git diff format file that describes the
> changes happened to the code from the baseline, this feature can
> understand which entries from the baseline report are deleted or shifted
> in position due to changes to unrelated lines and can modify them as
> they will appear in the "new report".
> 
> Having the "patched baseline" and the "new report", now it's simple
> to make the diff between them and print only the entry that are new.
> 
> Signed-off-by: Luca Fancellu <luca.fancellu@xxxxxxx>

This is an amazing work!! Thanks Luca!

I am having issues trying the new patch feature. After applying this
patch I get:

sstabellini@ubuntu-linux-20-04-desktop:/local/repos/xen-upstream/xen$ 
./scripts/diff-report.py
Traceback (most recent call last):
  File "./scripts/diff-report.py", line 5, in <module>
    from xen_analysis.diff_tool.debug import Debug
  File "/local/repos/xen-upstream/xen/scripts/xen_analysis/diff_tool/debug.py", 
line 4, in <module>
    from .report import Report
  File 
"/local/repos/xen-upstream/xen/scripts/xen_analysis/diff_tool/report.py", line 
4, in <module>
    from .unified_format_parser import UnifiedFormatParser, ChangeSet
  File 
"/local/repos/xen-upstream/xen/scripts/xen_analysis/diff_tool/unified_format_parser.py",
 line 56, in <module>
    class UnifiedFormatParser:
  File 
"/local/repos/xen-upstream/xen/scripts/xen_analysis/diff_tool/unified_format_parser.py",
 line 57, in UnifiedFormatParser
    def __init__(self, args: str | list) -> None:
TypeError: unsupported operand type(s) for |: 'type' and 'type'

Also got a similar error elsewhere:

sstabellini@ubuntu-linux-20-04-desktop:/local/repos/xen-upstream/xen$ 
./scripts/diff-report.py --patch ~/p/1 -b /tmp/1 -r /tmp/1
Traceback (most recent call last):
  File "./scripts/diff-report.py", line 127, in <module>
    main(sys.argv[1:])
  File "./scripts/diff-report.py", line 102, in main
    diffs = UnifiedFormatParser(diff_source)
  File 
"/local/repos/xen-upstream/xen/scripts/xen_analysis/diff_tool/unified_format_parser.py",
 line 79, in __init__
    self.__parse()
  File 
"/local/repos/xen-upstream/xen/scripts/xen_analysis/diff_tool/unified_format_parser.py",
 line 94, in __parse
    def parse_diff_header(line: str) -> ChangeSet | None:
TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'

My Python is 2.7.18


Am I understanding correctly that one should run the scan for the
baseline (saving the result somewhere), then apply the patch, run the
scan again. Finally, one should call diff-report.py passing -b
baseline-report -r new-report --patch the-patch-applied?



> ---
>  xen/scripts/diff-report.py                    |  55 ++++-
>  xen/scripts/xen_analysis/diff_tool/debug.py   |  19 ++
>  xen/scripts/xen_analysis/diff_tool/report.py  |  84 ++++++++
>  .../diff_tool/unified_format_parser.py        | 202 ++++++++++++++++++
>  4 files changed, 358 insertions(+), 2 deletions(-)
>  create mode 100644 
> xen/scripts/xen_analysis/diff_tool/unified_format_parser.py
> 
> diff --git a/xen/scripts/diff-report.py b/xen/scripts/diff-report.py
> index 4913fb43a8f9..17f707f5d34e 100755
> --- a/xen/scripts/diff-report.py
> +++ b/xen/scripts/diff-report.py
> @@ -5,6 +5,10 @@ from argparse import ArgumentParser
>  from xen_analysis.diff_tool.debug import Debug
>  from xen_analysis.diff_tool.report import ReportError
>  from xen_analysis.diff_tool.cppcheck_report import CppcheckReport
> +from xen_analysis.diff_tool.unified_format_parser import \
> +    (UnifiedFormatParser, UnifiedFormatParseError)
> +from xen_analysis.utils import invoke_command
> +from xen_analysis.settings import repo_dir
>  
>  
>  def log_info(text, end='\n'):
> @@ -32,9 +36,32 @@ def main(argv):
>                               "against the baseline.")
>      parser.add_argument("-v", "--verbose", action='store_true',
>                          help="Print more informations during the run.")
> +    parser.add_argument("--patch", type=str,
> +                        help="The patch file containing the changes to the "
> +                             "code, from the baseline analysis result to the 
> "
> +                             "'check report' analysis result.\n"
> +                             "Do not use with --baseline-rev/--report-rev")
> +    parser.add_argument("--baseline-rev", type=str,
> +                        help="Revision or SHA of the codebase analysed to "
> +                             "create the baseline report.\n"
> +                             "Use together with --report-rev")
> +    parser.add_argument("--report-rev", type=str,
> +                        help="Revision or SHA of the codebase analysed to "
> +                             "create the 'check report'.\n"
> +                             "Use together with --baseline-rev")
>  
>      args = parser.parse_args()
>  
> +    if args.patch and (args.baseline_rev or args.report_rev):
> +        print("ERROR: '--patch' argument can't be used with '--baseline-rev'"
> +              " or '--report-rev'.")
> +        sys.exit(1)
> +
> +    if bool(args.baseline_rev) != bool(args.report_rev):
> +        print("ERROR: '--baseline-rev' must be used together with "
> +              "'--report-rev'.")
> +        sys.exit(1)
> +
>      if args.out == "stdout":
>          file_out = sys.stdout
>      else:
> @@ -59,11 +86,35 @@ def main(argv):
>          new_rep.parse()
>          debug.debug_print_parsed_report(new_rep)
>          log_info(" [OK]")
> -    except ReportError as e:
> +        diff_source = None
> +        if args.patch:
> +            diff_source = os.path.realpath(args.patch)
> +        elif args.baseline_rev:
> +            git_diff = invoke_command(
> +                "git diff --git-dir={} -C -C {}..{}".format(repo_dir,
> +                                                            
> args.baseline_rev,
> +                                                            args.report_rev),
> +                True, "Error occured invoking:\n{}\n\n{}"
> +            )
> +            diff_source = git_diff.splitlines(keepends=True)
> +        if diff_source:
> +            log_info("Parsing changes...", "")
> +            diffs = UnifiedFormatParser(diff_source)
> +            debug.debug_print_parsed_diff(diffs)
> +            log_info(" [OK]")
> +    except (ReportError, UnifiedFormatParseError) as e:
>          print("ERROR: {}".format(e))
>          sys.exit(1)
>  
> -    output = new_rep - baseline
> +    if args.patch or args.baseline_rev:
> +        log_info("Patching baseline...", "")
> +        baseline_patched = baseline.patch(diffs)
> +        debug.debug_print_patched_report(baseline_patched)
> +        log_info(" [OK]")
> +        output = new_rep - baseline_patched
> +    else:
> +        output = new_rep - baseline
> +
>      print(output, end="", file=file_out)
>  
>      if len(output) > 0:
> diff --git a/xen/scripts/xen_analysis/diff_tool/debug.py 
> b/xen/scripts/xen_analysis/diff_tool/debug.py
> index d46df3300d21..c314edbc8e38 100644
> --- a/xen/scripts/xen_analysis/diff_tool/debug.py
> +++ b/xen/scripts/xen_analysis/diff_tool/debug.py
> @@ -2,6 +2,7 @@
>  
>  import os
>  from .report import Report
> +from .unified_format_parser import UnifiedFormatParser
>  
>  
>  class Debug:
> @@ -34,3 +35,21 @@ class Debug:
>          if not self.args.debug:
>              return
>          self.__debug_print_report(report, ".parsed")
> +
> +    def debug_print_patched_report(self, report: Report) -> None:
> +        if not self.args.debug:
> +            return
> +        # The patched report contains already .patched in its name
> +        self.__debug_print_report(report, "")
> +
> +    def debug_print_parsed_diff(self, diff: UnifiedFormatParser) -> None:
> +        if not self.args.debug:
> +            return
> +        diff_filename = diff.get_diff_path()
> +        out_pathname = self.__get_debug_out_filename(diff_filename, 
> ".parsed")
> +        try:
> +            with open(out_pathname, "wt") as outfile:
> +                for change_obj in diff.get_change_sets().values():
> +                    print(change_obj, end="", file=outfile)
> +        except OSError as e:
> +            print("ERROR: Issue opening file {}: {}".format(out_pathname, e))
> diff --git a/xen/scripts/xen_analysis/diff_tool/report.py 
> b/xen/scripts/xen_analysis/diff_tool/report.py
> index d958d1816eb4..312d59682329 100644
> --- a/xen/scripts/xen_analysis/diff_tool/report.py
> +++ b/xen/scripts/xen_analysis/diff_tool/report.py
> @@ -1,6 +1,7 @@
>  #!/usr/bin/env python3
>  
>  import os
> +from .unified_format_parser import UnifiedFormatParser, ChangeSet
>  
>  
>  class ReportError(Exception):
> @@ -65,6 +66,89 @@ class Report:
>              self.__entries[entry_path] = [entry]
>          self.__last_line_order += 1
>  
> +    def remove_entries(self, entry_file_path: str) -> None:
> +        del self.__entries[entry_file_path]
> +
> +    def remove_entry(self, entry_path: str, line_id: int) -> None:
> +        if entry_path in self.__entries.keys():
> +            len_entry_path = len(self.__entries[entry_path])
> +            if len_entry_path == 1:
> +                del self.__entries[entry_path]
> +            else:
> +                if line_id in self.__entries[entry_path]:
> +                    self.__entries[entry_path].remove(line_id)
> +
> +    def patch(self, diff_obj: UnifiedFormatParser) -> 'Report':
> +        filename, file_extension = os.path.splitext(self.__path)
> +        patched_report = self.__class__(filename + ".patched" + 
> file_extension)
> +        remove_files = []
> +        rename_files = []
> +        remove_entry = []
> +        ChangeMode = ChangeSet.ChangeMode
> +
> +        # Copy entries from this report to the report we are going to patch
> +        for entries in self.__entries.values():
> +            for entry in entries:
> +                patched_report.add_entry(entry.file_path, entry.line_number,
> +                                         entry.text)
> +
> +        # Patch the output report
> +        patched_rep_entries = patched_report.get_report_entries()
> +        for file_diff, change_obj in diff_obj.get_change_sets().items():
> +            if change_obj.is_change_mode(ChangeMode.COPY):
> +                # Copy the original entry pointed by change_obj.orig_file 
> into
> +                # a new key in the patched report named change_obj.dst_file,
> +                # that here is file_diff variable content, because this
> +                # change_obj is pushed into the change_sets with the
> +                # change_obj.dst_file key
> +                if change_obj.orig_file in self.__entries.keys():
> +                    for entry in self.__entries[change_obj.orig_file]:
> +                        patched_report.add_entry(file_diff,
> +                                                 entry.line_number,
> +                                                 entry.text)
> +
> +            if file_diff in patched_rep_entries.keys():
> +                if change_obj.is_change_mode(ChangeMode.DELETE):
> +                    # No need to check changes here, just remember to delete
> +                    # the file from the report
> +                    remove_files.append(file_diff)
> +                    continue
> +                elif change_obj.is_change_mode(ChangeMode.RENAME):
> +                    # Remember to rename the file entry on this report
> +                    rename_files.append(change_obj)
> +
> +                for line_num, change_type in change_obj.get_change_set():
> +                    len_rep = len(patched_rep_entries[file_diff])
> +                    for i in range(len_rep):
> +                        rep_item = patched_rep_entries[file_diff][i]
> +                        if change_type == ChangeSet.ChangeType.REMOVE:
> +                            if rep_item.line_number == line_num:
> +                                # This line is removed with this changes,
> +                                # append to the list of entries to be removed
> +                                remove_entry.append(rep_item)
> +                            elif rep_item.line_number > line_num:
> +                                rep_item.line_number -= 1
> +                        elif change_type == ChangeSet.ChangeType.ADD:
> +                            if rep_item.line_number >= line_num:
> +                                rep_item.line_number += 1
> +                    # Remove deleted entries from the list
> +                    if len(remove_entry) > 0:
> +                        for entry in remove_entry:
> +                            patched_report.remove_entry(entry.file_path,
> +                                                        entry.line_id)
> +                        remove_entry.clear()
> +
> +        if len(remove_files) > 0:
> +            for file_name in remove_files:
> +                patched_report.remove_entries(file_name)
> +
> +        if len(rename_files) > 0:
> +            for change_obj in rename_files:
> +                patched_rep_entries[change_obj.dst_file] = \
> +                    patched_rep_entries.pop(change_obj.orig_file)
> +
> +        return patched_report
> +
>      def to_list(self) -> list:
>          report_list = []
>          for _, entries in self.__entries.items():
> diff --git a/xen/scripts/xen_analysis/diff_tool/unified_format_parser.py 
> b/xen/scripts/xen_analysis/diff_tool/unified_format_parser.py
> new file mode 100644
> index 000000000000..e34cc8ac063f
> --- /dev/null
> +++ b/xen/scripts/xen_analysis/diff_tool/unified_format_parser.py
> @@ -0,0 +1,202 @@
> +#!/usr/bin/env python3
> +
> +import re
> +from enum import Enum
> +from typing import Tuple
> +
> +
> +class UnifiedFormatParseError(Exception):
> +    pass
> +
> +
> +class ParserState(Enum):
> +    FIND_DIFF_HEADER = 0
> +    REGISTER_CHANGES = 1
> +    FIND_HUNK_OR_DIFF_HEADER = 2
> +
> +
> +class ChangeSet:
> +    class ChangeType(Enum):
> +        REMOVE = 0
> +        ADD = 1
> +
> +    class ChangeMode(Enum):
> +        NONE = 0
> +        CHANGE = 1
> +        RENAME = 2
> +        DELETE = 3
> +        COPY = 4
> +
> +    def __init__(self, a_file: str, b_file: str) -> None:
> +        self.orig_file = a_file
> +        self.dst_file = b_file
> +        self.change_mode = ChangeSet.ChangeMode.NONE
> +        self.__changes = []
> +
> +    def __str__(self) -> str:
> +        str_out = "{}: {} -> {}:\n{}\n".format(
> +            str(self.change_mode), self.orig_file, self.dst_file,
> +            str(self.__changes)
> +        )
> +        return str_out
> +
> +    def set_change_mode(self, change_mode: ChangeMode) -> None:
> +        self.change_mode = change_mode
> +
> +    def is_change_mode(self, change_mode: ChangeMode) -> bool:
> +        return self.change_mode == change_mode
> +
> +    def add_change(self, line_number: int, change_type: ChangeType) -> None:
> +        self.__changes.append((line_number, change_type))
> +
> +    def get_change_set(self) -> dict:
> +        return self.__changes
> +
> +
> +class UnifiedFormatParser:
> +    def __init__(self, args: str | list) -> None:
> +        if isinstance(args, str):
> +            self.__diff_file = args
> +            try:
> +                with open(self.__diff_file, "rt") as infile:
> +                    self.__diff_lines = infile.readlines()
> +            except OSError as e:
> +                raise UnifiedFormatParseError(
> +                    "Issue with reading file {}: {}"
> +                    .format(self.__diff_file, e)
> +                )
> +        elif isinstance(args, list):
> +            self.__diff_file = "git-diff-local.txt"
> +            self.__diff_lines = args
> +        else:
> +            raise UnifiedFormatParseError(
> +                "UnifiedFormatParser constructor called with wrong 
> arguments")
> +
> +        self.__git_diff_header = re.compile(r'^diff --git a/(.*) b/(.*)$')
> +        self.__git_hunk_header = \
> +            re.compile(r'^@@ -\d+,(\d+) \+(\d+),(\d+) @@.*$')
> +        self.__diff_set = {}
> +        self.__parse()
> +
> +    def get_diff_path(self) -> str:
> +        return self.__diff_file
> +
> +    def add_change_set(self, change_set: ChangeSet) -> None:
> +        if not change_set.is_change_mode(ChangeSet.ChangeMode.NONE):
> +            if change_set.is_change_mode(ChangeSet.ChangeMode.COPY):
> +                # Add copy change mode items using the dst_file key, because
> +                # there might be other changes for the orig_file in this diff
> +                self.__diff_set[change_set.dst_file] = change_set
> +            else:
> +                self.__diff_set[change_set.orig_file] = change_set
> +
> +    def __parse(self) -> None:
> +        def parse_diff_header(line: str) -> ChangeSet | None:
> +            change_item = None
> +            diff_head = self.__git_diff_header.match(line)
> +            if diff_head and diff_head.group(1) and diff_head.group(2):
> +                change_item = ChangeSet(diff_head.group(1), 
> diff_head.group(2))
> +
> +            return change_item
> +
> +        def parse_hunk_header(line: str) -> Tuple[int, int, int]:
> +            file_linenum = -1
> +            hunk_a_linemax = -1
> +            hunk_b_linemax = -1
> +            hunk_head = self.__git_hunk_header.match(line)
> +            if hunk_head and hunk_head.group(1) and hunk_head.group(2) \
> +               and hunk_head.group(3):
> +                file_linenum = int(hunk_head.group(2))
> +                hunk_a_linemax = int(hunk_head.group(1))
> +                hunk_b_linemax = int(hunk_head.group(3))
> +
> +            return (file_linenum, hunk_a_linemax, hunk_b_linemax)
> +
> +        file_linenum = 0
> +        hunk_a_linemax = 0
> +        hunk_b_linemax = 0
> +        diff_elem = None
> +        parse_state = ParserState.FIND_DIFF_HEADER
> +        ChangeMode = ChangeSet.ChangeMode
> +        ChangeType = ChangeSet.ChangeType
> +
> +        for line in self.__diff_lines:
> +            if parse_state == ParserState.FIND_DIFF_HEADER:
> +                diff_elem = parse_diff_header(line)
> +                if diff_elem:
> +                    # Found the diff header, go to the next stage
> +                    parse_state = ParserState.FIND_HUNK_OR_DIFF_HEADER
> +            elif parse_state == ParserState.FIND_HUNK_OR_DIFF_HEADER:
> +                # Here only these change modalities will be registered:
> +                # deleted file mode <mode>
> +                # rename from <path>
> +                # rename to <path>
> +                # copy from <path>
> +                # copy to <path>
> +                #
> +                # These will be ignored:
> +                # old mode <mode>
> +                # new mode <mode>
> +                # new file mode <mode>
> +                #
> +                # Also these info will be ignored
> +                # similarity index <number>
> +                # dissimilarity index <number>
> +                # index <hash>..<hash> <mode>
> +                if line.startswith("deleted file"):
> +                    # If the file is deleted, register it but don't go 
> through
> +                    # the changes that will be only a set of lines removed
> +                    diff_elem.set_change_mode(ChangeMode.DELETE)
> +                    parse_state = ParserState.FIND_DIFF_HEADER
> +                elif line.startswith("new file"):
> +                    # If the file is new, skip it, as it doesn't give any
> +                    # useful information on the report translation
> +                    parse_state = ParserState.FIND_DIFF_HEADER
> +                elif line.startswith("rename to"):
> +                    # Renaming operation can be a pure renaming or a rename
> +                    # and a set of change, so keep looking for the hunk
> +                    # header
> +                    diff_elem.set_change_mode(ChangeMode.RENAME)
> +                elif line.startswith("copy to"):
> +                    # This is a copy operation, mark it
> +                    diff_elem.set_change_mode(ChangeMode.COPY)
> +                else:
> +                    # Look for the hunk header
> +                    (file_linenum, hunk_a_linemax, hunk_b_linemax) = \
> +                        parse_hunk_header(line)
> +                    if file_linenum >= 0:
> +                        if diff_elem.is_change_mode(ChangeMode.NONE):
> +                            # The file has only changes
> +                            diff_elem.set_change_mode(ChangeMode.CHANGE)
> +                        parse_state = ParserState.REGISTER_CHANGES
> +                    else:
> +                        # ... or there could be a diff header
> +                        new_diff_elem = parse_diff_header(line)
> +                        if new_diff_elem:
> +                            # Found a diff header, register the last change
> +                            # item
> +                            self.add_change_set(diff_elem)
> +                            diff_elem = new_diff_elem
> +            elif parse_state == ParserState.REGISTER_CHANGES:
> +                if (hunk_b_linemax > 0) and line.startswith("+"):
> +                    diff_elem.add_change(file_linenum, ChangeType.ADD)
> +                    hunk_b_linemax -= 1
> +                elif (hunk_a_linemax > 0) and line.startswith("-"):
> +                    diff_elem.add_change(file_linenum, ChangeType.REMOVE)
> +                    hunk_a_linemax -= 1
> +                    file_linenum -= 1
> +                elif ((hunk_a_linemax + hunk_b_linemax) > 0) and \
> +                        line.startswith(" "):
> +                    hunk_a_linemax -= 1 if (hunk_a_linemax > 0) else 0
> +                    hunk_b_linemax -= 1 if (hunk_b_linemax > 0) else 0
> +
> +                if (hunk_a_linemax + hunk_b_linemax) <= 0:
> +                    parse_state = ParserState.FIND_HUNK_OR_DIFF_HEADER
> +
> +                file_linenum += 1
> +
> +        if diff_elem is not None:
> +            self.add_change_set(diff_elem)
> +
> +    def get_change_sets(self) -> dict:
> +        return self.__diff_set
> -- 
> 2.34.1
> 



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.