cedadev / cc-summariser

Summarize compliance-checker results on multiple datasets

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

cc-summariser

cc-summariser creates a summary of compliance-checker results from running checks on multiple datasets.

Installation

git clone https://github.com/cedadev/cc-summariser
pip install ./cc-summariser

# Multiple dataset output is not yet on master branch...
pip install git+https://github.com/ioos/compliance-checker@refactor-scoring

Usage

The most basic usage is as follows:

compliance-checker -f json_new -o results.json --test <your test(s)> <dataset>...
cc-summariser results.json

For more options see the output of cc-summariser --help:

usage: cc-summariser [-h] [-l FILE_LIMIT] [-f {text,json}] results_file

Summarise the results compliance-checker run on multiple datasets

positional arguments:
  results_file          File containing compliance-checker results in
                        'json_new' format

optional arguments:
  -h, --help            show this help message and exit
  -l FILE_LIMIT, --file-limit FILE_LIMIT
                        The maximum number of offending files to list in the
                        'Failure details' section in the text format output
  -f {text,json}, --format {text,json}
                        Format to print summary in

Output

The default format if not specified with --format is text.

JSON

JSON output is of the following form:

{
    "num_files": <total files checked>,
    "num_no_errors": <number of files with no check failures>,
    "summary": {
        "high_priorities": {
            <checker name>: [
                {
                    "name": <check name>,
                    "count": <number of files that failed this check>,
                    "files": [<file 1>, <file 2>, ...],
                    "msgs": [
                        {
                            "msg": <msg 1>,
                            "count": <n>,
                            "files": <files with this failure message>
                        },
                        ...
                    ]
                },
                ...
            ],
            ...
        },
        "medium_priorities": {...},
        "low_priorities": {...}
    }
}

Text

The text output contains the same information as the JSON format but in a more human-readable layout. It is split into two sections

  • The 'Failures summary' section lists only the number of files that failed each check. This section is easy to scan through to see where checks are failing the most.

  • The 'Failure details' section lists the filenames that failed each check and reasons for the check failing (if available).

    The --file-limit command line option limits the number of filenames listed here; e.g. if 10 files failed a check and --file-limit=4 the output would be

    file_1.nc
    file_2.nc
    ...
    file_9.nc
    file_10.nc
    

About

Summarize compliance-checker results on multiple datasets

License:BSD 3-Clause "New" or "Revised" License


Languages

Language:Python 100.0%