datalad / datalad

Keep code, data, containers under control with git and git-annex

Home Page:http://datalad.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Use flake8 for linting

jwodder opened this issue · comments

The DataLad source has accumulated a lot of cruft over the years that needs to be cleaned up. Currently, the only linter in use is pylint, which is only configured to run a single check (though there is some unused configuration for flake8 in tox.ini). Rather than expanding the use of pylint (which would require a lot of configuration to be in a practical & friendly state), I suggest linting with the much simpler flake8 along with some of its plugins.

(An alternative linter that I feel I should mention is ruff, the latest hotness in Python linting, which I have not used.)

Flake8 can be set up as follows:

  • Add the following to the repos list in .pre-commit-config.yaml so that flake8 failures are caught on commit (assuming the developer has run pre-commit install):

      - repo: https://github.com/PyCQA/flake8
        rev: 6.1.0
        hooks:
          - id: flake8
            additional_dependencies:
              - flake8-bugbear
              - flake8-builtins
              - flake8-unused-arguments
  • In tox.ini:

    • Remove the commented-out "flake8" item from the envlist at the top.

    • Remove the [testenv:flake8] section

    • Add flake8 to the [testenv:lint] section:

      • Add the following to the deps field:

        flake8
        flake8-bugbear
        flake8-builtins
        flake8-unused-arguments
        
      • Add flake8 to the commands field

    • Set the contents of the [flake8] section to:

      [flake8]
      doctests = True
      extend-exclude = build/,dist/
      unused-arguments-ignore-stub-functions = True
      extend-select = B901,B902,E121,E123,E126,E223,E224,E242,E301,E304
      extend-ignore = A003,B005,E2,E3,E402,E501,U101
  • Whoever prepares this PR will also have to fix all errors that flake8 reports. Alternatively, some checks of lesser concern can be added to the extend-ignore list, though I would recommend for each such check to have an issue created for it for re-enabling the check and addressing all its failures.

Explanations of flake8 Options Used

  • doctests = True — Enables linting of doctests
  • extend-exclude — Adds the given directories to the list of paths not checked by flake8
  • unused-arguments-ignore-stub-functions = True — Prevents flake8-unused-arguments from complaining about unused arguments on functions with ... as a body

The extend-select and extend-ignore options configure which checks to enable & disable. Full listings of all possible checks can be found at:

By default, all checks (except the B9 checks and the default ignore checks) are enabled, and E121,E123,E126,E226,E24,E704,W503,W504 are ignored. The options given above extend these lists, including un-ignoring some checks ignored by default. You may want to adjust the configuration further, depending on which of the more common errors you consider too much effort to fix in a single PR.

Most of the checks included in extend-ignore are for whitespace-related things that it'd be fiddly and unfun to fix, unless you're willing to apply black to the codebase in one fell swoop (which is something I'd be in favor of). I also ignored E402 (module level import not at top of file) because there were a lot of occurrences and it's not that big of a problem.

When I tried the above configuration out on the DataLad source, one of the more common errors was F401 (unused import). While some of the unused imports legitimately should be removed, some others actually seemed to be done in order to enable exporting from alternative paths (e.g., the import of datalad.runner.gitrunner.GitWitlessRunner in datalad.cmd allows one to write from datalad.cmd import GitWitlessRunner). The latter kind should be addressed by adding __all__ variables to the re-exporting modules that are each assigned a list of the names of all re-exported items plus all locally-defined items also meant for export.

I personally would appreciate linting and lint-free code, but there were different opinions voiced before, and I did not push forcefully the initiative. Let's see what other @datalad/developers think about it now.