iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.

Home Page:http://iree.dev/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Migrate lint checks to pre-commit?

ScottTodd opened this issue · comments

Current lint checks

We have a lint workflow and script, both of which require some manual setup and can get out of sync:

We currently run:

Tool Description
bazel_to_cmake Our custom tool for translating BUILD.bazel files to CMakeLists.txt files
buildifier Formatting BUILD.bazel files
black Formatting Python files
pytype Analyzing types in Python files
This has enough false positives to remove-- we could switch to mypy
clang-format Formatting C/C++ files
tabs Our custom script to check source files tabs when spaces should be used
yamllint Linting yaml files
markdownlint Linting markdown files
path_lengths Checking for long path lengths (problematic on Windows)
generated_cmake_files Checking that our test suite generation scripts have been run
build_file_names Checking that Bazel files are named BUILD.bazel instead of BUILD
Remove? Helps with integration into Google's downstream repo

Pre-commit

Pre-commit is "a multi-language package manager for pre-commit hooks" that can be run easily on developer machines (no need to separately install specific versions of each tool) and on GitHub Actions via either an action or a service.

We've started using pre-commit on a few related projects and it has been helpful. See also this recent discussion on Discord.

We can also update our required status checks to block on all lint checks using this (or we could use the 'summary' style that ci.yml has. Right now we only block on clang-format, bazel_to_cmake, and yamllint:
image

Started testing this out in IREE with #17534 and this .pre-commit-config.yaml file:

exclude: "third_party/"

repos:
-   repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v3.2.0
    hooks:
    -   id: trailing-whitespace
    -   id: end-of-file-fixer
        exclude_types: ["image", "jupyter"]
    -   id: check-yaml
        exclude: "mkdocs.yml"  # Extensions aren't included in the schema: https://github.com/squidfunk/mkdocs-material/issues/6378
-   repo: https://github.com/psf/black
    rev: 23.3.0
    hooks:
    -   id: black

So far I'm liking the ergonomics.

Setup on Windows:

py -m pip install --user pipx
py -m pipx ensurepath
pipx install pre-commit

pre-commit run --all-files

Buildifier has some annoying gotchas:

  • bazelbuild/buildtools#914 makes it replace Windows style line endings with Linux line endings, marking all files as dirty when using git's core.autocrlf=true. For some reason I can format Bazel files in VSCode without issues but when I run buildifier directly or via pre-commit I get a diff for every BUILD.bazel file:
The file will have its original line endings in your working directory
warning: LF will be replaced by CRLF in compiler/plugins/target/LLVMCPU/Builtins/BUILD.bazel.
The file will have its original line endings in your working directory
warning: LF will be replaced by CRLF in compiler/plugins/target/LLVMCPU/internal/BUILD.bazel.
The file will have its original line endings in your working directory
warning: LF will be replaced by CRLF in compiler/plugins/target/LLVMCPU/test/BUILD.bazel.
  • despite buildifier being written in go and pre-commit supporting package manager features, the community pre-commit hooks assume that it is already installed

git add . --renormalize fixes line ending stuff for me whenever I have issues - maybe could run that after

(git line ending handling confuses me, so I use python build_tools/bazel_to_cmake/bazel_to_cmake.py && git add . --renormalize as my bazel-to-cmake command)

Got this mostly done. Remaining work:

  • Land #17538
  • Switch "required checks" in protected branch settings from previous lint jobs to pre-commit job
  • Enable end-of-file-fixer check
  • Enable trailing-whitespace check
  • Figure out how to enable buildifier check in a way that doesn't break Windows
  • Figure out how to enable generate_cmake_files check in a way that doesn't break Windows
  • Remove old pytype check
  • Add new mypy (https://mypy-lang.org/) check and ensure repo passes it
  • Test git commit hook mode (ensure it doesn't add significant overhead, a few checks are running on all files instead of changed files - may want to refactor those checks)
  • Add documentation to https://iree.dev/developers/general/contributing/#coding-style-guidelines

git add . --renormalize fixes line ending stuff for me whenever I have issues - maybe could run that after

(git line ending handling confuses me, so I use python build_tools/bazel_to_cmake/bazel_to_cmake.py && git add . --renormalize as my bazel-to-cmake command)

Okay, this sort of works. A few ideas for working within the limits of pre-commit:

  • Add a run_buildifier.py script that runs on all files in a single step then runs git add . --renormalize
  • Add a run_buildifier.py script that runs on a single file then runs git add {FILENAME} --renormalize (watch for git process lock if running in parallel)
  • Make the buildifier check optional (https://pre-commit.com/#confining-hooks-to-run-at-certain-stages makes it manual, might prefer a way to just skip it on Windows) and have the CI opt-in

Started looking at mypy on https://github.com/ScottTodd/iree/tree/pre-commit-mypy

Will need to either fix some issues or figure out how to limit how much of the repo it looks at

D:\dev\projects\iree (pre-commit-mypy)
λ pre-commit run --all-files mypy
mypy.....................................................................Failed
- hook id: mypy
- exit code: 2

build_tools\benchmarks\comparisons\common\__init__.py: error: Duplicate module named "common" (also at "build_tools\benchmarks\common\__init__.py")
build_tools\benchmarks\comparisons\common\__init__.py: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#mapping-file-paths-to-modules for more info
build_tools\benchmarks\comparisons\common\__init__.py: note: Common resolutions include: a) using `--exclude` to avoid checking one of them, b) adding `__init__.py` somewhere, c) using `--explicit-package-bases` or adjusting MYPYPATH
Found 1 error in 1 file (errors prevented further checking)

The git hook works, even via git gui:
image

Most of the hooks are fast and only run on individual files when relevant. The slowest right now appears to be bazel_to_cmake, taking several seconds. I just set bazel_to_cmake to always run regardless of which files changed since it does its own walk through to discover Bazel and CMake source files. Could speed that up by teaching the python script to run on individual source files (bazel source -> generate cmake, cmake source -> look for corresponding bazel and edit in-place) and then let pre-commit handle finding affected files.

Performance is particularly important when running as a git hook, since git commit should take milliseconds, not seconds.

Got a few devs specifically request getting the buildifier check running seamlessly (all platforms, ideally no manual download) here on Discord.

Can also check if there is an automatic deps adder we can use in open source. Google had an internal one but it sometimes was wrong or added extra deps (mostly across selects, which we barely use, IIRC). If such a tool / run mode exists we could try having pre-commit run it automatically.

Classic... there are two hooks for running buildifier listed at https://pre-commit.com/hooks.html, both of which assume that buildifier is already installed (despite pre-commit being a package manager), then https://github.com/bazelbuild/buildtools itself uses a third hook: https://github.com/keith/pre-commit-buildifier. That one claims it downloads buildifier, but instead of working like other pre-commit hooks and building from source (again - package manager) it uses a bash wrapper script that only supports macOS and Linux 🤦

Except for one last change (#17619) and pytype (which will be replaced with mypy), this migration is complete 🥳

All done! (except for setting up mypy for python type checking)