semgrep / semgrep

Lightweight static analysis for many languages. Find bug variants with patterns that look like source code.

Home Page:https://semgrep.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Feature: Dedicated semgrep-precommit repo

meshy opened this issue · comments

Suggestion: Please create a new github repo containing only the pre-commit hooks.

Is your feature request related to a problem? Please describe.

Installing semgrep as a pre-commit hook takes several minutes. I would normally expect a pre-commit hook to install in seconds.

It takes this long because semgrep's repo has a lot of git submodules (36 by my count), and when pre-commit clones a repo, it does a recursive-clone.

In the case of semgrep this is a wasted effort, because none of those submodules are required in order for the pre-commit hook to work. In fact, the "hook does not even rely on the code in this repository". All that is required is the .pre-commit-hooks.yaml file and the setup.py.

Describe the solution you'd like

Installing semgrep would be made a lot faster if the pre-commit hook were maintained in a separate repo. The repo needs to contain only the .pre-commit-hooks.yaml file and the setup.py from this repo. I've trialled this already with a private repo. It installed very swiftly, and worked a treat.

(It would be very friendly if old versions of semgrep were available through this pre-commit repo.)

Describe alternatives you've considered

I did consider some alternative approaches to this issue:

  • Try to make pre-commit's clone behaviour configurable / faster. I feel like this is addressing the wrong issue in this case.

  • Reduce the number of git submodules in this repo. They're presumably here for a good reason, and that wouldn't work for older versions.

  • Maintain a semgrep-precommit myself. I presume the semgrep org would prefer to be in charge of a repo like this.

  • Put up with the install time. I'm not keen on this if I can avoid it.

Use case

Reasonable install-time of semgrep as a pre-commit hook.

Additional context

I have only tried this with the python hook. I notice that there are also docker-based hooks, which didn't seem applicable to my use-case, so I didn't look into those.

For reference, here's the commit I made in a private repo to try this out:

commit (hash...) (HEAD -> main, tag: v1.69.0, origin/main)
Author: Charlie Denton <...@...>
Date:   Thu Apr 18 16:24:35 2024 +0100

    Add initial test for semgrep pre-commit hook
    
    Before now, we've been avoiding installing semgrep in a pre-commit
    because it takes 3 minutes to install. Investigation reveals that the
    majority of that time is taken up with installing submodules which turn
    out to be unused.
    
    By only having the pre-commit hook and the setup.py we keep checkout
    times short.

diff --git .pre-commit-hooks.yaml .pre-commit-hooks.yaml
new file mode 100644
index 0000000..6f5bf5a
--- /dev/null
+++ .pre-commit-hooks.yaml
@@ -0,0 +1,9 @@
+---
+# See https://pre-commit.com/#new-hooks for more information on this file.
+# It allows to call semgrep from pre-commit
+
+- id: semgrep
+  name: semgrep
+  entry: semgrep
+  language: python
+  args: ["--disable-version-check", "--quiet", "--skip-unknown-extensions"]
diff --git README.md README.md
new file mode 100644
index 0000000..986cb46
--- /dev/null
+++ README.md
@@ -0,0 +1,9 @@
+# semgrep-precommit
+
+A quick test to see if we can speed up pre-commit installs of semgrep.
+
+Semgrep's repo has a whole load of submodules which take ages for pre-commit to install.
+
+Those submodules aren't even used, because semgrep uses a dedicated `setup.py` to go on to install semgrep from PyPI.
+
+This repo includes a similar `setup.py` and a subset of the pre-commit config, but excludes the rest of semgrep's repo.
diff --git setup.py setup.py
new file mode 100644
index 0000000..4bbf959
--- /dev/null
+++ setup.py
@@ -0,0 +1,9 @@
+from setuptools import setup
+
+
+setup(
+    name="semgrep_pre_commit_test",
+    version="1.69.0",
+    install_requires=["semgrep==1.69.0"],
+    packages=[],
+)

(NB: I've changed the installed version in these commits, so the hashes are wrong.)

this seems like a good idea and a quick win

Thanks for the detailed request! I've created this at https://github.com/semgrep/pre-commit. Let me know if there's any issues.

I haven't included releases prior to 1.70.0; happy to do so. From a quick glance, we last added a hook in 1.28.0. Are there any specific past releases you'd want included, or just enough to make migrating probably not require a version bump?

@kopecs Thank you very much for that!

If you could make v1.52.0 available, I would really appreciate it :)

Alright, I think I have set it up correctly over there for 1.28.0 onwards. If you run into any issues just open an issue on that repo and tag me!

Thank you!