semgrep / semgrep

Lightweight static analysis for many languages. Find bug variants with patterns that look like source code.

Home Page:https://semgrep.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

semgrep scan --validate fails because semgrep-core report "No such file or directory"

sflanker opened this issue · comments

Describe the bug

It would appear that semgrep scan --validate is not invoking semgrep-core correctly.

semgrep scan --validate --config="p/owasp-top-ten" --debug

[00.00][DEBUG]: setup_logging: highlight_setting=Std_msg.Auto, highlight=true
Downloading config from https://semgrep.dev/p/owasp-top-ten
Failed to decode JSON: KeyError('rule_config')
Downloaded config from https://semgrep.dev/p/owasp-top-ten
loaded 1 configs in 5.68034553527832
Downloading config from https://semgrep.dev/p/semgrep-rule-lints
Failed to decode JSON: KeyError('rule_config')
Downloaded config from https://semgrep.dev/p/semgrep-rule-lints
loaded 1 configs in 0.3521561622619629
[00.00][INFO](cli, Core_CLI): Executed as: /usr/local/lib/python3.12/site-packages/semgrep/bin/semgrep-core -json -check_rules /tmp/tmpi87nyugx.yaml p/owasp-top-ten
[00.00][INFO](cli, Core_CLI): Version: semgrep-core version: 1.62.0
Exception: Sys_error("p/owasp-top-ten: No such file or directory")
Raised by primitive operation at UFile.Legacy.files_of_dirs_or_files_no_vcs_nofilter.(fun) in file "libs/commons/UFile.ml", line 177, characters 14-33
Called from List_.fast_map in file "libs/commons/List_.ml", line 80, characters 17-20
Called from UFile.Legacy.files_of_dirs_or_files_no_vcs_nofilter in file "libs/commons/UFile.ml", line 175, characters 4-204
Called from UFile.files_of_dirs_or_files_no_vcs_nofilter in file "libs/commons/UFile.ml", line 191, characters 2-74
Called from File_type.files_of_dirs_or_files in file "libs/commons/File_type.ml", line 423, characters 2-52
Called from Check_rule.run_checks in file "src/metachecking/Check_rule.ml", line 239, characters 4-145
Called from Check_rule.check_files in file "src/metachecking/Check_rule.ml", line 286, characters 26-65
Called from Core_CLI.with_exception_trace in file "src/core_cli/Core_CLI.ml", line 743, characters 6-10


Configuration is invalid - found 1 configuration error(s), and 523 rule(s).
[ERROR] Error while running rules:
                    You are seeing this because the engine was killed.

                    The most common reason this happens is because it used too much memory.
                    If your repo is large (~10k files or more), you have three options:
                    1. Increase the amount of memory available to semgrep
                    2. Reduce the number of jobs semgrep runs with via `-j <jobs>`. We
                        recommend using 1 job if you are running out of memory.
                    3. Scan the repo in parts (contact us for help)

                    Otherwise, it is likely that semgrep is hitting the limit on only some
                    files. In this case, you can try to set the limit on the amount of memory
                    semgrep can use on each file with `--max-memory <memory>`. We recommend
                    lowering this to a limit 70% of the available memory. For CI runs with
                    interfile analysis, the default max-memory is 5000MB. Without, the default
                    is unlimited.

                    The last thing you can try if none of these work is to raise the stack
                    limit with `ulimit -s <limit>`.

                    If you have tried all these steps and still are seeing this error, please
                    contact us.

                       Error: semgrep-core exited with unexpected output

Sending pseudonymous metrics since metrics are configured to AUTO and registry usage is True

Maybe I'm misunderstanding the usage for --validate but because I would like to differentiate between semgrep bugs like #9617, invalid configuration (for example a bad ruleset name), and actual issues with the files being scanned, I think it would be reasonable to be able to invoke scan --validate in this way.

To Reproduce

Create an empty folder.
Initialize a git repo.
Add a remote (doesn't have to exist).
Run semgrep scan --validate --config="p/owasp-top-ten" --debug

Expected behavior

Semgrep downloads and validates the ruleset specified in the --config arg. I would also expect then SEMGREP_RULES environment variable to work.

What is the priority of the bug to you?

  • P0: blocking your adoption of Semgrep or workflow
  • P1: important to fix or quite annoying
  • P2: regular bug that should get fixed

Environment

Docker: semgrep/semgrep:1.62.0

Use case

In a CI context, I want to differentiate between issues running semgrep and issues detected by semgrep.

cc @aryx osemgrep related?