ejin-luo / sgrep

polyglot AST pattern matching

Home Page:https://sgrep.live

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

sgrep ci

r2c community slack

sgrep.live - Try it now

sgrep, for syntactical (and occasionnally semantic) grep, is a tool to help find bugs by specifying code patterns using a familiar syntax. The idea is to mix the convenience of grep with the correctness and precision of a compiler frontend.

Quick Examples

patternwill match code like
$X == $Xif (node.id == node.id): ...
foo(kwd1=1, kwd2=2, ...)foo(kwd2=2, kwd1=1, kwd3=3)
subprocess.open(...)import subprocess as s; s.open(['foo'])
see more examples in the sgrep-rules registry

Supported Languages by Sgrep Feature

go python js java php ml cpp c
'...' operator 🚧 🚧 🚧 🔶
Equivalences 🔶 🔶 🔶 🔶 🚧 🚧 🚧 🔶
Metavariables 🚧 🚧 🚧
Others 🔶 🔶 🔶 🚧 🚧 🚧 🚧
  • ✅ — Supported
  • 🔶 — Partial support
  • 🚧 — Under development

Meetups

Want to learn more about sgrep? Check out these slides from the r2c February meetup

Installation

Too lazy to install? Try out sgrep.live

Docker

sgrep is packaged within a docker container, making installation as easy as installing docker.

Quickstart

docker pull returntocorp/sgrep

cd /path/to/repo
# generate a template config file
docker run --rm -v $(pwd):/home/repo returntocorp/sgrep --generate-config

# look for findings
docker run --rm -v $(pwd):/home/repo returntocorp/sgrep

Usage

Rule Development

To rapidly iterate on a single pattern, you can test on a single file or folder. For example,

docker run --rm -v $(pwd):/home/repo returntocorp/sgrep -l python -e '$X == $X' path/to/file.py

Here, sgrep will search the target with the pattern $X == $X (which is a stupid equals check) and print the results to stdout. This also works for directories and will skip the file if parsing fails. You can specifiy the language of the pattern with --lang javascript for example.

To see more options

docker run --rm -v $(pwd):/home/repo returntocorp/sgrep --help

Config Files

Format

See config.md for example configuration files and details on the syntax.

sgrep Registry

r2c provides a registry of config files tuned using our analysis platform on thousands of repositories. To use:

sgrep --config r2c

Default

Default configs are loaded from .sgrep.yml or multiple files matching .sgrep/**/*.yml and can be overridden by using --config <file|folder|yaml_url|tarball_url|registy_name>

Design

Sgrep has a design philosophy that emphasizes simplicity and a single pattern being as expressive as possible:

  1. Use concrete code syntax: easy to learn
  2. Metavariables ($X): abstract away code
  3. '...' operator: abstract away sequences
  4. Knows about code equivalences: one pattern can match many equivalent variations on the code
  5. Less is more: abstract away additional details

Patterns

Patterns are snippets of code with variables and other operators that will be parsed into an AST for that langauge and will be used to search for that pattern in code. See patterns.md for full documentation.

Metavariables

$X, $FOO, $RETURNCODE are all examples of metavariables. You can referance them later in your pattern and sgrep will ensure they match. Metavariables can only contain uppercase ASCII characters; $x and $SOME_VALUE are not valid metavariables.

Operators

... is the primary "match anything" operator

Equivalences

sgrep automatically searches for code that is semantically equivalent. For example, a pattern for

subprocess.open(...)

will match

from subprocess import open as
 sub_open
result = sub_open(“ls”)

and other semantically equivalent configurations.

Integrations

See integrations.md

Bug Reports

Reports are welcome! Please open an issue on this project.

Contributions

sgrep is LGPL-licensed and we would love your contributions. See docs/development.md

About

polyglot AST pattern matching

https://sgrep.live

License:GNU Lesser General Public License v2.1


Languages

Language:OCaml 74.9%Language:Python 18.8%Language:Shell 1.4%Language:PHP 1.2%Language:JavaScript 0.7%Language:Go 0.7%Language:Makefile 0.6%Language:Java 0.4%Language:Dockerfile 0.4%Language:Emacs Lisp 0.4%Language:C 0.3%Language:Hack 0.1%Language:Standard ML 0.1%Language:C++ 0.1%