streof / sp

Basic cli for searching patterns/words in files

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

sp

sp (Search and Print) is a basic implementation of grep/ripgrep. It can be used to find patterns/words in files.

The main idea behind sp was to create a cli that in terms of features lies somewhere between simply matching a substring and a regex search. In its current state, sp can be best seen as a limited extension of a substring search.

Options

USAGE:
    sp [OPTIONS] <PATTERN> <PATH>

ARGS:
    <PATTERN>    A pattern used for matching
    <PATH>       A file to search

OPTIONS:
    -c, --count              Suppress normal output and show number of matching lines
    -e, --ends-with          Only show matches containing fields ending with PATTERN
    -h, --help               Prints help information
    -i, --ignore-case        Case insensitive search
    -m, --max-count <NUM>    Limit number of shown matches
    -n, --no-line-number     Do not show line number which is enabled by default
    -s, --starts-with        Only show matches containing fields starting with PATTERN
    -V, --version            Prints version information
    -w, --words              Whole words search (i.e. non-word characters are stripped)

Fields are strings separated by contiguous whitespace (as defined by Unicode)

Building

This is a Rust project so first you have to make sure that Rust is running on your machine.

To build this project:

$ git clone https://github.com/streof/sp
$ cd sp
$ cargo build --release
$ ./target/release/sp --version
sp 0.1.2

Running tests

sp includes unit and integration tests. Run them as follows:

$ cargo test

Considerations

Tha main goal of this project was to reimplement a small number of grep/ripgrep alike features. Performance was not a strict requirement althought, in my opinion, cli's should at least be perceived as fast by their users. Performance is obviously a trade-off and for sp depends on things like:

  • memory allocation
  • cpu utilization
  • heuristics
  • algorithm implementation (e.g. searching, encoding/decoding)
  • number system calls

Here are some thoughts from the exploration phase:

  • A simple way to reduce memory allocation is using an IO buffer. Rust's standard library provides for example the very convenient lines API but also the lower level read_line and read_until methods. Initially, this project used read_line but then I read this reddit thread where linereader was mentioned. I ended up using bstr which offers a good balance between rich, ergonomic API and performance (see this commit).
  • Counts in sp rely on a very naive implementation that does not take any advantage of modern CPU capabilities (see for example bytecount)
  • The current matching algorithm relies on high level API's exposed by bstr. However, I also performed some simple benchmarks which suggested that switching to twoway will give a significant performance boost (>2x speedup). Rust uses the twoway algorithm for things like pattern matching, althought the implementation differs from the one provided by the twoway crate.
  • In some cases the number of read syscalls used by sp is significantly higher than when using ripgrep.
  • Ripgrep uses encoding_rs for fast encoding/decoding.

About

Basic cli for searching patterns/words in files


Languages

Language:Rust 100.0%