rhaeguard / rgx

a tiny regex engine written in go

Home Page:https://rhaeguard.github.io/posts/regex

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

rgx

A very simple regex engine written in go. This library is experimental, use it at your own risk!

read the article.

to add the dependency:

go get github.com/rhaeguard/rgx

how to use:

import "github.com/rhaeguard/rgx"

pattern, err := rgx.Compile(regexString)
if err != nil {
	// error handling
}
results := pattern.FindMatches(content)

if results.Matches {
	groupMatchString := results.Groups["group-name"]
}

todo

  • ^ beginning of the string
  • $ end of the string
  • . any single character/wildcard
  • bracket notation
    • [ ] bracket notation/ranges
    • [^ ] bracket negation notation
    • better handling of the bracket expressions: e.g., [ab-exy12]
    • special characters in the bracket
      • support escape character
  • quantifiers
    • * none or more times
    • + one or more times
    • ? optional
    • {m,n} more than or equal to m and less than equal to n times
  • capturing group
    • ( ) capturing group or subexpression
    • \n backreference, e.g, (dog)\1 where n is in [0, 9]
    • \k<name> named backreference, e.g, (?<animal>dog)\k<animal>
    • extracting the string that matches with the regex
  • \ escape character
    • support special characters - context dependant
  • better error handling in the API
  • ability to work on multi-line strings (tested on Alice in Wonderland text corpus)
    • . should not match the newline - \n
    • $ should match the newline - \n
    • multiple full matches

notes

  • \ escape turns any next character into a literal, no special combinations such as \d for digits, \b for backspace, etc. are allowed
  • numeric groups \n only support single digit references, so \10 will be interpreted as the first capture group followed by a literal 0

credits

About

a tiny regex engine written in go

https://rhaeguard.github.io/posts/regex

License:MIT License


Languages

Language:Go 100.0%