unjs / magic-regexp

A compiled-away, type-safe, readable RegExp alternative

Home Page:https://regexp.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Allow for custom regex, and clarifying the usage of `anyOf()`

xRSquared opened this issue · comments

@didavid61202 @danielroe , As I was working on issue #7, two issues/points of clarification for the API came to mind.

Allowing for custom regex patterns

Unless there is a function that I don't know of, a user can't add to an expression using custom regex unless it is exported as one of the helpers for specific RegExp characters such as digit, whitespace, letter, etc. For example, using the current API there is no way to include the following regex pattern [1-9] without it being passed to exactly() and ending up as \[1-9\].

import { exactly } from "magic-regexp";

const test = exactly("foo").and("[1-9]"); // foo\[1-9\]

Note: the regex pattern was passed to exactly() here

When working within the package, we can create these arbitrary regex patterns using createInput(); however, this function isn't exported to end users.

Possible Solutions

  1. export an alias for createInput named one of the following input, rawInput,regex, or some other suggestion
  2. leave as is and don't allow users to use custom regex patterns (I don't think this is an end goal of the package)

Providing an alias to createInput would allow for patterns such as:

import { exactly, input } from "magic-regexp";

const test = exactly("foo").and(input("[1-9]")); // foo[1-9]

anyOf

@didavid61202 @danielroe, the anyOf function states that it takes an array of inputs, but it doesn't really take an array; it takes an arbitrary number of arguments. The function documentation stating it takes an array can lead to confusion, and in fact, it confused me when I first started using this package.

Consider the following examples:

import { anyOf } from "magic-regexp";

const test1 = anyOf(...["a", "b", "c"]); //(?:a|b|c)
const test2 = anyOf("a", "b", "c"); //(?:a|b|c)
const test3 = anyOf(["a", "b", "c"]); //(?:a,b,c)
const test4 = anyOf("abc"); //(?:abc)

Possible Solutions

  1. leave as is, and change the documentation
  2. accept arrays, and do the array unpacking within the function
  3. overload the function to allow for the passing of arrays

Nice suggestions!
I think we could support a helper function that insert custom raw RegExp pattern as an escape hatch for special cases, and for naming, I think maybe rawRegExp would be less confusing? Let's discuss further.
and it would be awesome to support both array or arbitrary number of arguments for anyOf.👍

I'm also thinking about simplifying a series of .and(...).and(...) to an array wrap with a new helper, what do you think?

rawRegExp

I like rawRegExp; let's go with that unless @danielroe has a better suggestion. I will note that I don't view it as an escape hatch for special cases. I think it should be a standard function that users regularly use, at least, as long as there isn't an API to create custom character sets.

  • I can create a PR creating rawRegExp after I submit the PR for issue #7 in a couple of days. Otherwise, anyone else can feel free to submit the PR.

anyOf

I think we can go one of two ways:

  1. function overloading anyOf
  2. create a function anyOfArray

I'm indifferent to either choice, so we can go with whichever @didavid61202 chooses.

array wrapper for .and

This was actually on my list of things to suggest. Creating a function similar to anyOf for .and(...).and(...) should improve readability substantially.

Side Note: in the python package I'm working on, I used operator overloading of + to improve the readability of adding patterns. Sadly, no operator overloading in Typescript/JavaScript.

The main issue with custom regexp is that it bypasses the type safety of this library, which is why I've hitherto intentionally made it difficult to add custom chunks of regexp.

Instead, I think it would make sense to focus on providing any missing pieces (e.g. the API to create custom character sets you mention). Ranges (meant to be implemented in #162) and greedy/non-greedy globs are two other pieces to add in, and I'd be up for thinking about more also.

Agreed, now that I think about it, the primary use cases that I envisioned for the escape hatch were for character sets and the lazy quantifier. It is probably best to natively implement that functionality and not implement the escape hatch.

Edit(04/01/2023): I removed the requirements to close this Issue and explained them below in a separate comment.

Hi @xRSquared, @danielroe
I've created a PR (#284) proposing an update to improve the readability of chaining multiple .and(...).and(...) by updating all input helpers to variadic functions.

some suggestions or inputs are welcome! If everything is good, maybe we can update or add some examples in the doc?

List of features needed to close this Issue:

  • Clarify usage of anyOf()
    • Fixed with PR(#284)
  • Implement an API to create custom character sets

Hi. How far are you from implementing the custom character sets? Have you considered the API for them and for negative character sets as well?

I find myself using regexps like href="[^"]+" quite often and as much as I'd like to use magic-regexp everywhere in the projects I'm working on, I'm sometimes forced to lean towards the regular regexps.