sblom / RegExtract

Clean & simple idiomatic C# RegEx-based line parser that emits strongly typed results.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support for named groups into a single value?

Marvin-Brouwer opened this issue · comments

Hi, I see you have an option for mapping named groups to properties but it's not clear to me if I can do something like this:

const string expression = @"C:/SomePath/(?<ParentDir>.*)/(?<FileName>.*)\.txt";
var result = line.Extract<string>(expression , "FileName");

This is probably a bad example but, I find myself mostly using regex to match a single group and I like having them named for readability.

By the tuple example, it appears your library may support this but I'm not sure.

Thanks!

There's not a way to do that currently.

Today, RegExtract only binds named capture groups to named properties, and non-named groups to positional arguments.

I understand your scenario—I'll put some thought into it.

Possible approaches that come to mind:

  1. As with your example, accept an argument for which named capture group(s) to extract
  2. Accept a flag that ignores names on capture groups and binds positionally instead
  3. Return a dictionary or something with an indexer
  4. Return some generated type for use as dynamic

Ah thanks, I guess for now I can just do something like this:

const string expression = @"C:/SomePath/(?<ParentDir>.*)/(?<FileName>.*)\.txt";
var (_, fileName) = line.Extract<(string ParentDir, string FileName)>(expression);

This also kind of makes it apparent I'm not using the first group.
However, this method fails when having one named group because you can't use single value tuples.
Or would that be available by simply doing:

const string expression = @"C:/SomePath/.*/(?<FileName>.*)\.txt";
var fileName = line.Extract<string>(expression);

So I tried something similar to this:

var expression = new Regex("\\.\\/(?<filePath>(?!https?\\\\:).*?)[\\)\\\"]", RegexOptions.ECMAScript);
var fipePaths = line.Extract<IEnumerable<string>>(expression);

And I get an error it can't find the named expression.
I guess even if you don't support my initial request, perhaps it's a good idea to match single captures to the only/first named capture groups?

I also tried this just for completions sake:

var expression = new Regex("\\.\\/(?<filePath>(?!https?\\\\:).*?)[\\)\\\"]", RegexOptions.ECMAScript);
var fipePaths = line.Extract<(IEnumerable<string> filePath, object _)>(expression);

But I still get an exception telling me it can't find a named binding for "filePath".

I hope this helps, and I'd understand if it's out of scope for this library.
I just figured I'd give you some additional context.

Created Issue #14 to give you a flag that should support most of the examples above.

I totally understand your point about wanting to use Named Capture Groups just for readability, not for binding.

I did think of a possible workaround. If you don't use the capture group names somewhere else, you can still document your captures using regex comments ((?#comment text here)). So I guess instead of (?<fileName>.*), you could label your captures like this ((?#fileName).*) to force positional binding semantics before I add a flag to ignore capture group names.

Ah thanks, that's a viable workaround.
However I was planning on suggesting this library as a refactor. So I'd rather just hold off until #14 is added.