tidyverse / stringr

A fresh approach to string manipulation in R

Home Page:https://stringr.tidyverse.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

New features to easily capture text before or after n instance of delimiters without regex

jhtrico1850 opened this issue · comments

Microsoft just released the TEXTAFTER and TEXTBEFORE to easily extract text before or after the Nth space, colon, etc. It's similar to what Power Query had for a while with Text.BeforeDelimiter, but exposed in the main Excel formula interface rather than buried within Power Query.

Of course it's possible today to build regex with the existing stringr functions. Something that I deal with often is like having to import PDFs, having to parse and extract the relevant portions. It's quite tedious and error prone with regex to get what I want (like say the 3rd pair within 5 pairs of numbers like 10 3 4 5 4). With the old Power Query formula, and now the regular Excel formula, I can easily describe exactly what I want to get (just get the text before/after after N of the specified pattern. Hope this comes to stringr, or let me know if I'm missing something already that's similar to textafter/textbefore or Text.BeforeDelimiter/Text.AfterDelimiter.

Unfortunately that's out of scope for stringr, because we use stringi, which in turn uses the ICU regular expression engine, which doesn't support this feature.