imranpollob / learn-regex

Examples based regex learning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Learn Regex

Examples based Regex cheatsheet. Examples assumes /g flag is on.

Character Handling

basic match

Matches exact characters.

man => The man and woman.

[abc] character set

Matches any character in the set.

[tT]he => The man and the woman.
[bc]at => The cat and the bat and the rat are having a chat.

[^abc] negated set

Matches any character that is not in the set.

[^bc]at => The cat and the bat and the rat are having a chat.

[a-z] range set

Matches any character between the the two specified character.

[0-9] => My ID number is 0123456789.

. any character

Matches any character except linebreaks.

.at => The cat and the bat and the rat are having a chat.

Equivalent: [^\n\r]

\w word

Matches any word (alphanumeric and underscore).

\w => PI (π) = 3.1416

Equivalent: [a-zA-Z0-9_]

\W not word

Matches any non word (not alphanumeric and underscore).

\W => PI π = 3.1416

Equivalent: [^a-zA-Z0-9_]

\d digit

Matches any digit.

\d => PI (π) = 3.1416

Equivalent: [0-9]

\D not digit

Matches any non digit.

\D => PI (π) = 3.1416

Equivalent: [^0-9]

\s whitespace

Matches any whitespace.

\s => PI (π) = 3.1416

\S whitespace

Matches any non whitespace.

\S => PI (π) = 3.1416

Anchors

^ beginning

Matches at the beginning.

^P[iI] => PI (π) = 3.1416

Don't be confused with [^iT] that means not any of i and I.

$ end

Matches at the end.

\d$ => PI (π) = 3.1416

\b word boundary

Matches at the end of each word.

an\b => The man and woman.

\B not word boundary

Matches not at the end of each word.

an\B => The man and woman.

Escaped characters

\ reserved character

Matches special reserved characters +*?^$\.[]{}()|/.

\(\S\) => PI (π) = 3.1416

\t tab

Matches a TAB character.

\t => Tab1    Tab2

others

\n = Matches line feed character 
\v = Matches vertical tab character 
\f = Matches form feed character 
\r = Matches carriage return character 
\0 = Matches null character

octal, hexadecimal, unicode

© in octal is 251

\251 => RegExr is ©2014

© in hexadecimal is A9

\xA9 => RegExr is ©2014

© in unicode is 00A9

\u00A9 => RegExr is ©2014

Quatifiers

* zero or more

Matches 0 or more of the preceding character.

\s*man\s* => The man and woman.

+ one or more

Matches 1 or more of the preceding character.

w\w+n => The man and woman.

? optional

Matches 0 or more of the preceding character.

[cC]olou?r => Color or colour?

{} quantifier

Matches the specific quantity of the preceding character.
{3} = matches exactly 3
{1,3} = matches 1 to 3
{3,} = matches 3 or more

\d{3} => PI (π) = 3.1416
\d{1,4} => PI (π) = 3.1416
\d{2,} => PI (π) = 3.1416

| alternation

Acts like an OR statment. Matches at expression level unlike []

[cbr]at|and => The cat and the bat and the rat are having a chat.

Groups

() capturing group

Groups characters and captures (stores) for futher processing.

(\d{4})-(\d{2})-(\d{2}) => Today is 2023-11-02.

(?<>) named capturing group

Tags a group nam for the groups for futher processing.

(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2}) => Today is 2023-11-02.

() \1 group reference

Matches the given group number only.

(\d{4})-(\d{2})-(\d{2})\1 => Today is 2023-11-02.
# first group is referenced 

(?:) non capturing group

Groups characters but doesn't capture.

(?:\d{4})-(?<month>\d{2})-(\d{2}) => Today is 2023-11-02.
# captures two groups, one named "month" another unnamed

Lookaround

(?=) positive lookahead

Get all the matches that are followed by a specific pattern.

\d(?=px) => 1pt 2px 3em 4px

(?!) negative lookahead

Get all the matches that are not followed by a specific pattern.

\d(?!px) => 1pt 2px 3em 4px

(?<=) positive lookbehind

Get all the matches that are preceded by a specific pattern.

(?<=name)\d => name1 age1 name2 sex1

(?<!) negative lookbehind

get all the matches that are not preceded by a specific pattern.

(?<!name)\d => name1 age1 name2 sex1

Flags

g global search

Find all matches unlike stopping after the first match.

# default is just find one
at => The cat and the bat and the rat are having a chat.

# adding /g flag will search all
/t/g => The cat and the bat and the rat are having a chat.

i case insensitive

Case insensitive search.

/the/gi => The man and the woman.

m multiline

Apply anchors(^ $ \b \B) at each line.

# default
/at$/g =>  The cat
        and the bat
        and the rat

# adding /m flag will match the anchors at each line
/at$/gm =>  The cat
        and the bat
        and the rat

s dotall

Dot (.) will match any character, including newline.

# default
/.+/ => The man
        and the woman.

# adding /s flag will match newline
/.+/s => The man
        and the woman.

Greedy vs Lazy Matching

Default is greedy, that matches as many characters as possible.

.+at => The cat and the bat and the rat are having a chat.

Lazy makes is as few as possible.

.+?at => The cat and the bat and the rat are having a chat.

About

Examples based regex learning