book example
https://docs.microsoft.com/ko-kr/dotnet/standard/base-types/character-classes-in-regular-expressions
https://www.unicode.org/versions/Unicode8.0.0/
- Regular expression is basically case sensitive. I should turn it on when I need case insensitive comparison Option1: Specify as part of pattern inline directive(?i)A Option2: Regex.IsMatch(text, pattern, RegexOptions.IgnoreCase
- Single Character "OR" condition
- Problem: Find all occurrences of letter 'a' or 'b'
- Pattern: a|b
- Text: this is a big text
- String literal
- Problem: Find all occurrences of string 'ab'
- Pattern: ab
- Text: this is absolute test
- Set based - Square brackeets [set membership]
- Problem: Find all occurrences of a or b
- Pattern: [ab]
- Text: this is a big test
- Set based - Negation '^'
-
Problem: Find all occurrences of characters that are NOT (a or b)
-
Pattern [^ab]
-
Text: this is a big test
-
needs to be first character inside the set
- Pattern: [a^b] => indicates a set with members(a, b, ^) and will match a literal ^
- Text: this is a ^ big test
- Range of characters
- Problem: Find all occurrences of (a, b, c, d)
- Pattern: [a-d] (is equal to [abcd])
- Text: this is a definitive test
- Multiple range of characters
-
Problem: find all occurrences of (a, b, c, d, x, y, z, 0, 1, 2, 3)
-
Pattern: [a-dx-z0-3]
-
Text: x-ray 3 won't work for this test
-
Negate the whole range with ^
-
Problem: Find all occurrences of characters not in (a, b, c, d, x, y, z, 0, 1, 2, 3)
-
Pattern: [^a-dx-z0-3]
- Whild card character . <= Dot!
- Dot or full stop character matches every character except new line \n
- Dot may have a performance issue. Use carefullly.
- Escape with \
- Problem: Find all occurrences of '.'(dot)
- Pattern: .
- Text: This. Is a Test.
- Control Characters(tab, newline, carriage return and so forth)
- Problem: Find all occurrences of tab
- Pattern: \t
- Text: One .Two
- Anchors are special syntex used for specifying.
- Start of string or line
- End of string or line
- Word boundary
- And so forth...
- Search for text
- Problem: Find all occurrences of word 'log'
- Pattern: log
- Text: catalog of log
- Word boundary
- Pattern: \blog\b => is instruction to match only on word boundary
- Text: catalog of log
- Start of string or line ^
- Problem: Find occurrences of 'apple' at beginning of string or line
- Pattern: ^apple
- Text: apple Grows on apple tree
- ^ - Multi-line Text
- Pattern: ^apple
- Text: apple 1 grows on apple tree apple 2 grows on apple tree
- Internal String: "apple 1 grows on apple tree\r\napple 2 grows on apple tree\r\n" + Windows uses \r\n to represent new line => need to turn-on multi-line mode to interprete embedded lines
- ^ - Turn on multi-line mode (?m)
- Problem: Find occurrences of 'apple' at beginning of string or line
- Pattern: (?m)^apple
- Text: apple 1 grows on apple tree apple 2 grows on apple tree
- End of string or line $(matches end of string or \n)
- Problem: Find occurrences of 'apple' at end of string or line
- Patten: apple$
- Text: apple apple
- End of string or line $ (matches end of string or \n)
- Problem: Find occurrences of 'apple' at end of string or line
- Pattern: apple$
- Text: apple apple
- $ - Multi-line text
- Problem: Find occurrences of 'apple' at end of string or line
- Pattern: apple$
- Text: apple apple
- Internal String: "apple\r\napple"
- $ - Turn on multi-line mode (?m) and include \r as optional character
- Problem: Find occurrences of 'apple' at end of string or line
- Pattern: (?m)apple\r?$ ($ 사인은 \n 또는 end of string만 캐치한다. 하지만 윈도우즈는 \r\n을 줄바꿈으로 쓰므로 \r?이 필요)
- Text: apple appple
- Characer classes are readymade shortcuts that represents a set of characters
- Decimal Digit \d
- Problem: Check if valid decimal digit(0-9)
- Pattern 1: [0123456789]
- Pattern 2: [0-9]
- Pattern 3: \d
- Not a decimal digit: \D
- Word Character \w
- Problem: Check i a character is a valid letter of an alphabet (any language) or digit(숫자도 포함한다)
- Pattern: \w
- Text: F16, F18, ㄱ, ㄴ
- Not a character: \W
- Whtie space character \s
- Matches space, tab, carriage return, new line and so forth
- Problem: Check for white space character
- Pattern: \s
- Text: One tab space Two tab
- Not a white space character: \S
- Unicode category or Block \p{category}
-
Problem: Find occurrences of punctuation characters(구분문자)
-
Pattern: \p{P} => 구분문자 전체
-
Text: "one,two;three!FOUR?Five*" => ",;!?*"
-
Problem: Find uppercase characters
-
Pattern: \p{Lu} => 대문자 영어
-
Text: "one,two;three!FOUR?Five*"