← Tech Guides

Regular Expressions

Pattern Matching Reference

A comprehensive guide to regex syntax, patterns, and best practices across different flavors and implementations. Master the art of pattern matching.

01

Quick Reference

Essential regex syntax at a glance. Your go-to cheat sheet for common patterns and metacharacters.

Character Classes

.       Any character except newline
\d      Digit [0-9]
\D      Not digit [^0-9]
\w      Word character [a-zA-Z0-9_]
\W      Not word character
\s      Whitespace [ \t\r\n\f\v]
\S      Not whitespace
[abc]   Any of a, b, or c
[^abc]  Not a, b, or c
[a-z]   Range a through z

Quantifiers

*       0 or more (greedy)
+       1 or more (greedy)
?       0 or 1 (greedy)
{n}     Exactly n times
{n,}    n or more times
{n,m}   Between n and m times
*?      0 or more (lazy)
+?      1 or more (lazy)
??      0 or 1 (lazy)
*+      Possessive (PCRE/Java)

Anchors & Boundaries

^       Start of string/line
$       End of string/line
\b      Word boundary
\B      Not word boundary
\A      Start of string (always)
\Z      End of string (always)
\z      Absolute end (PCRE/Python)
\G      Start of search (Perl/PCRE)

Groups & Capturing

(...)           Capturing group
(?:...)         Non-capturing group
(?<name>...)    Named group (PCRE/.NET)
(?P<name>...)   Named group (Python)
\1, \2, ...     Backreference
\k<name>        Named backreference
(?>...)         Atomic group

Lookaround

(?=...)     Positive lookahead
(?!...)     Negative lookahead
(?<=...)    Positive lookbehind
(?<!...)    Negative lookbehind

Flags/Modifiers

i   Case-insensitive
g   Global (match all)
m   Multiline (^ $ match lines)
s   Dotall (. matches newline)
x   Extended (ignore whitespace)
u   Unicode
02

Basic Patterns

Foundational regex concepts: literals, metacharacters, and character classes.

Literals

Most characters match themselves literally. Regular text is interpreted character by character:

cat         Matches "cat"
hello       Matches "hello"
123         Matches "123"

Metacharacters

Special characters with reserved meanings. Must be escaped with backslash to match literally:

. ^ $ * + ? { } [ ] \ | ( )

To match these literally, prefix with backslash:

\.          Matches a literal dot
\$          Matches a dollar sign
\(hello\)   Matches "(hello)" literally
\*          Matches a literal asterisk

The Dot Metacharacter

The dot . matches any single character except newline:

.           Matches any single character except newline
a.c         Matches "abc", "a9c", "a c", etc.
.....       Matches any 5 characters
With the /s flag (dotall mode), the dot matches newlines too.

Character Classes

Define a set of characters to match. Use square brackets to create a class:

[abc]           Matches 'a', 'b', or 'c'
[aeiou]         Matches any vowel
[0-9]           Matches any digit
[a-z]           Matches any lowercase letter
[a-zA-Z]        Matches any letter
[a-zA-Z0-9]     Matches any alphanumeric character

Negated Character Classes

Use ^ at the start to negate:

[^abc]          Matches anything except 'a', 'b', or 'c'
[^0-9]          Matches anything except digits
[^\s]           Matches any non-whitespace character

Predefined Character Classes

\d      Digit [0-9]
\D      Non-digit [^0-9]
\w      Word character [a-zA-Z0-9_]
\W      Non-word character [^a-zA-Z0-9_]
\s      Whitespace [ \t\r\n\f\v]
\S      Non-whitespace [^ \t\r\n\f\v]
\h      Horizontal whitespace (PCRE)
\v      Vertical whitespace (PCRE)
03

Quantifiers

Control how many times an element repeats. Understanding greedy vs lazy is essential.

Basic Quantifiers

Quantifier Meaning Example Matches
* 0 or more a* "", "a", "aa", "aaa", ...
+ 1 or more a+ "a", "aa", "aaa", ...
? 0 or 1 a? "", "a"
{n} Exactly n a{3} "aaa"
{n,} n or more a{2,} "aa", "aaa", "aaaa", ...
{n,m} Between n and m a{2,4} "aa", "aaa", "aaaa"

Greedy vs Lazy Quantifiers

Greedy (Default)

Match as much as possible. Quantifiers are greedy by default:

<.+>        In "<p>Hello</p>", matches entire "<p>Hello</p>"
\d+         In "12345", matches "12345"

Lazy (Non-Greedy)

Match as little as possible. Add ? after quantifier:

<.+?>       In "<p>Hello</p>", matches "<p>" and "</p>" separately
\d+?        In "12345" with global flag, matches "1", "2", "3", "4", "5"

Common lazy quantifiers:

*?      0 or more (lazy)
+?      1 or more (lazy)
??      0 or 1 (lazy)
{n,}?   n or more (lazy)
{n,m}?  Between n-m (lazy)
Practical Example:

Extract text between quotes:

// Greedy - matches from first to last quote
"(.*)"      In '"Hello" and "World"' → '"Hello" and "World"'

// Lazy - matches each quoted string separately
"(.*?)"     In '"Hello" and "World"' → '"Hello"' and '"World"'

Possessive Quantifiers

Available in PCRE, Java, and .NET. Never backtrack once matched:

*+      0 or more (possessive)
++      1 or more (possessive)
?+      0 or 1 (possessive)
{n,}+   n or more (possessive)
{n,m}+  Between n-m (possessive)
\d++\d never matches because possessive ++ consumes all digits without backtracking. Use for performance optimization and preventing catastrophic backtracking.
04

Anchors & Boundaries

Match positions, not characters. Zero-width assertions that anchor patterns to specific locations.

Line Anchors

^       Start of string (or line in multiline mode)
$       End of string (or line in multiline mode)

Examples:

^Hello      Matches "Hello" only at start of string
world$      Matches "world" only at end of string
^Hello$     Matches entire string "Hello" (nothing before or after)

Multiline Mode

Without /m flag:

^       Matches start of entire string
$       Matches end of entire string

With /m flag:

^       Matches start of string AND start of each line
$       Matches end of string AND end of each line
Multiline Text Example:
Line 1
Line 2
Line 3
/^Line/     Matches "Line 1" only (1 match)
/^Line/m    Matches "Line" at start of each line (3 matches)

Absolute Anchors

\A      Start of string (always, ignores multiline mode)
\Z      End of string before final newline (always)
\z      Absolute end of string (PCRE/Python)

Word Boundaries

\b      Word boundary (between \w and \W)
\B      Not word boundary

Examples:

\bcat\b     Matches "cat" in "the cat sat" but not in "concatenate"
\Bcat\B     Matches "cat" in "concatenate" but not in "the cat sat"
\bcat       Matches "cat" at start of word: "cat", "caterpillar"
cat\b       Matches "cat" at end of word: "cat", "bobcat"
Word boundary positions occur:
  • Between \w and \W
  • Between start of string and \w
  • Between \w and end of string
05

Groups & Capturing

Extract and reference matched substrings. Essential for complex pattern matching and replacements.

Capturing Groups

Parentheses create capturing groups that store matched text:

(abc)           Captures "abc"
(\d{4})         Captures 4 digits
([a-z]+)        Captures one or more lowercase letters
Date Matching:
(\d{4})-(\d{2})-(\d{2})

Matching "2026-02-09" captures:
  Group 1: "2026"
  Group 2: "02"
  Group 3: "09"

Access captured groups:

  • In replacement: $1, $2 (JavaScript, Perl) or \1, \2 (sed, vim)
  • In code: match.groups(), match[1], etc.

Non-Capturing Groups

Use (?:...) when you need grouping but don't need to capture:

(?:abc)+        Matches "abc", "abcabc", "abcabcabc", ... (doesn't capture)
(?:\d{4})-      Matches year followed by dash, doesn't capture year
Benefits of non-capturing groups:
  • Better performance (no capturing overhead)
  • Cleaner code when you don't need the value
  • Group numbering isn't affected

Named Capturing Groups

Python Syntax

(?P<name>pattern)       Define named group
(?P=name)               Backreference to named group

# Example
(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})
# Access: match.group('year'), match.group('month'), match.group('day')

PCRE/JavaScript/.NET Syntax

(?<name>pattern)        Define named group
\k<name>                Backreference to named group

// JavaScript Example
/(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/
// Access: match.groups.year, match.groups.month, match.groups.day

Backreferences

Reference previously captured groups within the same pattern:

\1, \2, \3, ...         Reference groups 1, 2, 3, ...
\k<name>                Reference named group (PCRE/JavaScript)
(?P=name)               Reference named group (Python)

Examples:

(\w+)\s+\1              Matches repeated words: "the the", "hello hello"
(["'])(.*?)\1           Matches quoted strings with same quote type
<([a-z]+)>.*?</\1>      Matches HTML tags: <p>...</p>, <div>...</div>
Matching Repeated Words:
\b(\w+)\s+\1\b          Matches "word word", "test test"

Atomic Groups

Prevent backtracking once matched (PCRE, Java, .NET):

(?>pattern)             Atomic group

(?>a+)b                 Never matches "aaaa" (possessive + consumes all 'a's)
a+b                     Can match "aaaa" followed by "b"
Use atomic groups for:
  • Performance optimization
  • Preventing catastrophic backtracking
06

Lookahead & Lookbehind

Zero-width assertions that match positions without consuming characters. Essential for complex validation.

Positive Lookahead (?=...)

Assert that pattern CAN be matched ahead:

\d(?=px)                Matches digit followed by "px" (doesn't include "px")
                        In "10px", matches "10"

q(?=u)                  Matches "q" followed by "u"
                        In "question", matches "q"

Negative Lookahead (?!...)

Assert that pattern CANNOT be matched ahead:

\d(?!px)                Matches digit NOT followed by "px"
                        In "10px 20em", matches "2" and "0"

q(?!u)                  Matches "q" NOT followed by "u"
                        In "Iraq", matches "q"

Positive Lookbehind (?<=...)

Assert that pattern CAN be matched behind:

(?<=\$)\d+              Matches digits preceded by "$"
                        In "$100", matches "100"

(?<=[a-z])[A-Z]         Matches uppercase letter after lowercase
                        In "testCase", matches "C"

Negative Lookbehind (?<!...)

Assert that pattern CANNOT be matched behind:

(?<!\$)\d+              Matches digits NOT preceded by "$"
                        In "$10 20", matches "20"

(?<![a-z])[A-Z]         Matches uppercase letter NOT after lowercase
                        In "TestCase", matches "T"

Practical Examples

Password Validation:

Minimum 8 chars, requires uppercase, lowercase, digit, special:

^(?=.*[A-Z])(?=.*[a-z])(?=.*\d)(?=.*[^A-Za-z0-9\s]).{8,}$

Breakdown:
  ^                             Start of string
  (?=.*[A-Z])                   Must contain uppercase
  (?=.*[a-z])                   Must contain lowercase
  (?=.*\d)                      Must contain digit
  (?=.*[^A-Za-z0-9\s])          Must contain special character
  .{8,}                         At least 8 characters total
  $                             End of string
Extract Price Without Currency Symbol:
(?<=\$)\d+(?:\.\d{2})?
Matches: "$10.53" → "10.53"
Add Spaces to camelCase:
(?<=[a-z])(?=[A-Z])
Position between: "testCase" → "test" + "Case"
Username Validation:

Alphanumeric, 3-16 chars, not all digits:

^(?!^\d+$)[a-zA-Z0-9]{3,16}$

Limitations

Fixed-width lookbehind: Many engines (JavaScript before ES2018, some PCRE versions) require lookbehind to be fixed-width:
(?<=\d{4})      OK (fixed width: 4)
(?<=\d+)        May fail (variable width)
Modern JavaScript (ES2018+) and Python support variable-width lookbehind.
07

Flags & Modes

Modifiers that change regex behavior. Syntax varies by language.

Common Flags

i - Case Insensitive
/hello/i
Matches "hello", "Hello", "HELLO", "HeLLo"
g - Global
/test/g
Match all occurrences, not just the first
m - Multiline
/^Line/m
^ and $ match line boundaries, not just string boundaries
s - Dotall
/.+/s
Dot matches newlines too (single-line mode)
x - Extended
/\d{4} # year/x
Ignore whitespace, allow comments
u - Unicode
/^\u{1F600}$/u
Enable Unicode features and proper code point matching

Flag Syntax by Language

JavaScript

/pattern/flags
new RegExp('pattern', 'flags')

/hello/i
/\d+/g
/^line/m

Python

import re
re.compile(r'pattern', re.FLAG)

re.IGNORECASE or re.I
re.MULTILINE or re.M
re.DOTALL or re.S
re.VERBOSE or re.X
re.UNICODE or re.U

# Example
re.compile(r'hello', re.I | re.M)

Perl/PCRE

/pattern/imsxg
m/pattern/imsxg

/hello/i
/\d+/g

Inline Modifiers

Set flags within the pattern itself:

(?i)case-insensitive    Turn on case-insensitive
(?-i)case-sensitive     Turn off case-insensitive
(?i:pattern)            Case-insensitive for this group only
(?ims)                  Multiple flags

Examples:

(?i)hello               Matches "hello", "Hello", "HELLO"
hello(?i)world          Only "world" is case-insensitive
(?i:hello) world        Only "hello" is case-insensitive
08

Flavor Differences

Different regex engines have varying features and syntax. Know your environment.

POSIX BRE

Used by: grep, sed, vi
Oldest flavor. Most metacharacters require backslash to be special. Very limited feature set.
\(group\)       Grouping
\{n,m\}         Quantifiers
\|              Alternation

POSIX ERE

Used by: egrep, awk, grep -E
Metacharacters don't need escaping. More modern syntax.
(group)         Grouping
{n,m}           Quantifiers
|               Alternation

PCRE

Used by: Perl, PHP, R, grep -P
Most powerful and feature-rich. Industry standard for advanced regex. Supports all modern features including lookaround, atomic groups, recursion.

JavaScript

Used by: Browsers, Node.js
ES2018+ added lookbehind, named groups, dotAll, Unicode properties. No atomic groups or possessive quantifiers.

Python

Used by: Python scripts
Excellent Unicode support, named groups, variable-width lookbehind. Different named group syntax: (?P<name>...)

Java

Used by: Java applications
Supports possessive quantifiers, atomic groups, lookaround. Good Unicode support.

Feature Comparison

Feature BRE ERE PCRE JavaScript Python Java
Basic matching
Character classes
Lazy quantifiers
Possessive quantifiers
Non-capturing groups
Named groups ✓ (ES2018+)
Lookahead
Lookbehind ✓ (ES2018+)
Atomic groups
Unicode

Choosing the Right Flavor

  • For portability: Use ERE (works in most Unix tools)
  • For power: Use PCRE (most features, widely supported)
  • For web: JavaScript (browser compatibility)
  • For scripting: Python or Perl (excellent string manipulation)
  • For performance-critical: Consider RE2 (Google's linear-time engine)
09

Common Patterns

Battle-tested regex patterns for everyday tasks. Copy, paste, adapt.

Email Validation

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

Breakdown:

  • [a-zA-Z0-9._%+-]+ - Local part (username)
  • @ - Literal @
  • [a-zA-Z0-9.-]+ - Domain name
  • \. - Literal dot
  • [a-zA-Z]{2,} - TLD (2+ letters)
Email validation is complex. For production, use a dedicated library. RFC 5322-compliant regex is hundreds of characters long.

URL Validation

^(https?:\/\/)?([\w-]+\.)+[a-zA-Z]{2,}(\/[\w\-\.~:/?#\[\]@!\$&'()\*\+,;=%.]*)?$

Breakdown:

  • (https?:\/\/)? - Optional protocol
  • ([\w-]+\.)+ - Domain parts (subdomains)
  • [a-zA-Z]{2,} - TLD
  • (\/...)? - Optional path, query, fragment

Match URLs in text:

https?:\/\/[^\s]+

IP Address

IPv4:

^((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$

Breakdown:

  • 25[0-5] - 250-255
  • 2[0-4][0-9] - 200-249
  • [01]?[0-9][0-9]? - 0-199

Simpler (less strict):

^(\d{1,3}\.){3}\d{1,3}$

Date Formats

^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12][0-9]|3[01])$
// YYYY-MM-DD (ISO 8601)
^(0[1-9]|1[0-2])\/(0[1-9]|[12][0-9]|3[01])\/\d{4}$
// MM/DD/YYYY
^(0[1-9]|[12][0-9]|3[01])-(0[1-9]|1[0-2])-\d{4}$
// DD-MM-YYYY
These validate format only, not actual date validity (e.g., Feb 30). Use date libraries for real validation.

Phone Numbers

^(\+1[-.]?)?(\(?\d{3}\)?[-.]?)?\d{3}[-.]?\d{4}$

Matches:

  • 555-1234
  • (555) 123-4567
  • +1-555-123-4567
  • 555.123.4567

International E.164 format:

^\+[1-9]\d{1,14}$

Password Validation

^(?=.*[A-Z])(?=.*[a-z])(?=.*\d)(?=.*[^A-Za-z0-9\s]).{8,}$
// Min 8 chars, 1 uppercase, 1 lowercase, 1 digit, 1 special
^(?=.*[A-Za-z])(?=.*\d).{8,}$
// Min 8 chars, at least 1 letter and 1 digit
^\S{6,20}$
// No whitespace, 6-20 characters

Username Validation

^[a-zA-Z0-9_]{3,16}$
// Alphanumeric, 3-16 characters, underscores allowed
^[a-zA-Z][a-zA-Z0-9_-]{2,15}$
// Start with letter, alphanumeric + underscore + hyphen, 3-16 chars

HTML/XML

<([a-z]+)([^>]*)>(.*?)<\/\1>
// HTML tags with backreference
<!--.*?-->
// HTML comments
<[^>]*>
// Strip HTML tags

File Paths & Extensions

\.([a-zA-Z0-9]+)$
// File extension
^\/(?:[^\/\0]+\/?)*$
// Unix absolute path
^[a-zA-Z]:\\(?:[^\\/:*?"<>|\r\n]+\\)*[^\\/:*?"<>|\r\n]*$
// Windows path

Hex Color

^#?([a-fA-F0-9]{6}|[a-fA-F0-9]{3})$
// 3 or 6 digit hex color

Slug/URL-friendly String

^[a-z0-9]+(?:-[a-z0-9]+)*$
// Lowercase letters, numbers, hyphens
10

Performance

Optimize regex for speed and prevent catastrophic backtracking. Security matters.

Catastrophic Backtracking

Certain patterns cause exponential time complexity, making them vulnerable to ReDoS (Regular Expression Denial of Service) attacks.

Dangerous patterns - nested quantifiers:
(a+)+              Dangerous
(a*)*              Dangerous
(a+)*              Dangerous
(a|a)*             Dangerous
(a|b)*a            Can be dangerous with long non-matching input
Example Attack:
^(a+)+$

Input: "aaaaaaaaaaaaaaaaaaaX" (19 a's, then X)

The engine tries exponential combinations:
  - 19 groups of 1 'a' each
  - 1 group of 19 'a's
  - 9 groups of 2, 1 of 1
  - ... (2^19 combinations)

Result: Exponential time O(2^n)

Detecting Vulnerable Patterns

Red flags:

  • Nested quantifiers: (a+)+, (a*)*
  • Overlapping alternatives: (a|a)*, (a|ab)*
  • Optional groups repeated: (a?)*
  • Alternation with shared prefix: (abc|abd)*

Prevention Strategies

1. Use Atomic Groups

// Vulnerable
^(a+)+$

// Fixed with atomic group
^(?>a+)+$

2. Use Possessive Quantifiers

// Vulnerable
^(a+)+$

// Fixed with possessive quantifier (PCRE/Java)
^a++$

3. Rewrite Pattern

// Vulnerable
^(a+)+$

// Fixed - simplified
^a+$

// Vulnerable
^([a-zA-Z0-9])+@([a-zA-Z0-9])+$

// Fixed - removed unnecessary groups
^[a-zA-Z0-9]+@[a-zA-Z0-9]+$

4. Be Specific with Quantifiers

// Vulnerable - unbounded repetition
.*

// Better - bounded
.{0,100}

// Better - specific character class
[a-zA-Z0-9]{0,100}

Performance Best Practices

1. Anchor Your Regex

// Slow - engine must try every position
\d{4}-\d{2}-\d{2}

// Faster - engine knows to start at beginning
^\d{4}-\d{2}-\d{2}

2. Use Specific Character Classes

// Slower - . matches anything, more backtracking
.*?@.*

// Faster - specific character classes
[^@]+@[^@]+

3. Put More Specific Patterns First

// Slower
(a|abc)

// Faster - longer/more specific first
(abc|a)

4. Use Non-Capturing Groups When Possible

// Slower - unnecessary capturing
(\d+)\.(\d+)\.(\d+)\.(\d+)

// Faster - if you don't need captures
(?:\d+)\.(?:\d+)\.(?:\d+)\.(?:\d+)

// Fastest - no groups if structure is simple
\d+\.\d+\.\d+\.\d+

5. Avoid Greedy Matching When Possible

// Slower - greedy, lots of backtracking
<.*>

// Faster - lazy
<.*?>

// Fastest - specific negated class
<[^>]*>

6. Compile Regex Once, Reuse

// Slow - compiles on every iteration
for line in lines:
    if re.match(r'\d+', line):
        ...

// Fast - compile once
pattern = re.compile(r'\d+')
for line in lines:
    if pattern.match(line):
        ...

Linear-Time Regex Engines

RE2 (Google's engine):
  • Guarantees O(n) time complexity
  • No backtracking
  • Limitations: No backreferences, no lookahead/lookbehind
  • Used in: Go, Google Code Search
  • Safe from ReDoS attacks

When to use RE2:

  • User-provided regex patterns
  • Performance-critical applications
  • DoS attack prevention
11

Pro Tips

Advanced techniques, debugging strategies, and real-world usage patterns.

Regex in Command-Line Tools

grep

# Basic regex (BRE)
grep 'pattern' file.txt

# Extended regex (ERE)
grep -E 'pattern' file.txt
egrep 'pattern' file.txt

# Perl regex (PCRE) - if supported
grep -P 'pattern' file.txt

# Case-insensitive
grep -i 'pattern' file.txt

# Invert match (lines NOT matching)
grep -v 'pattern' file.txt

# Show line numbers
grep -n 'pattern' file.txt

# Recursive search
grep -r 'pattern' directory/

# Show only matching part
grep -o 'pattern' file.txt

sed

# Basic substitution (BRE)
sed 's/pattern/replacement/' file.txt

# Extended regex (ERE)
sed -E 's/pattern/replacement/' file.txt

# Global replacement (all occurrences per line)
sed 's/pattern/replacement/g' file.txt

# Case-insensitive (GNU sed)
sed 's/pattern/replacement/i' file.txt

# Backreferences
sed 's/\(word\)/[\1]/g' file.txt              # BRE
sed -E 's/(word)/[\1]/g' file.txt             # ERE

# Delete matching lines
sed '/pattern/d' file.txt

awk

# Match lines with ERE
awk '/pattern/' file.txt

# Pattern in condition
awk '$1 ~ /pattern/' file.txt

# Negation
awk '$1 !~ /pattern/' file.txt

# Substitution
awk '{gsub(/pattern/, "replacement"); print}' file.txt

Regex in Programming Languages

Python

import re

# Basic matching
match = re.search(r'pattern', text)
if match:
    print(match.group(0))

# Find all matches
matches = re.findall(r'\d+', text)

# Substitution
result = re.sub(r'pattern', 'replacement', text)

# Split
parts = re.split(r'[,;]', text)

# Compiled regex (better performance)
pattern = re.compile(r'\d+')
for line in lines:
    match = pattern.search(line)

# Named groups
pattern = r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})'
match = re.search(pattern, '2026-02-09')
print(match.group('year'))  # '2026'
print(match.groupdict())    # {'year': '2026', ...}

JavaScript

// Literal syntax
const regex = /pattern/flags;

// Constructor (useful for dynamic patterns)
const regex = new RegExp('pattern', 'flags');

// Test (returns boolean)
if (/\d+/.test(text)) {
  console.log('Contains digits');
}

// Match
const match = text.match(/\d+/);
if (match) {
  console.log(match[0]);  // Matched text
  console.log(match.index);  // Position
}

// Global match
const matches = text.match(/\d+/g);  // Array of all matches

// Replace
const result = text.replace(/pattern/g, 'replacement');

// Replace with function
const result = text.replace(/\d+/g, (match) => parseInt(match) * 2);

// Named groups (ES2018+)
const regex = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const match = text.match(regex);
console.log(match.groups.year);  // '2026'

Debugging Regex

Essential Tools:

  • regex101.com - Live testing, explanation, debugger, performance warnings
  • regexr.com - Visual highlighting, community patterns
  • regexper.com - Visualize regex as railroad diagram
  • debuggex.com - Visual regex tester

Debugging Techniques

// Python - verbose mode with comments
import re

pattern = re.compile(r'''
    ^                   # Start of string
    (?P<year>\d{4})     # Year (4 digits)
    -                   # Separator
    (?P<month>\d{2})    # Month (2 digits)
    -                   # Separator
    (?P<day>\d{2})      # Day (2 digits)
    $                   # End of string
''', re.VERBOSE)

Test Incrementally:

Start simple, add complexity step by step:

patterns = [
    r'\d{4}',                    # Just year
    r'\d{4}-\d{2}',              # Year-month
    r'\d{4}-\d{2}-\d{2}',        # Full date
    r'^\d{4}-\d{2}-\d{2}$',      # With anchors
]

Common Debugging Mistakes

Forgetting to Escape Metacharacters
example.com
Matches "exampleXcom" (. is any char). Use: example\.com
Greedy vs Lazy Quantifiers
<.+>
Matches entire "<p>Hello</p>". Use <.+?> for each tag separately.
Anchors in Multiline Text
/^line/
Only matches first line. Use /^line/m to match each line.
Forgetting Groups for Alternation
file.txt|jpg
Matches "file.txt" OR "jpg". Use file.(txt|jpg) for both extensions.

When to Use What

  • Use BRE when: Working with traditional Unix tools, maximum portability required
  • Use ERE when: Working with awk, grep -E, need basic modern features
  • Use PCRE when: Need advanced features (lookaround, atomic groups), performance critical
  • Use JavaScript regex when: Browser/Node.js environment, modern ES2018+ features available
  • Use Python regex when: Scripting and automation, need excellent Unicode support
  • Use RE2 when: User-provided patterns (security), performance guarantees required