Regular Expressions // Pattern Matching Reference

01

Quick Reference

Essential regex syntax at a glance. Your go-to cheat sheet for common patterns and metacharacters.

Character Classes

.       Any character except newline
\d      Digit [0-9]
\D      Not digit [^0-9]
\w      Word character [a-zA-Z0-9_]
\W      Not word character
\s      Whitespace [ \t\r\n\f\v]
\S      Not whitespace
[abc]   Any of a, b, or c
[^abc]  Not a, b, or c
[a-z]   Range a through z

Quantifiers

*       0 or more (greedy)
+       1 or more (greedy)
?       0 or 1 (greedy)
{n}     Exactly n times
{n,}    n or more times
{n,m}   Between n and m times
*?      0 or more (lazy)
+?      1 or more (lazy)
??      0 or 1 (lazy)
*+      Possessive (PCRE/Java)

Anchors & Boundaries

^       Start of string/line
$       End of string/line
\b      Word boundary
\B      Not word boundary
\A      Start of string (always)
\Z      End of string (always)
\z      Absolute end (PCRE/Python)
\G      Start of search (Perl/PCRE)

Groups & Capturing

(...)           Capturing group
(?:...)         Non-capturing group
(?<name>...)    Named group (PCRE/.NET)
(?P<name>...)   Named group (Python)
\1, \2, ...     Backreference
\k<name>        Named backreference
(?>...)         Atomic group

Lookaround

(?=...)     Positive lookahead
(?!...)     Negative lookahead
(?<=...)    Positive lookbehind
(?<!...)    Negative lookbehind

Flags/Modifiers

i   Case-insensitive
g   Global (match all)
m   Multiline (^ $ match lines)
s   Dotall (. matches newline)
x   Extended (ignore whitespace)
u   Unicode

02

Basic Patterns

Foundational regex concepts: literals, metacharacters, and character classes.

Literals

Most characters match themselves literally. Regular text is interpreted character by character:

cat         Matches "cat"
hello       Matches "hello"
123         Matches "123"

Metacharacters

Special characters with reserved meanings. Must be escaped with backslash to match literally:

. ^ $ * + ? { } [ ] \ | ( )

To match these literally, prefix with backslash:

\.          Matches a literal dot
\$          Matches a dollar sign
\(hello\)   Matches "(hello)" literally
\*          Matches a literal asterisk

The Dot Metacharacter

The dot . matches any single character except newline:

.           Matches any single character except newline
a.c         Matches "abc", "a9c", "a c", etc.
.....       Matches any 5 characters

With the /s flag (dotall mode), the dot matches newlines too.

Character Classes

Define a set of characters to match. Use square brackets to create a class:

[abc]           Matches 'a', 'b', or 'c'
[aeiou]         Matches any vowel
[0-9]           Matches any digit
[a-z]           Matches any lowercase letter
[a-zA-Z]        Matches any letter
[a-zA-Z0-9]     Matches any alphanumeric character

Negated Character Classes

Use ^ at the start to negate:

[^abc]          Matches anything except 'a', 'b', or 'c'
[^0-9]          Matches anything except digits
[^\s]           Matches any non-whitespace character

Predefined Character Classes

\d      Digit [0-9]
\D      Non-digit [^0-9]
\w      Word character [a-zA-Z0-9_]
\W      Non-word character [^a-zA-Z0-9_]
\s      Whitespace [ \t\r\n\f\v]
\S      Non-whitespace [^ \t\r\n\f\v]
\h      Horizontal whitespace (PCRE)
\v      Vertical whitespace (PCRE)

03

Quantifiers

Control how many times an element repeats. Understanding greedy vs lazy is essential.

Basic Quantifiers

Quantifier	Meaning	Example	Matches
*	0 or more	a*	"", "a", "aa", "aaa", ...
+	1 or more	a+	"a", "aa", "aaa", ...
?	0 or 1	a?	"", "a"
{n}	Exactly n	a{3}	"aaa"
{n,}	n or more	a{2,}	"aa", "aaa", "aaaa", ...
{n,m}	Between n and m	a{2,4}	"aa", "aaa", "aaaa"

Greedy vs Lazy Quantifiers

Greedy (Default)

Match as much as possible. Quantifiers are greedy by default:

<.+>        In "<p>Hello</p>", matches entire "<p>Hello</p>"
\d+         In "12345", matches "12345"

Lazy (Non-Greedy)

Match as little as possible. Add ? after quantifier:

<.+?>       In "<p>Hello</p>", matches "<p>" and "</p>" separately
\d+?        In "12345" with global flag, matches "1", "2", "3", "4", "5"

Common lazy quantifiers:

*?      0 or more (lazy)
+?      1 or more (lazy)
??      0 or 1 (lazy)
{n,}?   n or more (lazy)
{n,m}?  Between n-m (lazy)

Practical Example:

Extract text between quotes:

// Greedy - matches from first to last quote
"(.*)"      In '"Hello" and "World"' → '"Hello" and "World"'

// Lazy - matches each quoted string separately
"(.*?)"     In '"Hello" and "World"' → '"Hello"' and '"World"'

Possessive Quantifiers

Available in PCRE, Java, and .NET. Never backtrack once matched:

*+      0 or more (possessive)
++      1 or more (possessive)
?+      0 or 1 (possessive)
{n,}+   n or more (possessive)
{n,m}+  Between n-m (possessive)

\d++\d never matches because possessive ++ consumes all digits without backtracking. Use for performance optimization and preventing catastrophic backtracking.

04

Anchors & Boundaries

Match positions, not characters. Zero-width assertions that anchor patterns to specific locations.

Line Anchors

^       Start of string (or line in multiline mode)
$       End of string (or line in multiline mode)

Examples:

^Hello      Matches "Hello" only at start of string
world$      Matches "world" only at end of string
^Hello$     Matches entire string "Hello" (nothing before or after)

Multiline Mode

Without /m flag:

^       Matches start of entire string
$       Matches end of entire string

With /m flag:

^       Matches start of string AND start of each line
$       Matches end of string AND end of each line

Multiline Text Example:

Line 1
Line 2
Line 3

/^Line/     Matches "Line 1" only (1 match)
/^Line/m    Matches "Line" at start of each line (3 matches)

Absolute Anchors

\A      Start of string (always, ignores multiline mode)
\Z      End of string before final newline (always)
\z      Absolute end of string (PCRE/Python)

Word Boundaries

\b      Word boundary (between \w and \W)
\B      Not word boundary

Examples:

\bcat\b     Matches "cat" in "the cat sat" but not in "concatenate"
\Bcat\B     Matches "cat" in "concatenate" but not in "the cat sat"
\bcat       Matches "cat" at start of word: "cat", "caterpillar"
cat\b       Matches "cat" at end of word: "cat", "bobcat"

Word boundary positions occur:

Between \w and \W
Between start of string and \w
Between \w and end of string

05

Groups & Capturing

Extract and reference matched substrings. Essential for complex pattern matching and replacements.

Capturing Groups

Parentheses create capturing groups that store matched text:

(abc)           Captures "abc"
(\d{4})         Captures 4 digits
([a-z]+)        Captures one or more lowercase letters

Date Matching:

(\d{4})-(\d{2})-(\d{2})

Matching "2026-02-09" captures:
  Group 1: "2026"
  Group 2: "02"
  Group 3: "09"

Access captured groups:

In replacement: $1, $2 (JavaScript, Perl) or \1, \2 (sed, vim)
In code: match.groups(), match[1], etc.

Non-Capturing Groups

Use (?:...) when you need grouping but don't need to capture:

(?:abc)+        Matches "abc", "abcabc", "abcabcabc", ... (doesn't capture)
(?:\d{4})-      Matches year followed by dash, doesn't capture year

Benefits of non-capturing groups:

Better performance (no capturing overhead)
Cleaner code when you don't need the value
Group numbering isn't affected

Named Capturing Groups

Python Syntax

(?P<name>pattern)       Define named group
(?P=name)               Backreference to named group

# Example
(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})
# Access: match.group('year'), match.group('month'), match.group('day')

PCRE/JavaScript/.NET Syntax

(?<name>pattern)        Define named group
\k<name>                Backreference to named group

// JavaScript Example
/(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/
// Access: match.groups.year, match.groups.month, match.groups.day

Backreferences

Reference previously captured groups within the same pattern:

\1, \2, \3, ...         Reference groups 1, 2, 3, ...
\k<name>                Reference named group (PCRE/JavaScript)
(?P=name)               Reference named group (Python)

Examples:

(\w+)\s+\1              Matches repeated words: "the the", "hello hello"
(["'])(.*?)\1           Matches quoted strings with same quote type
<([a-z]+)>.*?</\1>      Matches HTML tags: <p>...</p>, <div>...</div>

Matching Repeated Words:

\b(\w+)\s+\1\b          Matches "word word", "test test"

Atomic Groups

Prevent backtracking once matched (PCRE, Java, .NET):

(?>pattern)             Atomic group

(?>a+)b                 Never matches "aaaa" (possessive + consumes all 'a's)
a+b                     Can match "aaaa" followed by "b"

Use atomic groups for:

Performance optimization
Preventing catastrophic backtracking

06

Lookahead & Lookbehind

Zero-width assertions that match positions without consuming characters. Essential for complex validation.

Positive Lookahead (?=...)

Assert that pattern CAN be matched ahead:

\d(?=px)                Matches digit followed by "px" (doesn't include "px")
                        In "10px", matches "10"

q(?=u)                  Matches "q" followed by "u"
                        In "question", matches "q"

Negative Lookahead (?!...)

Assert that pattern CANNOT be matched ahead:

\d(?!px)                Matches digit NOT followed by "px"
                        In "10px 20em", matches "2" and "0"

q(?!u)                  Matches "q" NOT followed by "u"
                        In "Iraq", matches "q"

Positive Lookbehind (?<=...)

Assert that pattern CAN be matched behind:

(?<=\$)\d+              Matches digits preceded by "$"
                        In "$100", matches "100"

(?<=[a-z])[A-Z]         Matches uppercase letter after lowercase
                        In "testCase", matches "C"

Negative Lookbehind (?<!...)

Assert that pattern CANNOT be matched behind:

(?<!\$)\d+              Matches digits NOT preceded by "$"
                        In "$10 20", matches "20"

(?<![a-z])[A-Z]         Matches uppercase letter NOT after lowercase
                        In "TestCase", matches "T"

Practical Examples

Password Validation:

Minimum 8 chars, requires uppercase, lowercase, digit, special:

^(?=.*[A-Z])(?=.*[a-z])(?=.*\d)(?=.*[^A-Za-z0-9\s]).{8,}$

Breakdown:
  ^                             Start of string
  (?=.*[A-Z])                   Must contain uppercase
  (?=.*[a-z])                   Must contain lowercase
  (?=.*\d)                      Must contain digit
  (?=.*[^A-Za-z0-9\s])          Must contain special character
  .{8,}                         At least 8 characters total
  $                             End of string

Extract Price Without Currency Symbol:

(?<=\$)\d+(?:\.\d{2})?
Matches: "$10.53" → "10.53"

Add Spaces to camelCase:

(?<=[a-z])(?=[A-Z])
Position between: "testCase" → "test" + "Case"

Username Validation:

Alphanumeric, 3-16 chars, not all digits:

^(?!^\d+$)[a-zA-Z0-9]{3,16}$

Limitations

Fixed-width lookbehind: Many engines (JavaScript before ES2018, some PCRE versions) require lookbehind to be fixed-width:

(?<=\d{4})      OK (fixed width: 4)
(?<=\d+)        May fail (variable width)

Modern JavaScript (ES2018+) and Python support variable-width lookbehind.

07

Flags & Modes

Modifiers that change regex behavior. Syntax varies by language.

Common Flags

i - Case Insensitive

/hello/i

Matches "hello", "Hello", "HELLO", "HeLLo"

g - Global

/test/g

Match all occurrences, not just the first

m - Multiline

/^Line/m

^ and $ match line boundaries, not just string boundaries

s - Dotall

/.+/s

Dot matches newlines too (single-line mode)

x - Extended

/\d{4} # year/x

Ignore whitespace, allow comments

u - Unicode

/^\u{1F600}$/u

Enable Unicode features and proper code point matching

Flag Syntax by Language

JavaScript

/pattern/flags
new RegExp('pattern', 'flags')

/hello/i
/\d+/g
/^line/m

Python

import re
re.compile(r'pattern', re.FLAG)

re.IGNORECASE or re.I
re.MULTILINE or re.M
re.DOTALL or re.S
re.VERBOSE or re.X
re.UNICODE or re.U

# Example
re.compile(r'hello', re.I | re.M)

Perl/PCRE

/pattern/imsxg
m/pattern/imsxg

/hello/i
/\d+/g

Inline Modifiers

Set flags within the pattern itself:

(?i)case-insensitive    Turn on case-insensitive
(?-i)case-sensitive     Turn off case-insensitive
(?i:pattern)            Case-insensitive for this group only
(?ims)                  Multiple flags

Examples:

(?i)hello               Matches "hello", "Hello", "HELLO"
hello(?i)world          Only "world" is case-insensitive
(?i:hello) world        Only "hello" is case-insensitive

08

Flavor Differences

Different regex engines have varying features and syntax. Know your environment.

POSIX BRE

Used by: grep, sed, vi

Oldest flavor. Most metacharacters require backslash to be special. Very limited feature set.

\(group\)       Grouping
\{n,m\}         Quantifiers
\|              Alternation

POSIX ERE

Used by: egrep, awk, grep -E

Metacharacters don't need escaping. More modern syntax.

(group)         Grouping
{n,m}           Quantifiers
|               Alternation

PCRE

Used by: Perl, PHP, R, grep -P

Most powerful and feature-rich. Industry standard for advanced regex. Supports all modern features including lookaround, atomic groups, recursion.

JavaScript

Used by: Browsers, Node.js

ES2018+ added lookbehind, named groups, dotAll, Unicode properties. No atomic groups or possessive quantifiers.

Python

Used by: Python scripts

Excellent Unicode support, named groups, variable-width lookbehind. Different named group syntax: (?P<name>...)

Java

Used by: Java applications

Supports possessive quantifiers, atomic groups, lookaround. Good Unicode support.

Feature Comparison

Feature	BRE	ERE	PCRE	JavaScript	Python	Java
Basic matching	✓	✓	✓	✓	✓	✓
Character classes	✓	✓	✓	✓	✓	✓
Lazy quantifiers	✗	✗	✓	✓	✓	✓
Possessive quantifiers	✗	✗	✓	✗	✗	✓
Non-capturing groups	✗	✗	✓	✓	✓	✓
Named groups	✗	✗	✓	✓ (ES2018+)	✓	✓
Lookahead	✗	✗	✓	✓	✓	✓
Lookbehind	✗	✗	✓	✓ (ES2018+)	✓	✓
Atomic groups	✗	✗	✓	✗	✗	✓
Unicode	✗	✗	✓	✓	✓	✓

Choosing the Right Flavor

For portability: Use ERE (works in most Unix tools)
For power: Use PCRE (most features, widely supported)
For web: JavaScript (browser compatibility)
For scripting: Python or Perl (excellent string manipulation)
For performance-critical: Consider RE2 (Google's linear-time engine)

09

Common Patterns

Battle-tested regex patterns for everyday tasks. Copy, paste, adapt.

Email Validation

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

Breakdown:

[a-zA-Z0-9._%+-]+ - Local part (username)
@ - Literal @
[a-zA-Z0-9.-]+ - Domain name
\. - Literal dot
[a-zA-Z]{2,} - TLD (2+ letters)

Email validation is complex. For production, use a dedicated library. RFC 5322-compliant regex is hundreds of characters long.

URL Validation

^(https?:\/\/)?([\w-]+\.)+[a-zA-Z]{2,}(\/[\w\-\.~:/?#\[\]@!\$&'()\*\+,;=%.]*)?$

Breakdown:

(https?:\/\/)? - Optional protocol
([\w-]+\.)+ - Domain parts (subdomains)
[a-zA-Z]{2,} - TLD
(\/...)? - Optional path, query, fragment

Match URLs in text:

https?:\/\/[^\s]+

IP Address

IPv4:

^((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$

Breakdown:

25[0-5] - 250-255
2[0-4][0-9] - 200-249
[01]?[0-9][0-9]? - 0-199

Simpler (less strict):

^(\d{1,3}\.){3}\d{1,3}$

Date Formats

^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12][0-9]|3[01])$
// YYYY-MM-DD (ISO 8601)

^(0[1-9]|1[0-2])\/(0[1-9]|[12][0-9]|3[01])\/\d{4}$
// MM/DD/YYYY

^(0[1-9]|[12][0-9]|3[01])-(0[1-9]|1[0-2])-\d{4}$
// DD-MM-YYYY

These validate format only, not actual date validity (e.g., Feb 30). Use date libraries for real validation.

Phone Numbers

^(\+1[-.]?)?(\(?\d{3}\)?[-.]?)?\d{3}[-.]?\d{4}$

Matches:

555-1234
(555) 123-4567
+1-555-123-4567
555.123.4567

International E.164 format:

^\+[1-9]\d{1,14}$

Password Validation

^(?=.*[A-Z])(?=.*[a-z])(?=.*\d)(?=.*[^A-Za-z0-9\s]).{8,}$
// Min 8 chars, 1 uppercase, 1 lowercase, 1 digit, 1 special

^(?=.*[A-Za-z])(?=.*\d).{8,}$
// Min 8 chars, at least 1 letter and 1 digit

^\S{6,20}$
// No whitespace, 6-20 characters

Username Validation

^[a-zA-Z0-9_]{3,16}$
// Alphanumeric, 3-16 characters, underscores allowed

^[a-zA-Z][a-zA-Z0-9_-]{2,15}$
// Start with letter, alphanumeric + underscore + hyphen, 3-16 chars

HTML/XML

<([a-z]+)([^>]*)>(.*?)<\/\1>
// HTML tags with backreference

<!--.*?-->
// HTML comments

<[^>]*>
// Strip HTML tags

File Paths & Extensions

\.([a-zA-Z0-9]+)$
// File extension

^\/(?:[^\/\0]+\/?)*$
// Unix absolute path

^[a-zA-Z]:\\(?:[^\\/:*?"<>|\r\n]+\\)*[^\\/:*?"<>|\r\n]*$
// Windows path

Hex Color

^#?([a-fA-F0-9]{6}|[a-fA-F0-9]{3})$
// 3 or 6 digit hex color

Slug/URL-friendly String

^[a-z0-9]+(?:-[a-z0-9]+)*$
// Lowercase letters, numbers, hyphens

10

Performance

Optimize regex for speed and prevent catastrophic backtracking. Security matters.

Catastrophic Backtracking

Certain patterns cause exponential time complexity, making them vulnerable to ReDoS (Regular Expression Denial of Service) attacks.

Dangerous patterns - nested quantifiers:

(a+)+              Dangerous
(a*)*              Dangerous
(a+)*              Dangerous
(a|a)*             Dangerous
(a|b)*a            Can be dangerous with long non-matching input

Example Attack:

^(a+)+$

Input: "aaaaaaaaaaaaaaaaaaaX" (19 a's, then X)

The engine tries exponential combinations:
  - 19 groups of 1 'a' each
  - 1 group of 19 'a's
  - 9 groups of 2, 1 of 1
  - ... (2^19 combinations)

Result: Exponential time O(2^n)

Detecting Vulnerable Patterns

Red flags:

Nested quantifiers: (a+)+, (a*)*
Overlapping alternatives: (a|a)*, (a|ab)*
Optional groups repeated: (a?)*
Alternation with shared prefix: (abc|abd)*

Prevention Strategies

1. Use Atomic Groups

// Vulnerable
^(a+)+$

// Fixed with atomic group
^(?>a+)+$

2. Use Possessive Quantifiers

// Vulnerable
^(a+)+$

// Fixed with possessive quantifier (PCRE/Java)
^a++$

3. Rewrite Pattern

// Vulnerable
^(a+)+$

// Fixed - simplified
^a+$

// Vulnerable
^([a-zA-Z0-9])+@([a-zA-Z0-9])+$

// Fixed - removed unnecessary groups
^[a-zA-Z0-9]+@[a-zA-Z0-9]+$

4. Be Specific with Quantifiers

// Vulnerable - unbounded repetition
.*

// Better - bounded
.{0,100}

// Better - specific character class
[a-zA-Z0-9]{0,100}

Performance Best Practices

1. Anchor Your Regex

// Slow - engine must try every position
\d{4}-\d{2}-\d{2}

// Faster - engine knows to start at beginning
^\d{4}-\d{2}-\d{2}

2. Use Specific Character Classes

// Slower - . matches anything, more backtracking
.*?@.*

// Faster - specific character classes
[^@]+@[^@]+

3. Put More Specific Patterns First

// Slower
(a|abc)

// Faster - longer/more specific first
(abc|a)

4. Use Non-Capturing Groups When Possible

// Slower - unnecessary capturing
(\d+)\.(\d+)\.(\d+)\.(\d+)

// Faster - if you don't need captures
(?:\d+)\.(?:\d+)\.(?:\d+)\.(?:\d+)

// Fastest - no groups if structure is simple
\d+\.\d+\.\d+\.\d+

5. Avoid Greedy Matching When Possible

// Slower - greedy, lots of backtracking
<.*>

// Faster - lazy
<.*?>

// Fastest - specific negated class
<[^>]*>

6. Compile Regex Once, Reuse

// Slow - compiles on every iteration
for line in lines:
    if re.match(r'\d+', line):
        ...

// Fast - compile once
pattern = re.compile(r'\d+')
for line in lines:
    if pattern.match(line):
        ...

Linear-Time Regex Engines

RE2 (Google's engine):

Guarantees O(n) time complexity
No backtracking
Limitations: No backreferences, no lookahead/lookbehind
Used in: Go, Google Code Search
Safe from ReDoS attacks

When to use RE2:

User-provided regex patterns
Performance-critical applications
DoS attack prevention

11

Pro Tips

Advanced techniques, debugging strategies, and real-world usage patterns.

Regex in Command-Line Tools

grep

# Basic regex (BRE)
grep 'pattern' file.txt

# Extended regex (ERE)
grep -E 'pattern' file.txt
egrep 'pattern' file.txt

# Perl regex (PCRE) - if supported
grep -P 'pattern' file.txt

# Case-insensitive
grep -i 'pattern' file.txt

# Invert match (lines NOT matching)
grep -v 'pattern' file.txt

# Show line numbers
grep -n 'pattern' file.txt

# Recursive search
grep -r 'pattern' directory/

# Show only matching part
grep -o 'pattern' file.txt

sed

# Basic substitution (BRE)
sed 's/pattern/replacement/' file.txt

# Extended regex (ERE)
sed -E 's/pattern/replacement/' file.txt

# Global replacement (all occurrences per line)
sed 's/pattern/replacement/g' file.txt

# Case-insensitive (GNU sed)
sed 's/pattern/replacement/i' file.txt

# Backreferences
sed 's/\(word\)/[\1]/g' file.txt              # BRE
sed -E 's/(word)/[\1]/g' file.txt             # ERE

# Delete matching lines
sed '/pattern/d' file.txt

awk

# Match lines with ERE
awk '/pattern/' file.txt

# Pattern in condition
awk '$1 ~ /pattern/' file.txt

# Negation
awk '$1 !~ /pattern/' file.txt

# Substitution
awk '{gsub(/pattern/, "replacement"); print}' file.txt

Regex in Programming Languages

Python

import re

# Basic matching
match = re.search(r'pattern', text)
if match:
    print(match.group(0))

# Find all matches
matches = re.findall(r'\d+', text)

# Substitution
result = re.sub(r'pattern', 'replacement', text)

# Split
parts = re.split(r'[,;]', text)

# Compiled regex (better performance)
pattern = re.compile(r'\d+')
for line in lines:
    match = pattern.search(line)

# Named groups
pattern = r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})'
match = re.search(pattern, '2026-02-09')
print(match.group('year'))  # '2026'
print(match.groupdict())    # {'year': '2026', ...}

JavaScript

// Literal syntax
const regex = /pattern/flags;

// Constructor (useful for dynamic patterns)
const regex = new RegExp('pattern', 'flags');

// Test (returns boolean)
if (/\d+/.test(text)) {
  console.log('Contains digits');
}

// Match
const match = text.match(/\d+/);
if (match) {
  console.log(match[0]);  // Matched text
  console.log(match.index);  // Position
}

// Global match
const matches = text.match(/\d+/g);  // Array of all matches

// Replace
const result = text.replace(/pattern/g, 'replacement');

// Replace with function
const result = text.replace(/\d+/g, (match) => parseInt(match) * 2);

// Named groups (ES2018+)
const regex = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const match = text.match(regex);
console.log(match.groups.year);  // '2026'

Debugging Regex

Essential Tools:

regex101.com - Live testing, explanation, debugger, performance warnings
regexr.com - Visual highlighting, community patterns
regexper.com - Visualize regex as railroad diagram
debuggex.com - Visual regex tester

Debugging Techniques

// Python - verbose mode with comments
import re

pattern = re.compile(r'''
    ^                   # Start of string
    (?P<year>\d{4})     # Year (4 digits)
    -                   # Separator
    (?P<month>\d{2})    # Month (2 digits)
    -                   # Separator
    (?P<day>\d{2})      # Day (2 digits)
    $                   # End of string
''', re.VERBOSE)

Test Incrementally:

Start simple, add complexity step by step:

patterns = [
    r'\d{4}',                    # Just year
    r'\d{4}-\d{2}',              # Year-month
    r'\d{4}-\d{2}-\d{2}',        # Full date
    r'^\d{4}-\d{2}-\d{2}$',      # With anchors
]

Common Debugging Mistakes

Forgetting to Escape Metacharacters

example.com

Matches "exampleXcom" (. is any char). Use: example\.com

Greedy vs Lazy Quantifiers

<.+>

Matches entire "<p>Hello</p>". Use <.+?> for each tag separately.

Anchors in Multiline Text

/^line/

Only matches first line. Use /^line/m to match each line.

Forgetting Groups for Alternation

file.txt|jpg

Matches "file.txt" OR "jpg". Use file.(txt|jpg) for both extensions.

When to Use What

Use BRE when: Working with traditional Unix tools, maximum portability required
Use ERE when: Working with awk, grep -E, need basic modern features
Use PCRE when: Need advanced features (lookaround, atomic groups), performance critical
Use JavaScript regex when: Browser/Node.js environment, modern ES2018+ features available
Use Python regex when: Scripting and automation, need excellent Unicode support
Use RE2 when: User-provided patterns (security), performance guarantees required