Regular expressions are hard for humans. They are worse for LLMs. If your AI agent needs to validate input, extract patterns, or generate regex, it needs reliable tools. This guide shows you why LLMs fail at regex and how to equip your agent with deterministic regex capabilities via MCP.
Why LLMs Struggle with Regex
Regex is a formal language with precise syntax. LLMs generate regex by statistical pattern matching, not formal grammar understanding. This leads to several failure modes:
Escaping Errors
LLMs frequently get escaping wrong. A period (.) matches any character in regex, but LLMs often forget to escape it when matching literal periods:
// LLM generates for "match email domains"
[a-z]+.[a-z]+ // Wrong: . matches ANY character
[a-z]+\.[a-z]+ // Correct: \. matches literal period
// "test@gmailXcom" matches the wrong pattern!
Engine Incompatibility
Different regex engines support different features. LLMs may generate patterns that work in PCRE but fail in JavaScript:
// LLM might generate
(?<=@)\w+ // Lookbehind - works in PCRE, Python, modern JS
// Fails in older JavaScript
// May need to use
@(\w+) // Capture group - universal compatibility
Edge Case Failures
LLMs generate patterns that work for common cases but fail on edge cases:
// LLM-generated email pattern (simplified)
\w+@\w+\.\w+
// Matches: test@gmail.com
// Fails on:
// - test.name@gmail.com (period in local part)
// - test+filter@gmail.com (plus addressing)
// - test@sub.domain.com (multiple domain levels)
Regex Generation
TinyFn provides pre-built, tested regex patterns for common use cases:
Tool: regex/generate
Input: {
"type": "email",
"engine": "javascript"
}
Result: {
"type": "email",
"pattern": "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$",
"flags": "i",
"description": "Validates email addresses (RFC 5321 compliant)",
"examples": {
"valid": ["user@example.com", "test.name+tag@sub.domain.co.uk"],
"invalid": ["invalid", "@nodomain.com", "no@tld"]
}
}
Supported Pattern Types
| Type | Description |
|---|---|
email |
Email address validation |
url |
URL validation (HTTP/HTTPS) |
phone |
Phone number (international format) |
ip |
IPv4 address |
ipv6 |
IPv6 address |
date |
Date (ISO 8601 format) |
time |
Time (24-hour format) |
uuid |
UUID v4 format |
hex_color |
Hex color code |
credit_card |
Credit card number format |
ssn |
US Social Security Number |
zip_code |
US ZIP code |
Pattern Testing
Always test regex patterns before using them. TinyFn provides pattern testing tools:
Tool: regex/test
Input: {
"pattern": "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$",
"test_strings": [
"user@example.com",
"invalid.email",
"test@sub.domain.co.uk",
"@missing-local.com"
]
}
Result: {
"pattern": "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$",
"results": [
{ "input": "user@example.com", "matches": true },
{ "input": "invalid.email", "matches": false },
{ "input": "test@sub.domain.co.uk", "matches": true },
{ "input": "@missing-local.com", "matches": false }
]
}
Match Extraction
Tool: regex/match
Input: {
"pattern": "(\\d{3})-(\\d{3})-(\\d{4})",
"text": "Call us at 555-123-4567 or 555-987-6543",
"global": true
}
Result: {
"matches": [
{
"full": "555-123-4567",
"groups": ["555", "123", "4567"],
"index": 11
},
{
"full": "555-987-6543",
"groups": ["555", "987", "6543"],
"index": 27
}
]
}
Common Validation Patterns
Here are reliable patterns for common validation tasks:
Email Validation
Pattern: ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
Matches:
user@example.com
first.last@company.co.uk
user+tag@domain.com
Does not match:
@no-local.com
user@.com
user@domain
URL Validation
Pattern: ^https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)$
Matches:
https://example.com
http://www.example.com/path?query=value
https://sub.domain.com:8080/path
Does not match:
ftp://not-http.com
example.com (missing protocol)
https://no-tld
IP Address Validation
Pattern: ^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$
Matches:
192.168.1.1
10.0.0.255
0.0.0.0
Does not match:
256.1.1.1 (octet > 255)
192.168.1 (incomplete)
192.168.1.1.1 (too many octets)
Validation Workflows
Here is how an agent uses regex tools for input validation:
User: "Validate this user signup data:
Email: john.doe@gmail.com
Phone: 555-123-4567
Website: www.johndoe.com"
Agent workflow:
1. Get validation patterns
2. Test each field
3. Report results
Tool calls:
- regex/test {
"pattern": "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$",
"test_strings": ["john.doe@gmail.com"]
}
Result: { "matches": true }
- regex/test {
"pattern": "^\\+?[1-9]\\d{1,14}$|^\\d{3}-\\d{3}-\\d{4}$",
"test_strings": ["555-123-4567"]
}
Result: { "matches": true }
- regex/test {
"pattern": "^https?:\\/\\/(www\\.)?...",
"test_strings": ["www.johndoe.com"]
}
Result: { "matches": false }
Agent response:
"**Validation Results:**
| Field | Value | Valid |
|---------|-------------------|-------|
| Email | john.doe@gmail.com | Yes |
| Phone | 555-123-4567 | Yes |
| Website | www.johndoe.com | No |
**Issue:** The website URL is missing the protocol (http:// or https://).
Suggested fix: `https://www.johndoe.com`"
Data Extraction
User: "Find all phone numbers in this text:
Contact us at 555-123-4567, or reach our support at
(800) 555-0199. International: +1-555-867-5309"
Tool call:
- regex/match {
"pattern": "(\\+?1?[-.]?)?\\(?\\d{3}\\)?[-.]?\\d{3}[-.]?\\d{4}",
"text": "...",
"global": true
}
Result: {
"matches": [
{ "full": "555-123-4567", "index": 14 },
{ "full": "(800) 555-0199", "index": 52 },
{ "full": "+1-555-867-5309", "index": 82 }
]
}
Agent response:
"Found 3 phone numbers:
1. 555-123-4567
2. (800) 555-0199
3. +1-555-867-5309"
MCP Setup
Add TinyFn regex tools to your AI agent:
{
"mcpServers": {
"tinyfn-regex": {
"url": "https://api.tinyfn.io/mcp/regex",
"headers": {
"X-API-Key": "your-api-key"
}
}
}
}
Available Regex Tools
| Tool | Description |
|---|---|
regex/generate |
Get pre-built pattern for common types |
regex/test |
Test if strings match a pattern |
regex/match |
Extract matches and capture groups |
regex/validate |
Check if a pattern is syntactically valid |
regex/replace |
Replace pattern matches in text |
Best Practices
- Use pre-built patterns: For common validation (email, URL, phone), use TinyFn's tested patterns instead of generating new ones
- Always test: Before presenting a regex to users, test it with edge cases
- Specify engine: Different regex engines have different features; specify your target
- Handle escaping: When embedding patterns in code, remember to escape backslashes
Add Regex Tools to Your AI Agent
Get your free API key and give your agent reliable regex capabilities.
Get Free API Key