Regex Tools for AI Agents via MCP

Regular expressions are hard for humans. They are worse for LLMs. If your AI agent needs to validate input, extract patterns, or generate regex, it needs reliable tools. This guide shows you why LLMs fail at regex and how to equip your agent with deterministic regex capabilities via MCP.

Why LLMs Struggle with Regex

Regex is a formal language with precise syntax. LLMs generate regex by statistical pattern matching, not formal grammar understanding. This leads to several failure modes:

Escaping Errors

LLMs frequently get escaping wrong. A period (.) matches any character in regex, but LLMs often forget to escape it when matching literal periods:

Common LLM Mistake
// LLM generates for "match email domains"
[a-z]+.[a-z]+           // Wrong: . matches ANY character
[a-z]+\.[a-z]+          // Correct: \. matches literal period

// "test@gmailXcom" matches the wrong pattern!

Engine Incompatibility

Different regex engines support different features. LLMs may generate patterns that work in PCRE but fail in JavaScript:

Engine Differences
// LLM might generate
(?<=@)\w+               // Lookbehind - works in PCRE, Python, modern JS
                        // Fails in older JavaScript

// May need to use
@(\w+)                  // Capture group - universal compatibility

Edge Case Failures

LLMs generate patterns that work for common cases but fail on edge cases:

Edge Case Issues
// LLM-generated email pattern (simplified)
\w+@\w+\.\w+

// Matches: test@gmail.com
// Fails on:
//   - test.name@gmail.com (period in local part)
//   - test+filter@gmail.com (plus addressing)
//   - test@sub.domain.com (multiple domain levels)
The Solution: Use pre-validated, tested regex patterns from TinyFn instead of generating them from scratch. When generation is needed, always test the output against edge cases.

Regex Generation

TinyFn provides pre-built, tested regex patterns for common use cases:

MCP Tool Call
Tool: regex/generate
Input: {
  "type": "email",
  "engine": "javascript"
}

Result: {
  "type": "email",
  "pattern": "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$",
  "flags": "i",
  "description": "Validates email addresses (RFC 5321 compliant)",
  "examples": {
    "valid": ["user@example.com", "test.name+tag@sub.domain.co.uk"],
    "invalid": ["invalid", "@nodomain.com", "no@tld"]
  }
}

Supported Pattern Types

Type Description
email Email address validation
url URL validation (HTTP/HTTPS)
phone Phone number (international format)
ip IPv4 address
ipv6 IPv6 address
date Date (ISO 8601 format)
time Time (24-hour format)
uuid UUID v4 format
hex_color Hex color code
credit_card Credit card number format
ssn US Social Security Number
zip_code US ZIP code

Pattern Testing

Always test regex patterns before using them. TinyFn provides pattern testing tools:

MCP Tool Call - Test Pattern
Tool: regex/test
Input: {
  "pattern": "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$",
  "test_strings": [
    "user@example.com",
    "invalid.email",
    "test@sub.domain.co.uk",
    "@missing-local.com"
  ]
}

Result: {
  "pattern": "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$",
  "results": [
    { "input": "user@example.com", "matches": true },
    { "input": "invalid.email", "matches": false },
    { "input": "test@sub.domain.co.uk", "matches": true },
    { "input": "@missing-local.com", "matches": false }
  ]
}

Match Extraction

MCP Tool Call - Extract Matches
Tool: regex/match
Input: {
  "pattern": "(\\d{3})-(\\d{3})-(\\d{4})",
  "text": "Call us at 555-123-4567 or 555-987-6543",
  "global": true
}

Result: {
  "matches": [
    {
      "full": "555-123-4567",
      "groups": ["555", "123", "4567"],
      "index": 11
    },
    {
      "full": "555-987-6543",
      "groups": ["555", "987", "6543"],
      "index": 27
    }
  ]
}

Common Validation Patterns

Here are reliable patterns for common validation tasks:

Email Validation

Email Pattern
Pattern: ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

Matches:
  user@example.com
  first.last@company.co.uk
  user+tag@domain.com

Does not match:
  @no-local.com
  user@.com
  user@domain

URL Validation

URL Pattern
Pattern: ^https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)$

Matches:
  https://example.com
  http://www.example.com/path?query=value
  https://sub.domain.com:8080/path

Does not match:
  ftp://not-http.com
  example.com (missing protocol)
  https://no-tld

IP Address Validation

IPv4 Pattern
Pattern: ^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$

Matches:
  192.168.1.1
  10.0.0.255
  0.0.0.0

Does not match:
  256.1.1.1 (octet > 255)
  192.168.1 (incomplete)
  192.168.1.1.1 (too many octets)

Validation Workflows

Here is how an agent uses regex tools for input validation:

Example: Form Validation
User: "Validate this user signup data:
       Email: john.doe@gmail.com
       Phone: 555-123-4567
       Website: www.johndoe.com"

Agent workflow:
1. Get validation patterns
2. Test each field
3. Report results

Tool calls:
- regex/test {
    "pattern": "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$",
    "test_strings": ["john.doe@gmail.com"]
  }
  Result: { "matches": true }

- regex/test {
    "pattern": "^\\+?[1-9]\\d{1,14}$|^\\d{3}-\\d{3}-\\d{4}$",
    "test_strings": ["555-123-4567"]
  }
  Result: { "matches": true }

- regex/test {
    "pattern": "^https?:\\/\\/(www\\.)?...",
    "test_strings": ["www.johndoe.com"]
  }
  Result: { "matches": false }

Agent response:
"**Validation Results:**

| Field   | Value              | Valid |
|---------|-------------------|-------|
| Email   | john.doe@gmail.com | Yes   |
| Phone   | 555-123-4567      | Yes   |
| Website | www.johndoe.com   | No    |

**Issue:** The website URL is missing the protocol (http:// or https://).
Suggested fix: `https://www.johndoe.com`"

Data Extraction

Example: Extract Phone Numbers
User: "Find all phone numbers in this text:
       Contact us at 555-123-4567, or reach our support at
       (800) 555-0199. International: +1-555-867-5309"

Tool call:
- regex/match {
    "pattern": "(\\+?1?[-.]?)?\\(?\\d{3}\\)?[-.]?\\d{3}[-.]?\\d{4}",
    "text": "...",
    "global": true
  }

Result: {
  "matches": [
    { "full": "555-123-4567", "index": 14 },
    { "full": "(800) 555-0199", "index": 52 },
    { "full": "+1-555-867-5309", "index": 82 }
  ]
}

Agent response:
"Found 3 phone numbers:
1. 555-123-4567
2. (800) 555-0199
3. +1-555-867-5309"

MCP Setup

Add TinyFn regex tools to your AI agent:

mcp.json
{
  "mcpServers": {
    "tinyfn-regex": {
      "url": "https://api.tinyfn.io/mcp/regex",
      "headers": {
        "X-API-Key": "your-api-key"
      }
    }
  }
}

Available Regex Tools

Tool Description
regex/generate Get pre-built pattern for common types
regex/test Test if strings match a pattern
regex/match Extract matches and capture groups
regex/validate Check if a pattern is syntactically valid
regex/replace Replace pattern matches in text

Best Practices

  1. Use pre-built patterns: For common validation (email, URL, phone), use TinyFn's tested patterns instead of generating new ones
  2. Always test: Before presenting a regex to users, test it with edge cases
  3. Specify engine: Different regex engines have different features; specify your target
  4. Handle escaping: When embedding patterns in code, remember to escape backslashes
Pro Tip: Combine regex validation with format-specific validators. For example, a regex can check email format, but TinyFn's email validator also checks for disposable domains and DNS records.

Add Regex Tools to Your AI Agent

Get your free API key and give your agent reliable regex capabilities.

Get Free API Key

Ready to try TinyFn?

Get your free API key and start building in minutes.

Get Free API Key