Need to control how search engines crawl your site? This guide covers everything you need to know about generating robots.txt files via API, including directives, user-agent rules, and implementation examples.
What is Robots.txt?
Robots.txt is a text file placed in a website's root directory that tells web crawlers which pages or sections of the site should not be crawled or indexed. It follows the Robots Exclusion Protocol and is respected by most major search engines.
A robots.txt file is always located at: https://example.com/robots.txt
Robots.txt Directives
The main directives used in robots.txt are:
| Directive | Description | Example |
|---|---|---|
User-agent | Specifies which crawler the rules apply to | User-agent: Googlebot |
Disallow | Blocks access to specified paths | Disallow: /admin/ |
Allow | Explicitly allows access (overrides Disallow) | Allow: /public/ |
Sitemap | Points to your XML sitemap | Sitemap: /sitemap.xml |
Crawl-delay | Sets delay between requests (some bots) | Crawl-delay: 10 |
User-agent: * to apply rules to all crawlers, or specify individual bots for more granular control.
Using the Robots.txt API
TinyFn provides an endpoint to generate robots.txt files:
POST https://api.tinyfn.io/v1/generate/robots
Headers: X-API-Key: your-api-key
Content-Type: application/json
{
"rules": [
{"user_agent": "*", "disallow": ["/admin/", "/private/"]},
{"user_agent": "Googlebot", "allow": ["/"], "disallow": ["/tmp/"]}
],
"sitemap": "https://example.com/sitemap.xml"
}
{
"robots_txt": "User-agent: *\nDisallow: /admin/\nDisallow: /private/\n\nUser-agent: Googlebot\nAllow: /\nDisallow: /tmp/\n\nSitemap: https://example.com/sitemap.xml"
}
Parameters
| Parameter | Type | Description |
|---|---|---|
rules |
array | Array of rule objects (required) |
rules[].user_agent |
string | User agent to apply rules to (required) |
rules[].allow |
array | Paths to allow crawling |
rules[].disallow |
array | Paths to block from crawling |
sitemap |
string | URL of your sitemap |
Code Examples
JavaScript / Node.js
const response = await fetch('https://api.tinyfn.io/v1/generate/robots', {
method: 'POST',
headers: {
'X-API-Key': 'your-api-key',
'Content-Type': 'application/json'
},
body: JSON.stringify({
rules: [
{ user_agent: '*', disallow: ['/api/', '/admin/', '/_next/'] }
],
sitemap: 'https://mysite.com/sitemap.xml'
})
});
const { robots_txt } = await response.json();
// Save to robots.txt file
Python
import requests
response = requests.post(
'https://api.tinyfn.io/v1/generate/robots',
json={
'rules': [
{'user_agent': '*', 'disallow': ['/private/']},
{'user_agent': 'Googlebot', 'allow': ['/public/']}
],
'sitemap': 'https://mysite.com/sitemap.xml'
},
headers={'X-API-Key': 'your-api-key'}
)
data = response.json()
with open('robots.txt', 'w') as f:
f.write(data['robots_txt'])
Common Patterns
- Block all bots:
User-agent: *withDisallow: / - Allow all:
User-agent: *with emptyDisallow: - Block admin areas: Disallow /admin/, /dashboard/, /wp-admin/
- Block search results: Disallow /search, /?s=, /?q=
- Block staging/dev: Consider noindex meta tags instead
Best Practices
- Test your robots.txt: Use Google Search Console's robots.txt tester
- Don't hide sensitive data: Robots.txt is public; use authentication instead
- Include sitemap URL: Helps search engines find your sitemap
- Keep it simple: Too many rules can cause confusion
Use via MCP
Your AI agent can call this tool directly via Model Context Protocol — no HTTP code needed. Add TinyFn to Claude Desktop, Cursor, or any MCP client:
{
"mcpServers": {
"tinyfn-generate": {
"url": "https://api.tinyfn.io/mcp/generate/",
"headers": {
"X-API-Key": "your-api-key"
}
}
}
}
See all generator tools available via MCP in our Generator MCP Tools for AI Agents guide.
Try the Robots.txt Generator API
Get your free API key and start creating robots.txt files in seconds.
Get Free API Key