Free Robots.txt Generator — Beginner Friendly, Pro Powerful
Build a perfect robots.txt file in seconds. Tick what to allow or block — including 18 AI crawlers like GPTBot, ClaudeBot and PerplexityBot. Includes platform install guides for WordPress, Shopify, Webflow, Astro, Next.js and more.
Step 1 — Pick a starting point
Quick Start Presets
Each preset applies a sensible set of rules to every section below. You can fine-tune anything afterwards — picking a preset is a starting point, not a commitment.
How to read this builder
- Green dot — we recommend a specific action for most sites (block or allow). Hover the chip to see which.
- Amber dot — optional. Depends on your situation. No strong default either way.
- "?" button on each chip — opens a panel with the full reasons to block vs allow that specific bot or path.
Standard Search Engines
Default: All allowedThese are the search engines that drive your traffic. Leave them all ticked unless you have a specific reason to block one.
AI Crawlers
Tick to blockTick a bot to block it from training on or scraping your content. Hover any chip for what that specific bot does. Use the master toggle below for maximum protection in one click.
Common Paths to Block
Tick the URL patterns you want to block from crawling. The preset above has already ticked the most common ones for your chosen starting point.
/wp-content/plugins/ or /wp-includes/ on WordPress sites — many themes load CSS / JS from there, and blocking breaks Google's ability to render your pages. Tap each chip's "?" for the full reasoning.
Admin & CMS
Commerce
Content archives
Query parameters
File types
System
Crawl Delay (advanced — optional)
Most modern bots ignore crawl-delay. Only set this if your server is genuinely struggling under crawl load.
Sitemap URLs
Tell crawlers where to find your sitemap. We strongly recommend adding at least one — this is how Google discovers every important page on your site.
The basics
What is robots.txt?
What it is
robots.txt is a plain text file that lives at the root of your website (yourdomain.com/robots.txt). It tells search-engine crawlers and other bots which parts of your site they can and cannot access. Created in 1994, it is the oldest and most widely respected protocol for controlling crawler behaviour, and every major search engine plus most AI companies honour it.
What it does (and doesn’t do)
- Tells well-behaved bots which URLs to skip
- Conserves your crawl budget
- Reduces server load
- Hides admin areas from search results
- Controls AI training data access
- Does NOT prevent indexing (use a noindex meta tag for that)
- Does NOT block malicious bots (they ignore it)
- Does NOT secure private content (use authentication)
- Does NOT remove already-indexed pages from Google
Common mistakes
- Blocking CSS or JS files (hurts rankings — Google needs to render your page)
- Using robots.txt to hide private content (use authentication instead)
- Typos in user-agent names (case-sensitive in practice)
- Missing the file entirely (recommended even if empty)
- Conflicting Allow / Disallow rules
- Forgetting to update after a site migration
Step 3 — Install it
Platform Deployment Guides
Exact step-by-step for every major platform. Pick yours, follow the steps,
verify at yourdomain.com/robots.txt.
WordPress sites have three good options depending on which SEO plugin you run. Pick one method — do not combine them.
Method 1 — Yoast SEO plugin
- WordPress Admin → Yoast SEO → Tools
- Click File editor
- If no robots.txt exists, click Create robots.txt file
- Paste the generated content from the preview panel above
- Click Save changes to robots.txt and verify at
yourdomain.com/robots.txt
Method 2 — Rank Math plugin
- WordPress Admin → Rank Math → General Settings
- Click Edit robots.txt in the sidebar
- Paste the generated content
- Click Save Changes
Method 3 — Manual via FTP / cPanel
- Click Download robots.txt in the preview panel above
- Connect via FTP or cPanel File Manager
- Upload to the WordPress root folder (same level as
wp-config.php) - Verify at
yourdomain.com/robots.txt
Shopify auto-generates a sensible robots.txt for every store. To customise it, you must create a Liquid template that extends the default.
- Shopify Admin → Online Store → Themes
- Click the three-dot menu on your live theme → Edit code
- Under Templates, click Add a new template
- Choose robots.txt from the dropdown and click Create template
- Replace the default content with the Liquid snippet below (it loops Shopify's defaults then adds your custom rules)
- Click Save and verify at
yourdomain.com/robots.txt
# Generated by SEO First Web — seofirstweb.co.uk/tools/robots-txt-generator
{%- for group in robots.default_groups -%}
{{- group.user_agent }}
{%- for rule in group.rules -%}
{{ rule }}
{%- endfor -%}
{%- if group.user_agent.value == '*' -%}
{{ 'Disallow: /your-custom-path/' }}
{%- endif -%}
{%- if group.sitemap != blank -%}
{{ group.sitemap }}
{%- endif -%}
{%- endfor -%}Wix exposes a basic robots.txt editor on Business / Premium plans.
- Site Dashboard → Marketing & SEO → SEO Tools
- Click Robots.txt Editor
- Paste the generated content
- Click Save
Squarespace auto-generates robots.txt and intentionally limits customisation. You have two real options:
Option A — built-in crawler settings
- Settings → Crawlers
- Toggle each crawler on or off (basic allow/block per major bot)
- Save
Option B — Developer Mode (full custom robots.txt)
To deploy a fully custom robots.txt you must switch your site to Developer Mode and add a robots.txt file at the site root in the Git repository. This is a one-way change for the underlying site files — back up first.
Webflow gives you a clean robots.txt editor on every paid plan.
- Project Settings → SEO tab
- Scroll to Robots.txt
- Paste the generated content
- Click Save Changes
- Click Publish to push the change to your live domain — robots.txt updates do NOT go live until you publish
Ghost serves a default robots.txt automatically. To replace it:
Self-hosted Ghost
- SFTP into your Ghost host
- Navigate to
/content/static/(create it if it does not exist) - Upload your generated
robots.txthere - Restart Ghost to clear any cached default
Ghost(Pro) hosted
You cannot upload arbitrary files. The supported workaround is a routes.yaml override that maps /robots.txt to a route on your theme. See the official Ghost routing docs for the syntax.
The classic deployment — works on every shared host, every static host, and every CDN.
- Click Download robots.txt in the preview panel above
- Connect to your host via FTP, SFTP, or the cPanel File Manager
- Upload
robots.txtto the web root (the same folder that containsindex.html) - Verify at
yourdomain.com/robots.txt
For sites running on Apache (the most common Linux web server).
- Save the generated file as
robots.txt - Place in your document root — typically
/var/www/html/ - No
.htaccesschanges are needed; Apache servesrobots.txtautomatically - Verify:
curl https://yourdomain.com/robots.txt
For sites running on Nginx — common for VPS and high-traffic deployments.
- Save the generated file as
robots.txt - Place in your document root — typically
/usr/share/nginx/html/or/var/www/ - Optional but recommended — add an explicit
locationblock so logs do not pollute and the correct Content-Type is set - Reload Nginx:
sudo nginx -s reload
location = /robots.txt {
add_header Content-Type text/plain;
log_not_found off;
access_log off;
} Static method (recommended for most projects)
- Save the generated file as
robots.txt - Place it at
/public/robots.txtin your Next.js project - Commit and push — Vercel auto-deploys
- Verify at
yourdomain.com/robots.txt
Dynamic method — Next.js App Router
Use this when your robots.txt needs to vary between staging and production environments. Create app/robots.ts:
import type { MetadataRoute } from 'next'
export default function robots(): MetadataRoute.Robots {
return {
rules: [
{ userAgent: '*', allow: '/', disallow: ['/admin/', '/api/'] },
{ userAgent: 'GPTBot', disallow: '/' },
],
sitemap: 'https://yourdomain.com/sitemap.xml',
}
} Static method
- Save the generated file as
robots.txt - Place it at
/public/robots.txtin your Astro project - Commit and push — your Astro host (Vercel, Netlify, Cloudflare Pages) auto-deploys
- Verify at
yourdomain.com/robots.txt
Dynamic method — Astro endpoint
Create an endpoint at src/pages/robots.txt.ts for environment-dependent output:
import type { APIRoute } from 'astro';
const robotsTxt = `
User-agent: *
Allow: /
User-agent: GPTBot
Disallow: /
Sitemap: ${import.meta.env.SITE}/sitemap-index.xml
`.trim();
export const GET: APIRoute = () =>
new Response(robotsTxt, {
headers: { 'Content-Type': 'text/plain' },
}); For sites proxied through Cloudflare (orange-cloud DNS).
Default — origin serves it
Your origin server's robots.txt is served as normal through Cloudflare's proxy. No additional configuration required.
Override at the edge — Cloudflare Workers
- Cloudflare dashboard → Workers & Pages → Create application
- Add a Worker route for
yourdomain.com/robots.txt - Return the generated content as a
text/plainresponse - This lets you serve a different robots.txt to specific country regions or bot user-agents
Frequently Asked Questions
The questions we hear most from clients setting up robots.txt for the first time.
Where does robots.txt go on my server?
At the root of your domain, accessible at yourdomain.com/robots.txt. The filename must be lowercase and named exactly robots.txt — Google will not look for it anywhere else.
Does robots.txt actually block bots?
Only bots that choose to honour it. All major search engines (Google, Bing, DuckDuckGo) and most reputable AI companies (OpenAI, Anthropic, Perplexity) respect it. Malicious scrapers, spam bots and most copy-scrapers ignore it entirely — for those, use authentication, rate limiting or a WAF rule.
Is robots.txt case-sensitive?
The filename must be lowercase: robots.txt. URL paths inside the file ARE case-sensitive (/Admin/ and /admin/ are treated as different paths). User-agent names are NOT case-sensitive, but it is conventional to write them in their canonical case (Googlebot, GPTBot, ClaudeBot) for readability.
How long until Google sees my robots.txt changes?
Googlebot typically re-checks robots.txt every 24 hours. To speed it up, submit your sitemap in Google Search Console or use the URL Inspection tool on any page — the request triggers a fresh fetch of robots.txt as a side effect.
What is the difference between robots.txt and a noindex meta tag?
robots.txt blocks crawling (the bot never fetches the page). A noindex meta tag blocks indexing (the bot fetches the page but is told not to add it to search results). To remove a page from Google, use noindex — robots.txt alone will NOT remove an already-indexed page, because Google cannot fetch the page to see the noindex.
What is the difference between robots.txt and .htaccess?
robots.txt is a polite request to bots — it relies on the bot choosing to obey. .htaccess (or your Nginx config) is server-level enforcement — it can actually block requests, redirect them, or require authentication. Use robots.txt for crawl-budget control and use .htaccess / Nginx rules for genuine access control.
Why should I block AI crawlers?
To prevent your content being used to train AI models without compensation, to protect proprietary content, or to control how your brand is represented in AI-generated answers. The middle path many businesses now take: block training bots (GPTBot, ClaudeBot, Google-Extended, CCBot) but allow live-search bots (ChatGPT-User, Perplexity-User, OAI-SearchBot) so you still appear in AI answers.
Should I block AI crawlers or allow them?
Allow them if you want visibility in AI-generated answers — a fast-growing traffic source. Block them if you want to protect proprietary content from being absorbed into training datasets. A middle path: block training crawlers (GPTBot, ClaudeBot, Google-Extended, CCBot) but allow live-search crawlers (ChatGPT-User, PerplexityBot, OAI-SearchBot). This tool makes that split a one-click toggle.
Can I have multiple sitemap URLs?
Yes. Add multiple Sitemap: lines, one per line. Useful for large sites with separate sitemaps for pages, posts, products, images, video, news, etc. Google reads them all on the same crawl.
What does the asterisk (*) mean in robots.txt?
* is a wildcard. User-agent: * means "all bots". In paths, * matches any sequence of characters — /*?sort= blocks any URL containing ?sort=. The $ symbol at the end of a path means "URL ends here" — *.pdf$ matches only URLs that finish with .pdf, not URLs containing .pdf in the middle.
Other Free SEO Tools
SEO Audit
Full-site SEO audit in 30 seconds — Lighthouse scores, security grade, on-page checks.
Open tool →Schema Validator
Paste any URL, see every schema block, plain-English fixes for what is broken.
Open tool →Content Brief Generator
Ahrefs-level SEO content brief in 30 seconds with Content Score and SERP feature matrix.
Open tool →Need help with your SEO strategy?
Our London-based team has delivered over 1,500 SEO projects with a 90% client success rate. Book a free 30-minute consultation — no obligation, no sales pitch, just an honest look at what your site needs.
Book a Free Consultation