Free Tool · No Signup · 18 AI Bot Controls

Free Robots.txt Generator: Beginner Friendly, Pro Powerful

Build a perfect robots.txt file in seconds. Tick what to allow or block, including 18 AI crawlers like GPTBot, ClaudeBot and PerplexityBot. Includes platform install guides for WordPress, Shopify, Webflow, Astro, Next.js and more.

Quick Start What is robots.txt?

1,500+ Businesses helped · 100% free Cost · None Signup required

Step 1: Pick a starting point

Quick Start Presets

Each preset applies a sensible set of rules to every section below. You can fine-tune anything afterwards. Picking a preset is a starting point, not a commitment.

How to read this builder

Green dot: we recommend a specific action for most sites (block or allow). Hover the chip to see which.
Amber dot: optional. Depends on your situation. No strong default either way.
The "?" button on each chip opens a panel with the full reasons to block vs allow that specific bot or path.

Standard Search Engines

Default: All allowed

These are the search engines that drive your traffic. Leave them all ticked unless you have a specific reason to block one.

Recommended approach: Allow all 8 by default. Only block a search engine if you serve a specific market and have zero traffic from elsewhere (e.g. UK-only B2B sites can safely block Baiduspider).

AI Crawlers

Tick to block

Tick a bot to block it from training on or scraping your content. Hover any chip for what that specific bot does. Use the master toggle below for maximum protection in one click.

Recommended approach (the middle path): Block training crawlers (GPTBot, ClaudeBot, Google-Extended, CCBot, Bytespider) to protect IP, but allow live-search crawlers (ChatGPT-User, Perplexity-User, OAI-SearchBot) so you stay visible in AI answers. Tap any "?" to see the pros / cons for that specific bot.

Common Paths to Block

Tick the URL patterns you want to block from crawling. The preset above has already ticked the most common ones for your chosen starting point.

CRITICAL: Do NOT block /wp-content/plugins/ or /wp-includes/ on WordPress sites: many themes load CSS / JS from there, and blocking breaks Google's ability to render your pages. Tap each chip's "?" for the full reasoning.

Admin & CMS

Commerce

Content archives

Query parameters

File types

System

Add a custom path

Crawl Delay (advanced, optional)

Most modern bots ignore crawl-delay. Only set this if your server is genuinely struggling under crawl load.

Sitemap URLs

Tell crawlers where to find your sitemap. We strongly recommend adding at least one: this is how Google discovers every important page on your site.

The basics

What is robots.txt?

What it is

robots.txt is a plain text file that lives at the root of your website (yourdomain.com/robots.txt). It tells search-engine crawlers and other bots which parts of your site they can and cannot access. Created in 1994, it is the oldest and most widely respected protocol for controlling crawler behaviour, and every major search engine plus most AI companies honour it.

What it does (and doesn’t do)

Tells well-behaved bots which URLs to skip
Conserves your crawl budget
Reduces server load
Hides admin areas from search results
Controls AI training data access

Does NOT prevent indexing (use a noindex meta tag for that)
Does NOT block malicious bots (they ignore it)
Does NOT secure private content (use authentication)
Does NOT remove already-indexed pages from Google

Common mistakes

Blocking CSS or JS files (hurts rankings: Google needs to render your page)
Using robots.txt to hide private content (use authentication instead)
Typos in user-agent names (case-sensitive in practice)
Missing the file entirely (recommended even if empty)
Conflicting Allow / Disallow rules
Forgetting to update after a site migration

Step 3: Install it

Platform Deployment Guides

Exact step-by-step for every major platform. Pick yours, follow the steps, verify at yourdomain.com/robots.txt.

WordPress

WordPress sites have three good options depending on which SEO plugin you run. Pick one method and do not combine them.

Method 1: Yoast SEO plugin

WordPress Admin → Yoast SEO → Tools
Click File editor
If no robots.txt exists, click Create robots.txt file
Paste the generated content from the preview panel above
Click Save changes to robots.txt and verify at yourdomain.com/robots.txt

Method 2: Rank Math plugin

WordPress Admin → Rank Math → General Settings
Click Edit robots.txt in the sidebar
Paste the generated content
Click Save Changes

Method 3: Manual via FTP / cPanel

Click Download robots.txt in the preview panel above
Connect via FTP or cPanel File Manager
Upload to the WordPress root folder (same level as wp-config.php)
Verify at yourdomain.com/robots.txt

WordPress generates a virtual robots.txt by default. As soon as you upload a real file, the virtual one is overridden. There is no setting to toggle.

Shopify

Shopify auto-generates a sensible robots.txt for every store. To customise it, you must create a Liquid template that extends the default.

Shopify Admin → Online Store → Themes
Click the three-dot menu on your live theme → Edit code
Under Templates, click Add a new template
Choose robots.txt from the dropdown and click Create template
Replace the default content with the Liquid snippet below (it loops Shopify's defaults then adds your custom rules)
Click Save and verify at yourdomain.com/robots.txt

# Generated by SEO First Web — seofirstweb.co.uk/tools/robots-txt-generator
{%- for group in robots.default_groups -%}
  {{- group.user_agent }}
  {%- for rule in group.rules -%}
    {{ rule }}
  {%- endfor -%}

  {%- if group.user_agent.value == '*' -%}
    {{ 'Disallow: /your-custom-path/' }}
  {%- endif -%}

  {%- if group.sitemap != blank -%}
    {{ group.sitemap }}
  {%- endif -%}
{%- endfor -%}

Shopify will not let you remove its built-in Disallow rules for /cart, /checkout and /policies/. That is by design. Your custom rules sit alongside them.

Squarespace

Squarespace auto-generates robots.txt and intentionally limits customisation. You have two real options:

Option A: built-in crawler settings

Settings → Crawlers
Toggle each crawler on or off (basic allow/block per major bot)
Save

Option B: Developer Mode (full custom robots.txt)

To deploy a fully custom robots.txt you must switch your site to Developer Mode and add a robots.txt file at the site root in the Git repository. This is a one-way change for the underlying site files, so back up first.

Ghost

Ghost serves a default robots.txt automatically. To replace it:

Self-hosted Ghost

SFTP into your Ghost host
Navigate to /content/static/ (create it if it does not exist)
Upload your generated robots.txt here
Restart Ghost to clear any cached default

Ghost(Pro) hosted

You cannot upload arbitrary files. The supported workaround is a routes.yaml override that maps /robots.txt to a route on your theme. See the official Ghost routing docs for the syntax.

Custom HTML / Static

The classic deployment. Works on every shared host, every static host, and every CDN.

Click Download robots.txt in the preview panel above
Connect to your host via FTP, SFTP, or the cPanel File Manager
Upload robots.txt to the web root (the same folder that contains index.html)
Verify at yourdomain.com/robots.txt

Apache Server

For sites running on Apache (the most common Linux web server).

Save the generated file as robots.txt
Place in your document root, typically /var/www/html/
No .htaccess changes are needed; Apache serves robots.txt automatically
Verify: curl https://yourdomain.com/robots.txt

Nginx Server

For sites running on Nginx, common for VPS and high-traffic deployments.

Save the generated file as robots.txt
Place in your document root, typically /usr/share/nginx/html/ or /var/www/
Optional but recommended: add an explicit location block so logs do not pollute and the correct Content-Type is set
Reload Nginx: sudo nginx -s reload

location = /robots.txt {
  add_header Content-Type text/plain;
  log_not_found off;
  access_log off;
}

Vercel / Next.js

Static method (recommended for most projects)

Save the generated file as robots.txt
Place it at /public/robots.txt in your Next.js project
Commit and push, and Vercel auto-deploys
Verify at yourdomain.com/robots.txt

Dynamic method: Next.js App Router

Use this when your robots.txt needs to vary between staging and production environments. Create app/robots.ts:

import type { MetadataRoute } from 'next'

export default function robots(): MetadataRoute.Robots {
  return {
    rules: [
      { userAgent: '*', allow: '/', disallow: ['/admin/', '/api/'] },
      { userAgent: 'GPTBot', disallow: '/' },
    ],
    sitemap: 'https://yourdomain.com/sitemap.xml',
  }
}

Astro

Static method

Save the generated file as robots.txt
Place it at /public/robots.txt in your Astro project
Commit and push, and your Astro host (Vercel, Netlify, Cloudflare Pages) auto-deploys
Verify at yourdomain.com/robots.txt

Dynamic method: Astro endpoint

Create an endpoint at src/pages/robots.txt.ts for environment-dependent output:

import type { APIRoute } from 'astro';

const robotsTxt = `
User-agent: *
Allow: /

User-agent: GPTBot
Disallow: /

Sitemap: ${import.meta.env.SITE}/sitemap-index.xml
`.trim();

export const GET: APIRoute = () =>
  new Response(robotsTxt, {
    headers: { 'Content-Type': 'text/plain' },
  });

Cloudflare

For sites proxied through Cloudflare (orange-cloud DNS).

Default: origin serves it

Your origin server's robots.txt is served as normal through Cloudflare's proxy. No additional configuration required.

Override at the edge: Cloudflare Workers

Cloudflare dashboard → Workers & Pages → Create application
Add a Worker route for yourdomain.com/robots.txt
Return the generated content as a text/plain response
This lets you serve a different robots.txt to specific country regions or bot user-agents

For most sites, do not bother with a Worker. Let the origin serve robots.txt and use Cloudflare's built-in bot management for AI/bot blocking instead.

Switch platform

Frequently Asked Questions

The questions we hear most from clients setting up robots.txt for the first time.

Where does robots.txt go on my server?

At the root of your domain, accessible at yourdomain.com/robots.txt. The filename must be lowercase and named exactly robots.txt. Google will not look for it anywhere else.

Does robots.txt actually block bots?

Only bots that choose to honour it. All major search engines (Google, Bing, DuckDuckGo) and most reputable AI companies (OpenAI, Anthropic, Perplexity) respect it. Malicious scrapers, spam bots and most copy-scrapers ignore it entirely. For those, use authentication, rate limiting or a WAF rule.

Is robots.txt case-sensitive?

The filename must be lowercase: robots.txt. URL paths inside the file ARE case-sensitive (/Admin/ and /admin/ are treated as different paths). User-agent names are NOT case-sensitive, but it is conventional to write them in their canonical case (Googlebot, GPTBot, ClaudeBot) for readability.

How long until Google sees my robots.txt changes?

Googlebot typically re-checks robots.txt every 24 hours. To speed it up, submit your sitemap in Google Search Console or use the URL Inspection tool on any page. The request triggers a fresh fetch of robots.txt as a side effect.

What is the difference between robots.txt and a noindex meta tag?

robots.txt blocks crawling (the bot never fetches the page). A noindex meta tag blocks indexing (the bot fetches the page but is told not to add it to search results). To remove a page from Google, use noindex. robots.txt alone will NOT remove an already-indexed page, because Google cannot fetch the page to see the noindex.

What is the difference between robots.txt and .htaccess?

robots.txt is a polite request to bots: it relies on the bot choosing to obey. .htaccess (or your Nginx config) is server-level enforcement, so it can actually block requests, redirect them, or require authentication. Use robots.txt for crawl-budget control and use .htaccess / Nginx rules for genuine access control.

Why should I block AI crawlers?

To prevent your content being used to train AI models without compensation, to protect proprietary content, or to control how your brand is represented in AI-generated answers. The middle path many businesses now take: block training bots (GPTBot, ClaudeBot, Google-Extended, CCBot) but allow live-search bots (ChatGPT-User, Perplexity-User, OAI-SearchBot) so you still appear in AI answers.

Should I block AI crawlers or allow them?

Allow them if you want visibility in AI-generated answers, a fast-growing traffic source. Block them if you want to protect proprietary content from being absorbed into training datasets. A middle path: block training crawlers (GPTBot, ClaudeBot, Google-Extended, CCBot) but allow live-search crawlers (ChatGPT-User, PerplexityBot, OAI-SearchBot). This tool makes that split a one-click toggle.

Can I have multiple sitemap URLs?

Yes. Add multiple Sitemap: lines, one per line. Useful for large sites with separate sitemaps for pages, posts, products, images, video, news, etc. Google reads them all on the same crawl.

What does the asterisk (*) mean in robots.txt?

* is a wildcard. User-agent: * means "all bots". In paths, * matches any sequence of characters, so /*?sort= blocks any URL containing ?sort=. The $ symbol at the end of a path means "URL ends here", so *.pdf$ matches only URLs that finish with .pdf, not URLs containing .pdf in the middle.

Other Free SEO Tools

SEO Audit

Full-site SEO audit in 30 seconds: Lighthouse scores, security grade, on-page checks.

Open tool →

Schema Validator

Paste any URL, see every schema block, plain-English fixes for what is broken.

Open tool →

Content Brief Generator

Ahrefs-level SEO content brief in 30 seconds with Content Score and SERP feature matrix.

Open tool →

Need help with your SEO strategy?

Our London-based team has delivered over 1,500 SEO projects with a 90% client success rate. Book a free 30-minute consultation: no obligation, no sales pitch, just an honest look at what your site needs.

Book a Free Consultation