5 min readSEO Expert

What is Robots.txt?

Quick Answer

Robots.txt is a text file that tells search engine crawlers which pages or sections of your website they can or cannot access and crawl.

Robot representing search engine crawlers and robots.txt file configuration

Overview

Robots.txt is a simple text file placed in your website's root directory that provides instructions to search engine crawlers (also called bots or spiders) about which parts of your site they should or shouldn't access. It's part of the Robots Exclusion Protocol, a standard that websites use to communicate with web crawlers.

While robots.txt is a powerful tool for managing crawler behavior, it's important to understand what it can and cannot do. It's a directive, not a security measure—it tells well-behaved crawlers what to avoid, but malicious bots can ignore it entirely. Pages blocked by robots.txt can still appear in search results if other sites link to them.

Understanding how to properly configure robots.txt is essential for technical SEO. A misconfigured file can accidentally block important content from being indexed, while a well-optimized one can help preserve crawl budget and keep low-value pages out of search results.

How Robots.txt Works

Robots.txt uses simple syntax to control crawler access:

Basic Commands: - User-agent: Specifies which crawler the rules apply to - Disallow: Blocks access to specified URLs or directories - Allow: Permits access (overrides Disallow for specific paths) - Sitemap: Points to your XML sitemap location

Example Robots.txt File: User-agent: * Disallow: /admin/ Disallow: /private/ Allow: /public/ Sitemap: https://example.com/sitemap.xml

User-Agent Targeting: - Use * for all crawlers - Use specific names for individual bots: Googlebot, Bingbot, etc. - Different rules can apply to different crawlers

Path Matching: - Disallow: /folder/ blocks all URLs starting with /folder/ - Disallow: /*.pdf$ blocks all PDF files - Wildcards (*) and end-of-URL markers ($) enable pattern matching

Important Notes: - Robots.txt must be at the root: example.com/robots.txt - It's case-sensitive: /Admin/ is different from /admin/ - Crawlers check robots.txt before crawling any page

When to Use Robots.txt

Strategic use of robots.txt helps manage how search engines interact with your site:

Good Uses for Robots.txt: - Blocking admin and login pages - Preventing crawling of internal search results pages - Blocking development or staging environments - Managing crawl budget on large sites - Blocking duplicate content from filters or parameters - Preventing indexing of thank-you or confirmation pages

When NOT to Use Robots.txt: - Hiding sensitive information (use authentication instead) - Blocking pages you want indexed (use noindex instead) - Trying to prevent pages from appearing in search (doesn't work) - Blocking CSS/JavaScript files needed for rendering

Robots.txt vs. Noindex: - Robots.txt: Prevents crawling entirely - Noindex meta tag: Allows crawling but prevents indexing - If you block with robots.txt, crawlers can't see noindex tags - Use noindex when you want content crawled but not indexed

Common Mistakes: - Accidentally blocking entire sites with Disallow: / - Blocking CSS/JS files Google needs to render pages - Using robots.txt to hide content (not secure) - Forgetting to update after site restructuring

Testing and Maintaining Robots.txt

Regular testing ensures your robots.txt works as intended:

Testing Tools: - Google Search Console: Robots.txt Tester tool - Bing Webmaster Tools: Robots.txt testing feature - Technical SEO tools: Screaming Frog, Sitebulb validate robots.txt

What to Test: - Verify important pages aren't accidentally blocked - Confirm unwanted pages are properly blocked - Check for syntax errors that might cause issues - Test with specific user-agents if using targeted rules

Maintenance Best Practices: - Review robots.txt quarterly or after major site changes - Document why each rule exists - Test changes in a staging environment first - Monitor Google Search Console for crawl errors

Troubleshooting Blocked Pages: 1. Check if robots.txt is blocking the page 2. Use Search Console's URL Inspection tool 3. Verify no parent directories are blocked 4. Check for conflicting Allow/Disallow rules

Robots.txt Template for Most Sites: User-agent: * Disallow: /admin/ Disallow: /cart/ Disallow: /checkout/ Disallow: /account/ Disallow: /*?*sort= Disallow: /*?*filter= Allow: /

Sitemap: https://yoursite.com/sitemap.xml

What is Robots.txt?

Quick Answer

Overview

How Robots.txt Works

When to Use Robots.txt

Testing and Maintaining Robots.txt

Data Insights

Common Robots.txt Configurations

Stop Turning Down SEO Work

Overview

How Robots.txt Works

When to Use Robots.txt

Testing and Maintaining Robots.txt

Data Insights

Common Robots.txt Configurations

Related Questions

Related SEO Terms

Stop Turning Down SEO Work