Good AI Tools

>> skills/seo/technical/robots

stars: 26
forks: 6
watches: 26
last updated: 2026-03-05 06:09:38

SEO Technical: robots.txt

Guides configuration and auditing of robots.txt for search engine and AI crawler control.

When invoking: On first use, if helpful, open with 1–2 sentences on what this skill covers and why it matters, then provide the main output. On subsequent use or when the user asks to skip, go directly to the main output.

Scope (Technical SEO)

  • Robots.txt: Review Disallow/Allow; avoid blocking important pages
  • Crawler access: Ensure crawlers (including AI crawlers) can access key pages
  • Indexing: Misconfigured robots.txt can block indexing; verify no accidental blocks

Initial Assessment

Check for product marketing context first: If .claude/product-marketing-context.md or .cursor/product-marketing-context.md exists, read it for site URL and indexing goals.

Identify:

  1. Site URL: Base domain (e.g., https://example.com)
  2. Indexing scope: Full site, partial, or specific paths to exclude
  3. AI crawler strategy: Allow search/indexing vs. block training data crawlers

Best Practices

Purpose and Limitations

PointNote
PurposeControls crawler access; does NOT prevent indexing (disallowed URLs may still appear in search without snippet)
No-indexUse noindex meta or auth for sensitive content; robots.txt is publicly readable
Indexed vs non-indexedNot all content should be indexed. robots.txt and noindex complement each other: robots for path-level crawl control, noindex for page-level indexing. See indexing
AdvisoryRules are advisory; malicious crawlers may ignore

Location and Format

ItemRequirement
PathSite root: https://example.com/robots.txt
EncodingUTF-8 plain text
StandardRFC 9309 (Robots Exclusion Protocol)

Core Directives

DirectivePurposeExample
User-agent:Target crawlerUser-agent: Googlebot, User-agent: *
Disallow:Block path prefixDisallow: /admin/
Allow:Allow path (can override Disallow)Allow: /public/
Sitemap:Declare sitemap absolute URLSitemap: https://example.com/sitemap.xml
Clean-param:Strip query params (Yandex)See below

Critical: Do Not Block Rendering Resources

  • Do not block CSS, JS, images; Google needs them to render pages
  • Only block paths that don't need crawling: admin, API, temp files

AI Crawler Strategy

User-agentPurposeTypical
OAI-SearchBotChatGPT searchAllow
GPTBotOpenAI trainingDisallow
Claude-SearchBotClaude searchAllow
ClaudeBotAnthropic trainingDisallow
PerplexityBotPerplexity searchAllow
Google-ExtendedGemini trainingDisallow
CCBotCommon CrawlDisallow

Clean-param (Yandex)

Clean-param: utm_source&utm_medium&utm_campaign&utm_term&utm_content&ref&fbclid&gclid

Output Format

  • Current state (if auditing)
  • Recommended robots.txt (full file)
  • Compliance checklist
  • References: Google robots.txt

Related Skills

  • xml-sitemap: Sitemap URL to reference in robots.txt
  • site-crawlability: Broader crawl and structure guidance