skills/seo/technical/sitemap

stars:0
forks:0
watches:0
last updated:N/A

SEO Technical: Sitemap

Guides sitemap creation, auditing, and optimization for search engine discovery.

When invoking: On first use, if helpful, open with 1–2 sentences on what this skill covers and why it matters, then provide the main output. On subsequent use or when the user asks to skip, go directly to the main output.

Scope (Technical SEO)

  • Sitemap: Create XML sitemap; submit to Google Search Console
  • URL discovery: Help search engines find pages; especially important for large sites or poor internal linking

Task

Generate an XML Sitemap that complies with the sitemaps.org protocol from the project's page list, and declare it in robots.txt.

Initial Assessment

Check for project context first: If .claude/project-context.md or .cursor/project-context.md exists, read it for site URL and page structure.

Identify:

  1. Site URL: Base domain (e.g., https://example.com)
  2. URL count: Total indexable pages (single sitemap vs. sitemap index)
  3. Data source: Static config, CMS, file system, or hybrid

Precondition: Does the site need a sitemap?

Before generating, assess whether a sitemap is warranted. A sitemap is most valuable when:

  • Large site (hundreds+ pages) or new site with few backlinks
  • Deep or orphaned pages that internal links don't reach well
  • Rich media (images, videos, news) needing extension metadata
  • GSC shows growing "Discovered – not indexed" or "Not discovered" counts

Small sites (< 50 pages) with strong internal linking may not strictly need one, but creating a sitemap has near-zero cost and provides future-proof infrastructure. If in doubt, err on the side of creating.

1. Protocol Essentials

ItemSpec
Single sitemap limit50,000 URLs, 50MB (uncompressed)
Sitemap indexWhen exceeding limit, split and have main index reference sub-sitemaps
EncodingUTF-8
URL formatFull URL, same host, include https://
Required tags<loc>
Optional tags<lastmod>, <changefreq>, <priority>

2. Field Requirements

FieldDescriptionRecommendation
urlFull URLhttps://example.com/path
lastModifiedPage last modified timeUse page metadata, ISO 8601; use YYYY-MM-DD or omit when no data
changeFrequencyUpdate frequencyHome daily, list pages weekly, content pages monthly
priorityRelative importanceHome 1.0, aggregate pages 0.9, content pages 0.7–0.8, others 0.5–0.6

lastmod (Critical)

  • Must be accurate: Reflect actual page modification time, not sitemap generation time. Google requires verifiability; Bing reports ~18% of sitemaps have incorrect lastmod values.
  • Format: W3C Datetime (YYYY-MM-DD or YYYY-MM-DDTHH:MM:SS+TZD), e.g. 2025-01-15, 2025-01-15T14:30:00+08:00.
  • Avoid: Using new Date() for lastmod—causes all URLs to share the same timestamp; search engines may ignore.
  • Apply when: Content updates, structured data changes, or important link changes.

changefreq / priority

  • changefreq: Hints only; does not directly determine crawl frequency. Values: always, hourly, daily, weekly, monthly, yearly, never.
  • priority: 0.0–1.0; does not affect ranking; set higher for important pages; avoid identical values for all.

3. Architecture & Split

Single Sitemap

  • When URLs >50,000, generate /sitemap.xml directly.

Sitemap Index (Multiple Sub-sitemaps)

  • When exceeding limit, split by type or language; main index references sub-sitemaps.
  • Example splits: /sitemap/posts.xml, /sitemap/pages.xml, /sitemap/zh.xml, /sitemap/en.xml.
  • Main index outputs /sitemap.xml or /sitemap-index.xml, each entry as <sitemap><loc>...</loc></sitemap>.

Multilingual Sites

  • Split by locale: /sitemap/zh.xml, /sitemap/en.xml.
  • Or by content type + language: /sitemap/zh-posts.xml, /sitemap/en-posts.xml.

Multi-Language Sitemap (hreflang in Sitemap)

For multilingual sites, add xhtml:link hreflang alternates inside each <url> entry. Recommended for large sites (100+ multilingual pages); centralizes hreflang management.

Rules:

  • Every language version must link to ALL others, including itself (self-reference).
  • Include x-default pointing to default locale.
  • Use xmlns:xhtml="http://www.w3.org/1999/xhtml" namespace.
  • <loc> typically uses default-locale (clean) URL; x-default points there too.
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:xhtml="http://www.w3.org/1999/xhtml">
  <url>
    <loc>https://example.com/page</loc>
    <xhtml:link rel="alternate" hreflang="en" href="https://example.com/page" />
    <xhtml:link rel="alternate" hreflang="zh" href="https://example.com/zh/page" />
    <xhtml:link rel="alternate" hreflang="x-default" href="https://example.com/page" />
  </url>
</urlset>

List all language sitemaps in sitemap index; include in robots.txt.

4. Implementation

Tech StackImplementation
Next.js App Routerapp/sitemap.ts export MetadataRoute.Sitemap or generateSitemaps
Next.js Pages Routerpages/sitemap.xml.ts or getServerSideProps return XML
Astrosrc/pages/sitemap-index.xml.ts or @astrojs/sitemap
Vite / Static buildBuild script generates public/sitemap.xml
OtherGenerate static /sitemap.xml or return dynamically via API

Route Exclusion

  • If the project has i18n / middleware redirects, exclude sitemap paths to avoid redirect.
  • Example (Next.js matcher): '/((?!api|_next|sitemap|sitemap-index|.*\\..*).*)'.

5. Page Scope

Include

  • Home: /
  • Locale/region home pages (e.g. /zh, /en)
  • All indexable content pages, list pages, category pages

Exclude

  • /api/*, /admin/*, /_next/*
  • Static assets (images, JS, CSS, etc.). For image discovery, use image sitemap extension—see image-optimization. For video discovery, use video sitemap extension—see video-optimization
  • Login, admin, drafts, and other pages not intended for indexing

6. Data Source & Maintenance (Single Source of Truth)

  • Single source of truth: Read URL list from config, CMS, or metadata; avoid hardcoding in sitemap.
  • Multiple page types: Tools, blog, marketing pages can be merged into one array for unified generation.
  • New pages: Add only to data source; sitemap updates automatically; avoid maintaining multiple places.

Central Config (Recommended)

Create a config (e.g., site-pages-config.ts) that exports:

  • Page slugs/paths by section (tools, blog, marketing, etc.)
  • Optional: modifiedDate per page for accurate lastmod
  • Function: getAllPageUrls(baseUrl) for sitemap and IndexNow

Why: Sitemap, IndexNow, and feed can all import from the same config—no duplicate URL maintenance. IndexNow should use the same URL list; avoid separate hardcoded lists.

7. robots.txt

Add to robots.txt:

Sitemap: https://example.com/sitemap.xml

With multiple sitemaps, only declare the main index.

8. Output Format

Single Sitemap Example

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/</loc>
    <lastmod>2025-01-15</lastmod>
    <changefreq>daily</changefreq>
    <priority>1.0</priority>
  </url>
  <url>
    <loc>https://example.com/page</loc>
    <lastmod>2025-01-10</lastmod>
    <changefreq>weekly</changefreq>
    <priority>0.8</priority>
  </url>
</urlset>

Sitemap Index Example

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://example.com/sitemap/pages.xml</loc>
    <lastmod>2025-01-15</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://example.com/sitemap/posts.xml</loc>
    <lastmod>2025-01-14</lastmod>
  </sitemap>
</sitemapindex>

9. Submission & Verification

After generating the sitemap, guide the user through submission:

  1. robots.txt: Add Sitemap: https://example.com/sitemap.xml (or the index URL)
  2. Google Search Console: Navigate to Indexing → Sitemaps, enter the sitemap URL, submit
  3. Verify: In GSC Sitemaps report, check "Discovered pages" count vs. expected; if large gap, investigate
  4. Monitor: After 24–48h, confirm status shows "Success"; check Indexing → Pages report filtered by sitemap for "Indexed" vs. "Not indexed" breakdown

Common submission errors to flag:

  • Sitemap URL 404 — check build output and deployment
  • "Could not fetch" — verify robots.txt doesn't block the sitemap URL
  • "HTML page" error — see §10 HTML Diagnosis below

10. HTML Diagnosis (Sitemap Returns HTML Instead of XML)

When /sitemap.xml returns HTTP 200 but Content-Type: text/html with homepage content, Google silently rejects the sitemap — no GSC alert because the status code is 200. This is worse than 404.

Common causes:

  • Catch-all page routes (pages/[...slug].vue, Next.js catch-all) intercepting before the sitemap handler
  • i18n modules running at a lower level than routeRules/middleware
  • Geo-redirects or redirect plugins matching sitemap paths
  • CDN/cache layers stripping Content-Type headers

Diagnose with:

curl -I https://example.com/sitemap.xml          # Check Content-Type
curl -A "Googlebot" -I https://example.com/sitemap.xml  # Googlebot's view
curl -s https://example.com/sitemap.xml | head -5       # First lines must be XML

Fix (Next.js): Use server/routes/sitemap.xml.ts (highest priority, bypasses catch-all). For i18n, add excludePatterns: [/^\/sitemap.*\.xml$/]. Renaming the sitemap file (e.g., to sitemap_index.xml) can bypass Google's cache of the failed state.

11. Common Issues

IssueCause / Fix
Sitemap 404Build failure, wrong path, incorrect export; check routes and deployment
Missing pagesURLs not in data source, filtered or excluded
lastmod anomalyAvoid new Date(); use modifiedDate from page metadata
Google not indexingSubmit sitemap in GSC; check Coverage (google-search-console) and robots
EN/ZH URL mismatchUse unified data source; share same list when generating by locale
Sitemap returns HTMLCatch-all route or i18n intercepting before sitemap handler; diagnose with curl -I + Googlebot UA; see §10

References

Related Skills

  • website-structure: Plan page structure and URL list; sitemap reflects planned/indexable pages
  • google-search-console: Sitemap status, indexed URL count, Coverage
  • robots-txt: Reference sitemap in robots.txt
  • indexnow: Share same URL list from config
  • image-optimization: Image sitemap extension for image discovery
  • video-optimization: Video sitemap extension for video discovery
    Good AI Tools