`, or ``) - Strips navigation, headers, footers, sidebars, scripts, styles, SVGs, and forms - Converts headings, lists, links, bold, italic, code, and blockquotes to markdown syntax - Preserves code blocks intact - Normalizes whitespace and deduplicates the page title - Injects a `` tag into the HTML for discovery The result is a clean markdown file that an agent can read without wading through layout chrome. ## Cloudflare's approach: runtime readability extraction Cloudflare offers a readability extraction feature that strips HTML to readable content at request time. It is based on Mozilla's Readability library and runs on Cloudflare's edge network. The key difference is runtime versus build time. Cloudflare processes pages on every request. You do not control the exact output. The extraction algorithm decides what is content and what is noise using heuristics. ## Build-time vs runtime: why it matters agentmarkup (build-time) Cloudflare (runtime) When it runs Once, during build Every request Output control You see the .md files in your build output Opaque, algorithm decides Consistency Deterministic, same output every build May vary with algorithm updates Performance cost Zero runtime cost Added latency per request Works with SPAs Yes, uses noscript fallback or pre-rendered HTML Depends on SSR availability Discovery Link tag in HTML head + static .md URL Special URL parameter or header Vendor lock-in None, output is static files Requires Cloudflare Customization Choose which pages, preserve existing .md files All or nothing ## Why build-time can be a good fit for your own content Cloudflare's runtime extraction makes sense for consuming other people's content, like a reader mode. For your own website, build-time generation can be a better fit because: - **You control the output.** If the markdown is wrong, you can debug it. You see the actual.md files in your build directory. - **It works with client-rendered apps.** agentmarkup checks for noscript fallback content in SPAs and uses it when the rendered body is thin. Runtime extractors often get empty content from JavaScript-rendered pages. - **No vendor dependency.** The markdown files are static. Deploy them anywhere. They work on Cloudflare Pages, Netlify, Vercel, S3, or any static host. - **Integrated with the rest of the stack.** Markdown mirrors work alongside llms.txt, JSON-LD, and robots.txt. One config, one build, everything consistent. ## How agentmarkup reduces the downside Public markdown mirrors do create tradeoffs. The main risks are duplicate fetches, indexing ambiguity, and output drift if the markdown becomes a second source of truth. agentmarkup tries to keep those risks contained by generating the mirrors from final built HTML, preserving HTML as the canonical page, and writing canonical headers from each `.md` file back to the HTML route. If your raw HTML is already substantial, you can also keep `llms.txt` pointing at HTML by setting`llmsTxt.preferMarkdownMirrors` to `false`. ## What the output looks like For a blog post with a title, description, headings, and paragraphs, the generated markdown looks like: ``` # Why llms.txt matters > LLMs answer questions by synthesizing web content. llms.txt gives them a structured overview. Source: https://example.com/blog/why-llms-txt-matters/ ## The shift from search engines to AI answers For two decades, the path to online visibility was clear: optimize for Google... ## What is llms.txt? llms.txt is a proposed standard that gives LLMs a structured overview of your website... ``` Clean, readable, no HTML artifacts. An AI agent reading this file understands the page quickly. ## Getting started Add `markdownPages: { enabled: true }` to your agentmarkup config when your raw HTML needs a cleaner machine-facing fetch path. On the next build, every HTML page in your output gets a companion `.md` file. When markdown mirrors are enabled, same-site page entries in `llms.txt` also default to the generated markdown URLs so cold agents discover the cleaner fetch path first. Check the [llms.txt guide](/docs/llms-txt/) for the opt-out if you want HTML-first links instead. If your site already serves rich raw HTML, you do not need to treat markdown mirrors as mandatory. They are a tactical option, not the whole product. ``` pnpm add -D @agentmarkup/next # or @agentmarkup/vite or @agentmarkup/astro ``` ## Verify the protective headers in production agentmarkup generates two sets of headers for markdown mirrors in the `_headers` file. Both are important for keeping search engines and agents on the right page. **Canonical Link headers** tell search engines that the `.md` file is a mirror of the HTML page, not a separate indexable URL. Each mirror gets its own entry: ``` # from the generated _headers file /blog/my-post.md Link: ; rel="canonical" ``` **Content-Signal headers** tell agents whether they are allowed to use the content for training, search, and input. agentmarkup generates a wildcard rule that covers all paths including `.md` files: ``` /* Content-Signal: ai-train=yes, search=yes, ai-input=yes ``` These headers only work if your hosting platform actually serves them. Cloudflare Pages, Netlify, and Vercel all support `_headers` files, but the behavior can vary. After deploying, verify that the headers are present on a live `.md` URL: ``` curl -I https://yoursite.com/blog/my-post.md # look for these in the response: # Link: ; rel="canonical" # Content-Signal: ai-train=yes, search=yes, ai-input=yes ``` If the `Link` header is missing, your host may not be applying path-specific `_headers` rules to `.md` files. Check your platform documentation or add equivalent headers through server configuration. ## Make your website machine-readable agentmarkup is an open-source build-time toolkit for Vite, Astro, and Next.js that generates llms.txt, injects JSON-LD structured data, creates optional markdown mirrors from final HTML when raw pages need a cleaner agent-facing fetch path, manages AI crawler robots.txt rules, patches optional Content-Signal and canonical mirror headers, and validates everything at build time. Zero runtime cost. Learn more GitHub ``` pnpm add -D @agentmarkup/vite # or @agentmarkup/astro or @agentmarkup/next ``` Written by [Sebastian Cochinescu](/authors/sebastian-cochinescu/) · Developer of agentmarkup Builder of developer tools for machine-readable websites. Developer of agentmarkup. Founder of Anima Felix. ## More from the blog ### How to add llms.txt, JSON-LD, and AI crawler controls to Next.js Use @agentmarkup/next to generate llms.txt, inject JSON-LD, manage AI crawler rules, and understand the dynamic SSR boundary in Next.js. March 23, 2026 · 8 min read ### When markdown mirrors help, and when they do not A practical guide to when generated markdown mirrors add signal, when HTML is already enough, and how to avoid unnecessary downsides. March 20, 2026 · 7 min read ### Is your website ready for AI? Free LLM discoverability checker Audit your website for llms.txt, JSON-LD, robots.txt, markdown mirrors, and sitemap. Free tool for e-commerce and brand websites. March 20, 2026 · 8 min read ### How to make your brand appear in AI conversations Organization schema, llms.txt, and FAQ markup make your brand visible in ChatGPT, Claude, and Perplexity answers. March 20, 2026 · 7 min read ### Why LLM-optimized e-commerce websites sell more Product JSON-LD, llms.txt, and AI crawler access make your store visible in AI product recommendations. March 20, 2026 · 8 min read ### Every AI crawler indexing your website in 2026 Complete list: GPTBot, ClaudeBot, PerplexityBot, Google-Extended, CCBot, and more. What each does and how to control access. March 20, 2026 · 8 min read ### JSON-LD structured data: the complete guide for web developers Schema types, JSON-LD vs microdata, common mistakes, and build-time validation. March 20, 2026 · 10 min read ### What is GEO? Generative Engine Optimization explained for developers What is real, what is hype, and what you can do today to make your site citeable by AI. March 20, 2026 · 7 min read ### Why llms.txt matters: making your website discoverable by AI LLMs answer questions by synthesizing web content. llms.txt gives them a structured overview of your site. March 20, 2026 · 6 min read