Make Your Website Agent-Ready: Crawlers, llms.txt, Structured Data, and Commerce Protocols
A practical, hype-free guide to preparing websites for AI agents with correct crawler policies, llms.txt, structured data, MCP, ACP, UCP, and measurable outcomes.

The question businesses ask about visibility is changing. It used to be, "How do we rank higher on Google?" Now it is often, "Why does an AI assistant recommend our competitor instead of us?" For commerce sites, the stakes are higher: an agent may compare products, prices, availability, delivery terms, and policies before a person ever visits a homepage.
That does not make traditional SEO obsolete. It adds another reader to the website: software acting on a person's behalf. An agent does not care about a hero animation. It cares whether it can access the page, understand the data, identify the canonical source, and complete the requested action safely.
The practical definition of an agent-ready website: a machine can discover trustworthy information, interpret it without guessing, and use supported actions without bypassing business rules.
There is plenty of hype around this subject. Some vendors reduce "agent readiness" to publishing one text file. The real work is more structural. A few changes are worth making now; others only make sense when agent-driven commerce is a genuine acquisition channel for the business.
What "Agentic" Actually Changes
For years, the dominant journey was predictable: a person searched, opened several links, compared the options, and made a decision. The website had to earn the click and persuade the visitor after arrival.
Agents compress that journey. A person can ask for "a VPS under thirty dollars per month, hosted in Europe, with clear backup terms" or "a condolence arrangement available for delivery today." The agent may inspect several sources, reject incomplete offers, and return a shortlist. The human sees the summary first and may visit only the final candidates.
This changes the optimization question. Traffic still matters, but so do machine-readable accuracy, source authority, and action readiness. A page that looks excellent but hides its price in an image, contradicts its structured data, or has no clear delivery policy is difficult for both agents and customers to trust.
Step One: Audit Crawler Access Correctly
Before adding new protocols, inspect robots.txt, CDN bot controls, firewall rules, and server logs. A crawler cannot use a page it cannot fetch. But do not treat every AI-related user agent as if it serves the same purpose.
OpenAI documents separate controls for OAI-SearchBot and GPTBot. OAI-SearchBot relates to surfacing websites in ChatGPT search, while GPTBot controls potential use of crawled content for training foundation models. A site can allow the former and disallow the latter. These are independent policy choices.
Google's Google-Extended control also needs careful wording. It is a robots.txt control token, not a separate HTTP crawler user agent, and Google states that it does not affect inclusion or ranking in Google Search.
An intentional policy might look like this:
User-agent: OAI-SearchBot
Allow: /
User-agent: GPTBot
Disallow: /
That example is not a universal recommendation. Legal, licensing, privacy, and commercial requirements differ. The important thing is to make the decision deliberately instead of inheriting an old blanket rule from a security plugin.
What to verify
- The important public pages return
200without requiring cookies or JavaScript. robots.txtreflects the business's actual search and AI policies.- The CDN does not challenge legitimate crawlers with an interactive CAPTCHA.
- Canonical URLs are crawlable and do not redirect through unnecessary tracking links.
- Server logs confirm whether the relevant bots reach product, service, and policy pages.
The Description Layer: llms.txt Without the Magic Claims
The llms.txt proposal describes a Markdown file at the root of a domain that gives language models a curated map of useful content. It can identify the organization, explain what the site provides, and point to authoritative documentation, policies, products, or API references.
It is useful because websites often contain many pages with overlapping messages. A concise map can direct an agent toward the sources the business considers canonical. It is especially sensible for technical products, documentation-heavy services, and sites with APIs.
It is not, however, a proven ranking shortcut for AI citations. Publishing /llms.txt does not repair inaccessible pages, weak product data, contradictory prices, or missing structured data. Treat it as low-cost machine-oriented documentation, not as a replacement for technical SEO.
A minimal file can be simple:
# Example Company
> A short, factual description of the business and its market.
## Products
- [Product catalog](https://example.com/products)
## Policies
- [Delivery](https://example.com/delivery)
- [Returns](https://example.com/returns)
## Support
- [Contact](https://example.com/contact)
Write it by hand or review generated output carefully. A sitemap generator knows which URLs exist; it does not know which pages are commercially important, legally authoritative, or safe for an agent to rely on.
What About agents.md?
agents.md is useful in software repositories as a convention for giving coding agents project instructions. On public commerce websites, it is not a universally adopted discovery standard comparable to robots.txt or Schema.org.
A business can still publish machine-facing action documentation, but it should not assume that external agents will automatically discover or obey a root-level /agents.md. Action capabilities are better described through the protocol or API that actually exposes them, with authentication, permissions, and error behavior defined there. If you keep an agents.md file, treat it as supplementary documentation rather than the foundation of the integration.
The Data Layer: Structured Data Must Match Reality
The description layer tells a machine where to look. Structured data helps it interpret what it finds. For commerce pages, that usually means appropriate Schema.org types such as Product, Offer, AggregateRating, and BreadcrumbList, using fields that genuinely match the rendered page and backend state.
The key phrase is match reality. Price, currency, availability, condition, delivery information, and review totals should not disagree across visible HTML, JSON-LD, feeds, and checkout. An agent that sees conflicting facts cannot reliably recommend or transact.
Structured data is also not limited to stores. Service businesses can clarify organization details, service areas, FAQs, contact points, and page relationships. The goal is not to add every possible property. It is to make the important facts explicit, current, and internally consistent.
A reliable product-data checklist
- Stable product identifiers and canonical URLs
- Current price and currency
- Variant-specific availability
- Accurate images and descriptive alternative text
- Delivery, return, and cancellation terms
- Review counts that match visible reviews
- Consistent data across HTML, schema, feeds, and APIs
The Action Layer: MCP, ACP, UCP, and AP2
Structured pages help an agent understand an offer. Protocols and APIs let it perform controlled actions. These technologies overlap, but they are not interchangeable.
MCP: tools and context, not a commerce system by itself
Model Context Protocol is a general protocol for connecting AI applications to tools and data sources. A commerce implementation can expose tools for product search, stock checks, cart creation, or support lookup, but MCP itself does not define the complete commercial lifecycle. The business still owns authentication, authorization, validation, pricing rules, and audit logs.
ACP: commerce infrastructure for ChatGPT
OpenAI describes the Agentic Commerce Protocol as infrastructure between merchants and shoppers in ChatGPT. Its merchant integration model covers product discovery and commerce flows while leaving the merchant responsible for authoritative catalog data and order handling. It matters when ChatGPT is an intentional sales channel, not merely because a site wants to appear in AI answers.
UCP: a broader commerce lifecycle
The Universal Commerce Protocol defines building blocks for agentic commerce across discovery, cart, checkout, identity linking, orders, and post-purchase support. Its specification is designed to work with established transports and related standards, including MCP and AP2.
Shopify's current agentic commerce documentation describes UCP-based experiences and UCP-compliant MCP servers for discovery, cart, checkout, and order workflows. That is a platform capability, not permission to assume every storefront is automatically configured, eligible, and exposed in every agent channel. Merchants still need to verify their actual setup and data quality.
AP2: verifiable payment authorization
Agent Payments Protocol (AP2) focuses on the authorization layer: how a user can provide verifiable intent for an agent-mediated payment. It complements commerce protocols; it does not replace the merchant's checkout, fraud controls, payment processor, or order system.
Do not implement a protocol because its acronym is fashionable. Implement it when a supported agent channel can create measurable value and the business can operate the resulting orders safely.
What Is Realistic on Shopify, WooCommerce, and Custom Builds?
Shopify
Shopify is moving quickly on agentic commerce and provides documented building blocks for product discovery and transaction flows. Merchants should first ensure that product, inventory, market, shipping, and policy data inside Shopify is complete. Platform support is valuable only when the underlying catalog is reliable.
WooCommerce
WooCommerce gives the site owner control over the web root and REST infrastructure, so publishing llms.txt, improving schema, or building a dedicated integration is technically straightforward. The harder part is operational: plugin conflicts, caching, security rules, variant data, and extensions that each believe they own the same field.
For a small catalog, correct crawler access, schema, feeds, and policy pages may deliver more value than a custom transaction protocol. A custom endpoint becomes reasonable when product volume, order frequency, or a strategic partner channel justifies its maintenance cost.
Custom platforms
A custom application offers the most control: live catalog queries, purpose-built tools, precise permissions, and consistent observability. It also creates the most responsibility. Every endpoint needs authentication, rate limits, input validation, idempotency, audit logs, safe failure states, and a versioning policy.
The best custom architecture does not let an agent write directly to the database. It exposes narrow business actions such as search_products, check_inventory, create_cart, or request_quote, and applies the same rules used by the human-facing application.
A Sensible Implementation Order
If we were preparing an existing site for agents, we would work in this order:
- Audit access. Check robots rules, CDN challenges, redirects, canonical pages, and server logs.
- Fix the source data. Make prices, availability, identifiers, policies, and contact details consistent.
- Validate rendered structured data. Test actual product and service pages, not only templates.
- Create a curated
llms.txt. Point agents toward authoritative, commercially important pages. - Document actions. Define what an agent may read or do, including authentication and failure behavior.
- Add protocols only for a real channel. Build ACP, UCP, MCP, or payment integrations when the distribution opportunity justifies production ownership.
- Monitor continuously. Track crawler access, tool errors, stale data, abandoned actions, and completed outcomes.
Notice what is not first: the fashionable file or protocol. Agent readiness starts with trustworthy pages and data. The machine-facing extras amplify that foundation; they cannot substitute for it.
How to Audit Whether the Work Is Paying Off
Do not measure success only by whether /llms.txt exists. Track outcomes that connect implementation work to visibility and revenue:
- AI crawler requests and response quality in server logs
- Mentions and citations for representative customer questions
- Referral traffic from AI search and assistant products
- Product-feed errors and schema validation failures
- Agent tool success, latency, and abandonment rates
- Assisted leads, carts, orders, and revenue
- Incorrect recommendations caused by stale or ambiguous data
This also creates a feedback loop. If agents repeatedly ask for information that the site does not expose cleanly, that is not just an AI problem. Human customers probably struggle to find it too.
The Honest Bottom Line
The agentic web is real, but most businesses do not need every protocol today. They do need a website that machines and people can trust: accessible canonical pages, accurate structured data, explicit policies, and consistent backend facts.
Start there. Add llms.txt as curated documentation, not a ranking promise. Treat agents.md as an optional convention, not a universal web standard. Build transaction integrations only when a supported channel and business case exist.
The unglamorous foundation is what makes everything else possible. It improves search, accessibility, integrations, customer confidence, and future agent workflows at the same time.
If you want to see what an agent can actually understand and do on your website today, DigiForge can audit crawler access, structured data, machine-facing documentation, product feeds, and transaction readiness. You will get a prioritized implementation plan rather than a bundle of fashionable files.


