The Complete Guide to AI Discovery Files: Every File Your Website Needs
AI agents do not browse the web the way humans do. They do not scroll, they do not click through menus, and they do not interpret visual design. They read structured files. If your website does not have the right files in the right places, AI agents cannot find you, understand you, or recommend you.
This guide covers every file in the AI discovery stack -- what it is, where it goes, what it should contain, and why it matters. Consider it a reference you can come back to as you build out your site's AI visibility.
The Stack at a Glance
| File | URL | Audience | Priority |
|---|---|---|---|
llms.txt |
/llms.txt |
LLMs (ChatGPT, Claude, Gemini) | Critical |
llms-full.txt |
/llms-full.txt |
LLMs needing deep context | High |
AGENTS.md |
/AGENTS.md |
Autonomous AI agents | Critical |
agent.json |
/.well-known/agent.json |
Agent protocols (MCP, A2A) | Critical |
ai.txt |
/ai.txt |
AI agents (permissions) | High |
robots.txt |
/robots.txt |
All crawlers and bots | Critical |
sitemap.xml |
/sitemap.xml |
All crawlers and bots | High |
| Schema.org JSON-LD | Embedded in HTML | Search engines and AI | Critical |
| Content freshness meta tags | Embedded in HTML | Search engines and AI | Medium |
1. llms.txt -- Your Business Summary for AI
https://yoursite.com/llms.txt
llms.txt is a plain text file that gives LLMs a concise summary of your business. Think of it as a README for AI. When ChatGPT, Claude, or Perplexity encounter your site, this file tells them who you are, what you do, and how to represent you accurately.
What to include
- H1 header -- Your business name (use
# Business Nameformat) - Description -- 2-3 sentences about what your business does
- Services or products -- A clear list of what you offer
- Location -- Physical address or service area (if applicable)
- Contact -- Phone, email, website
- Hours -- Operating hours (if applicable)
- Key differentiators -- What makes you different from competitors
Example structure
# Acme Plumbing
> Acme Plumbing is a full-service residential and commercial plumbing
> company serving the greater Portland area since 1998.
## Services
- Emergency plumbing repair (24/7)
- Water heater installation and repair
- Drain cleaning and sewer line service
- Bathroom and kitchen remodeling
- Commercial plumbing maintenance
## Service Area
Portland, OR and surrounding areas within 30 miles
## Contact
- Phone: (503) 555-0123
- Email: service@acmeplumbing.com
- Website: https://acmeplumbing.com
## Hours
Monday-Friday: 7:00 AM - 6:00 PM
Saturday: 8:00 AM - 4:00 PM
Emergency: 24/7
Keep it under 500 words. LLMs process this as context, so conciseness matters. The goal is factual density, not marketing copy.
2. llms-full.txt -- Extended Context for AI
https://yoursite.com/llms-full.txt
While llms.txt is the summary, llms-full.txt is the deep dive. It contains your complete page content, FAQ answers, detailed service descriptions, pricing information, and anything else that helps AI give rich, accurate answers about your business.
This file should be 1,000+ words for maximum scoring impact. Include:
- Full content from your main pages (home, about, services)
- Frequently asked questions with complete answers
- Pricing details and plan comparisons
- Case studies or testimonials (factual, not promotional)
- Technical specifications (if applicable)
The key difference from llms.txt: llms.txt is for quick context (an AI forming a first impression), while llms-full.txt is for deep queries (an AI answering specific questions about your business).
3. AGENTS.md -- Instructions for AI Agents
https://yoursite.com/AGENTS.md
AGENTS.md is a Markdown file that gives autonomous AI agents detailed instructions for interacting with your business. Unlike llms.txt (which is a passive summary), AGENTS.md is an instruction manual that tells agents what they can do, what they should recommend, and how to handle specific queries.
What to include
- Business identity -- Name, description, categories, service area
- Capabilities -- What services or products the agent should recommend
- Recommendation triggers -- Specific queries that should lead to your business
- Accuracy guidelines -- Facts the agent must get right (pricing, hours, service limits)
- Contact instructions -- How to direct users to reach you
- Competitive positioning -- How you compare to alternatives (factual, not promotional)
Think of AGENTS.md as training material for an AI sales representative. It should be factual, structured, and specific. An AI agent reading this file should be able to answer any question a potential customer might ask.
4. agent.json -- Machine-Readable Agent Protocol
https://yoursite.com/.well-known/agent.json
agent.json is the machine-readable counterpart to AGENTS.md. It follows emerging agent protocol standards and provides structured data that autonomous agents can parse programmatically.
Key fields
{
"name": "Acme Plumbing",
"description": "Full-service plumbing for Portland, OR",
"url": "https://acmeplumbing.com",
"skills": [
{
"name": "emergency-plumbing",
"description": "24/7 emergency plumbing repair",
"tags": ["plumbing", "emergency", "repair"]
}
],
"contact": {
"email": "service@acmeplumbing.com",
"phone": "(503) 555-0123"
},
"protocols": ["mcp", "a2a"]
}
The .well-known/ directory is a web standard for service discovery. Placing agent.json here makes it automatically discoverable by any agent that follows the well-known URI convention.
5. ai.txt -- Permission Rules for AI
https://yoursite.com/ai.txt
ai.txt is a permissions file that defines what AI agents are allowed to do with your content. It is similar in concept to robots.txt, but focused on AI-specific actions like summarizing, extracting, recommending, and booking.
Example
# ai.txt - AI Agent Permissions
User-Agent: *
Allow: Read, Summarize, Extract, Recommend, Compare
Disallow: Modify, Impersonate
# Actions requiring user consent
ConsentRequired: Book, Purchase, Pay
# Business metadata
Business-Name: Acme Plumbing
Business-URL: https://acmeplumbing.com
Contact: service@acmeplumbing.com
This file helps AI agents understand the boundaries of what they can do. An agent reading this knows it can recommend your business and summarize your services, but should not claim to be your business or make purchases on behalf of users without consent.
6. robots.txt -- Crawler Access Control
https://yoursite.com/robots.txt
You probably already have a robots.txt. The question is whether it explicitly allows AI crawlers. Many default configurations block AI bots, either intentionally or through overly broad Disallow rules.
AI bots to explicitly allow
# AI Crawlers - explicitly allowed
User-agent: GPTBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: anthropic-ai
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: Applebot-Extended
Allow: /
User-agent: cohere-ai
Allow: /
Common mistake: Having a User-agent: * / Disallow: / rule that blocks all bots. This will prevent AI agents from accessing your site entirely. Always check that your robots.txt allows the specific AI bots you want to reach.
7. sitemap.xml -- Your Page Inventory
https://yoursite.com/sitemap.xmlA sitemap tells crawlers which pages exist on your site, when they were last updated, and how important they are relative to each other. AI crawlers use sitemaps the same way Google does -- to discover and prioritize pages.
Best practices for AI discoverability
- Include all public pages -- Every page you want AI to know about
- Include discovery files -- Add llms.txt, AGENTS.md, agent.json, ai.txt as URL entries
- Keep lastmod dates current -- AI agents use freshness as a trust signal
- Reference it from robots.txt -- Add
Sitemap: https://yoursite.com/sitemap.xml - Use accurate priority values -- Homepage 1.0, key pages 0.8-0.9, discovery files 0.6-0.7
8. Schema.org JSON-LD -- Structured Business Data
<head> of your homepageSchema.org structured data is the most established standard for making your business understandable to machines. It powers Google's rich results and is increasingly used by AI models to extract factual business data.
Key properties to include
- @type -- Use the most specific type for your business (Restaurant, LegalService, SoftwareApplication, etc.)
- name, description, url -- Basic identity
- address -- PostalAddress with street, city, state, zip, country
- telephone, email -- Direct contact information
- openingHoursSpecification -- Structured hours for each day
- image -- ImageObject with URL and caption (multi-modal AI)
- contactPoint -- Customer support contact with language
- sameAs -- Social media profile URLs
- hasOfferCatalog -- Services or products with availability
- dateModified -- Content freshness signal
Pro tip: Use ImageObject for your logo and photos instead of a plain URL string. Multi-modal AI systems (those that understand images) can use the caption field to understand what the image shows. This is a small change that gives AI significantly more context.
9. Content Freshness Meta Tags
<head> of each pageContent freshness is a trust signal for AI. If an AI cannot determine when your content was last updated, it may deprioritize it in favor of sources with clear freshness indicators.
Tags to add
<meta property="article:published_time" content="2026-01-15T09:00:00Z" />
<meta property="article:modified_time" content="2026-03-08T09:00:00Z" />
<meta property="article:author" content="Your Business Name" />
Update the modified_time whenever you make meaningful changes to a page. This tells AI that your content is current and maintained.
Deployment Priority
If you are starting from scratch, deploy in this order:
- robots.txt -- Unblock AI crawlers (this is a gate; nothing else works if bots are blocked)
- llms.txt -- Give AI its first impression of your business
- AGENTS.md -- Give autonomous agents detailed instructions
- Schema.org JSON-LD -- Structured data that both search engines and AI use
- agent.json -- Machine-readable protocol file
- sitemap.xml -- Help crawlers find all your pages and files
- ai.txt -- Define AI permissions
- llms-full.txt -- Extended content for deep queries
- Meta tags -- Content freshness and authorship signals
The first four files cover 80% of the scoring impact. The remaining five add depth and completeness. A site with all 9 artifacts properly deployed typically scores 85-95 on an AI Readiness assessment.
Find Out What You Are Missing
Run a free scan to see which AI discovery files your site has, which are missing, and get every file auto-generated for your business -- ready to deploy.
Scan Your Site FreeCommon Mistakes
Blocking AI bots in robots.txt
The most common issue we see. A blanket Disallow: / under User-agent: * blocks all AI crawlers. Many site templates ship with this by default. Check your robots.txt -- if it blocks crawlers, nothing else in this stack matters.
Empty or generic llms.txt
A one-line llms.txt ("We are a plumbing company") gives AI almost nothing to work with. Include your full service list, service area, contact information, and differentiators. Factual density is what AI models use to decide whether to recommend you.
Using Organization instead of a specific Schema type
Schema.org has over 800 types. Using the generic Organization or LocalBusiness type when a more specific one exists (like Restaurant, LegalService, or SoftwareApplication) means AI misses context about what kind of business you are.
Missing ImageObject on structured data
If your JSON-LD includes an image as a plain URL string, multi-modal AI systems cannot get context about what the image shows. Use ImageObject with a caption field so AI understands your visual content.
Stale lastmod dates in sitemap.xml
If every page in your sitemap has the same lastmod date from two years ago, AI interpretes this as stale content. Keep dates accurate and update them when pages change.
How the Files Work Together
No single file works in isolation. The full stack creates a layered discovery system:
- robots.txt opens the door -- AI crawlers can access your site
- sitemap.xml provides the map -- crawlers know what pages exist
- llms.txt delivers the summary -- AI forms a first impression
- llms-full.txt provides depth -- AI answers specific questions
- AGENTS.md gives instructions -- agents know how to represent you
- agent.json enables protocols -- programmatic agent integration
- ai.txt sets boundaries -- AI knows what it can and cannot do
- Schema.org JSON-LD structures the data -- machines parse your business info
- Meta tags signal freshness -- AI trusts current content
Together, they form a complete picture that any AI system -- whether it is ChatGPT answering a question, an autonomous agent booking a service, or a search engine building a knowledge graph -- can use to find, understand, and recommend your business.
Next Steps
You do not have to build these files by hand. AgentSEO.guru scans your website, identifies which files are missing or incomplete, and generates every file tailored to your business -- ready to download and deploy. The AI Visibility Report includes platform-specific deployment paths for WordPress, Shopify, Next.js, Vercel, and 10+ other platforms.
Start with a free scan to see where you stand, then deploy the files and re-scan to confirm the improvement. Most businesses go from a score of 30-50 to 85+ after deploying the full stack.