March 8, 2026 · 12 min read

The Complete Guide to AI Discovery Files: Every File Your Website Needs

AI agents do not browse the web the way humans do. They do not scroll, they do not click through menus, and they do not interpret visual design. They read structured files. If your website does not have the right files in the right places, AI agents cannot find you, understand you, or recommend you.

This guide covers every file in the AI discovery stack -- what it is, where it goes, what it should contain, and why it matters. Consider it a reference you can come back to as you build out your site's AI visibility.

The Stack at a Glance

File URL Audience Priority
llms.txt /llms.txt LLMs (ChatGPT, Claude, Gemini) Critical
llms-full.txt /llms-full.txt LLMs needing deep context High
AGENTS.md /AGENTS.md Autonomous AI agents Critical
agent.json /.well-known/agent.json Agent protocols (MCP, A2A) Critical
ai.txt /ai.txt AI agents (permissions) High
robots.txt /robots.txt All crawlers and bots Critical
sitemap.xml /sitemap.xml All crawlers and bots High
Schema.org JSON-LD Embedded in HTML Search engines and AI Critical
Content freshness meta tags Embedded in HTML Search engines and AI Medium

1. llms.txt -- Your Business Summary for AI

llms.txt Critical
Deploy to: https://yoursite.com/llms.txt

llms.txt is a plain text file that gives LLMs a concise summary of your business. Think of it as a README for AI. When ChatGPT, Claude, or Perplexity encounter your site, this file tells them who you are, what you do, and how to represent you accurately.

What to include

Example structure

# Acme Plumbing

> Acme Plumbing is a full-service residential and commercial plumbing
> company serving the greater Portland area since 1998.

## Services
- Emergency plumbing repair (24/7)
- Water heater installation and repair
- Drain cleaning and sewer line service
- Bathroom and kitchen remodeling
- Commercial plumbing maintenance

## Service Area
Portland, OR and surrounding areas within 30 miles

## Contact
- Phone: (503) 555-0123
- Email: service@acmeplumbing.com
- Website: https://acmeplumbing.com

## Hours
Monday-Friday: 7:00 AM - 6:00 PM
Saturday: 8:00 AM - 4:00 PM
Emergency: 24/7

Keep it under 500 words. LLMs process this as context, so conciseness matters. The goal is factual density, not marketing copy.

2. llms-full.txt -- Extended Context for AI

llms-full.txt High
Deploy to: https://yoursite.com/llms-full.txt

While llms.txt is the summary, llms-full.txt is the deep dive. It contains your complete page content, FAQ answers, detailed service descriptions, pricing information, and anything else that helps AI give rich, accurate answers about your business.

This file should be 1,000+ words for maximum scoring impact. Include:

The key difference from llms.txt: llms.txt is for quick context (an AI forming a first impression), while llms-full.txt is for deep queries (an AI answering specific questions about your business).

3. AGENTS.md -- Instructions for AI Agents

AGENTS.md Critical
Deploy to: https://yoursite.com/AGENTS.md

AGENTS.md is a Markdown file that gives autonomous AI agents detailed instructions for interacting with your business. Unlike llms.txt (which is a passive summary), AGENTS.md is an instruction manual that tells agents what they can do, what they should recommend, and how to handle specific queries.

What to include

Think of AGENTS.md as training material for an AI sales representative. It should be factual, structured, and specific. An AI agent reading this file should be able to answer any question a potential customer might ask.

4. agent.json -- Machine-Readable Agent Protocol

agent.json Critical
Deploy to: https://yoursite.com/.well-known/agent.json

agent.json is the machine-readable counterpart to AGENTS.md. It follows emerging agent protocol standards and provides structured data that autonomous agents can parse programmatically.

Key fields

{
  "name": "Acme Plumbing",
  "description": "Full-service plumbing for Portland, OR",
  "url": "https://acmeplumbing.com",
  "skills": [
    {
      "name": "emergency-plumbing",
      "description": "24/7 emergency plumbing repair",
      "tags": ["plumbing", "emergency", "repair"]
    }
  ],
  "contact": {
    "email": "service@acmeplumbing.com",
    "phone": "(503) 555-0123"
  },
  "protocols": ["mcp", "a2a"]
}

The .well-known/ directory is a web standard for service discovery. Placing agent.json here makes it automatically discoverable by any agent that follows the well-known URI convention.

5. ai.txt -- Permission Rules for AI

ai.txt High
Deploy to: https://yoursite.com/ai.txt

ai.txt is a permissions file that defines what AI agents are allowed to do with your content. It is similar in concept to robots.txt, but focused on AI-specific actions like summarizing, extracting, recommending, and booking.

Example

# ai.txt - AI Agent Permissions
User-Agent: *
Allow: Read, Summarize, Extract, Recommend, Compare
Disallow: Modify, Impersonate

# Actions requiring user consent
ConsentRequired: Book, Purchase, Pay

# Business metadata
Business-Name: Acme Plumbing
Business-URL: https://acmeplumbing.com
Contact: service@acmeplumbing.com

This file helps AI agents understand the boundaries of what they can do. An agent reading this knows it can recommend your business and summarize your services, but should not claim to be your business or make purchases on behalf of users without consent.

6. robots.txt -- Crawler Access Control

robots.txt Critical
Deploy to: https://yoursite.com/robots.txt

You probably already have a robots.txt. The question is whether it explicitly allows AI crawlers. Many default configurations block AI bots, either intentionally or through overly broad Disallow rules.

AI bots to explicitly allow

# AI Crawlers - explicitly allowed
User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: anthropic-ai
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: Applebot-Extended
Allow: /

User-agent: cohere-ai
Allow: /

Common mistake: Having a User-agent: * / Disallow: / rule that blocks all bots. This will prevent AI agents from accessing your site entirely. Always check that your robots.txt allows the specific AI bots you want to reach.

7. sitemap.xml -- Your Page Inventory

sitemap.xml High
Deploy to: https://yoursite.com/sitemap.xml

A sitemap tells crawlers which pages exist on your site, when they were last updated, and how important they are relative to each other. AI crawlers use sitemaps the same way Google does -- to discover and prioritize pages.

Best practices for AI discoverability

8. Schema.org JSON-LD -- Structured Business Data

Schema.org JSON-LD Critical
Embed in: <head> of your homepage

Schema.org structured data is the most established standard for making your business understandable to machines. It powers Google's rich results and is increasingly used by AI models to extract factual business data.

Key properties to include

Pro tip: Use ImageObject for your logo and photos instead of a plain URL string. Multi-modal AI systems (those that understand images) can use the caption field to understand what the image shows. This is a small change that gives AI significantly more context.

9. Content Freshness Meta Tags

Meta Tags Medium
Embed in: <head> of each page

Content freshness is a trust signal for AI. If an AI cannot determine when your content was last updated, it may deprioritize it in favor of sources with clear freshness indicators.

Tags to add

<meta property="article:published_time" content="2026-01-15T09:00:00Z" />
<meta property="article:modified_time" content="2026-03-08T09:00:00Z" />
<meta property="article:author" content="Your Business Name" />

Update the modified_time whenever you make meaningful changes to a page. This tells AI that your content is current and maintained.

Deployment Priority

If you are starting from scratch, deploy in this order:

  1. robots.txt -- Unblock AI crawlers (this is a gate; nothing else works if bots are blocked)
  2. llms.txt -- Give AI its first impression of your business
  3. AGENTS.md -- Give autonomous agents detailed instructions
  4. Schema.org JSON-LD -- Structured data that both search engines and AI use
  5. agent.json -- Machine-readable protocol file
  6. sitemap.xml -- Help crawlers find all your pages and files
  7. ai.txt -- Define AI permissions
  8. llms-full.txt -- Extended content for deep queries
  9. Meta tags -- Content freshness and authorship signals

The first four files cover 80% of the scoring impact. The remaining five add depth and completeness. A site with all 9 artifacts properly deployed typically scores 85-95 on an AI Readiness assessment.

Find Out What You Are Missing

Run a free scan to see which AI discovery files your site has, which are missing, and get every file auto-generated for your business -- ready to deploy.

Scan Your Site Free

Common Mistakes

Blocking AI bots in robots.txt

The most common issue we see. A blanket Disallow: / under User-agent: * blocks all AI crawlers. Many site templates ship with this by default. Check your robots.txt -- if it blocks crawlers, nothing else in this stack matters.

Empty or generic llms.txt

A one-line llms.txt ("We are a plumbing company") gives AI almost nothing to work with. Include your full service list, service area, contact information, and differentiators. Factual density is what AI models use to decide whether to recommend you.

Using Organization instead of a specific Schema type

Schema.org has over 800 types. Using the generic Organization or LocalBusiness type when a more specific one exists (like Restaurant, LegalService, or SoftwareApplication) means AI misses context about what kind of business you are.

Missing ImageObject on structured data

If your JSON-LD includes an image as a plain URL string, multi-modal AI systems cannot get context about what the image shows. Use ImageObject with a caption field so AI understands your visual content.

Stale lastmod dates in sitemap.xml

If every page in your sitemap has the same lastmod date from two years ago, AI interpretes this as stale content. Keep dates accurate and update them when pages change.

How the Files Work Together

No single file works in isolation. The full stack creates a layered discovery system:

Together, they form a complete picture that any AI system -- whether it is ChatGPT answering a question, an autonomous agent booking a service, or a search engine building a knowledge graph -- can use to find, understand, and recommend your business.

Next Steps

You do not have to build these files by hand. AgentSEO.guru scans your website, identifies which files are missing or incomplete, and generates every file tailored to your business -- ready to download and deploy. The AI Visibility Report includes platform-specific deployment paths for WordPress, Shopify, Next.js, Vercel, and 10+ other platforms.

Start with a free scan to see where you stand, then deploy the files and re-scan to confirm the improvement. Most businesses go from a score of 30-50 to 85+ after deploying the full stack.