Crawlability shows you which AI bots are allowed or blocked by a site’s robots.txt file. Select a tracked domain and get results instantly with no setup or account connection required. Peec checks the domain’s robots.txt against 40+ AI bots from 20+ vendors, then shows you the status for each. We categorize bots based on publicly available data and their stated purpose. As real-world behavior becomes clearer, categories may be refined to ensure accuracy.Documentation Index
Fetch the complete documentation index at: https://docs.peec.ai/llms.txt
Use this file to discover all available pages before exploring further.
Crawlability table
The table breaks down the status for each individual bot:- Bot: the user-agent identifier (e.g., GPTBot, ClaudeBot).
- Platform: the AI vendor behind the bot (e.g., OpenAI, Anthropic, Google)
- Bot type: Training, Search, User Query, and Other
- Status: Allowed, Partial, or Blocked
- Reason: How the status was determined: explicit rules for this bot, or inherited from the global wildcard (*) rules
URL Tester
Here you can enter any URL to see which AI bots are allowed or blocked by your domain’s robots.txt rules. Simply choose a URL on your domain to analyze and see which bots are allowed or blocked from crawling it. You can then use this insight to decide whether allowing different bots to crawl that URL is beneficial.Interpreting Crawlability
If a bot is blocked, it can’t access your content. This means it can’t use your site as a source in its responses. Use Crawlability to:- Catch accidental blocking before it affects your AI visibility
- Understand which AI ecosystems can and can’t access your content
- Verify that changes to your robots.txt are working as expected
Bots
The Bots shows you which particular bots from which vendor are accessing and visiting your pages, and under which type.| AI bot | Platform | Type | Purpose / Note |
|---|---|---|---|
| YouBot | You.com | Other | Fetches pages to power You.com’s AI search results. |
| omgili | Webz.io | Training | Forum and discussion crawler for structured dataset building. |
| Perplexity-User | Perplexity | User Query | Used during a user’s Deep Research session. |
| Amazonbot | Amazon | Training | General training for Titan/Olympus models. |
| Google-Agent | User Query | Used by Google agents to navigate the web and perform actions upon user request (e.g. Project Mariner). | |
| cohere-training-data-crawler | Cohere | Training | Specialized crawler for raw training data. |
| ClaudeBot | Claude (Anthropic) | Training | Official training bot for Anthropic models. |
| Gemini-Deep-Research | User Query | High-intensity agent for user-requested research. | |
| Google-CloudVertexBot | Search | Crawling for Google Cloud Vertex AI services. | |
| Google-Extended | Training | Opt-out token for Gemini training and AI product improvement. | |
| PanguBot | PanGu (Huawei) | Training | Training for Huawei’s Pangu models. |
| ChatGPT-User | ChatGPT (OpenAI) | User Query | Visits links directly provided by a user. |
| CCBot | Common Crawl | Training | Massive open-source web archive for AI labs. |
| GrokBot | Grok (xAI) | Training | Real-time web search and training for Grok 3/4 models. |
| DuckAssistBot | DuckDuckGo | User Query | Summarizes pages for DuckDuckGo’s AI responses. |
| omgilibot | Webz.io | Other | Forum-specific crawler variant. Commercial data product. |
| Diffbot | Diffbot | Training | Structured data extraction as a service. |
| GoogleAgent-Mariner | User Query | Action Agent: Can fill forms and click buttons. | |
| TikTokSpider | ByteDance | Other | Specialized scraper for TikTok’s AI data. |
| Webzio-Extended | Webz.io | Training | Large-scale data scraping for AI providers. |
| Bytespider | ByteDance | Training | Training for TikTok and ByteDance AI. |
| Applebot-Extended | Apple | Training | Used for training Apple’s generative features. |
| OAI-SearchBot | ChatGPT (OpenAI) | Search | Real-time retriever for ChatGPT answers. |
| DeepSeekBot | DeepSeek | Training | Training for the DeepSeek model series. |
| PerplexityBot | Perplexity | Search | Fact-checking and retrieval for Perplexity. |
| Claude-Web | Claude (Anthropic) | Other | Legacy bot for web browsing during Claude interactions. |
| Grok-DeepSearch | Grok (xAI) | Search | Real-time web search for Grok’s deep research feature. |
| Ai2Bot-Dolma | Allen Institute | Training | Specifically builds the Dolma open dataset. |
| Manus-User | Meta | User Query | Action Agent: Navigates and interacts with sites. |
| FacebookBot | Meta | Training | Web crawler used by Meta for AI training data collection. |
| AzureAI-SearchBot | Microsoft | Search | Web retrieval for Azure AI services. |
| xAI-Grok | Grok (xAI) | Search | General-purpose web search bot for xAI/Grok. |
| Timpibot | Timpi | Training | Decentralized search engine for AI. |
| Claude-SearchBot | Claude (Anthropic) | Search | Anthropic’s specific bot for its search features. |
| MistralAI-User | Mistral | User Query | On-demand browser for Mistral users. |
| Claude-User | Claude (Anthropic) | User Query | Triggered when a user prompts with a specific link. |
| Amzn-SearchBot | Amazon | Search | Search bot for Amazon’s AI shopping features. |
| MyCentralAIScraperBot | Unknown | Other | Centralized AI data collection tool. |
| GPTBot | ChatGPT (OpenAI) | Training | Primary crawler for foundational training. |
| anthropic-ai | Claude (Anthropic) | Training | General data collection and model training. |
| meta-webindexer | Meta | Search | Search indexing for Meta’s AI assistants. |
| NovaAct | Amazon | User Query | Agent for automated web-based workflows. |
| meta-externalfetcher | Meta | User Query | Used for real-time link expansion on Meta. |
| CloudVertexBot | Training | Cloud-based AI deployment and indexing. | |
| Ai2Bot | Allen Institute | Training | General-purpose web crawler for Allen Institute AI research. |
| Meta-ExternalAgent | Meta | Training | High-velocity training crawler for Llama. |
| quillbot.com | QuillBot | User Query | Fetches content to power QuillBot’s AI writing tools. |
| Applebot | Apple | Search | Gathers data to power Spotlight, Siri, and Safari search functionality. |
| cohere-ai | Cohere | Training | Training for enterprise-grade LLMs. |
