How to Track Bot Crawls on Your Website: A Complete Guide

Have you ever wondered which bots are crawling your website? With the rise of AI and LLM bots, understanding your site's bot traffic has become more important than ever. In this guide, I'll show you how to build a comprehensive bot tracking system using Cloudflare Workers and a simple dashboard.
Why Track Bot Crawls?
Bot tracking helps you understand:
- Which AI models are crawling your content
- How search engines index your site
- When and how often bots visit
- Potential security threats
- Content optimization opportunities
The Solution: Cloudflare Worker + Dashboard
We'll build a system that:
- Tracks bot crawls in real-time
- Categorizes bots (LLM, Search, Other)
- Provides hourly and daily statistics
- Shows historical data
- Updates automatically
Step 1: Set Up the Cloudflare Worker
First, create a new Cloudflare Worker with this configuration:
# wrangler.toml
name = "bot-crawl-tracker"
main = "botcrawl.js"
compatibility_date = "2024-01-01"
[[kv_namespaces]]
binding = "BOT_CRAWLS"
id = "your-kv-id"
Step 2: Implement the Worker Code
Here's the core worker code that tracks bot crawls:
// botcrawl.js
const BOT_CATEGORIES = {
llm: ['gptbot', 'claudebot'],
search: ['googlebot', 'bingbot', 'yandexbot'],
other: ['ccBot', 'amazonbot', 'bytespider', 'googleExtended']
};
async function handleRequest(request) {
const url = new URL(request.url);
const userAgent = request.headers.get('user-agent') || '';
const bot = identifyBot(userAgent);
if (bot) {
await trackBotCrawl(bot);
}
return new Response('OK', { status: 200 });
}
function identifyBot(userAgent) {
const ua = userAgent.toLowerCase();
for (const [bot, patterns] of Object.entries(BOT_PATTERNS)) {
if (patterns.some(pattern => ua.includes(pattern))) {
return bot;
}
}
return null;
}
async function trackBotCrawl(bot) {
const now = new Date();
const hour = now.getUTCHours().toString();
const date = now.toISOString().split('T')[0];
// Update hourly stats
const hourlyKey = `hourly:${date}:${hour}`;
let hourlyData = await BOT_CRAWLS.get(hourlyKey, { type: 'json' }) || {};
hourlyData[bot] = (hourlyData[bot] || 0) + 1;
await BOT_CRAWLS.put(hourlyKey, JSON.stringify(hourlyData));
// Update daily stats
const dailyKey = `daily:${date}`;
let dailyData = await BOT_CRAWLS.get(dailyKey, { type: 'json' }) || {};
dailyData[bot] = (dailyData[bot] || 0) + 1;
await BOT_CRAWLS.put(dailyKey, JSON.stringify(dailyData));
}
addEventListener('fetch', event => {
event.respondWith(handleRequest(event.request));
});
Step 3: Create the Dashboard
The dashboard provides a beautiful interface to view bot crawl data. Here's the key HTML structure:
<div class="container">
<!-- Stats Overview -->
<div class="row">
<div class="col-md-4">
<div class="stat-card">
<div class="stat-value" id="totalCrawls">0</div>
<div class="stat-label">Total Crawls Today</div>
</div>
</div>
<!-- More stat cards... -->
</div>
<!-- Charts -->
<div class="row">
<div class="col-md-8">
<canvas id="crawlsChart"></canvas>
</div>
<div class="col-md-4">
<canvas id="distributionChart"></canvas>
</div>
</div>
</div>
Step 4: Add Real-time Updates
The dashboard updates automatically every 5 minutes:
// Update dashboard every 5 minutes
setInterval(initializeDashboard, 5 * 60 * 1000);
async function initializeDashboard() {
const data = await fetchBotData();
updateStats(data);
updateCharts(data);
updateBotList(data);
}
Understanding the Data
The dashboard shows several key metrics:
- Total Crawls: All bot visits in the last 24 hours
- Unique Bots: Different types of bots detected
- LLM Bots: AI model crawlers (GPT, Claude, etc.)
- Hourly Activity: Crawl patterns throughout the day
- Bot Distribution: Breakdown of bot types
Bot Categories
We track three main categories:
1. LLM Bots
- GPT Bot
- Claude Bot
- Other AI crawlers
2. Search Bots
- Google Bot
- Bing Bot
- Yandex Bot
3. Other Bots
- Amazon Bot
- CC Bot
- ByteSpider
- Google Extended
Visualizing the Data
The dashboard includes three main charts:
- Crawls Over Time: Line chart showing hourly activity
- Bot Distribution: Doughnut chart of top 10 bots
- Activity by Type: Bar chart showing LLM vs Search vs Other
Benefits of This System
- Real-time Monitoring: See bot activity as it happens
- Cost-effective: Uses Cloudflare's free tier
- Easy to Deploy: Simple setup with minimal configuration
- Scalable: Handles high traffic without issues
- Insightful: Provides valuable data for SEO and content strategy
Next Steps
To enhance your bot tracking:
- Add more bot patterns to detect
- Implement rate limiting
- Add email notifications for unusual activity
- Create custom reports
- Track bot behavior patterns
Conclusion
Tracking bot crawls is essential for understanding how AI models and search engines interact with your content. This system provides a solid foundation that you can build upon based on your specific needs.
The code is available on GitHub if you want to implement it yourself or contribute improvements.