AnyCrawl

Webhooks

Receive real-time notifications for all AnyCrawl events including scraping, crawling, search, and scheduled tasks.

Introduction

Webhooks allow you to receive real-time HTTP notifications when events occur in your AnyCrawl account. Instead of polling for updates, AnyCrawl automatically sends POST requests to your specified endpoint when events happen.

Key Features: Subscribe to multiple event types, HMAC-SHA256 signature verification, automatic retry with exponential backoff, delivery history tracking, and private IP protection.

Core Features

  • Event Subscriptions: Subscribe to scraping, crawling, search, scheduled task, and system events
  • Secure Delivery: HMAC-SHA256 signature verification for authenticity
  • Automatic Retries: Exponential backoff retry mechanism for failed deliveries
  • Delivery Tracking: Complete history of all webhook deliveries
  • Scope Filtering: Subscribe to all events or specific tasks only
  • Custom Headers: Add custom HTTP headers to webhook requests
  • Private IP Protection: Built-in protection against SSRF attacks

API Endpoints

POST   /v1/webhooks                              # Create webhook subscription
GET    /v1/webhooks                              # List all webhooks
GET    /v1/webhooks/:webhookId                   # Get webhook details
PUT    /v1/webhooks/:webhookId                   # Update webhook
DELETE /v1/webhooks/:webhookId                   # Delete webhook
GET    /v1/webhooks/:webhookId/deliveries        # Get delivery history
POST   /v1/webhooks/:webhookId/test              # Send test webhook
PUT    /v1/webhooks/:webhookId/activate          # Activate webhook
PUT    /v1/webhooks/:webhookId/deactivate        # Deactivate webhook
POST   /v1/webhooks/:webhookId/deliveries/:deliveryId/replay  # Replay failed delivery
GET    /v1/webhook-events                        # List supported events

Supported Events

Job Events

EventDescriptionTriggered When
scrape.createdScrape job createdNew scrape job is queued
scrape.startedScrape job startedJob begins execution
scrape.completedScrape job completedJob finishes successfully
scrape.failedScrape job failedJob encounters an error
scrape.cancelledScrape job cancelledJob is manually cancelled
crawl.createdCrawl job createdNew crawl job is queued
crawl.startedCrawl job startedJob begins execution
crawl.completedCrawl job completedJob finishes successfully
crawl.failedCrawl job failedJob encounters an error
crawl.cancelledCrawl job cancelledJob is manually cancelled

Scheduled Task Events

EventDescriptionTriggered When
task.executedTask executedScheduled task runs
task.failedTask failedScheduled task fails
task.pausedTask pausedTask is paused
task.resumedTask resumedTask is resumed

Search Events

EventDescriptionTriggered When
search.createdSearch job createdNew search job is queued
search.startedSearch job startedJob begins execution
search.completedSearch job completedJob finishes successfully
search.failedSearch job failedJob encounters an error
search.cancelledSearch job cancelledJob is manually cancelled

Test Events

EventDescriptionTriggered When
webhook.testTest eventManual test webhook is sent

Quick Start

Creating a Webhook

curl -X POST "https://api.anycrawl.dev/v1/webhooks" \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Production Notifications",
    "webhook_url": "https://your-domain.com/webhooks/anycrawl",
    "event_types": ["scrape.completed", "scrape.failed", "crawl.completed"],
    "scope": "all",
    "timeout_seconds": 10,
    "max_retries": 3
  }'

Response

{
  "success": true,
  "data": {
    "webhook_id": "webhook-uuid-here",
    "secret": "a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6q7r8s9t0u1v2w3x4y5z6",
    "message": "Webhook created successfully. Save the secret - it won't be shown again."
  }
}

Important: Save the secret immediately! It's only shown once during creation and is required for signature verification.

Request Parameters

Webhook Configuration

ParameterTypeRequiredDefaultDescription
namestringYes-Webhook name (1-255 characters)
descriptionstringNo-Webhook description
webhook_urlstringYes-Your endpoint URL (HTTPS recommended)
event_typesstring[]Yes-Array of event types to subscribe to
scopestringNo"all"Subscription scope: "all" or "specific"
specific_task_idsstring[]No-Task IDs (required if scope is "specific")

Delivery Configuration

ParameterTypeRequiredDefaultDescription
timeout_secondsnumberNo10Request timeout (1-60 seconds)
max_retriesnumberNo3Maximum retry attempts (0-10)
retry_backoff_multipliernumberNo2Retry backoff multiplier (1-10)
auto_disable_after_failuresnumberNo10Auto-disable after N consecutive failures
custom_headersobjectNo-Custom HTTP headers

The webhook will be automatically disabled after auto_disable_after_failures consecutive failures to prevent excessive retries. You can reactivate it manually after fixing the issue.

Metadata

ParameterTypeRequiredDefaultDescription
tagsstring[]No-Tags for organization
metadataobjectNo-Custom metadata

Webhook Payload Format

HTTP Headers

Every webhook request includes these headers:

Content-Type: application/json
X-AnyCrawl-Signature: sha256=abc123...
X-Webhook-Event: scrape.completed
X-Webhook-Delivery-Id: delivery-uuid-1
X-Webhook-Timestamp: 2026-01-27T10:00:00.000Z

Payload Examples

scrape.completed

{
  "job_id": "job-uuid-1",
  "status": "completed",
  "url": "https://example.com",
  "total": 10,
  "completed": 10,
  "failed": 0,
  "credits_used": 5,
  "created_at": "2026-01-27T09:00:00.000Z",
  "completed_at": "2026-01-27T10:00:00.000Z"
}

scrape.failed

{
  "job_id": "job-uuid-1",
  "status": "failed",
  "url": "https://example.com",
  "error_message": "Connection timeout",
  "credits_used": 3,
  "created_at": "2026-01-27T09:00:00.000Z",
  "completed_at": "2026-01-27T10:00:00.000Z"
}

task.executed

{
  "task_id": "task-uuid-1",
  "task_name": "Daily News Scrape",
  "execution_id": "exec-uuid-1",
  "execution_number": 45,
  "status": "completed",
  "job_id": "job-uuid-1",
  "credits_used": 5,
  "scheduled_for": "2026-01-27T09:00:00.000Z",
  "completed_at": "2026-01-27T09:02:15.000Z"
}

Signature Verification

Why Verify Signatures?

Signature verification ensures webhook requests are genuinely from AnyCrawl and haven't been tampered with, protecting against malicious requests.

Verification Algorithm

AnyCrawl uses HMAC-SHA256 to sign payloads:

signature = HMAC-SHA256(payload, webhook_secret)
header_value = "sha256=" + hex(signature)

Implementation Examples

Node.js / Express

const crypto = require('crypto');
const express = require('express');

function verifyWebhookSignature(payload, signature, secret) {
  const hmac = crypto.createHmac('sha256', secret);
  hmac.update(JSON.stringify(payload));
  const expectedSignature = `sha256=${hmac.digest('hex')}`;

  return crypto.timingSafeEqual(
    Buffer.from(signature),
    Buffer.from(expectedSignature)
  );
}

const app = express();
app.use(express.json());

app.post('/webhooks/anycrawl', (req, res) => {
  const signature = req.headers['x-anycrawl-signature'];
  const secret = process.env.WEBHOOK_SECRET;

  // Verify signature
  if (!verifyWebhookSignature(req.body, signature, secret)) {
    return res.status(401).json({ error: 'Invalid signature' });
  }

  // Extract event info
  const eventType = req.headers['x-webhook-event'];
  const deliveryId = req.headers['x-webhook-delivery-id'];

  console.log(`Received event: ${eventType}`);
  console.log(`Delivery ID: ${deliveryId}`);
  console.log('Payload:', req.body);

  // Respond quickly (< 5 seconds recommended)
  res.status(200).json({ received: true });

  // Process asynchronously
  processWebhookAsync(eventType, req.body).catch(console.error);
});

app.listen(3000);

Python / Flask

import hmac
import hashlib
import json
from flask import Flask, request, jsonify

app = Flask(__name__)
WEBHOOK_SECRET = 'your-webhook-secret-here'

def verify_webhook_signature(payload, signature, secret):
    expected_signature = 'sha256=' + hmac.new(
        secret.encode('utf-8'),
        json.dumps(payload).encode('utf-8'),
        hashlib.sha256
    ).hexdigest()

    return hmac.compare_digest(signature, expected_signature)

@app.route('/webhooks/anycrawl', methods=['POST'])
def webhook_handler():
    signature = request.headers.get('X-AnyCrawl-Signature')
    payload = request.get_json()

    # Verify signature
    if not verify_webhook_signature(payload, signature, WEBHOOK_SECRET):
        return jsonify({'error': 'Invalid signature'}), 401

    # Extract event info
    event_type = request.headers.get('X-Webhook-Event')
    delivery_id = request.headers.get('X-Webhook-Delivery-Id')

    print(f'Received event: {event_type}')
    print(f'Delivery ID: {delivery_id}')
    print(f'Payload: {payload}')

    # Respond quickly
    return jsonify({'received': True}), 200

if __name__ == '__main__':
    app.run(port=3000)

Go

package main

import (
    "crypto/hmac"
    "crypto/sha256"
    "encoding/hex"
    "encoding/json"
    "fmt"
    "io"
    "net/http"
    "os"
)

func verifyWebhookSignature(payload []byte, signature, secret string) bool {
    mac := hmac.New(sha256.New, []byte(secret))
    mac.Write(payload)
    expectedSignature := "sha256=" + hex.EncodeToString(mac.Sum(nil))
    return hmac.Equal([]byte(signature), []byte(expectedSignature))
}

func webhookHandler(w http.ResponseWriter, r *http.Request) {
    signature := r.Header.Get("X-AnyCrawl-Signature")
    eventType := r.Header.Get("X-Webhook-Event")
    secret := os.Getenv("WEBHOOK_SECRET")

    body, err := io.ReadAll(r.Body)
    if err != nil {
        http.Error(w, "Error reading body", http.StatusBadRequest)
        return
    }

    if !verifyWebhookSignature(body, signature, secret) {
        http.Error(w, "Invalid signature", http.StatusUnauthorized)
        return
    }

    var payload map[string]interface{}
    if err := json.Unmarshal(body, &payload); err != nil {
        http.Error(w, "Invalid JSON", http.StatusBadRequest)
        return
    }

    fmt.Printf("Received event: %s\n", eventType)
    fmt.Printf("Payload: %+v\n", payload)

    w.Header().Set("Content-Type", "application/json")
    json.NewEncoder(w).Encode(map[string]bool{"received": true})
}

func main() {
    http.HandleFunc("/webhooks/anycrawl", webhookHandler)
    http.ListenAndServe(":3000", nil)
}

Managing Webhooks

List All Webhooks

curl -X GET "https://api.anycrawl.dev/v1/webhooks" \
  -H "Authorization: Bearer <your-api-key>"

Response

{
  "success": true,
  "data": [
    {
      "uuid": "webhook-uuid-1",
      "name": "Production Notifications",
      "webhook_url": "https://your-domain.com/webhooks/anycrawl",
      "webhook_secret": "***hidden***",
      "event_types": ["scrape.completed", "scrape.failed"],
      "scope": "all",
      "is_active": true,
      "consecutive_failures": 0,
      "total_deliveries": 145,
      "successful_deliveries": 142,
      "failed_deliveries": 3,
      "last_success_at": "2026-01-27T10:00:00.000Z",
      "last_failure_at": "2026-01-26T15:30:00.000Z",
      "created_at": "2026-01-01T00:00:00.000Z"
    }
  ]
}

The webhook_secret is always hidden in list and detail views for security.

Update Webhook

curl -X PUT "https://api.anycrawl.dev/v1/webhooks/:webhookId" \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "event_types": ["scrape.completed", "scrape.failed", "crawl.completed"]
  }'

You cannot update the webhook secret. To change it, delete and recreate the webhook.

Testing Webhooks

Send a test event to verify your webhook configuration:

curl -X POST "https://api.anycrawl.dev/v1/webhooks/:webhookId/test" \
  -H "Authorization: Bearer <your-api-key>"

Test Payload:

{
  "message": "This is a test webhook from AnyCrawl",
  "timestamp": "2026-01-27T10:00:00.000Z",
  "webhook_id": "webhook-uuid-1"
}

Deactivate/Activate Webhook

curl -X PUT "https://api.anycrawl.dev/v1/webhooks/:webhookId/deactivate" \
  -H "Authorization: Bearer <your-api-key>"

Delete Webhook

curl -X DELETE "https://api.anycrawl.dev/v1/webhooks/:webhookId" \
  -H "Authorization: Bearer <your-api-key>"

Deleting a webhook also deletes all its delivery history.

Replay Failed Delivery

Manually retry a failed webhook delivery:

curl -X POST "https://api.anycrawl.dev/v1/webhooks/:webhookId/deliveries/:deliveryId/replay" \
  -H "Authorization: Bearer <your-api-key>"

Response:

{
  "success": true,
  "message": "Delivery replayed successfully",
  "data": {
    "delivery_id": "delivery-uuid-1",
    "status": "pending"
  }
}

Replaying a delivery creates a new delivery attempt with the same payload. This is useful for retrying failed deliveries after fixing endpoint issues.

Delivery History

View Deliveries

curl -X GET "https://api.anycrawl.dev/v1/webhooks/:webhookId/deliveries?limit=20" \
  -H "Authorization: Bearer <your-api-key>"

Query Parameters

ParameterTypeDefaultDescription
limitnumber100Number of deliveries to return
offsetnumber0Number of deliveries to skip
statusstring-Filter by status: delivered, failed, retrying
fromstring-Start date (ISO 8601)
tostring-End date (ISO 8601)

Response

{
  "success": true,
  "data": [
    {
      "uuid": "delivery-uuid-1",
      "webhookSubscriptionUuid": "webhook-uuid-1",
      "eventType": "scrape.completed",
      "status": "delivered",
      "attempt_number": 1,
      "request_url": "https://your-domain.com/webhooks/anycrawl",
      "request_method": "POST",
      "response_status": 200,
      "response_duration_ms": 125,
      "created_at": "2026-01-27T10:00:00.000Z",
      "delivered_at": "2026-01-27T10:00:00.125Z"
    },
    {
      "uuid": "delivery-uuid-2",
      "status": "failed",
      "attempt_number": 3,
      "error_message": "Connection timeout",
      "error_code": "ETIMEDOUT",
      "created_at": "2026-01-27T09:00:00.000Z"
    }
  ],
  "meta": {
    "limit": 20,
    "offset": 0,
    "filters": {
      "status": null,
      "from": null,
      "to": null
    }
  }
}

Retry Mechanism

When Retries Occur

Webhooks are retried when:

  • HTTP status code is not 2xx
  • Connection timeout occurs
  • Network errors happen

Retry Schedule

Using default settings (max_retries: 3, retry_backoff_multiplier: 2):

AttemptDelayTime After Initial
1st retry1 minute1 minute
2nd retry2 minutes3 minutes
3rd retry4 minutes7 minutes

The delay formula is: backoff_multiplier ^ (attempt - 1) × 1 minute

Automatic Disabling

Webhooks are automatically disabled after 10 consecutive failures to prevent excessive retries.

To re-enable:

curl -X PUT "https://api.anycrawl.dev/v1/webhooks/:webhookId/activate" \
  -H "Authorization: Bearer <your-api-key>"

Scope Filtering

All Events (scope: "all")

Receive notifications for all events of the subscribed types:

{
  "scope": "all",
  "event_types": ["scrape.completed", "crawl.completed"]
}

Specific Tasks (scope: "specific")

Only receive notifications for specific scheduled tasks:

{
  "scope": "specific",
  "specific_task_ids": ["task-uuid-1", "task-uuid-2"],
  "event_types": ["task.executed", "task.failed"]
}

Private IP Protection

Default Behavior

AnyCrawl blocks webhook deliveries to private IP addresses:

  • 10.0.0.0/8
  • 172.16.0.0/12
  • 192.168.0.0/16
  • 169.254.0.0/16 (link-local)
  • 127.0.0.1 / localhost
  • IPv6 private addresses

Allow Local Webhooks (Testing Only)

For local development, set:

ALLOW_LOCAL_WEBHOOKS=true

Never enable this in production. It poses serious security risks.

Best Practices

1. Respond Quickly

Return a 2xx status code within 5 seconds:

app.post('/webhook', async (req, res) => {
  // Verify signature
  if (!verifySignature(req.body, req.headers['x-anycrawl-signature'])) {
    return res.status(401).send('Invalid signature');
  }

  // Quick acknowledgment
  res.status(200).json({ received: true });

  // Process asynchronously
  queue.add('process-webhook', req.body);
});

2. Implement Idempotency

Use X-Webhook-Delivery-Id to prevent duplicate processing:

const processedDeliveries = new Set();

app.post('/webhook', (req, res) => {
  const deliveryId = req.headers['x-webhook-delivery-id'];

  if (processedDeliveries.has(deliveryId)) {
    return res.status(200).json({ received: true, duplicate: true });
  }

  processedDeliveries.add(deliveryId);

  // Process event...

  res.status(200).json({ received: true });
});

3. Return Appropriate Status Codes

Status CodeDescriptionAnyCrawl Behavior
200-299SuccessNo retry
400-499Client errorNo retry (logged as failed)
500-599Server errorRetry with backoff
TimeoutNetwork timeoutRetry with backoff

4. Log All Webhook Activity

app.post('/webhook', (req, res) => {
  const deliveryId = req.headers['x-webhook-delivery-id'];
  const eventType = req.headers['x-webhook-event'];

  logger.info('Webhook received', {
    deliveryId,
    eventType,
    timestamp: req.headers['x-webhook-timestamp']
  });

  try {
    processWebhook(req.body, eventType);
    logger.info('Webhook processed', { deliveryId });
    res.status(200).json({ received: true });
  } catch (error) {
    logger.error('Webhook failed', {
      deliveryId,
      error: error.message
    });
    res.status(500).json({ error: 'Processing failed' });
  }
});

5. Security Checklist

  • ✅ Always verify signatures
  • ✅ Use HTTPS in production
  • ✅ Don't expose secrets in URLs
  • ✅ Implement rate limiting
  • ✅ Monitor for anomalies
  • ✅ Validate payload structure

Common Use Cases

Slack Notifications

Send scraping results to Slack:

app.post('/webhooks/anycrawl', async (req, res) => {
  const { job_id, status, url } = req.body;

  await fetch(process.env.SLACK_WEBHOOK_URL, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      text: `Job ${status}: ${url}\nJob ID: ${job_id}`
    })
  });

  res.status(200).json({ received: true });
});

Email Alerts

Send email notifications on failures:

app.post('/webhooks/anycrawl', async (req, res) => {
  const eventType = req.headers['x-webhook-event'];

  if (eventType.endsWith('.failed')) {
    await sendEmail({
      to: 'admin@example.com',
      subject: 'AnyCrawl Job Failed',
      body: JSON.stringify(req.body, null, 2)
    });
  }

  res.status(200).json({ received: true });
});

Database Logging

Store webhook events in database:

app.post('/webhooks/anycrawl', async (req, res) => {
  const eventType = req.headers['x-webhook-event'];
  const deliveryId = req.headers['x-webhook-delivery-id'];

  await db.webhookEvents.create({
    deliveryId,
    eventType,
    payload: req.body,
    receivedAt: new Date()
  });

  res.status(200).json({ received: true });
});

Troubleshooting

Webhook Not Receiving Events

Check:

  • Is the webhook active? (is_active: true)
  • Are event types correctly configured?
  • Is the webhook URL accessible from the internet?
  • Is it blocked by private IP protection?
  • Check scope settings (all vs. specific)

Signature Verification Failing

Common issues:

  • Using wrong secret (check webhook creation response)
  • Not stringifying payload before hashing
  • Including extra whitespace or formatting in JSON
  • Using wrong HMAC algorithm (must be SHA-256)

High Failure Rate

Solutions:

  • Check your endpoint is responding within 5 seconds
  • Return proper HTTP status codes
  • Review error messages in delivery history
  • Test locally with ngrok or similar tools

Webhook Auto-Disabled

Cause: 10 consecutive failures

Solution:

  1. Fix the underlying issue (endpoint, signature verification, etc.)
  2. Test with the test endpoint
  3. Reactivate the webhook:
curl -X PUT "https://api.anycrawl.dev/v1/webhooks/:webhookId/activate" \
  -H "Authorization: Bearer <your-api-key>"

Debugging Tools

Testing Tools

Local Development

Use ngrok to expose local server:

ngrok http 3000

Then use the ngrok URL as your webhook URL:

https://abc123.ngrok.io/webhooks/anycrawl

Limitations

ItemLimit
Webhook name length1-255 characters
Webhook URLHTTPS recommended (production)
Timeout1-60 seconds
Max retries0-10
Payload sizeMaximum 1MB
Custom headersMaximum 20
Event types per webhookNo limit