AnyCrawl

Webhooks

接收 AnyCrawl 所有事件的即時通知,包括擷取、爬取、網站地圖、搜尋和排程任務。

簡介

Webhooks 允許您在 AnyCrawl 帳戶中發生事件時接收即時 HTTP 通知。無需輪詢更新,AnyCrawl 會在事件發生時自動向您指定的端點發送 POST 請求。

核心特性:訂閱多種事件類型、HMAC-SHA256 簽章驗證、帶指數退避的自動重試、投遞歷史追蹤以及私有 IP 保護。

核心功能

  • 事件訂閱:訂閱擷取、爬取、網站地圖、搜尋、排程任務和系統事件
  • 安全投遞:HMAC-SHA256 簽章驗證確保真實性
  • 自動重試:失敗投遞的指數退避重試機制
  • 投遞追蹤:完整的 Webhook 投遞歷史記錄
  • 範圍過濾:訂閱所有事件或僅特定任務的事件
  • 自訂標頭:為 Webhook 請求新增自訂 HTTP 標頭
  • 私有 IP 保護:內建 SSRF 攻擊防護

API 端點

POST   /v1/webhooks                              # 建立 Webhook 訂閱
GET    /v1/webhooks                              # 列出所有 Webhooks
GET    /v1/webhooks/:webhookId                   # 取得 Webhook 詳情
PUT    /v1/webhooks/:webhookId                   # 更新 Webhook
DELETE /v1/webhooks/:webhookId                   # 刪除 Webhook
GET    /v1/webhooks/:webhookId/deliveries        # 取得投遞歷史
POST   /v1/webhooks/:webhookId/test              # 發送測試 Webhook
PUT    /v1/webhooks/:webhookId/activate          # 啟用 Webhook
PUT    /v1/webhooks/:webhookId/deactivate        # 停用 Webhook
POST   /v1/webhooks/:webhookId/deliveries/:deliveryId/replay  # 重放失敗的投遞
GET    /v1/webhook-events                        # 列出支援的事件

支援的事件

作業事件

事件說明觸發時機
scrape.created擷取作業已建立新擷取作業進入佇列
scrape.started擷取作業已開始作業開始執行
scrape.completed擷取作業已完成作業成功完成
scrape.failed擷取作業失敗作業遇到錯誤
scrape.cancelled擷取作業已取消作業被手動取消
crawl.created爬取作業已建立新爬取作業進入佇列
crawl.started爬取作業已開始作業開始執行
crawl.completed爬取作業已完成作業成功完成
crawl.failed爬取作業失敗作業遇到錯誤
crawl.cancelled爬取作業已取消作業被手動取消

排程任務事件

事件說明觸發時機
task.executed任務已執行排程任務運行
task.failed任務失敗排程任務失敗
task.paused任務已暫停任務被暫停
task.resumed任務已恢復任務被恢復

搜尋事件

事件說明觸發時機
search.created搜尋作業已建立新搜尋作業進入佇列
search.started搜尋作業已開始作業開始執行
search.completed搜尋作業已完成作業成功完成
search.failed搜尋作業失敗作業遇到錯誤

網站地圖事件

事件說明觸發時機
map.created網站地圖作業已建立新網站地圖作業進入佇列
map.started網站地圖作業已開始作業開始執行
map.completed網站地圖作業已完成作業成功完成
map.failed網站地圖作業失敗作業遇到錯誤

測試事件

事件說明觸發時機
webhook.test測試事件手動發送測試 Webhook

快速開始

建立 Webhook

curl -X POST "https://api.anycrawl.dev/v1/webhooks" \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Production Notifications",
    "webhook_url": "https://your-domain.com/webhooks/anycrawl",
    "event_types": ["scrape.completed", "scrape.failed", "crawl.completed"],
    "scope": "all",
    "timeout_seconds": 10,
    "max_retries": 3
  }'

回應

{
  "success": true,
  "data": {
    "webhook_id": "webhook-uuid-here",
    "secret": "a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6q7r8s9t0u1v2w3x4y5z6",
    "message": "Webhook created successfully. Save the secret - it won't be shown again."
  }
}

重要提示:請立即儲存 secret!它僅在建立時顯示一次,且用於簽章驗證。

請求參數

Webhook 配置

參數類型必填預設值說明
namestring-Webhook 名稱(1-255 個字元)
descriptionstring-Webhook 描述
webhook_urlstring-您的端點 URL(建議使用 HTTPS)
event_typesstring[]-要訂閱的事件類型陣列
scopestring"all"訂閱範圍:"all""specific"
specific_task_idsstring[]-任務 ID(scope 為 "specific" 時必填)

投遞配置

參數類型必填預設值說明
timeout_secondsnumber10請求逾時(1-60 秒)
max_retriesnumber3最大重試次數(0-10)
retry_backoff_multipliernumber2重試退避乘數(1-10)
custom_headersobject-自訂 HTTP 標頭

Webhook 在連續 10 次失敗後會自動停用,以防止過多重試。您可以在修復問題後手動重新啟用。

中繼資料

參數類型必填預設值說明
tagsstring[]-用於組織的標籤
metadataobject-自訂中繼資料

Webhook 負載格式

HTTP 標頭

每個 Webhook 請求包含以下標頭:

Content-Type: application/json
X-AnyCrawl-Signature: sha256=abc123...
X-Webhook-Event: scrape.completed
X-Webhook-Delivery-Id: delivery-uuid-1
X-Webhook-Timestamp: 2026-01-27T10:00:00.000Z

負載範例

scrape.completed

{
  "job_id": "job-uuid-1",
  "status": "completed",
  "url": "https://example.com",
  "total": 10,
  "completed": 10,
  "failed": 0,
  "credits_used": 5,
  "created_at": "2026-01-27T09:00:00.000Z",
  "completed_at": "2026-01-27T10:00:00.000Z"
}

scrape.failed

{
  "job_id": "job-uuid-1",
  "status": "failed",
  "url": "https://example.com",
  "error_message": "Connection timeout",
  "credits_used": 3,
  "created_at": "2026-01-27T09:00:00.000Z",
  "completed_at": "2026-01-27T10:00:00.000Z"
}

task.executed

{
  "task_id": "task-uuid-1",
  "task_name": "Daily News Scrape",
  "execution_id": "exec-uuid-1",
  "execution_number": 45,
  "status": "completed",
  "job_id": "job-uuid-1",
  "credits_used": 5,
  "scheduled_for": "2026-01-27T09:00:00.000Z",
  "completed_at": "2026-01-27T09:02:15.000Z"
}

簽章驗證

為什麼要驗證簽章?

簽章驗證確保 Webhook 請求確實來自 AnyCrawl 且未被竄改,可防止惡意請求。

驗證演算法

AnyCrawl 使用 HMAC-SHA256 對負載進行簽章:

signature = HMAC-SHA256(payload, webhook_secret)
header_value = "sha256=" + hex(signature)

實作範例

Node.js / Express

const crypto = require('crypto');
const express = require('express');

function verifyWebhookSignature(payload, signature, secret) {
  const hmac = crypto.createHmac('sha256', secret);
  hmac.update(JSON.stringify(payload));
  const expectedSignature = `sha256=${hmac.digest('hex')}`;

  return crypto.timingSafeEqual(
    Buffer.from(signature),
    Buffer.from(expectedSignature)
  );
}

const app = express();
app.use(express.json());

app.post('/webhooks/anycrawl', (req, res) => {
  const signature = req.headers['x-anycrawl-signature'];
  const secret = process.env.WEBHOOK_SECRET;

  // Verify signature
  if (!verifyWebhookSignature(req.body, signature, secret)) {
    return res.status(401).json({ error: 'Invalid signature' });
  }

  // Extract event info
  const eventType = req.headers['x-webhook-event'];
  const deliveryId = req.headers['x-webhook-delivery-id'];

  console.log(`Received event: ${eventType}`);
  console.log(`Delivery ID: ${deliveryId}`);
  console.log('Payload:', req.body);

  // Respond quickly (< 5 seconds recommended)
  res.status(200).json({ received: true });

  // Process asynchronously
  processWebhookAsync(eventType, req.body).catch(console.error);
});

app.listen(3000);

Python / Flask

import hmac
import hashlib
import json
from flask import Flask, request, jsonify

app = Flask(__name__)
WEBHOOK_SECRET = 'your-webhook-secret-here'

def verify_webhook_signature(payload, signature, secret):
    expected_signature = 'sha256=' + hmac.new(
        secret.encode('utf-8'),
        json.dumps(payload).encode('utf-8'),
        hashlib.sha256
    ).hexdigest()

    return hmac.compare_digest(signature, expected_signature)

@app.route('/webhooks/anycrawl', methods=['POST'])
def webhook_handler():
    signature = request.headers.get('X-AnyCrawl-Signature')
    payload = request.get_json()

    # Verify signature
    if not verify_webhook_signature(payload, signature, WEBHOOK_SECRET):
        return jsonify({'error': 'Invalid signature'}), 401

    # Extract event info
    event_type = request.headers.get('X-Webhook-Event')
    delivery_id = request.headers.get('X-Webhook-Delivery-Id')

    print(f'Received event: {event_type}')
    print(f'Delivery ID: {delivery_id}')
    print(f'Payload: {payload}')

    # Respond quickly
    return jsonify({'received': True}), 200

if __name__ == '__main__':
    app.run(port=3000)

Go

package main

import (
    "crypto/hmac"
    "crypto/sha256"
    "encoding/hex"
    "encoding/json"
    "fmt"
    "io"
    "net/http"
    "os"
)

func verifyWebhookSignature(payload []byte, signature, secret string) bool {
    mac := hmac.New(sha256.New, []byte(secret))
    mac.Write(payload)
    expectedSignature := "sha256=" + hex.EncodeToString(mac.Sum(nil))
    return hmac.Equal([]byte(signature), []byte(expectedSignature))
}

func webhookHandler(w http.ResponseWriter, r *http.Request) {
    signature := r.Header.Get("X-AnyCrawl-Signature")
    eventType := r.Header.Get("X-Webhook-Event")
    secret := os.Getenv("WEBHOOK_SECRET")

    body, err := io.ReadAll(r.Body)
    if err != nil {
        http.Error(w, "Error reading body", http.StatusBadRequest)
        return
    }

    if !verifyWebhookSignature(body, signature, secret) {
        http.Error(w, "Invalid signature", http.StatusUnauthorized)
        return
    }

    var payload map[string]interface{}
    if err := json.Unmarshal(body, &payload); err != nil {
        http.Error(w, "Invalid JSON", http.StatusBadRequest)
        return
    }

    fmt.Printf("Received event: %s\n", eventType)
    fmt.Printf("Payload: %+v\n", payload)

    w.Header().Set("Content-Type", "application/json")
    json.NewEncoder(w).Encode(map[string]bool{"received": true})
}

func main() {
    http.HandleFunc("/webhooks/anycrawl", webhookHandler)
    http.ListenAndServe(":3000", nil)
}

管理 Webhooks

列出所有 Webhooks

curl -X GET "https://api.anycrawl.dev/v1/webhooks" \
  -H "Authorization: Bearer <your-api-key>"

回應

{
  "success": true,
  "data": [
    {
      "uuid": "webhook-uuid-1",
      "name": "Production Notifications",
      "webhook_url": "https://your-domain.com/webhooks/anycrawl",
      "webhook_secret": "***hidden***",
      "event_types": ["scrape.completed", "scrape.failed"],
      "scope": "all",
      "is_active": true,
      "consecutive_failures": 0,
      "total_deliveries": 145,
      "successful_deliveries": 142,
      "failed_deliveries": 3,
      "last_success_at": "2026-01-27T10:00:00.000Z",
      "last_failure_at": "2026-01-26T15:30:00.000Z",
      "created_at": "2026-01-01T00:00:00.000Z"
    }
  ]
}

基於安全考量,webhook_secret 在列表和詳情檢視中始終處於隱藏狀態。

更新 Webhook

curl -X PUT "https://api.anycrawl.dev/v1/webhooks/:webhookId" \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "event_types": ["scrape.completed", "scrape.failed", "crawl.completed"]
  }'

您無法更新 Webhook 密鑰。如需變更密鑰,請刪除並重新建立 Webhook。

測試 Webhooks

發送測試事件以驗證您的 Webhook 配置:

curl -X POST "https://api.anycrawl.dev/v1/webhooks/:webhookId/test" \
  -H "Authorization: Bearer <your-api-key>"

測試負載

{
  "message": "This is a test webhook from AnyCrawl",
  "timestamp": "2026-01-27T10:00:00.000Z",
  "webhook_id": "webhook-uuid-1"
}

停用/啟用 Webhook

curl -X PUT "https://api.anycrawl.dev/v1/webhooks/:webhookId/deactivate" \
  -H "Authorization: Bearer <your-api-key>"

刪除 Webhook

curl -X DELETE "https://api.anycrawl.dev/v1/webhooks/:webhookId" \
  -H "Authorization: Bearer <your-api-key>"

刪除 Webhook 也會刪除其所有投遞歷史。

重放失敗的投遞

手動重試失敗的 Webhook 投遞:

curl -X POST "https://api.anycrawl.dev/v1/webhooks/:webhookId/deliveries/:deliveryId/replay" \
  -H "Authorization: Bearer <your-api-key>"

回應

{
  "success": true,
  "message": "Delivery replayed successfully",
  "data": {
    "delivery_id": "delivery-uuid-1",
    "status": "pending"
  }
}

重放投遞會使用相同的負載建立新的投遞嘗試。這對於在修復端點問題後重試失敗的投遞非常有用。

投遞歷史

查看投遞記錄

curl -X GET "https://api.anycrawl.dev/v1/webhooks/:webhookId/deliveries?limit=20" \
  -H "Authorization: Bearer <your-api-key>"

查詢參數

參數類型預設值說明
limitnumber100傳回的投遞記錄數量
offsetnumber0跳過的投遞記錄數量
statusstring-按狀態過濾:deliveredfailedretrying
fromstring-開始日期(ISO 8601)
tostring-結束日期(ISO 8601)

回應

{
  "success": true,
  "data": [
    {
      "uuid": "delivery-uuid-1",
      "webhookSubscriptionUuid": "webhook-uuid-1",
      "eventType": "scrape.completed",
      "status": "delivered",
      "attempt_number": 1,
      "request_url": "https://your-domain.com/webhooks/anycrawl",
      "request_method": "POST",
      "response_status": 200,
      "response_duration_ms": 125,
      "created_at": "2026-01-27T10:00:00.000Z",
      "delivered_at": "2026-01-27T10:00:00.125Z"
    },
    {
      "uuid": "delivery-uuid-2",
      "status": "failed",
      "attempt_number": 3,
      "error_message": "Connection timeout",
      "error_code": "ETIMEDOUT",
      "created_at": "2026-01-27T09:00:00.000Z"
    }
  ],
  "meta": {
    "limit": 20,
    "offset": 0,
    "filters": {
      "status": null,
      "from": null,
      "to": null
    }
  }
}

重試機制

何時觸發重試

在以下情況下會重試 Webhook:

  • HTTP 狀態碼不是 2xx
  • 連線逾時
  • 網路錯誤

重試排程

使用預設設定(max_retries: 3retry_backoff_multiplier: 2):

嘗試次數延遲首次之後的時間
第 1 次重試1 分鐘1 分鐘
第 2 次重試2 分鐘3 分鐘
第 3 次重試4 分鐘7 分鐘

延遲公式為:backoff_multiplier ^ (attempt - 1) × 1 分鐘

自動停用

Webhook 在連續 10 次失敗後會自動停用,以防止過多重試。

重新啟用

curl -X PUT "https://api.anycrawl.dev/v1/webhooks/:webhookId/activate" \
  -H "Authorization: Bearer <your-api-key>"

範圍過濾

所有事件(scope: "all")

接收所有已訂閱類型的事件通知:

{
  "scope": "all",
  "event_types": ["scrape.completed", "crawl.completed"]
}

特定任務(scope: "specific")

僅接收特定排程任務的通知:

{
  "scope": "specific",
  "specific_task_ids": ["task-uuid-1", "task-uuid-2"],
  "event_types": ["task.executed", "task.failed"]
}

私有 IP 保護

預設行為

AnyCrawl 阻止向私有 IP 位址投遞 Webhook:

  • 10.0.0.0/8
  • 172.16.0.0/12
  • 192.168.0.0/16
  • 169.254.0.0/16(鏈路本地)
  • 127.0.0.1 / localhost
  • IPv6 私有位址

允許本地 Webhooks(僅用於測試)

對於本地開發,請設定:

ALLOW_LOCAL_WEBHOOKS=true

切勿在正式環境中啟用此選項。它會帶來嚴重的安全風險。

最佳實務

1. 快速回應

5 秒內傳回 2xx 狀態碼:

app.post('/webhook', async (req, res) => {
  // Verify signature
  if (!verifySignature(req.body, req.headers['x-anycrawl-signature'])) {
    return res.status(401).send('Invalid signature');
  }

  // Quick acknowledgment
  res.status(200).json({ received: true });

  // Process asynchronously
  queue.add('process-webhook', req.body);
});

2. 實作冪等性

使用 X-Webhook-Delivery-Id 防止重複處理:

const processedDeliveries = new Set();

app.post('/webhook', (req, res) => {
  const deliveryId = req.headers['x-webhook-delivery-id'];

  if (processedDeliveries.has(deliveryId)) {
    return res.status(200).json({ received: true, duplicate: true });
  }

  processedDeliveries.add(deliveryId);

  // Process event...

  res.status(200).json({ received: true });
});

3. 傳回適當的狀態碼

狀態碼說明AnyCrawl 行為
200-299成功不重試
400-499用戶端錯誤不重試(記錄為失敗)
500-599伺服器錯誤指數退避重試
逾時網路逾時指數退避重試

4. 記錄所有 Webhook 活動

app.post('/webhook', (req, res) => {
  const deliveryId = req.headers['x-webhook-delivery-id'];
  const eventType = req.headers['x-webhook-event'];

  logger.info('Webhook received', {
    deliveryId,
    eventType,
    timestamp: req.headers['x-webhook-timestamp']
  });

  try {
    processWebhook(req.body, eventType);
    logger.info('Webhook processed', { deliveryId });
    res.status(200).json({ received: true });
  } catch (error) {
    logger.error('Webhook failed', {
      deliveryId,
      error: error.message
    });
    res.status(500).json({ error: 'Processing failed' });
  }
});

5. 安全檢查清單

  • ✅ 始終驗證簽章
  • ✅ 在正式環境使用 HTTPS
  • ✅ 不要在 URL 中暴露密鑰
  • ✅ 實作速率限制
  • ✅ 監控異常情況
  • ✅ 驗證負載結構

常見用例

Slack 通知

將擷取結果發送到 Slack:

app.post('/webhooks/anycrawl', async (req, res) => {
  const { job_id, status, url } = req.body;

  await fetch(process.env.SLACK_WEBHOOK_URL, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      text: `Job ${status}: ${url}\nJob ID: ${job_id}`
    })
  });

  res.status(200).json({ received: true });
});

郵件告警

在失敗時發送郵件通知:

app.post('/webhooks/anycrawl', async (req, res) => {
  const eventType = req.headers['x-webhook-event'];

  if (eventType.endsWith('.failed')) {
    await sendEmail({
      to: 'admin@example.com',
      subject: 'AnyCrawl Job Failed',
      body: JSON.stringify(req.body, null, 2)
    });
  }

  res.status(200).json({ received: true });
});

資料庫日誌

將 Webhook 事件儲存到資料庫:

app.post('/webhooks/anycrawl', async (req, res) => {
  const eventType = req.headers['x-webhook-event'];
  const deliveryId = req.headers['x-webhook-delivery-id'];

  await db.webhookEvents.create({
    deliveryId,
    eventType,
    payload: req.body,
    receivedAt: new Date()
  });

  res.status(200).json({ received: true });
});

疑難排解

Webhook 未接收到事件

檢查

  • Webhook 是否已啟用?(is_active: true
  • 事件類型是否正確配置?
  • Webhook URL 是否可從網際網路存取?
  • 是否被私有 IP 保護阻擋?
  • 檢查範圍設定(all 或 specific)

簽章驗證失敗

常見問題

  • 使用了錯誤的密鑰(檢查 Webhook 建立回應)
  • 在雜湊之前未對負載進行字串化
  • JSON 中包含額外的空白或格式化
  • 使用了錯誤的 HMAC 演算法(必須是 SHA-256)

高失敗率

解決方案

  • 檢查您的端點是否在 5 秒內回應
  • 傳回正確的 HTTP 狀態碼
  • 查看投遞歷史中的錯誤訊息
  • 使用 ngrok 或類似工具在本地測試

Webhook 被自動停用

原因:連續 10 次失敗

解決方案

  1. 修復根本問題(端點、簽章驗證等)
  2. 使用測試端點進行測試
  3. 重新啟用 Webhook:
curl -X PUT "https://api.anycrawl.dev/v1/webhooks/:webhookId/activate" \
  -H "Authorization: Bearer <your-api-key>"

除錯工具

測試工具

本地開發

使用 ngrok 暴露本地伺服器:

ngrok http 3000

然後使用 ngrok URL 作為您的 Webhook URL:

https://abc123.ngrok.io/webhooks/anycrawl

限制

項目限制
Webhook 名稱長度1-255 個字元
Webhook URL建議使用 HTTPS(正式環境)
逾時1-60 秒
最大重試次數0-10
負載大小最大 1MB
自訂標頭最多 20 個
每個 Webhook 的事件類型無限制

相關文件