AnyCrawl

Scheduled Tasks

Automate recurring scraping, crawling, and search operations with cron-based scheduling.

Introduction

Scheduled Tasks allow you to automate recurring web scraping, crawling, and data collection operations using standard cron expressions. Perfect for monitoring websites, tracking price changes, archiving content, aggregating search results, or any periodic data collection needs.

Key Features: Cron-based scheduling with timezone support, multiple concurrency modes, automatic failure handling, credit management, and webhook integration for event notifications.

Core Features

  • Cron Expression Scheduling: Use standard cron syntax to define execution schedules
  • Timezone Support: Execute tasks in any timezone (e.g., Asia/Shanghai, America/New_York)
  • Concurrency Control: Two modes - skip or queue concurrent executions
  • Automatic Pause: Auto-pause after consecutive failures to protect resources
  • Credit Management: Daily execution limits and estimated credit tracking
  • Execution History: Track all executions with detailed status and metrics
  • Webhook Integration: Receive real-time notifications for task events

API Endpoints

POST   /v1/scheduled-tasks              # Create a scheduled task
GET    /v1/scheduled-tasks              # List all tasks
GET    /v1/scheduled-tasks/:taskId      # Get task details
PUT    /v1/scheduled-tasks/:taskId      # Update task
PATCH  /v1/scheduled-tasks/:taskId/pause   # Pause task
PATCH  /v1/scheduled-tasks/:taskId/resume  # Resume task
DELETE /v1/scheduled-tasks/:taskId      # Delete task
GET    /v1/scheduled-tasks/:taskId/executions  # Get execution history
DELETE /v1/scheduled-tasks/:taskId/executions/:executionId  # Cancel an execution

Quick Start

Creating a Daily Scraping Task

curl -X POST "https://api.anycrawl.dev/v1/scheduled-tasks" \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Daily Tech News",
    "description": "Scrape Hacker News daily at 9 AM",
    "cron_expression": "0 9 * * *",
    "timezone": "Asia/Shanghai",
    "task_type": "scrape",
    "task_payload": {
      "url": "https://news.ycombinator.com",
      "engine": "cheerio",
      "formats": ["markdown"]
    },
    "concurrency_mode": "skip",
    "max_executions_per_day": 1
  }'

Response

{
  "success": true,
  "data": {
    "task_id": "550e8400-e29b-41d4-a716-446655440000",
    "next_execution_at": "2026-01-28T01:00:00.000Z"
  }
}

Cron Expression Guide

Cron expressions use 5 fields to define the schedule:

* * * * *
│ │ │ │ │
│ │ │ │ └─ Day of week (0-7, 0 and 7 = Sunday)
│ │ │ └─── Month (1-12)
│ │ └───── Day of month (1-31)
│ └─────── Hour (0-23)
└───────── Minute (0-59)

Common Examples

ExpressionDescriptionExecution Time
0 9 * * *Every day at 9:00 AM09:00:00 daily
*/15 * * * *Every 15 minutes:00, :15, :30, :45
0 */6 * * *Every 6 hours00:00, 06:00, 12:00, 18:00
0 9 * * 1Every Monday at 9 AMMonday 09:00:00
0 0 1 * *First day of month1st day, 00:00:00
30 2 * * 0Every Sunday at 2:30 AMSunday 02:30:00

Use crontab.guru to validate and understand cron expressions.

Request Parameters

Task Configuration

ParameterTypeRequiredDefaultDescription
namestringYes-Task name (1-255 characters)
descriptionstringNo-Task description
cron_expressionstringYes-Standard cron expression (5 fields)
timezonestringNo"UTC"Timezone for execution (e.g., "Asia/Shanghai")
task_typestringYes-Task type: "scrape", "crawl", "search", or "template"
task_payloadobjectYes-Task configuration (see below)

Concurrency Control

ParameterTypeRequiredDefaultDescription
concurrency_modestringNo"skip"Mode: "skip" or "queue"
max_executions_per_daynumberNo-Daily execution limit

Execution Metadata (Read-only)

These fields are returned in task responses and are not accepted in create/update request bodies:

  • min_credits_required: Estimated minimum credits for one execution (server-calculated)
  • consecutive_failures: Consecutive failure count used for automatic pausing

Webhook Integration

ParameterTypeRequiredDefaultDescription
webhook_idsstring[]No-Array of webhook subscription UUIDs to trigger
webhook_urlstringNo-Direct webhook URL (creates implicit subscription)

You can either use webhook_ids to reference existing webhook subscriptions, or provide a webhook_url to create an implicit webhook subscription for this task.

Metadata

ParameterTypeRequiredDefaultDescription
tagsstring[]No-Tags for organization
metadataobjectNo-Custom metadata

Task Payload

Scrape Task

{
  "url": "https://example.com/page",
  "engine": "cheerio",
  "formats": ["markdown"],
  "timeout": 60000,
  "wait_for": 2000,
  "include_tags": ["article", "main"],
  "exclude_tags": ["nav", "footer"]
}

Crawl Task

{
  "url": "https://example.com",
  "engine": "playwright",
  "options": {
    "max_depth": 3,
    "limit": 50,
    "strategy": "same-domain",
    "exclude_paths": ["/admin/*", "/api/*"],
    "scrape_options": {
      "formats": ["markdown"]
    }
  }
}

Search Task

{
  "query": "artificial intelligence news",
  "engine": "google",
  "limit": 20,
  "country": "US",
  "lang": "en",
  "timeRange": "day"
}

Template Task

{
  "template_id": "my-search-template",
  "query": "machine learning tutorials",
  "variables": {
    "lang": "en"
  }
}

For task_type: "template", task_payload must include:

  • template_id (required): template to execute
  • endpoint-specific input fields required by that template type (for example url for scrape/crawl templates, query for search templates)
  • optional variables to pass dynamic template inputs

Concurrency Modes

If a previous execution is still running, skip the current execution.

Best for: Tasks that may take longer than the interval between executions.

Example: A crawl job scheduled every hour that sometimes takes 90 minutes.

queue

Queue the new execution and wait for the previous one to complete.

Best for: Tasks where every execution must happen without skipping.

Example: Critical data collection that must not miss any scheduled runs.

Managing Tasks

List All Tasks

curl -X GET "https://api.anycrawl.dev/v1/scheduled-tasks" \
  -H "Authorization: Bearer <your-api-key>"

Response

{
  "success": true,
  "data": [
    {
      "uuid": "550e8400-e29b-41d4-a716-446655440000",
      "name": "Daily Tech News",
      "task_type": "scrape",
      "cron_expression": "0 9 * * *",
      "timezone": "Asia/Shanghai",
      "is_active": true,
      "is_paused": false,
      "next_execution_at": "2026-01-28T01:00:00.000Z",
      "total_executions": 45,
      "successful_executions": 43,
      "failed_executions": 2,
      "consecutive_failures": 0,
      "last_execution_at": "2026-01-27T01:00:00.000Z",
      "created_at": "2026-01-01T00:00:00.000Z"
    }
  ]
}

Pause a Task

Pause a task to temporarily stop its execution. The reason parameter will be stored as pause_reason in the task record.

curl -X PATCH "https://api.anycrawl.dev/v1/scheduled-tasks/:taskId/pause" \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "reason": "Maintenance period"
  }'

Response

{
  "success": true,
  "message": "Task paused successfully",
  "data": {
    "uuid": "550e8400-e29b-41d4-a716-446655440000",
    "is_paused": true,
    "pause_reason": "Maintenance period"
  }
}

Resume a Task

curl -X PATCH "https://api.anycrawl.dev/v1/scheduled-tasks/:taskId/resume" \
  -H "Authorization: Bearer <your-api-key>"

Update a Task

curl -X PUT "https://api.anycrawl.dev/v1/scheduled-tasks/:taskId" \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "cron_expression": "0 10 * * *",
    "description": "Updated description"
  }'

Response

{
  "success": true,
  "data": {
    "uuid": "550e8400-e29b-41d4-a716-446655440000",
    "name": "Daily Tech News",
    "description": "Updated description",
    "task_type": "scrape",
    "cron_expression": "0 10 * * *",
    "timezone": "Asia/Shanghai",
    "task_payload": {
      "url": "https://news.ycombinator.com",
      "engine": "cheerio",
      "formats": ["markdown"]
    },
    "concurrency_mode": "skip",
    "is_active": true,
    "is_paused": false,
    "next_execution_at": "2026-01-28T02:00:00.000Z",
    "total_executions": 45,
    "successful_executions": 43,
    "failed_executions": 2,
    "consecutive_failures": 0,
    "last_execution_at": "2026-01-27T01:00:00.000Z",
    "created_at": "2026-01-01T00:00:00.000Z",
    "updated_at": "2026-01-27T12:00:00.000Z",
    "icon": "FileText"
  }
}

Only include fields you want to update. All other fields will remain unchanged.

Delete a Task

curl -X DELETE "https://api.anycrawl.dev/v1/scheduled-tasks/:taskId" \
  -H "Authorization: Bearer <your-api-key>"

Deleting a task also deletes all its execution history.

Execution History

Get Executions

curl -X GET "https://api.anycrawl.dev/v1/scheduled-tasks/:taskId/executions?limit=20" \
  -H "Authorization: Bearer <your-api-key>"

Response

{
  "success": true,
  "data": [
    {
      "uuid": "exec-uuid-1",
      "scheduled_task_uuid": "task-uuid",
      "execution_number": 45,
      "status": "completed",
      "started_at": "2026-01-27T01:00:00.000Z",
      "completed_at": "2026-01-27T01:02:15.000Z",
      "duration_ms": 135000,
      "job_uuid": "job-uuid-1",
      "triggered_by": "scheduler",
      "scheduled_for": "2026-01-27T01:00:00.000Z",
      "error_message": null,
      "credits_used": 5,
      "items_processed": 1,
      "items_succeeded": 1,
      "items_failed": 0,
      "job_status": "completed",
      "job_success": true,
      "icon": "CircleCheck"
    }
  ]
}

Note: The credits_used, items_processed, items_succeeded, items_failed, job_status, and job_success fields are retrieved from the associated job record via JOIN. The duration_ms is calculated from started_at and completed_at.

Cancel an Execution

Cancel a single execution that is currently pending or running. This is useful when you need to stop a specific execution without pausing the entire task.

curl -X DELETE "https://api.anycrawl.dev/v1/scheduled-tasks/:taskId/executions/:executionId" \
  -H "Authorization: Bearer <your-api-key>"

Response

{
  "success": true,
  "message": "Execution cancelled successfully"
}

Only executions with status pending or running can be cancelled. Completed, failed, or already cancelled executions will return an error.

Automatic Failure Handling

Auto-Pause on Consecutive Failures

Tasks are automatically paused after 5 consecutive failures to prevent resource waste and excessive API calls.

Resume: Manually resume the task after fixing the underlying issue.

curl -X PATCH "https://api.anycrawl.dev/v1/scheduled-tasks/:taskId/resume" \
  -H "Authorization: Bearer <your-api-key>"

Monitoring Failures

Track failures using the consecutive_failures field in the task status:

{
  "consecutive_failures": 3,
  "failed_executions": 5,
  "successful_executions": 40
}

Monitor the consecutive_failures field. High values indicate recurring issues that need attention.

Integration with Webhooks

Subscribe to task events using webhooks:

curl -X POST "https://api.anycrawl.dev/v1/webhooks" \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Task Notifications",
    "webhook_url": "https://your-domain.com/webhook",
    "event_types": ["task.executed", "task.failed", "task.paused"],
    "scope": "all"
  }'

See Webhooks for more details.

Common Use Cases

Price Monitoring

Monitor product prices every hour:

{
  "name": "Hourly Price Tracker",
  "cron_expression": "0 * * * *",
  "task_type": "scrape",
  "task_payload": {
    "url": "https://shop.example.com/product/12345",
    "engine": "cheerio",
    "formats": ["markdown"]
  },
  "concurrency_mode": "skip"
}

Weekly Documentation Backup

Crawl and archive documentation weekly:

{
  "name": "Weekly Docs Backup",
  "cron_expression": "0 3 * * 0",
  "timezone": "UTC",
  "task_type": "crawl",
  "task_payload": {
    "url": "https://docs.example.com",
    "engine": "playwright",
    "options": {
      "max_depth": 10,
      "limit": 500
    }
  },
  "max_executions_per_day": 1
}

Daily News Aggregation

Scrape multiple news sources daily:

{
  "name": "Morning News Digest",
  "cron_expression": "0 6 * * *",
  "timezone": "America/New_York",
  "task_type": "scrape",
  "task_payload": {
    "url": "https://news.example.com",
    "engine": "cheerio",
    "formats": ["markdown"]
  },
  "max_executions_per_day": 1
}

Competitive Intelligence

Track competitor mentions and industry trends hourly:

{
  "name": "Competitor Monitoring",
  "cron_expression": "0 * * * *",
  "task_type": "search",
  "task_payload": {
    "query": "YourCompany OR CompetitorA OR CompetitorB",
    "engine": "google",
    "limit": 50,
    "timeRange": "day"
  },
  "concurrency_mode": "skip"
}

Best Practices

1. Choose Appropriate Cron Intervals

  • Avoid overly frequent schedules (< 1 minute) to prevent rate limiting
  • Consider website update frequency when setting intervals
  • Use timezone settings for location-specific timing

2. Set Concurrency Mode Correctly

  • Use "skip" for most tasks to avoid overlapping executions
  • Use "queue" only when every execution is critical

3. Manage Credits Wisely

  • Set max_executions_per_day for expensive tasks
  • Review min_credits_required in task details (server-calculated estimate)
  • Monitor credit usage in execution history

4. Monitor Task Health

  • Check consecutive_failures regularly
  • Review execution history for patterns
  • Set up webhook notifications for failures

5. Use Descriptive Names and Tags

  • Use clear, descriptive task names
  • Add tags for easy filtering and organization
  • Include relevant metadata for tracking

Limitations

ItemLimit
Task name length1-255 characters
Minimum interval1 minute
Concurrency modesskip, queue
Task payload size100KB
Cron expression formatStandard 5-field format

Troubleshooting

Task Not Executing

Check:

  • Is the task active? (is_active: true)
  • Is the task paused? (is_paused: false)
  • Is the cron expression valid?
  • Are there sufficient credits?

High Failure Rate

Possible causes:

  • Target website blocking requests
  • Invalid URLs in task payload
  • Insufficient timeout values
  • Network connectivity issues

Solutions:

  • Add delays with wait_for parameter
  • Use different engines (try Playwright for dynamic sites)
  • Increase timeout values
  • Check execution history for error messages

Credits Depleting Too Fast

Solutions:

  • Reduce execution frequency
  • Set max_executions_per_day limits
  • Use lower-cost payloads (fewer pages/results, lighter formats)
  • Review execution history to identify expensive runs