Scheduled Tasks

Automate recurring scraping, crawling, and search operations with cron-based scheduling.

Introduction

Scheduled Tasks allow you to automate recurring web scraping, crawling, and data collection operations using standard cron expressions. Perfect for monitoring websites, tracking price changes, archiving content, aggregating search results, or any periodic data collection needs.

Key Features: Cron-based scheduling with timezone support, multiple concurrency modes, automatic failure handling, credit management, and webhook integration for event notifications.

Core Features

Cron Expression Scheduling: Use standard cron syntax to define execution schedules
Timezone Support: Execute tasks in any timezone (e.g., Asia/Shanghai, America/New_York)
Concurrency Control: Two modes - skip or queue concurrent executions
Automatic Pause: Auto-pause after consecutive failures to protect resources
Credit Management: Daily execution limits and estimated credit tracking
Execution History: Track all executions with detailed status and metrics
Webhook Integration: Receive real-time notifications for task events

API Endpoints

POST   /v1/scheduled-tasks              # Create a scheduled task
GET    /v1/scheduled-tasks              # List all tasks
GET    /v1/scheduled-tasks/:taskId      # Get task details
PUT    /v1/scheduled-tasks/:taskId      # Update task
PATCH  /v1/scheduled-tasks/:taskId/pause   # Pause task
PATCH  /v1/scheduled-tasks/:taskId/resume  # Resume task
DELETE /v1/scheduled-tasks/:taskId      # Delete task
GET    /v1/scheduled-tasks/:taskId/executions  # Get execution history
DELETE /v1/scheduled-tasks/:taskId/executions/:executionId  # Cancel an execution

Quick Start

Creating a Daily Scraping Task

curl -X POST "https://api.anycrawl.dev/v1/scheduled-tasks" \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Daily Tech News",
    "description": "Scrape Hacker News daily at 9 AM",
    "cron_expression": "0 9 * * *",
    "timezone": "Asia/Shanghai",
    "task_type": "scrape",
    "task_payload": {
      "url": "https://news.ycombinator.com",
      "engine": "cheerio",
      "formats": ["markdown"]
    },
    "concurrency_mode": "skip",
    "max_executions_per_day": 1
  }'

Response

{
  "success": true,
  "data": {
    "task_id": "550e8400-e29b-41d4-a716-446655440000",
    "next_execution_at": "2026-01-28T01:00:00.000Z"
  }
}

Cron Expression Guide

Cron expressions use 5 fields to define the schedule:

* * * * *
│ │ │ │ │
│ │ │ │ └─ Day of week (0-7, 0 and 7 = Sunday)
│ │ │ └─── Month (1-12)
│ │ └───── Day of month (1-31)
│ └─────── Hour (0-23)
└───────── Minute (0-59)

Common Examples

Expression	Description	Execution Time
`0 9 * * *`	Every day at 9:00 AM	09:00:00 daily
`/15 * * *`	Every 15 minutes	:00, :15, :30, :45
`0 /6 * *`	Every 6 hours	00:00, 06:00, 12:00, 18:00
`0 9 * * 1`	Every Monday at 9 AM	Monday 09:00:00
`0 0 1 * *`	First day of month	1st day, 00:00:00
`30 2 * * 0`	Every Sunday at 2:30 AM	Sunday 02:30:00

Use crontab.guru to validate and understand cron expressions.

Request Parameters

Task Configuration

Parameter	Type	Required	Default	Description
`name`	string	Yes	-	Task name (1-255 characters)
`description`	string	No	-	Task description
`cron_expression`	string	Yes	-	Standard cron expression (5 fields)
`timezone`	string	No	`"UTC"`	Timezone for execution (e.g., "Asia/Shanghai")
`task_type`	string	Yes	-	Task type: `"scrape"`, `"crawl"`, `"search"`, or `"template"`
`task_payload`	object	Yes	-	Task configuration (see below)

Concurrency Control

Parameter	Type	Required	Default	Description
`concurrency_mode`	string	No	`"skip"`	Mode: `"skip"` or `"queue"`
`max_executions_per_day`	number	No	-	Daily execution limit

Execution Metadata (Read-only)

These fields are returned in task responses and are not accepted in create/update request bodies:

min_credits_required: Estimated minimum credits for one execution (server-calculated)
consecutive_failures: Consecutive failure count used for automatic pausing

Webhook Integration

Parameter	Type	Required	Default	Description
`webhook_ids`	string[]	No	-	Array of webhook subscription UUIDs to trigger
`webhook_url`	string	No	-	Direct webhook URL (creates implicit subscription)

You can either use webhook_ids to reference existing webhook subscriptions, or provide a webhook_url to create an implicit webhook subscription for this task.

Metadata

Parameter	Type	Required	Default	Description
`tags`	string[]	No	-	Tags for organization
`metadata`	object	No	-	Custom metadata

Task Payload

Scrape Task

{
  "url": "https://example.com/page",
  "engine": "cheerio",
  "formats": ["markdown"],
  "timeout": 60000,
  "wait_for": 2000,
  "include_tags": ["article", "main"],
  "exclude_tags": ["nav", "footer"]
}

Crawl Task

{
  "url": "https://example.com",
  "engine": "playwright",
  "options": {
    "max_depth": 3,
    "limit": 50,
    "strategy": "same-domain",
    "exclude_paths": ["/admin/*", "/api/*"],
    "scrape_options": {
      "formats": ["markdown"]
    }
  }
}

Search Task

{
  "query": "artificial intelligence news",
  "engine": "google",
  "limit": 20,
  "country": "US",
  "lang": "en",
  "timeRange": "day"
}

Template Task

{
  "template_id": "my-search-template",
  "query": "machine learning tutorials",
  "variables": {
    "lang": "en"
  }
}

For task_type: "template", task_payload must include:

template_id (required): template to execute
endpoint-specific input fields required by that template type (for example url for scrape/crawl templates, query for search templates)
optional variables to pass dynamic template inputs

Concurrency Modes

skip (Recommended)

If a previous execution is still running, skip the current execution.

Best for: Tasks that may take longer than the interval between executions.

Example: A crawl job scheduled every hour that sometimes takes 90 minutes.

queue

Queue the new execution and wait for the previous one to complete.

Best for: Tasks where every execution must happen without skipping.

Example: Critical data collection that must not miss any scheduled runs.

Managing Tasks

List All Tasks

curl -X GET "https://api.anycrawl.dev/v1/scheduled-tasks" \
  -H "Authorization: Bearer <your-api-key>"

Response

{
  "success": true,
  "data": [
    {
      "uuid": "550e8400-e29b-41d4-a716-446655440000",
      "name": "Daily Tech News",
      "task_type": "scrape",
      "cron_expression": "0 9 * * *",
      "timezone": "Asia/Shanghai",
      "is_active": true,
      "is_paused": false,
      "next_execution_at": "2026-01-28T01:00:00.000Z",
      "total_executions": 45,
      "successful_executions": 43,
      "failed_executions": 2,
      "consecutive_failures": 0,
      "last_execution_at": "2026-01-27T01:00:00.000Z",
      "created_at": "2026-01-01T00:00:00.000Z"
    }
  ]
}

Pause a Task

Pause a task to temporarily stop its execution. The reason parameter will be stored as pause_reason in the task record.

curl -X PATCH "https://api.anycrawl.dev/v1/scheduled-tasks/:taskId/pause" \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "reason": "Maintenance period"
  }'

Response

{
  "success": true,
  "message": "Task paused successfully",
  "data": {
    "uuid": "550e8400-e29b-41d4-a716-446655440000",
    "is_paused": true,
    "pause_reason": "Maintenance period"
  }
}

Resume a Task

curl -X PATCH "https://api.anycrawl.dev/v1/scheduled-tasks/:taskId/resume" \
  -H "Authorization: Bearer <your-api-key>"

Update a Task

curl -X PUT "https://api.anycrawl.dev/v1/scheduled-tasks/:taskId" \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "cron_expression": "0 10 * * *",
    "description": "Updated description"
  }'

Response

{
  "success": true,
  "data": {
    "uuid": "550e8400-e29b-41d4-a716-446655440000",
    "name": "Daily Tech News",
    "description": "Updated description",
    "task_type": "scrape",
    "cron_expression": "0 10 * * *",
    "timezone": "Asia/Shanghai",
    "task_payload": {
      "url": "https://news.ycombinator.com",
      "engine": "cheerio",
      "formats": ["markdown"]
    },
    "concurrency_mode": "skip",
    "is_active": true,
    "is_paused": false,
    "next_execution_at": "2026-01-28T02:00:00.000Z",
    "total_executions": 45,
    "successful_executions": 43,
    "failed_executions": 2,
    "consecutive_failures": 0,
    "last_execution_at": "2026-01-27T01:00:00.000Z",
    "created_at": "2026-01-01T00:00:00.000Z",
    "updated_at": "2026-01-27T12:00:00.000Z",
    "icon": "FileText"
  }
}

Only include fields you want to update. All other fields will remain unchanged.

Delete a Task

curl -X DELETE "https://api.anycrawl.dev/v1/scheduled-tasks/:taskId" \
  -H "Authorization: Bearer <your-api-key>"

Deleting a task also deletes all its execution history.

Execution History

Get Executions

curl -X GET "https://api.anycrawl.dev/v1/scheduled-tasks/:taskId/executions?limit=20" \
  -H "Authorization: Bearer <your-api-key>"

Response

{
  "success": true,
  "data": [
    {
      "uuid": "exec-uuid-1",
      "scheduled_task_uuid": "task-uuid",
      "execution_number": 45,
      "status": "completed",
      "started_at": "2026-01-27T01:00:00.000Z",
      "completed_at": "2026-01-27T01:02:15.000Z",
      "duration_ms": 135000,
      "job_uuid": "job-uuid-1",
      "triggered_by": "scheduler",
      "scheduled_for": "2026-01-27T01:00:00.000Z",
      "error_message": null,
      "credits_used": 5,
      "items_processed": 1,
      "items_succeeded": 1,
      "items_failed": 0,
      "job_status": "completed",
      "job_success": true,
      "icon": "CircleCheck"
    }
  ]
}

Note: The credits_used, items_processed, items_succeeded, items_failed, job_status, and job_success fields are retrieved from the associated job record via JOIN. The duration_ms is calculated from started_at and completed_at.

Cancel an Execution

Cancel a single execution that is currently pending or running. This is useful when you need to stop a specific execution without pausing the entire task.

curl -X DELETE "https://api.anycrawl.dev/v1/scheduled-tasks/:taskId/executions/:executionId" \
  -H "Authorization: Bearer <your-api-key>"

Response

{
  "success": true,
  "message": "Execution cancelled successfully"
}

Only executions with status pending or running can be cancelled. Completed, failed, or already cancelled executions will return an error.

Automatic Failure Handling

Auto-Pause on Consecutive Failures

Tasks are automatically paused after 5 consecutive failures to prevent resource waste and excessive API calls.

Resume: Manually resume the task after fixing the underlying issue.

curl -X PATCH "https://api.anycrawl.dev/v1/scheduled-tasks/:taskId/resume" \
  -H "Authorization: Bearer <your-api-key>"

Monitoring Failures

Track failures using the consecutive_failures field in the task status:

{
  "consecutive_failures": 3,
  "failed_executions": 5,
  "successful_executions": 40
}

Monitor the consecutive_failures field. High values indicate recurring issues that need attention.

Integration with Webhooks

Subscribe to task events using webhooks:

curl -X POST "https://api.anycrawl.dev/v1/webhooks" \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Task Notifications",
    "webhook_url": "https://your-domain.com/webhook",
    "event_types": ["task.executed", "task.failed", "task.paused"],
    "scope": "all"
  }'

See Webhooks for more details.

Common Use Cases

Price Monitoring

Monitor product prices every hour:

{
  "name": "Hourly Price Tracker",
  "cron_expression": "0 * * * *",
  "task_type": "scrape",
  "task_payload": {
    "url": "https://shop.example.com/product/12345",
    "engine": "cheerio",
    "formats": ["markdown"]
  },
  "concurrency_mode": "skip"
}

Weekly Documentation Backup

Crawl and archive documentation weekly:

{
  "name": "Weekly Docs Backup",
  "cron_expression": "0 3 * * 0",
  "timezone": "UTC",
  "task_type": "crawl",
  "task_payload": {
    "url": "https://docs.example.com",
    "engine": "playwright",
    "options": {
      "max_depth": 10,
      "limit": 500
    }
  },
  "max_executions_per_day": 1
}

Daily News Aggregation

Scrape multiple news sources daily:

{
  "name": "Morning News Digest",
  "cron_expression": "0 6 * * *",
  "timezone": "America/New_York",
  "task_type": "scrape",
  "task_payload": {
    "url": "https://news.example.com",
    "engine": "cheerio",
    "formats": ["markdown"]
  },
  "max_executions_per_day": 1
}

Competitive Intelligence

Track competitor mentions and industry trends hourly:

{
  "name": "Competitor Monitoring",
  "cron_expression": "0 * * * *",
  "task_type": "search",
  "task_payload": {
    "query": "YourCompany OR CompetitorA OR CompetitorB",
    "engine": "google",
    "limit": 50,
    "timeRange": "day"
  },
  "concurrency_mode": "skip"
}

Best Practices

1. Choose Appropriate Cron Intervals

Avoid overly frequent schedules (< 1 minute) to prevent rate limiting
Consider website update frequency when setting intervals
Use timezone settings for location-specific timing

2. Set Concurrency Mode Correctly

Use "skip" for most tasks to avoid overlapping executions
Use "queue" only when every execution is critical

3. Manage Credits Wisely

Set max_executions_per_day for expensive tasks
Review min_credits_required in task details (server-calculated estimate)
Monitor credit usage in execution history

4. Monitor Task Health

Check consecutive_failures regularly
Review execution history for patterns
Set up webhook notifications for failures

5. Use Descriptive Names and Tags

Use clear, descriptive task names
Add tags for easy filtering and organization
Include relevant metadata for tracking

Limitations

Item	Limit
Task name length	1-255 characters
Minimum interval	1 minute
Concurrency modes	`skip`, `queue`
Task payload size	100KB
Cron expression format	Standard 5-field format

Troubleshooting

Task Not Executing

Check:

Is the task active? (is_active: true)
Is the task paused? (is_paused: false)
Is the cron expression valid?
Are there sufficient credits?

High Failure Rate

Possible causes:

Target website blocking requests
Invalid URLs in task payload
Insufficient timeout values
Network connectivity issues

Solutions:

Add delays with wait_for parameter
Use different engines (try Playwright for dynamic sites)
Increase timeout values
Check execution history for error messages

Credits Depleting Too Fast

Solutions:

Reduce execution frequency
Set max_executions_per_day limits
Use lower-cost payloads (fewer pages/results, lighter formats)
Review execution history to identify expensive runs

Webhooks - Event notifications for scheduled tasks
Scrape API - Scraping task configuration
Crawl API - Crawling task configuration

Scheduled Tasks

On this page