Scheduled Tasks
Automate recurring scraping, crawling, and search operations with cron-based scheduling.
Introduction
Scheduled Tasks allow you to automate recurring web scraping, crawling, and data collection operations using standard cron expressions. Perfect for monitoring websites, tracking price changes, archiving content, aggregating search results, or any periodic data collection needs.
Key Features: Cron-based scheduling with timezone support, multiple concurrency modes, automatic failure handling, credit management, and webhook integration for event notifications.
Core Features
- Cron Expression Scheduling: Use standard cron syntax to define execution schedules
- Timezone Support: Execute tasks in any timezone (e.g., Asia/Shanghai, America/New_York)
- Concurrency Control: Two modes - skip or queue concurrent executions
- Automatic Pause: Auto-pause after consecutive failures to protect resources
- Credit Management: Daily execution limits and estimated credit tracking
- Execution History: Track all executions with detailed status and metrics
- Webhook Integration: Receive real-time notifications for task events
API Endpoints
POST /v1/scheduled-tasks # Create a scheduled task
GET /v1/scheduled-tasks # List all tasks
GET /v1/scheduled-tasks/:taskId # Get task details
PUT /v1/scheduled-tasks/:taskId # Update task
PATCH /v1/scheduled-tasks/:taskId/pause # Pause task
PATCH /v1/scheduled-tasks/:taskId/resume # Resume task
DELETE /v1/scheduled-tasks/:taskId # Delete task
GET /v1/scheduled-tasks/:taskId/executions # Get execution history
DELETE /v1/scheduled-tasks/:taskId/executions/:executionId # Cancel an executionQuick Start
Creating a Daily Scraping Task
curl -X POST "https://api.anycrawl.dev/v1/scheduled-tasks" \
-H "Authorization: Bearer <your-api-key>" \
-H "Content-Type: application/json" \
-d '{
"name": "Daily Tech News",
"description": "Scrape Hacker News daily at 9 AM",
"cron_expression": "0 9 * * *",
"timezone": "Asia/Shanghai",
"task_type": "scrape",
"task_payload": {
"url": "https://news.ycombinator.com",
"engine": "cheerio",
"formats": ["markdown"]
},
"concurrency_mode": "skip",
"max_executions_per_day": 1
}'Response
{
"success": true,
"data": {
"task_id": "550e8400-e29b-41d4-a716-446655440000",
"next_execution_at": "2026-01-28T01:00:00.000Z"
}
}Cron Expression Guide
Cron expressions use 5 fields to define the schedule:
* * * * *
│ │ │ │ │
│ │ │ │ └─ Day of week (0-7, 0 and 7 = Sunday)
│ │ │ └─── Month (1-12)
│ │ └───── Day of month (1-31)
│ └─────── Hour (0-23)
└───────── Minute (0-59)Common Examples
| Expression | Description | Execution Time |
|---|---|---|
0 9 * * * | Every day at 9:00 AM | 09:00:00 daily |
*/15 * * * * | Every 15 minutes | :00, :15, :30, :45 |
0 */6 * * * | Every 6 hours | 00:00, 06:00, 12:00, 18:00 |
0 9 * * 1 | Every Monday at 9 AM | Monday 09:00:00 |
0 0 1 * * | First day of month | 1st day, 00:00:00 |
30 2 * * 0 | Every Sunday at 2:30 AM | Sunday 02:30:00 |
Use crontab.guru to validate and understand cron expressions.
Request Parameters
Task Configuration
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
name | string | Yes | - | Task name (1-255 characters) |
description | string | No | - | Task description |
cron_expression | string | Yes | - | Standard cron expression (5 fields) |
timezone | string | No | "UTC" | Timezone for execution (e.g., "Asia/Shanghai") |
task_type | string | Yes | - | Task type: "scrape", "crawl", "search", or "template" |
task_payload | object | Yes | - | Task configuration (see below) |
Concurrency Control
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
concurrency_mode | string | No | "skip" | Mode: "skip" or "queue" |
max_executions_per_day | number | No | - | Daily execution limit |
Execution Metadata (Read-only)
These fields are returned in task responses and are not accepted in create/update request bodies:
min_credits_required: Estimated minimum credits for one execution (server-calculated)consecutive_failures: Consecutive failure count used for automatic pausing
Webhook Integration
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
webhook_ids | string[] | No | - | Array of webhook subscription UUIDs to trigger |
webhook_url | string | No | - | Direct webhook URL (creates implicit subscription) |
You can either use webhook_ids to reference existing webhook subscriptions, or provide a webhook_url to create an implicit webhook subscription for this task.
Metadata
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
tags | string[] | No | - | Tags for organization |
metadata | object | No | - | Custom metadata |
Task Payload
Scrape Task
{
"url": "https://example.com/page",
"engine": "cheerio",
"formats": ["markdown"],
"timeout": 60000,
"wait_for": 2000,
"include_tags": ["article", "main"],
"exclude_tags": ["nav", "footer"]
}Crawl Task
{
"url": "https://example.com",
"engine": "playwright",
"options": {
"max_depth": 3,
"limit": 50,
"strategy": "same-domain",
"exclude_paths": ["/admin/*", "/api/*"],
"scrape_options": {
"formats": ["markdown"]
}
}
}Search Task
{
"query": "artificial intelligence news",
"engine": "google",
"limit": 20,
"country": "US",
"lang": "en",
"timeRange": "day"
}Template Task
{
"template_id": "my-search-template",
"query": "machine learning tutorials",
"variables": {
"lang": "en"
}
}For task_type: "template", task_payload must include:
template_id(required): template to execute- endpoint-specific input fields required by that template type (for example
urlfor scrape/crawl templates,queryfor search templates) - optional
variablesto pass dynamic template inputs
Concurrency Modes
skip (Recommended)
If a previous execution is still running, skip the current execution.
Best for: Tasks that may take longer than the interval between executions.
Example: A crawl job scheduled every hour that sometimes takes 90 minutes.
queue
Queue the new execution and wait for the previous one to complete.
Best for: Tasks where every execution must happen without skipping.
Example: Critical data collection that must not miss any scheduled runs.
Managing Tasks
List All Tasks
curl -X GET "https://api.anycrawl.dev/v1/scheduled-tasks" \
-H "Authorization: Bearer <your-api-key>"Response
{
"success": true,
"data": [
{
"uuid": "550e8400-e29b-41d4-a716-446655440000",
"name": "Daily Tech News",
"task_type": "scrape",
"cron_expression": "0 9 * * *",
"timezone": "Asia/Shanghai",
"is_active": true,
"is_paused": false,
"next_execution_at": "2026-01-28T01:00:00.000Z",
"total_executions": 45,
"successful_executions": 43,
"failed_executions": 2,
"consecutive_failures": 0,
"last_execution_at": "2026-01-27T01:00:00.000Z",
"created_at": "2026-01-01T00:00:00.000Z"
}
]
}Pause a Task
Pause a task to temporarily stop its execution. The reason parameter will be stored as pause_reason in the task record.
curl -X PATCH "https://api.anycrawl.dev/v1/scheduled-tasks/:taskId/pause" \
-H "Authorization: Bearer <your-api-key>" \
-H "Content-Type: application/json" \
-d '{
"reason": "Maintenance period"
}'Response
{
"success": true,
"message": "Task paused successfully",
"data": {
"uuid": "550e8400-e29b-41d4-a716-446655440000",
"is_paused": true,
"pause_reason": "Maintenance period"
}
}Resume a Task
curl -X PATCH "https://api.anycrawl.dev/v1/scheduled-tasks/:taskId/resume" \
-H "Authorization: Bearer <your-api-key>"Update a Task
curl -X PUT "https://api.anycrawl.dev/v1/scheduled-tasks/:taskId" \
-H "Authorization: Bearer <your-api-key>" \
-H "Content-Type: application/json" \
-d '{
"cron_expression": "0 10 * * *",
"description": "Updated description"
}'Response
{
"success": true,
"data": {
"uuid": "550e8400-e29b-41d4-a716-446655440000",
"name": "Daily Tech News",
"description": "Updated description",
"task_type": "scrape",
"cron_expression": "0 10 * * *",
"timezone": "Asia/Shanghai",
"task_payload": {
"url": "https://news.ycombinator.com",
"engine": "cheerio",
"formats": ["markdown"]
},
"concurrency_mode": "skip",
"is_active": true,
"is_paused": false,
"next_execution_at": "2026-01-28T02:00:00.000Z",
"total_executions": 45,
"successful_executions": 43,
"failed_executions": 2,
"consecutive_failures": 0,
"last_execution_at": "2026-01-27T01:00:00.000Z",
"created_at": "2026-01-01T00:00:00.000Z",
"updated_at": "2026-01-27T12:00:00.000Z",
"icon": "FileText"
}
}Only include fields you want to update. All other fields will remain unchanged.
Delete a Task
curl -X DELETE "https://api.anycrawl.dev/v1/scheduled-tasks/:taskId" \
-H "Authorization: Bearer <your-api-key>"Deleting a task also deletes all its execution history.
Execution History
Get Executions
curl -X GET "https://api.anycrawl.dev/v1/scheduled-tasks/:taskId/executions?limit=20" \
-H "Authorization: Bearer <your-api-key>"Response
{
"success": true,
"data": [
{
"uuid": "exec-uuid-1",
"scheduled_task_uuid": "task-uuid",
"execution_number": 45,
"status": "completed",
"started_at": "2026-01-27T01:00:00.000Z",
"completed_at": "2026-01-27T01:02:15.000Z",
"duration_ms": 135000,
"job_uuid": "job-uuid-1",
"triggered_by": "scheduler",
"scheduled_for": "2026-01-27T01:00:00.000Z",
"error_message": null,
"credits_used": 5,
"items_processed": 1,
"items_succeeded": 1,
"items_failed": 0,
"job_status": "completed",
"job_success": true,
"icon": "CircleCheck"
}
]
}Note: The credits_used, items_processed, items_succeeded, items_failed, job_status, and job_success fields are retrieved from the associated job record via JOIN. The duration_ms is calculated from started_at and completed_at.
Cancel an Execution
Cancel a single execution that is currently pending or running. This is useful when you need to stop a specific execution without pausing the entire task.
curl -X DELETE "https://api.anycrawl.dev/v1/scheduled-tasks/:taskId/executions/:executionId" \
-H "Authorization: Bearer <your-api-key>"Response
{
"success": true,
"message": "Execution cancelled successfully"
}Only executions with status pending or running can be cancelled. Completed, failed, or already cancelled executions will return an error.
Automatic Failure Handling
Auto-Pause on Consecutive Failures
Tasks are automatically paused after 5 consecutive failures to prevent resource waste and excessive API calls.
Resume: Manually resume the task after fixing the underlying issue.
curl -X PATCH "https://api.anycrawl.dev/v1/scheduled-tasks/:taskId/resume" \
-H "Authorization: Bearer <your-api-key>"Monitoring Failures
Track failures using the consecutive_failures field in the task status:
{
"consecutive_failures": 3,
"failed_executions": 5,
"successful_executions": 40
}Monitor the consecutive_failures field. High values indicate recurring issues that need attention.
Integration with Webhooks
Subscribe to task events using webhooks:
curl -X POST "https://api.anycrawl.dev/v1/webhooks" \
-H "Authorization: Bearer <your-api-key>" \
-H "Content-Type: application/json" \
-d '{
"name": "Task Notifications",
"webhook_url": "https://your-domain.com/webhook",
"event_types": ["task.executed", "task.failed", "task.paused"],
"scope": "all"
}'See Webhooks for more details.
Common Use Cases
Price Monitoring
Monitor product prices every hour:
{
"name": "Hourly Price Tracker",
"cron_expression": "0 * * * *",
"task_type": "scrape",
"task_payload": {
"url": "https://shop.example.com/product/12345",
"engine": "cheerio",
"formats": ["markdown"]
},
"concurrency_mode": "skip"
}Weekly Documentation Backup
Crawl and archive documentation weekly:
{
"name": "Weekly Docs Backup",
"cron_expression": "0 3 * * 0",
"timezone": "UTC",
"task_type": "crawl",
"task_payload": {
"url": "https://docs.example.com",
"engine": "playwright",
"options": {
"max_depth": 10,
"limit": 500
}
},
"max_executions_per_day": 1
}Daily News Aggregation
Scrape multiple news sources daily:
{
"name": "Morning News Digest",
"cron_expression": "0 6 * * *",
"timezone": "America/New_York",
"task_type": "scrape",
"task_payload": {
"url": "https://news.example.com",
"engine": "cheerio",
"formats": ["markdown"]
},
"max_executions_per_day": 1
}Competitive Intelligence
Track competitor mentions and industry trends hourly:
{
"name": "Competitor Monitoring",
"cron_expression": "0 * * * *",
"task_type": "search",
"task_payload": {
"query": "YourCompany OR CompetitorA OR CompetitorB",
"engine": "google",
"limit": 50,
"timeRange": "day"
},
"concurrency_mode": "skip"
}Best Practices
1. Choose Appropriate Cron Intervals
- Avoid overly frequent schedules (< 1 minute) to prevent rate limiting
- Consider website update frequency when setting intervals
- Use timezone settings for location-specific timing
2. Set Concurrency Mode Correctly
- Use
"skip"for most tasks to avoid overlapping executions - Use
"queue"only when every execution is critical
3. Manage Credits Wisely
- Set
max_executions_per_dayfor expensive tasks - Review
min_credits_requiredin task details (server-calculated estimate) - Monitor credit usage in execution history
4. Monitor Task Health
- Check
consecutive_failuresregularly - Review execution history for patterns
- Set up webhook notifications for failures
5. Use Descriptive Names and Tags
- Use clear, descriptive task names
- Add tags for easy filtering and organization
- Include relevant metadata for tracking
Limitations
| Item | Limit |
|---|---|
| Task name length | 1-255 characters |
| Minimum interval | 1 minute |
| Concurrency modes | skip, queue |
| Task payload size | 100KB |
| Cron expression format | Standard 5-field format |
Troubleshooting
Task Not Executing
Check:
- Is the task active? (
is_active: true) - Is the task paused? (
is_paused: false) - Is the cron expression valid?
- Are there sufficient credits?
High Failure Rate
Possible causes:
- Target website blocking requests
- Invalid URLs in task payload
- Insufficient timeout values
- Network connectivity issues
Solutions:
- Add delays with
wait_forparameter - Use different engines (try Playwright for dynamic sites)
- Increase timeout values
- Check execution history for error messages
Credits Depleting Too Fast
Solutions:
- Reduce execution frequency
- Set
max_executions_per_daylimits - Use lower-cost payloads (fewer pages/results, lighter formats)
- Review execution history to identify expensive runs
Related Documentation
- Webhooks - Event notifications for scheduled tasks
- Scrape API - Scraping task configuration
- Crawl API - Crawling task configuration