예약 작업

소개

예약 작업(Scheduled Tasks)은 표준 cron 식으로 주기적인 웹 스크래핑, 크롤, 데이터 수집을 자동화합니다. 사이트 모니터링, 가격 추적, 콘텐츠 보관, 검색 결과 집계 등 주기적 수집에 적합합니다.

핵심: 타임존이 있는 cron 일정, 여러 동시성 모드, 자동 실패 처리, 크레딧 관리, 이벤트 알림용 웹훅 연동.

핵심 기능

Cron 일정: 표준 cron 문법으로 실행 시각 정의
타임존: Asia/Shanghai, America/New_York 등 원하는 타임존에서 실행
동시성 제어: 두 가지 모드 — 겹치면 건너뛰기(skip) 또는 대기열(queue)
자동 일시정지: 연속 실패 후 자동 일시정지로 리소스 보호
크레딧 관리: 일일 실행 한도 및 예상 크레딧 추적
실행 기록: 상태·지표가 포함된 전체 실행 이력
웹훅 연동: 작업 이벤트 실시간 알림

API 엔드포인트

POST   /v1/scheduled-tasks              # 예약 작업 생성
GET    /v1/scheduled-tasks              # 작업 목록
GET    /v1/scheduled-tasks/:taskId      # 작업 상세
PUT    /v1/scheduled-tasks/:taskId      # 작업 수정
PATCH  /v1/scheduled-tasks/:taskId/pause   # 일시정지
PATCH  /v1/scheduled-tasks/:taskId/resume  # 재개
DELETE /v1/scheduled-tasks/:taskId      # 삭제
GET    /v1/scheduled-tasks/:taskId/executions  # 실행 이력
DELETE /v1/scheduled-tasks/:taskId/executions/:executionId  # 실행 취소

빠른 시작

매일 스크래핑 작업 만들기

curl -X POST "https://api.anycrawl.dev/v1/scheduled-tasks" \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Daily Tech News",
    "description": "Scrape Hacker News daily at 9 AM",
    "cron_expression": "0 9 * * *",
    "timezone": "Asia/Shanghai",
    "task_type": "scrape",
    "task_payload": {
      "url": "https://news.ycombinator.com",
      "engine": "cheerio",
      "formats": ["markdown"]
    },
    "concurrency_mode": "skip",
    "max_executions_per_day": 1
  }'

응답

{
  "success": true,
  "data": {
    "task_id": "550e8400-e29b-41d4-a716-446655440000",
    "next_execution_at": "2026-01-28T01:00:00.000Z"
  }
}

Cron 식 가이드

Cron 식은 5개 필드로 일정을 정의합니다.

* * * * *
│ │ │ │ │
│ │ │ │ └─ 요일 (0-7, 0과 7 = 일요일)
│ │ │ └─── 월 (1-12)
│ │ └───── 일 (1-31)
│ └─────── 시 (0-23)
└───────── 분 (0-59)

흔한 예

Expression	설명	실행 시각
`0 9 * * *`	매일 오전 9:00	매일 09:00:00
`/15 * * *`	15분마다	:00, :15, :30, :45
`0 /6 * *`	6시간마다	00:00, 06:00, 12:00, 18:00
`0 9 * * 1`	매주 월요일 오전 9시	월요일 09:00:00
`0 0 1 * *`	매월 1일	매월 1일 00:00:00
`30 2 * * 0`	매주 일요일 새벽 2:30	일요일 02:30:00

crontab.guru로 cron 식을 검증·이해할 수 있습니다.

요청 파라미터

작업 설정

파라미터	타입	필수	기본값	설명
`name`	string	예	-	작업 이름(1-255자)
`description`	string	아니오	-	설명
`cron_expression`	string	예	-	표준 cron 식(5필드)
`timezone`	string	아니오	`"UTC"`	실행 타임존(예: "Asia/Shanghai")
`task_type`	string	예	-	`"scrape"`, `"crawl"`, `"search"`, `"template"`
`task_payload`	object	예	-	작업 설정(아래 참고)

동시성 제어

파라미터	타입	필수	기본값	설명
`concurrency_mode`	string	아니오	`"skip"`	`"skip"` 또는 `"queue"`
`max_executions_per_day`	number	아니오	-	일일 실행 상한

실행 메타데이터(읽기 전용)

응답에만 포함되며 생성/수정 요청 본문에는 넣을 수 없습니다.

min_credits_required: 한 번 실행에 필요한 예상 최소 크레딧(서버 계산)
consecutive_failures: 자동 일시정지에 쓰이는 연속 실패 횟수

웹훅 연동

파라미터	타입	필수	기본값	설명
`webhook_ids`	string[]	아니오	-	트리거할 웹훅 구독 UUID 배열
`webhook_url`	string	아니오	-	직접 URL(암시적 구독 생성)

기존 웹훅은 webhook_ids로 참조하거나, 이 작업 전용 암시적 구독을 위해 webhook_url을 제공할 수 있습니다.

메타데이터

파라미터	타입	필수	기본값	설명
`tags`	string[]	아니오	-	분류용 태그
`metadata`	object	아니오	-	사용자 정의 메타데이터

작업 페이로드(task_payload)

스크래프 작업

{
  "url": "https://example.com/page",
  "engine": "cheerio",
  "formats": ["markdown"],
  "timeout": 60000,
  "wait_for": 2000,
  "include_tags": ["article", "main"],
  "exclude_tags": ["nav", "footer"]
}

크롤 작업

{
  "url": "https://example.com",
  "engine": "playwright",
  "options": {
    "max_depth": 3,
    "limit": 50,
    "strategy": "same-domain",
    "exclude_paths": ["/admin/*", "/api/*"],
    "scrape_options": {
      "formats": ["markdown"]
    }
  }
}

검색 작업

{
  "query": "artificial intelligence news",
  "engine": "google",
  "limit": 20,
  "country": "US",
  "lang": "en",
  "timeRange": "day"
}

템플릿 작업

{
  "template_id": "my-search-template",
  "query": "machine learning tutorials",
  "variables": {
    "lang": "en"
  }
}

task_type: "template"인 경우 task_payload에 다음이 필요합니다.

template_id(필수): 실행할 템플릿
해당 템플릿 유형이 요구하는 엔드포인트별 입력(예: 스크래프/크롤 템플릿은 url, 검색 템플릿은 query)
선택적 variables로 동적 입력 전달

동시성 모드

skip(권장)

이전 실행이 아직 끝나지 않았으면 이번 실행을 건너뜁니다.

적합: 실행 시간이 간격보다 길 수 있는 작업.

예: 시간마다 돌지만 가끔 90분 걸리는 크롤.

queue

새 실행을 큐에 넣고 이전 실행이 끝날 때까지 대기합니다.

적합: 어떤 실행도 빠뜨리면 안 되는 작업.

예: 예정된 실행을 놓치면 안 되는 중요 수집.

작업 관리

전체 목록

curl -X GET "https://api.anycrawl.dev/v1/scheduled-tasks" \
  -H "Authorization: Bearer <your-api-key>"

응답

{
  "success": true,
  "data": [
    {
      "uuid": "550e8400-e29b-41d4-a716-446655440000",
      "name": "Daily Tech News",
      "task_type": "scrape",
      "cron_expression": "0 9 * * *",
      "timezone": "Asia/Shanghai",
      "is_active": true,
      "is_paused": false,
      "next_execution_at": "2026-01-28T01:00:00.000Z",
      "total_executions": 45,
      "successful_executions": 43,
      "failed_executions": 2,
      "consecutive_failures": 0,
      "last_execution_at": "2026-01-27T01:00:00.000Z",
      "created_at": "2026-01-01T00:00:00.000Z"
    }
  ]
}

작업 일시정지

실행을 잠시 멈춥니다. reason은 작업 기록의 pause_reason에 저장됩니다.

curl -X PATCH "https://api.anycrawl.dev/v1/scheduled-tasks/:taskId/pause" \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "reason": "Maintenance period"
  }'

응답

{
  "success": true,
  "message": "Task paused successfully",
  "data": {
    "uuid": "550e8400-e29b-41d4-a716-446655440000",
    "is_paused": true,
    "pause_reason": "Maintenance period"
  }
}

작업 재개

curl -X PATCH "https://api.anycrawl.dev/v1/scheduled-tasks/:taskId/resume" \
  -H "Authorization: Bearer <your-api-key>"

작업 수정

curl -X PUT "https://api.anycrawl.dev/v1/scheduled-tasks/:taskId" \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "cron_expression": "0 10 * * *",
    "description": "Updated description"
  }'

응답

{
  "success": true,
  "data": {
    "uuid": "550e8400-e29b-41d4-a716-446655440000",
    "name": "Daily Tech News",
    "description": "Updated description",
    "task_type": "scrape",
    "cron_expression": "0 10 * * *",
    "timezone": "Asia/Shanghai",
    "task_payload": {
      "url": "https://news.ycombinator.com",
      "engine": "cheerio",
      "formats": ["markdown"]
    },
    "concurrency_mode": "skip",
    "is_active": true,
    "is_paused": false,
    "next_execution_at": "2026-01-28T02:00:00.000Z",
    "total_executions": 45,
    "successful_executions": 43,
    "failed_executions": 2,
    "consecutive_failures": 0,
    "last_execution_at": "2026-01-27T01:00:00.000Z",
    "created_at": "2026-01-01T00:00:00.000Z",
    "updated_at": "2026-01-27T12:00:00.000Z",
    "icon": "FileText"
  }
}

바꾸려는 필드만 보내세요. 나머지는 그대로 유지됩니다.

작업 삭제

curl -X DELETE "https://api.anycrawl.dev/v1/scheduled-tasks/:taskId" \
  -H "Authorization: Bearer <your-api-key>"

작업을 삭제하면 실행 이력도 모두 삭제됩니다.

실행 이력

실행 목록 조회

curl -X GET "https://api.anycrawl.dev/v1/scheduled-tasks/:taskId/executions?limit=20" \
  -H "Authorization: Bearer <your-api-key>"

응답

{
  "success": true,
  "data": [
    {
      "uuid": "exec-uuid-1",
      "scheduled_task_uuid": "task-uuid",
      "execution_number": 45,
      "status": "completed",
      "started_at": "2026-01-27T01:00:00.000Z",
      "completed_at": "2026-01-27T01:02:15.000Z",
      "duration_ms": 135000,
      "job_uuid": "job-uuid-1",
      "triggered_by": "scheduler",
      "scheduled_for": "2026-01-27T01:00:00.000Z",
      "error_message": null,
      "credits_used": 5,
      "items_processed": 1,
      "items_succeeded": 1,
      "items_failed": 0,
      "job_status": "completed",
      "job_success": true,
      "icon": "CircleCheck"
    }
  ]
}

참고: credits_used, items_processed, items_succeeded, items_failed, job_status, job_success는 연결된 작업 레코드와 JOIN으로 가져옵니다. duration_ms는 started_at과 completed_at으로 계산됩니다.

단일 실행 취소

대기 중이거나 실행 중인 한 번의 실행만 취소합니다. 전체 작업을 멈추지 않고 특정 실행만 중지할 때 유용합니다.

curl -X DELETE "https://api.anycrawl.dev/v1/scheduled-tasks/:taskId/executions/:executionId" \
  -H "Authorization: Bearer <your-api-key>"

응답

{
  "success": true,
  "message": "Execution cancelled successfully"
}

pending 또는 running 상태만 취소할 수 있습니다. 완료·실패·이미 취소된 실행은 오류를 반환합니다.

자동 실패 처리

연속 실패 시 자동 일시정지

연속 5회 실패 후 작업이 자동으로 일시정지되어 리소스 낭비와 과도한 API 호출을 막습니다.

재개: 원인을 해결한 뒤 수동으로 재개하세요.

curl -X PATCH "https://api.anycrawl.dev/v1/scheduled-tasks/:taskId/resume" \
  -H "Authorization: Bearer <your-api-key>"

실패 모니터링

작업 상태의 consecutive_failures로 추적합니다.

{
  "consecutive_failures": 3,
  "failed_executions": 5,
  "successful_executions": 40
}

consecutive_failures를 모니터링하세요. 값이 높으면 반복 문제가 있다는 뜻입니다.

웹훅 연동

웹훅으로 작업 이벤트를 구독합니다.

curl -X POST "https://api.anycrawl.dev/v1/webhooks" \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Task Notifications",
    "webhook_url": "https://your-domain.com/webhook",
    "event_types": ["task.executed", "task.failed", "task.paused"],
    "scope": "all"
  }'

자세한 내용은 Webhooks를 참고하세요.

활용 사례

가격 모니터링

매시간 상품 가격 모니터링:

{
  "name": "Hourly Price Tracker",
  "cron_expression": "0 * * * *",
  "task_type": "scrape",
  "task_payload": {
    "url": "https://shop.example.com/product/12345",
    "engine": "cheerio",
    "formats": ["markdown"]
  },
  "concurrency_mode": "skip"
}

주간 문서 백업

매주 문서 크롤·보관:

{
  "name": "Weekly Docs Backup",
  "cron_expression": "0 3 * * 0",
  "timezone": "UTC",
  "task_type": "crawl",
  "task_payload": {
    "url": "https://docs.example.com",
    "engine": "playwright",
    "options": {
      "max_depth": 10,
      "limit": 500
    }
  },
  "max_executions_per_day": 1
}

일일 뉴스 집계

매일 뉴스 소스 스크래핑:

{
  "name": "Morning News Digest",
  "cron_expression": "0 6 * * *",
  "timezone": "America/New_York",
  "task_type": "scrape",
  "task_payload": {
    "url": "https://news.example.com",
    "engine": "cheerio",
    "formats": ["markdown"]
  },
  "max_executions_per_day": 1
}

경쟁 정보

매시간 경쟁사·업계 동향 추적:

{
  "name": "Competitor Monitoring",
  "cron_expression": "0 * * * *",
  "task_type": "search",
  "task_payload": {
    "query": "YourCompany OR CompetitorA OR CompetitorB",
    "engine": "google",
    "limit": 50,
    "timeRange": "day"
  },
  "concurrency_mode": "skip"
}