AnyCrawl

Templates

Reusable scraping, crawling, and search recipes with variables and custom logic.

Introduction

Templates are reusable configurations for scraping, crawling, or searching. Instead of repeating the same options in every API call, you define them once in a template (or obtain a template from AnyCrawl Template Store) and reference it by template_id.

Benefits:

  • Simplicity: Call APIs with just template_id + minimal inputs
  • Consistency: Standardize behavior across your team or projects
  • Safety: Templates can restrict allowed domains and expose only necessary variables
  • Power: Optional custom handlers for advanced transformations

Supported types:

  • scrape: single-page extraction via /v1/scrape
  • crawl: multi-page crawling via /v1/crawl
  • search: search engine results via /v1/search

Template Marketplace

Browse ready-to-use templates at anycrawl.dev/template.

How to use:

  1. Browse the marketplace and find a template that fits your needs
  2. Copy the template_id from the template detail page
  3. Call the API with that template_id and required inputs

Example:

curl -X POST "https://api.anycrawl.dev/v1/scrape" \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "template_id": "content-extractor",
    "url": "https://example.com"
  }'

Using Templates in API Calls

Request Parameters

When using template_id, only minimal fields are allowed:

EndpointRequired FieldOptional Fields
/v1/scrapetemplate_idurl, variables
/v1/crawltemplate_idurl, variables
/v1/searchtemplate_idquery, variables

Important notes:

  • url or query may be optional if the template predefines them. Check the template description.
  • variables passes dynamic inputs the template expects (see Variables section below).
  • Other fields (like engine, formats, timeout, etc.) come from the template and cannot be overridden.
  • Providing disallowed fields returns a 400 validation error.

Scrape with a Template

curl -X POST "https://api.anycrawl.dev/v1/scrape" \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "template_id": "my-scrape-template",
    "url": "https://example.com",
    "variables": { "category": "tech" }
  }'

Crawl with a Template

curl -X POST "https://api.anycrawl.dev/v1/crawl" \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "template_id": "my-crawl-template",
    "url": "https://docs.example.com",
    "variables": { "maxPages": 50 }
  }'

Search with a Template

curl -X POST "https://api.anycrawl.dev/v1/search" \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "template_id": "my-search-template",
    "query": "machine learning tutorials",
    "variables": { "lang": "en" }
  }'

Variables

Templates can declare variables to accept dynamic inputs at call time.

  • Each variable has a type: string, number, boolean, or url
  • Variables can be required or optional with defaultValue
  • Check the template description to see what variables it expects

Example request with variables:

{
    "template_id": "blog-scraper",
    "url": "https://example.com/blog/post-123",
    "variables": {
        "author": "john-doe",
        "includeComments": true,
        "maxComments": 50
    }
}

If you omit a required variable or provide the wrong type, you'll get a 400 validation error.

Response Format

Template responses follow the same format as standard API calls:

{
    "success": true,
    "data": {
        "url": "https://example.com",
        "markdown": "# Page Title\n\nContent...",
        "metadata": { ... },
        // Additional fields from custom handlers (if any)
        "extractedData": { ... }
    }
}

Templates with custom handlers may add extra fields to the response.

Error Handling

Common errors when using templates:

ErrorHTTP StatusDescription
Template not found404template_id doesn't exist or you lack access
Validation error400Missing required variables or wrong types
Domain restriction violation403URL not allowed by template's domain policy
Invalid fields400Extra top-level fields not permitted with templates

Example error response:

{
    "success": false,
    "error": "Validation error",
    "message": "When using template_id, only template_id, url, variables are allowed. Invalid fields: engine, formats",
    "data": {
        "type": "validation_error",
        "issues": [
            {
                "field": "engine",
                "message": "Field 'engine' is not allowed when using template_id",
                "code": "invalid_field"
            }
        ],
        "status": "failed"
    }
}

Best Practices

For API Callers

  • Always check the template description for required variables and allowed domains
  • Use marketplace templates when available to save time
  • Handle 404 errors (template may have been deleted or archived)
  • Don't try to override template settings (engine, formats, etc.) - it will fail

For Template Authors

  • Keep templates focused on a single use case
  • Document all variables clearly with descriptions
  • Use domain restrictions to prevent misuse
  • Set appropriate pricing based on complexity
  • Test templates thoroughly before publishing

Creating Templates (Advanced)

If you're creating your own templates, you can configure:

Domain Restrictions

Limit where your template can be used:

{
    "allowedDomains": {
        "type": "glob",
        "patterns": ["*.example.com", "docs.mysite.com"]
    }
}
  • type: "exact" (exact match) or "glob" (pattern matching)
  • patterns: array of allowed domains or patterns

Pricing

Set credit cost per call:

{
    "pricing": {
        "perCall": 10,
        "currency": "credits"
    }
}

Custom Handlers

Write JavaScript/TypeScript code to:

  • requestHandler: Post-process scrape results and add custom fields
  • failedRequestHandler: Handle failures with custom retry logic
  • queryTransform (search only): Transform queries before searching
  • urlTransform (scrape/crawl only): Transform URLs before processing

Both transforms support:

  • Template mode with placeholders (query: {{query}}, url: {{url}})
  • Append mode with prefix and suffix
  • Optional regexExtract to pre-extract a substring before applying the mode

Example regex extraction for TikTok profiles:

{
    "customHandlers": {
        "urlTransform": {
            "enabled": true,
            "mode": "append",
            "prefix": "",
            "suffix": "",
            "regexExtract": {
                "pattern": "^(https?:\\/\\/www\\.tiktok\\.com\\/@[^\\/?#]+)",
                "flags": "i",
                "group": 1
            }
        }
    }
}

This extracts https://www.tiktok.com/@piperrockelle from inputs like:

  • https://www.tiktok.com/@piperrockelle?abb=ccc
  • https://www.tiktok.com/@piperrockelle

Example requestHandler:

// Extract structured data from page context
const title = context.data.title;
const content = context.data.markdown;

return {
    extractedTitle: title,
    wordCount: content.split(/\s+/).length,
    customMetric: calculateMetric(content),
};

Security Model

  • Non-trusted templates: Run in a hardened VM sandbox with strict limitations
  • Trusted templates: Can use async functions with controlled browser page access

Only templates reviewed and approved by AnyCrawl can be marked as trusted.

FAQ

Can I override template settings like engine or formats?

No. Templates are designed to be immutable configurations. You can only provide url/query and variables.

What happens if I use a template from the marketplace?

Marketplace templates are publicly available. You pay the credits defined by the template author.

Can templates see my API key?

No. Templates run in isolated sandboxes and have no access to your credentials.

How do I create my own templates?

Visit the AnyCrawl playground to create and test templates. Once published, they can be used via API.

See Also