JSON mode

Introduction

JSON mode enables you to extract structured data from any webpage using Large Language Models (LLMs).

AnyCrawl allows you to define a JSON schema and prompts to guide the extraction process, ensuring the output matches your desired format.

Usage Examples

JSON mode supports three structured extraction modes:

Prompt Only: Use a natural language prompt to describe what you want to extract. The LLM determines the output structure.
Schema Only: Define a JSON schema to strictly constrain the output structure. The LLM fills in the content.
Prompt + Schema: Combine both schema and prompt for precise structure and content guidance.

Tip: Set extract_source to control whether structured extraction reads from generated markdown or original html. Default is markdown; use html for cases where markdown loses important structure.

1. Prompt Only (Flexible Structure)

Steps:

Set the target webpage URL
Write a prompt describing the information to extract
Send the request

curl -X POST "https://api.anycrawl.dev/v1/scrape" \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "json_options": {
      "user_prompt": "Extract the company mission, open source status, and employee count."
    }
  }'

2. Schema Only (Strict Structure)

Steps:

Set the target webpage URL
Define a JSON schema for the desired fields and types
Send the request

curl -X POST "https://api.anycrawl.dev/v1/scrape" \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "json_options": {
      "schema": {
        "type": "object",
        "properties": {
          "company_mission": { "type": "string" },
          "is_open_source": { "type": "boolean" },
          "employee_count": { "type": "number" }
        },
        "required": ["company_mission"]
      }
    }
  }'

3. Prompt + Schema (Guided Structure & Content)

Steps:

Set the target webpage URL
Define a JSON schema for the output structure
Write a prompt to guide the extraction
Send the request

curl -X POST "https://api.anycrawl.dev/v1/scrape" \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "json_options": {
      "schema": {
        "type": "object",
        "properties": {
          "company_mission": { "type": "string" },
          "is_open_source": { "type": "boolean" },
          "employee_count": { "type": "number" }
        },
        "required": ["company_mission"]
      },
      "user_prompt": "Extract the company mission (string), open source status (boolean), and employee count (number)."
    }
  }'

JSON options object

The json_options parameter is an object that accepts the following parameters:

schema: The schema to use for the extraction.
user_prompt: The user prompt to use for the extraction.
schema_name: Optional name of the output that should be generated.
schema_description: Optional description of the output that should be generated.

JSON Schema

The schema to use for the extraction. Common fields include:

type: The type of the value, supported types show below.
properties: (for object) An object specifying the fields and their schemas.
required: (for object) An array of required field names.

Supported Types

Type	Description	Example
`string`	Text data	Company name, descriptions, titles
`number`	Numeric values	Prices, quantities, ratings
`boolean`	True/false values	Availability flags, feature indicators
`object`	Nested structures	Address details, product specifications
`array`	Lists of values or objects	Tags, categories, feature lists

Schema Example

{
    "type": "object",
    "properties": {
        "company_name": {
            "type": "string"
        },
        "is_open_source": {
            "type": "object",
            "properties": {
                "answer": {
                    "type": "string"
                },
                "value": {
                    "type": "boolean"
                }
            }
        },
        "employee_count": {
            "type": "number"
        }
    },
    "required": ["company_name"]
}

JSON mode

On this page