---
# **🔥 Firecrawl**
**Turn websites into LLM-ready data.**
[**Firecrawl**](https://firecrawl.dev/?ref=github) is an API that scrapes, crawls, and extracts structured data from any website, powering AI agents and apps with real-time context from the web.
Looking for our MCP? Check out the repo [here](https://github.com/firecrawl/firecrawl-mcp-server).
*This repository is in development, and we're still integrating custom modules into the mono repo. It's not fully ready for self-hosted deployment yet, but you can run it locally.*
_Pst. Hey, you, join our stargazers :)_
---
## Why Firecrawl?
- **LLM-ready output**: Clean markdown, structured JSON, screenshots, HTML, and more
- **Industry-leading reliability**: >80% coverage on [benchmark evaluations](https://www.firecrawl.dev/blog/the-worlds-best-web-data-api-v25), outperforming every other provider tested
- **Handles the hard stuff**: Proxies, JavaScript rendering, and dynamic content that breaks other scrapers
- **Customization**: Exclude tags, crawl behind auth walls, max depth, and more
- **Media parsing**: Automatic text extraction from PDFs, DOCX, and images
- **Actions**: Click, scroll, input, wait, and more before extracting
- **Batch processing**: Scrape thousands of URLs asynchronously
- **Change tracking**: Monitor website content changes over time
---
## Quick Start
Sign up at [firecrawl.dev](https://firecrawl.dev) to get your API key and start extracting data in seconds. Try the [playground](https://firecrawl.dev/playground) to test it out.
### Make Your First API Request
```bash
curl -X POST 'https://api.firecrawl.dev/v2/scrape' \
-H 'Authorization: Bearer fc-YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{"url": "https://example.com"}'
```
Response:
```json
{
"success": true,
"data": {
"markdown": "# Example Domain
This domain is for use in illustrative examples...",
"metadata": {
"title": "Example Domain",
"sourceURL": "https://example.com"
}
}
}
```
---
## Feature Overview
| Feature | Description |
|---------|-------------|
| [**Scrape**](#scraping) | Convert any URL to markdown, HTML, screenshots, or structured JSON |
| [**Search**](#search) | Search the web and get full page content from results |
| [**Map**](#map) | Discover all URLs on a website instantly |
| [**Crawl**](#crawling) | Scrape all URLs of a website with a single request |
| [**Agent**](#agent) | Automated data gathering, just describe what you need |
---
## Scrape
Convert any URL to clean markdown, HTML, or structured data.
```bash
curl -X POST 'https://api.firecrawl.dev/v2/scrape' \
-H 'Authorization: Bearer fc-YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
"url": "https://docs.firecrawl.dev",
"formats": ["markdown", "html"]
}'
```
Response:
```json
{
"success": true,
"data": {
"markdown": "# Firecrawl Docs
Turn websites into LLM-ready data...",
"html": "...",
"metadata": {
"title": "Quickstart | Firecrawl",
"description": "Firecrawl allows you to turn entire websites into LLM-ready markdown",
"sourceURL": "https://docs.firecrawl.dev",
"statusCode": 200
}
}
}
```
### Extract Structured Data (JSON Mode)
Extract structured data using a schema:
```python
from firecrawl import Firecrawl
from pydantic import BaseModel
app = Firecrawl(api_key="fc-YOUR_API_KEY")
class CompanyInfo(BaseModel):
company_mission: str
is_open_source: bool
is_in_yc: bool
result = app.scrape(
'https://firecrawl.dev',
formats=[{"type": "json", "schema": CompanyInfo.model_json_schema()}]
)
print(result.json)
```
```json
{"company_mission": "Turn websites into LLM-ready data", "is_open_source": true, "is_in_yc": true}
```
Or extract with just a prompt (no schema):
```python
result = app.scrape(
'https://firecrawl.dev',
formats=[{"type": "json", "prompt": "Extract the company mission"}]
)
```
### Scrape Formats
Available formats: `markdown`, `html`, `rawHtml`, `screenshot`, `links`, `json`, `branding`
**Get a screenshot**
```python
doc = app.scrape("https://firecrawl.dev", formats=["screenshot"])
print(doc.screenshot) # Base64 encoded image
```
**Extract brand identity (colors, fonts, typography)**
```python
doc = app.scrape("https://firecrawl.dev", formats=["branding"])
print(doc.branding) # {"colors": {...}, "fonts": [...], "typography": {...}}
```
### Actions (Interact Before Scraping)
Click, type, scroll, and more before extracting:
```python
doc = app.scrape(
url="https://example.com/login",
formats=["markdown"],
actions=[
{"type": "write", "text": "user@example.com"},
{"type": "press", "key": "Tab"},
{"type": "write", "text": "password"},
{"type": "click", "selector": 'button[type="submit"]'},
{"type": "wait", "milliseconds": 2000},
{"type": "screenshot"}
]
)
```
---
## Search
Search the web and optionally scrape the results.
```bash
curl -X POST 'https://api.firecrawl.dev/v2/search' \
-H 'Authorization: Bearer fc-YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
"query": "firecrawl web scraping",
"limit": 5
}'
```
Response:
```json
{
"success": true,
"data": {
"web": [
{
"url": "https://www.firecrawl.dev/",
"title": "Firecrawl - The Web Data API for AI",
"description": "The web crawling, scraping, and search API for AI.",
"position": 1
}
],
"images": [...],
"news": [...]
}
}
```
### Search with Content Scraping
Get the full content of search results:
```python
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR_API_KEY")
results = firecrawl.search(
"firecrawl web scraping",
limit=3,
scrape_options={
"formats": ["markdown", "links"]
}
)
```
---
## Agent
**The easiest way to get data from the web.** Describe what you need, and our AI agent searches, navigates, and extracts it. No URLs required.
Agent is the evolution of our `/extract` endpoint: faster, more reliable, and doesn't require you to know the URLs upfront.
```bash
curl -X POST 'https://api.firecrawl.dev/v2/agent' \
-H 'Authorization: Bearer fc-YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
"prompt": "Find the pricing plans for Notion"
}'
```
Response:
```json
{
"success": true,
"data": {
"result": "Notion offers the following pricing plans:
1. Free - $0/month...
2. Plus - $10/seat/month...
3. Business - $18/seat/month...",
"sources": ["https://www.notion.so/pricing"]
}
}
```
### Agent with Structured Output
Use a schema to get structured data:
```python
from firecrawl import Firecrawl
from pydantic import BaseModel, Field
from typing import List, Optional
app = Firecrawl(api_key="fc-YOUR_API_KEY")
class Founder(BaseModel):
name: str = Field(description="Full name of the founder")
role: Optional[str] = Field(None, description="Role or position")
class FoundersSchema(BaseModel):
founders: List[Founder] = Field(description="List of founders")
result = app.agent(
prompt="Find the founders of Firecrawl",
schema=FoundersSchema
)
print(result.data)
```
```json
{
"founders": [
{"name": "Eric Ciarla", "role": "Co-founder"},
{"name": "Nicolas Camara", "role": "Co-founder"},
{"name": "Caleb Peffer", "role": "Co-founder"}
]
}
```
### Agent with URLs (Optional)
Focus the agent on specific pages:
```python
result = app.agent(
urls=["https://docs.firecrawl.dev", "https://firecrawl.dev/pricing"],
prompt="Compare the features and pricing information"
)
```
### Model Selection
Choose between two models based on your needs:
| Model | Cost | Best For |
|-------|------|----------|
| `spark-1-mini` (default) | 60% cheaper | Most tasks |
| `spark-1-pro` | Standard | Complex research, critical extraction |
```python
result = app.agent(
prompt="Compare enterprise features across Firecrawl, Apify, and ScrapingBee",
model="spark-1-pro"
)
```
**When to use Pro:**
- Comparing data across multiple websites
- Extracting from sites with complex navigation or auth
- Research tasks where the agent needs to explore multiple paths
- Critical data where accuracy is paramount
Learn more about Spark models in our [Agen
---