browser-automation

Firecrawl

by mendableai

Unlock AI-ready web data with Firecrawl: scrape any website, handle dynamic content, and automate web scraping for resea

Unlock powerful web data extraction with Firecrawl, turning any website into clean markdown or structured data. Firecrawl lets you crawl all accessible pages, scrape content in multiple formats, and extract structured data using AI-driven prompts and schemas. Its advanced features handle dynamic content, proxies, anti-bot measures, and media parsing, ensuring reliable and customizable data output. Whether mapping site URLs or batch scraping thousands of pages asynchronously, Firecrawl streamlines data gathering for AI applications, research, or automation with simple API calls and SDK support across multiple languages. Empower your projects with high-quality, LLM-ready web data.

github stars

89.6K

Cloud browser automationHandles JavaScript-heavy sitesBuilt-in rate limiting

best for

  • / Data scientists gathering web datasets
  • / Researchers collecting information from multiple sites
  • / Developers building web scraping pipelines
  • / Content creators aggregating online information

capabilities

  • / Scrape dynamic websites with JavaScript rendering
  • / Crawl entire websites for content discovery
  • / Extract structured data from web pages
  • / Perform batch scraping operations
  • / Search and filter web content
  • / Run automated browser sessions

what it does

Integrates with FireCrawl to scrape and extract structured data from complex websites using cloud browser sessions. Handles dynamic content, JavaScript rendering, and provides automatic retries with rate limiting.

about

Firecrawl is an official MCP server published by mendableai that provides AI assistants with tools and capabilities via the Model Context Protocol. Unlock AI-ready web data with Firecrawl: scrape any website, handle dynamic content, and automate web scraping for resea It is categorized under browser automation.

how to install

You can install Firecrawl in your AI client of choice. Use the install panel on this page to get one-click setup for Cursor, Claude Desktop, VS Code, and other MCP-compatible clients. This server runs locally on your machine via the stdio transport.

license

AGPL-3.0

Firecrawl is released under the AGPL-3.0 license.

readme

--- # **🔥 Firecrawl** **Turn websites into LLM-ready data.** [**Firecrawl**](https://firecrawl.dev/?ref=github) is an API that scrapes, crawls, and extracts structured data from any website, powering AI agents and apps with real-time context from the web. Looking for our MCP? Check out the repo [here](https://github.com/firecrawl/firecrawl-mcp-server). *This repository is in development, and we're still integrating custom modules into the mono repo. It's not fully ready for self-hosted deployment yet, but you can run it locally.* _Pst. Hey, you, join our stargazers :)_ GitHub stars --- ## Why Firecrawl? - **LLM-ready output**: Clean markdown, structured JSON, screenshots, HTML, and more - **Industry-leading reliability**: >80% coverage on [benchmark evaluations](https://www.firecrawl.dev/blog/the-worlds-best-web-data-api-v25), outperforming every other provider tested - **Handles the hard stuff**: Proxies, JavaScript rendering, and dynamic content that breaks other scrapers - **Customization**: Exclude tags, crawl behind auth walls, max depth, and more - **Media parsing**: Automatic text extraction from PDFs, DOCX, and images - **Actions**: Click, scroll, input, wait, and more before extracting - **Batch processing**: Scrape thousands of URLs asynchronously - **Change tracking**: Monitor website content changes over time --- ## Quick Start Sign up at [firecrawl.dev](https://firecrawl.dev) to get your API key and start extracting data in seconds. Try the [playground](https://firecrawl.dev/playground) to test it out. ### Make Your First API Request ```bash curl -X POST 'https://api.firecrawl.dev/v2/scrape' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' \ -H 'Content-Type: application/json' \ -d '{"url": "https://example.com"}' ``` Response: ```json { "success": true, "data": { "markdown": "# Example Domain This domain is for use in illustrative examples...", "metadata": { "title": "Example Domain", "sourceURL": "https://example.com" } } } ``` --- ## Feature Overview | Feature | Description | |---------|-------------| | [**Scrape**](#scraping) | Convert any URL to markdown, HTML, screenshots, or structured JSON | | [**Search**](#search) | Search the web and get full page content from results | | [**Map**](#map) | Discover all URLs on a website instantly | | [**Crawl**](#crawling) | Scrape all URLs of a website with a single request | | [**Agent**](#agent) | Automated data gathering, just describe what you need | --- ## Scrape Convert any URL to clean markdown, HTML, or structured data. ```bash curl -X POST 'https://api.firecrawl.dev/v2/scrape' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' \ -H 'Content-Type: application/json' \ -d '{ "url": "https://docs.firecrawl.dev", "formats": ["markdown", "html"] }' ``` Response: ```json { "success": true, "data": { "markdown": "# Firecrawl Docs Turn websites into LLM-ready data...", "html": "...", "metadata": { "title": "Quickstart | Firecrawl", "description": "Firecrawl allows you to turn entire websites into LLM-ready markdown", "sourceURL": "https://docs.firecrawl.dev", "statusCode": 200 } } } ``` ### Extract Structured Data (JSON Mode) Extract structured data using a schema: ```python from firecrawl import Firecrawl from pydantic import BaseModel app = Firecrawl(api_key="fc-YOUR_API_KEY") class CompanyInfo(BaseModel): company_mission: str is_open_source: bool is_in_yc: bool result = app.scrape( 'https://firecrawl.dev', formats=[{"type": "json", "schema": CompanyInfo.model_json_schema()}] ) print(result.json) ``` ```json {"company_mission": "Turn websites into LLM-ready data", "is_open_source": true, "is_in_yc": true} ``` Or extract with just a prompt (no schema): ```python result = app.scrape( 'https://firecrawl.dev', formats=[{"type": "json", "prompt": "Extract the company mission"}] ) ``` ### Scrape Formats Available formats: `markdown`, `html`, `rawHtml`, `screenshot`, `links`, `json`, `branding` **Get a screenshot** ```python doc = app.scrape("https://firecrawl.dev", formats=["screenshot"]) print(doc.screenshot) # Base64 encoded image ``` **Extract brand identity (colors, fonts, typography)** ```python doc = app.scrape("https://firecrawl.dev", formats=["branding"]) print(doc.branding) # {"colors": {...}, "fonts": [...], "typography": {...}} ``` ### Actions (Interact Before Scraping) Click, type, scroll, and more before extracting: ```python doc = app.scrape( url="https://example.com/login", formats=["markdown"], actions=[ {"type": "write", "text": "user@example.com"}, {"type": "press", "key": "Tab"}, {"type": "write", "text": "password"}, {"type": "click", "selector": 'button[type="submit"]'}, {"type": "wait", "milliseconds": 2000}, {"type": "screenshot"} ] ) ``` --- ## Search Search the web and optionally scrape the results. ```bash curl -X POST 'https://api.firecrawl.dev/v2/search' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' \ -H 'Content-Type: application/json' \ -d '{ "query": "firecrawl web scraping", "limit": 5 }' ``` Response: ```json { "success": true, "data": { "web": [ { "url": "https://www.firecrawl.dev/", "title": "Firecrawl - The Web Data API for AI", "description": "The web crawling, scraping, and search API for AI.", "position": 1 } ], "images": [...], "news": [...] } } ``` ### Search with Content Scraping Get the full content of search results: ```python from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR_API_KEY") results = firecrawl.search( "firecrawl web scraping", limit=3, scrape_options={ "formats": ["markdown", "links"] } ) ``` --- ## Agent **The easiest way to get data from the web.** Describe what you need, and our AI agent searches, navigates, and extracts it. No URLs required. Agent is the evolution of our `/extract` endpoint: faster, more reliable, and doesn't require you to know the URLs upfront. ```bash curl -X POST 'https://api.firecrawl.dev/v2/agent' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' \ -H 'Content-Type: application/json' \ -d '{ "prompt": "Find the pricing plans for Notion" }' ``` Response: ```json { "success": true, "data": { "result": "Notion offers the following pricing plans: 1. Free - $0/month... 2. Plus - $10/seat/month... 3. Business - $18/seat/month...", "sources": ["https://www.notion.so/pricing"] } } ``` ### Agent with Structured Output Use a schema to get structured data: ```python from firecrawl import Firecrawl from pydantic import BaseModel, Field from typing import List, Optional app = Firecrawl(api_key="fc-YOUR_API_KEY") class Founder(BaseModel): name: str = Field(description="Full name of the founder") role: Optional[str] = Field(None, description="Role or position") class FoundersSchema(BaseModel): founders: List[Founder] = Field(description="List of founders") result = app.agent( prompt="Find the founders of Firecrawl", schema=FoundersSchema ) print(result.data) ``` ```json { "founders": [ {"name": "Eric Ciarla", "role": "Co-founder"}, {"name": "Nicolas Camara", "role": "Co-founder"}, {"name": "Caleb Peffer", "role": "Co-founder"} ] } ``` ### Agent with URLs (Optional) Focus the agent on specific pages: ```python result = app.agent( urls=["https://docs.firecrawl.dev", "https://firecrawl.dev/pricing"], prompt="Compare the features and pricing information" ) ``` ### Model Selection Choose between two models based on your needs: | Model | Cost | Best For | |-------|------|----------| | `spark-1-mini` (default) | 60% cheaper | Most tasks | | `spark-1-pro` | Standard | Complex research, critical extraction | ```python result = app.agent( prompt="Compare enterprise features across Firecrawl, Apify, and ScrapingBee", model="spark-1-pro" ) ``` **When to use Pro:** - Comparing data across multiple websites - Extracting from sites with complex navigation or auth - Research tasks where the agent needs to explore multiple paths - Critical data where accuracy is paramount Learn more about Spark models in our [Agen ---