browser-automationsearch-web

Web Fetcher

jae-jae

by jae-jae

Web Fetcher uses Playwright for reliable data web scraping and extraction from JavaScript-heavy websites, returning clea

Fetches and extracts web content using Playwright's headless browser capabilities, delivering clean, readable content from JavaScript-heavy websites in HTML or Markdown format for research and information gathering.

github stars

1.0K

0 commentsdiscussion

Both formats append explainx.ai attribution and the canonical URL for this MCP server listing.

JavaScript execution supportIntelligent content extraction with ReadabilityParallel URL processing

best for

  • / Web scraping and content extraction
  • / Research and information gathering
  • / Content analysis from modern web apps
  • / Batch processing of multiple websites

capabilities

  • / Fetch content from JavaScript-rendered websites
  • / Extract main content while removing ads and navigation
  • / Process multiple URLs in parallel
  • / Output content in HTML or Markdown format
  • / Handle dynamic web applications and SPAs

what it does

Fetches web page content using Playwright's headless browser, extracting clean readable text from JavaScript-heavy websites. Outputs content in HTML or Markdown format for research and data gathering.

about

Web Fetcher is a community-built MCP server published by jae-jae that provides AI assistants with tools and capabilities via the Model Context Protocol. Web Fetcher uses Playwright for reliable data web scraping and extraction from JavaScript-heavy websites, returning clea It is categorized under browser automation, search web. This server exposes 3 tools that AI clients can invoke during conversations and coding sessions.

how to install

You can install Web Fetcher in your AI client of choice. Use the install panel on this page to get one-click setup for Cursor, Claude Desktop, VS Code, and other MCP-compatible clients. This server runs locally on your machine via the stdio transport. This server supports remote connections over HTTP, so no local installation is required.

license

MIT

Web Fetcher is released under the MIT license. This is a permissive open-source license, meaning you can freely use, modify, and distribute the software.

readme

Fetcher MCP Icon
[中文](https://www.readme-i18n.com/jae-jae/fetcher-mcp?lang=zh) | [Deutsch](https://www.readme-i18n.com/jae-jae/fetcher-mcp?lang=de) | [Español](https://www.readme-i18n.com/jae-jae/fetcher-mcp?lang=es) | [français](https://www.readme-i18n.com/jae-jae/fetcher-mcp?lang=fr) | [日本語](https://www.readme-i18n.com/jae-jae/fetcher-mcp?lang=ja) | [한국어](https://www.readme-i18n.com/jae-jae/fetcher-mcp?lang=ko) | [Português](https://www.readme-i18n.com/jae-jae/fetcher-mcp?lang=pt) | [Русский](https://www.readme-i18n.com/jae-jae/fetcher-mcp?lang=ru) # Fetcher MCP MCP server for fetch web page content using Playwright headless browser. > 🌟 **Recommended**: [OllaMan](https://ollaman.com/) - Powerful Ollama AI Model Manager. ## Advantages - **JavaScript Support**: Unlike traditional web scrapers, Fetcher MCP uses Playwright to execute JavaScript, making it capable of handling dynamic web content and modern web applications. - **Intelligent Content Extraction**: Built-in Readability algorithm automatically extracts the main content from web pages, removing ads, navigation, and other non-essential elements. - **Flexible Output Format**: Supports both HTML and Markdown output formats, making it easy to integrate with various downstream applications. - **Parallel Processing**: The `fetch_urls` tool enables concurrent fetching of multiple URLs, significantly improving efficiency for batch operations. - **Resource Optimization**: Automatically blocks unnecessary resources (images, stylesheets, fonts, media) to reduce bandwidth usage and improve performance. - **Robust Error Handling**: Comprehensive error handling and logging ensure reliable operation even when dealing with problematic web pages. - **Configurable Parameters**: Fine-grained control over timeouts, content extraction, and output formatting to suit different use cases. ## Quick Start Run directly with npx: ```bash npx -y fetcher-mcp ``` First time setup - install the required browser by running the following command in your terminal: ```bash npx playwright install chromium ``` ### HTTP and SSE Transport Use the `--transport=http` parameter to start both Streamable HTTP endpoint and SSE endpoint services simultaneously: ```bash npx -y fetcher-mcp --log --transport=http --host=0.0.0.0 --port=3000 ``` After startup, the server provides the following endpoints: - `/mcp` - Streamable HTTP endpoint (modern MCP protocol) - `/sse` - SSE endpoint (legacy MCP protocol) Clients can choose which method to connect based on their needs. ### Debug Mode Run with the `--debug` option to show the browser window for debugging: ```bash npx -y fetcher-mcp --debug ``` ## Configuration MCP Configure this MCP server in Claude Desktop: On MacOS: `~/Library/Application Support/Claude/claude_desktop_config.json` On Windows: `%APPDATA%/Claude/claude_desktop_config.json` ```json { "mcpServers": { "fetcher": { "command": "npx", "args": ["-y", "fetcher-mcp"] } } } ``` ## Docker Deployment ### Running with Docker ```bash docker run -p 3000:3000 ghcr.io/jae-jae/fetcher-mcp:latest ``` ### Deploying with Docker Compose Create a `docker-compose.yml` file: ```yaml version: "3.8" services: fetcher-mcp: image: ghcr.io/jae-jae/fetcher-mcp:latest container_name: fetcher-mcp restart: unless-stopped ports: - "3000:3000" environment: - NODE_ENV=production # Using host network mode on Linux hosts can improve browser access efficiency # network_mode: "host" volumes: # For Playwright, may need to share certain system paths - /tmp:/tmp # Health check healthcheck: test: ["CMD", "wget", "--spider", "-q", "http://localhost:3000"] interval: 30s timeout: 10s retries: 3 ``` Then run: ```bash docker-compose up -d ``` ## Features - `fetch_url` - Retrieve web page content from a specified URL - Uses Playwright headless browser to parse JavaScript - Supports intelligent extraction of main content and conversion to Markdown - Supports the following parameters: - `url`: The URL of the web page to fetch (required parameter) - `timeout`: Page loading timeout in milliseconds, default is 30000 (30 seconds) - `waitUntil`: Specifies when navigation is considered complete, options: 'load', 'domcontentloaded', 'networkidle', 'commit', default is 'load' - `extractContent`: Whether to intelligently extract the main content, default is true - `maxLength`: Maximum length of returned content (in characters), default is no limit - `returnHtml`: Whether to return HTML content instead of Markdown, default is false - `waitForNavigation`: Whether to wait for additional navigation after initial page load (useful for sites with anti-bot verification), default is false - `navigationTimeout`: Maximum time to wait for additional navigation in milliseconds, default is 10000 (10 seconds) - `disableMedia`: Whether to disable media resources (images, stylesheets, fonts, media), default is true - `debug`: Whether to enable debug mode (showing browser window), overrides the --debug command line flag if specified - `fetch_urls` - Batch retrieve web page content from multiple URLs in parallel - Uses multi-tab parallel fetching for improved performance - Returns combined results with clear separation between webpages - Supports the following parameters: - `urls`: Array of URLs to fetch (required parameter) - Other parameters are the same as `fetch_url` - `browser_install` - Install Playwright Chromium browser binary automatically - Installs required Chromium browser binary when not available - Automatically suggested when browser installation errors occur - Supports the following parameters: - `withDeps`: Install system dependencies required by Chromium browser, default is false - `force`: Force installation even if Chromium is already installed, default is false ## Tips ### Handling Special Website Scenarios #### Dealing with Anti-Crawler Mechanisms - **Wait for Complete Loading**: For websites using CAPTCHA, redirects, or other verification mechanisms, include in your prompt: ``` Please wait for the page to fully load ``` This will use the `waitForNavigation: true` parameter. - **Increase Timeout Duration**: For websites that load slowly: ``` Please set the page loading timeout to 60 seconds ``` This adjusts both `timeout` and `navigationTimeout` parameters accordingly. #### Content Retrieval Adjustments - **Preserve Original HTML Structure**: When content extraction might fail: ``` Please preserve the original HTML content ``` Sets `extractContent: false` and `returnHtml: true`. - **Fetch Complete Page Content**: When extracted content is too limited: ``` Please fetch the complete webpage content instead of just the main content ``` Sets `extractContent: false`. - **Return Content as HTML**: When HTML format is needed instead of default Markdown: ``` Please return the content in HTML format ``` Sets `returnHtml: true`. ### Debugging and Authentication #### Enabling Debug Mode - **Dynamic Debug Activation**: To display the browser window during a specific fetch operation: ``` Please enable debug mode for this fetch operation ``` This sets `debug: true` even if the server was started without the `--debug` flag. #### Using Custom Cookies for Authentication - **Manual Login**: To login using your own credentials: ``` Please run in debug mode so I can manually log in to the website ``` Sets `debug: true` or uses the `--debug` flag, keeping the browser window open for manual login. - **Interacting with Debug Browser**: When debug mode is enabled: 1. The browser window remains open 2. You can manually log into the website using your credentials 3. After login is complete, content will be fetched with your authenticated session - **Enable Debug for Specific Requests**: Even if the server is already running, you can enable debug mode for a specific request: ``` Please enable debug mode for this authentication step ``` Sets `debug: true` for this specific request only, opening the browser window for manual login. ## Development ### Install Dependencies ```bash npm install ``` ### Install Playwright Browser Install the browsers needed for Playwright: ```bash npm run install-browser ``` ### Build the Server ```bash npm run build ``` ## Debugging Use MCP Inspector for debugging: ```bash npm run inspector ``` You can also enable visible browser mode for debugging: ```bash node build/index.js --debug ``` ## Related Projects - [g-search-mcp](https://github.com/jae-jae/g-search-mcp): A powerful MCP server for Google search that enables parallel searching with multiple keywords simultaneously. Perfect for batch search operations and data collection. ## License Licensed under the [MIT License](https://choosealicense.com/licenses/mit/) [![Powered by DartNode](https://dartnode.com/branding/DN-Open-Source-sm.png)](https://dartnode.com "Powered by DartNode - Free VPS for Open Source")

FAQ

What is the Web Fetcher MCP server?
Web Fetcher is a Model Context Protocol (MCP) server profile on explainx.ai. MCP lets AI hosts (e.g. Claude Desktop, Cursor) call tools and resources through a standard interface; this page summarizes categories, install hints, and community ratings.
How do MCP servers relate to agent skills?
Skills are reusable instruction packages (often SKILL.md); MCP servers expose live capabilities. Teams frequently combine both—skills for workflows, MCP for APIs and data. See explainx.ai/skills and explainx.ai/mcp-servers for parallel directories.
How are reviews shown for Web Fetcher?
This profile displays 35 aggregated ratings (sample rows for discoverability plus signed-in user reviews). Average score is about 4.8 out of 5—verify behavior in your own environment before production use.

Use Cases

Web Research & Information Gathering

Fetch and extract information from websites automatically

Example

Research competitor pricing, scrape product reviews, monitor news mentions

Automate 5-10 hours/week of manual web research

Content Monitoring & Alerts

Track website changes, new content, price updates

Example

Monitor competitor blog for new posts, track stock availability, watch for pricing changes

Stay informed without manual checking, never miss important updates

Data Extraction & Aggregation

Extract structured data from multiple websites

Example

Compile product listings from 10 e-commerce sites, aggregate job postings, collect real estate data

Build datasets 100x faster than manual copying

API-less Integration

Interact with services that don't offer APIs

Example

Check form submissions, validate website functionality, test user flows

Automate interactions with any website, even without API

Implementation Guide

Prerequisites

  • Claude Desktop or Cursor with MCP support
  • Understanding of web scraping ethics and robots.txt
  • Rate limiting awareness to avoid overwhelming target sites
  • Knowledge of legal restrictions on data collection

Time Estimate

20-40 minutes including configuration and testing

Installation Steps

  1. 1.Install web automation MCP server via npm or pip
  2. 2.Configure allowed domains and rate limits in MCP config
  3. 3.Test with simple fetch: 'Get content from example.com'
  4. 4.Progress to extraction: 'Extract all product prices from this page'
  5. 5.Set up monitoring: 'Check this URL daily for changes'
  6. 6.Parse structured data: 'Create CSV from this table'
  7. 7.Respect robots.txt and rate limits always

Troubleshooting

  • 403 Forbidden: Website blocks bots—respect their wishes, use official API instead
  • Rate limit errors: Slow down requests, add delays between fetches
  • Stale data: Target site changed HTML structure—update selectors
  • Timeout errors: Site is slow or blocking—increase timeout, try different user agent
  • JavaScript-rendered content: Use headless browser MCP servers for dynamic sites

Best Practices

✓ Do

  • +Check robots.txt and respect crawl rules
  • +Rate limit requests: 1-2 requests/second maximum
  • +Use official APIs when available instead of scraping
  • +Identify your bot with descriptive user agent
  • +Cache results to minimize repeated requests
  • +Handle errors gracefully with retries and fallbacks
  • +Validate extracted data for accuracy

✗ Don't

  • Don't scrape sites that explicitly forbid it (robots.txt, ToS)
  • Don't overwhelm servers with rapid requests—use rate limiting
  • Don't scrape personal data without consent and legal basis
  • Don't ignore copyright on extracted content
  • Don't assume HTML structure is stable—handle changes
  • Don't use scraped data for commercial purposes without permission

💡 Pro Tips

  • Use CSS selectors or XPath for robust data extraction
  • Set up monitoring alerts for extraction failures (structure changed)
  • Implement exponential backoff for retries on failures
  • Store raw HTML for reprocessing if extraction logic changes
  • Combine with data analysis tools for insights from extracted data
  • Consider using official APIs or RSS feeds as more stable alternatives

Technical Details

Architecture

MCP server handles HTTP requests, HTML parsing, JavaScript rendering (if headless browser), and returns structured data to Claude.

Protocols

  • HTTP/HTTPS
  • WebSocket (for real-time sites)
  • Puppeteer/Playwright (for JavaScript sites)

Compatibility

  • Static HTML sites
  • JavaScript-rendered SPAs (with headless browser)
  • REST APIs
  • GraphQL endpoints

When to Use This

✓ Use When

Use for research automation, content monitoring, data aggregation from multiple sources, and when official APIs don't exist. Best for read-only information gathering.

✗ Avoid When

Avoid for sites with APIs (use API instead), sites that explicitly forbid scraping, when data is copyrighted, or for login-required content without proper authorization.

Integration

  • Scheduled monitoring with change detection
  • Multi-source data aggregation pipelines
  • Fallback to web scraping when API rate limits hit
  • Headless browser for JavaScript-heavy sites

Discussion

Product Hunt–style comments (not star reviews)
  • No comments yet — start the thread.

List & Promote Your MCP Server

Share your MCP server with the developer community

GET_STARTED →
MCP server reviews

Ratings

4.835 reviews
  • Shikha Mishra· Dec 28, 2024

    We evaluated Web Fetcher against two servers with overlapping tools; this profile had the clearer scope statement.

  • Tariq Choi· Dec 28, 2024

    I recommend Web Fetcher for teams standardizing on MCP; the explainx.ai page compares cleanly with sibling servers.

  • Naina Martinez· Dec 28, 2024

    According to our notes, Web Fetcher benefits from clear Model Context Protocol framing — fewer ambiguous “AI plugin” claims.

  • Lucas Shah· Nov 27, 2024

    We evaluated Web Fetcher against two servers with overlapping tools; this profile had the clearer scope statement.

  • Yash Thakker· Nov 19, 2024

    Useful MCP listing: Web Fetcher is the kind of server we cite when onboarding engineers to host + tool permissions.

  • Lucas Abebe· Nov 19, 2024

    We wired Web Fetcher into a staging workspace; the listing’s GitHub and npm pointers saved time versus hunting across READMEs.

  • Dhruvi Jain· Oct 10, 2024

    Web Fetcher reduced integration guesswork — categories and install configs on the listing matched the upstream repo.

  • Diya Shah· Oct 10, 2024

    Web Fetcher is a well-scoped MCP server in the explainx.ai directory — install snippets and categories matched our Claude Code setup.

  • Amina Yang· Sep 25, 2024

    According to our notes, Web Fetcher benefits from clear Model Context Protocol framing — fewer ambiguous “AI plugin” claims.

  • Amina Jackson· Sep 9, 2024

    Web Fetcher is a well-scoped MCP server in the explainx.ai directory — install snippets and categories matched our Claude Code setup.

showing 1-10 of 35

1 / 4