productivity

webclaw

by 0xMassi

Fast, local-first web content extraction for LLMs. Scrape, crawl, extract structured data — all from Rust. CLI, REST API

A high-performance web scraper optimized for AI agents that extracts clean, structured content from URLs with 67% fewer tokens than raw HTML and sub-millisecond extraction speed.

github stars

425

best for

  • / General purpose MCP workflows

capabilities

  • / scrape
  • / crawl
  • / map
  • / batch
  • / extract
  • / summarize

what it does

A high-performance web scraper optimized for AI agents that extracts clean, structured content from URLs with 67% fewer tokens than raw HTML and sub-millisecond extraction speed.

about

webclaw is a community-built MCP server published by 0xMassi that provides AI assistants with tools and capabilities via the Model Context Protocol. Fast, local-first web content extraction for LLMs. Scrape, crawl, extract structured data — all from Rust. CLI, REST API It is categorized under productivity. This server exposes 10 tools that AI clients can invoke during conversations and coding sessions.

how to install

You can install webclaw in your AI client of choice. Use the install panel on this page to get one-click setup for Cursor, Claude Desktop, VS Code, and other MCP-compatible clients. This server runs locally on your machine via the stdio transport.

license

AGPL-3.0

webclaw is released under the AGPL-3.0 license.

readme

webclaw

The fastest web scraper for AI agents.
67% fewer tokens. Sub-millisecond extraction. Zero browser overhead.

Stars Version License npm installs

Discord X / Twitter Website Docs

---

Claude Code: web_fetch gets 403, webclaw extracts successfully
Claude Code's built-in web_fetch → 403 Forbidden. webclaw → clean markdown.

--- Your AI agent calls `fetch()` and gets a 403. Or 142KB of raw HTML that burns through your token budget. **webclaw fixes both.** It extracts clean, structured content from any URL using Chrome-level TLS fingerprinting — no headless browser, no Selenium, no Puppeteer. Output is optimized for LLMs: **67% fewer tokens** than raw HTML, with metadata, links, and images preserved. ``` Raw HTML webclaw ┌──────────────────────────────────┐ ┌──────────────────────────────────┐ │
│ │ # Breaking: AI Breakthrough │ │