What is Bumblebee and who created it?

Bumblebee is an open-source, read-only inventory scanner created by Perplexity AI for detecting vulnerable packages on developer endpoints. It scans lockfiles, package metadata, and extension manifests across macOS and Linux to identify exposure to known supply chain compromises.

How is Bumblebee different from SBOMs?

SBOMs (Software Bill of Materials) document what shipped in production builds, while Bumblebee scans the messy local development state across lockfiles, package-manager caches, editor extensions, and MCP configs. It answers 'which developer machines have this vulnerable package right now?' rather than 'what's in our production artifact?'

Does Bumblebee execute package managers like npm or pip?

No. Bumblebee only reads on-disk metadata files (package-lock.json, requirements.txt, go.sum, etc.). It never executes npm, pip, go list, or any package manager commands, making it safe for security-focused environments.

Is Bumblebee safe to run on machines with sensitive credentials?

Yes. Bumblebee is read-only and does not emit environment variables or credentials found in MCP configs. While it parses MCP server configs for inventory, it explicitly does not include env blocks with credentials in its output records.

Bumblebee: Perplexity's Open-Source Supply Chain | explainx.ai Blog

explainx.ainewsletter3.5k

workshops ↗

Bumblebee: Perplexity's Open-Source Supply Chain | explainx.ai Blog | explainx.ai

On May 22, 2026, Perplexity AI released Bumblebee, an open-source supply chain security scanner designed to answer a deceptively simple question: when a security advisory names a vulnerable package, which developer machines in your organization have it installed right now?

Unlike SBOMs (Software Bill of Materials) that document production artifacts, or EDR tools that monitor what's running, Bumblebee focuses on the messy local development state scattered across lockfiles, package manager caches, IDE extensions, browser add-ons, and Model Context Protocol (MCP) server configs.

In a landscape where supply chain attacks like the 2024 XZ Utils backdoor, 2025 PyTorch supply chain compromise, and ongoing npm malware campaigns dominate headlines, Bumblebee offers security teams a fast, read-only, zero-dependency tool to assess exposure across their entire developer fleet.

The Supply Chain Visibility Gap

Modern software development involves dozens of package ecosystems, multiple language toolchains, and an ever-growing surface area of extensions and developer tools. When a critical advisory drops, security teams face a race against time:

Which developers have the vulnerable package installed?
Which projects are affected?
What versions are in use?

Traditional tools fall short:

SBOMs document production builds, not local dev environments
EDR monitors running processes, not dormant dependencies
Package manager queries (npm ls, pip show) are slow, resource-intensive, and require execution
Vulnerability scanners focus on project-level analysis, not fleet-wide inventory

Bumblebee fills this gap with a single-purpose, surgical scanner that collects package metadata without execution overhead.

What Makes Bumblebee Different?

1. Read-Only by Design

Bumblebee never executes package managers or build tools. It only reads:

Lockfiles (package-lock.json, yarn.lock, go.sum, Gemfile.lock)
Package manager metadata (node_modules/*/package.json, .dist-info/METADATA)
Extension manifests (VS Code, Cursor, Chrome, Firefox)
MCP host configs (Claude Desktop, Cline, Gemini CLI)

This read-only approach ensures:

No side effects on developer workflows
Safe for CI/CD pipelines (won't trigger installs or builds)

Ecosystem	Sources Scanned
npm	`package-lock.json`, `npm-shrinkwrap.json`, `node_modules/**`
pnpm	`pnpm-lock.yaml`, `.pnpm/.../package.json`
Yarn	`yarn.lock` (Classic and Berry)
Bun	`bun.lock` (plus `bun.lockb` detection)
PyPI	`.dist-info/METADATA`, `.egg-info/PKG-INFO`, `INSTALLER`, `direct_url.json`
Go modules	`go.sum`, `go.mod`
RubyGems	`Gemfile.lock`, installed `*.gemspec`
Composer	`composer.lock`, `vendor/composer/installed.json`
MCP	`mcp.json`, `claude_desktop_config.json`, `cline_mcp_settings.json`, `~/.gemini/settings.json`
Editor extensions	VS Code, Cursor, Windsurf, VSCodium manifests
Browser extensions	Chrome, Chromium, Edge, Firefox (per-profile manifests)

json

{
  "schema_version": "0.1.0",
  "entries": [
    {
      "id": "cve-2026-12345",
      "name": "malicious-npm-package 3.1.4 (typosquatting attack)",
      "ecosystem": "npm",
      "package": "malicious-npm-package",
      "versions": ["3.1.4", "3.1.5"],
      "severity": "critical"
    },
    {
      "id": "gh-advisory-2026-0089",
      "name": "compromised-pypi-lib 2.0.0",
      "ecosystem": "pypi",
      "package": "compromised-pypi-lib",
      "versions": ["2.0.0"],
      "severity": "high"
    }
  ]
}

json

{
  "record_type": "finding",
  "record_id": "finding-sha256:abc123...",
  "scan_id": "scan-20260524T153045Z-hostname",
  "catalog_entry_id": "laravel-lang-typosquat-2026-05",
  "severity": "critical",
  "ecosystem": "npm",
  "package": "larave1-lang",
  "version": "14.3.0",
  "source_path": "/Users/dev/project-x/package-lock.json",
  "confidence": "high",
  "detected_at": "2026-05-24T15:30:52Z"
}

json

{
  "record_type": "package",
  "record_id": "pkg-npm-lodash-4.17.21-sha256:xyz789...",
  "scan_id": "scan-20260524T120000Z-hostname",
  "profile": "baseline",
  "root_path": "/Users/dev/.nvm/versions/node/v22.0.0",
  "root_kind": "language_toolchain",
  "ecosystem": "npm",
  "name": "lodash",
  "version": "4.17.21",
  "source_path": "/Users/dev/.nvm/versions/node/v22.0.0/lib/node_modules/lodash/package.json",
  "confidence": "high",
  "detected_at": "2026-05-24T12:00:15Z"
}

json

{
  "record_type": "scan_summary",
  "scan_id": "scan-20260524T120000Z-hostname",
  "profile": "baseline",
  "start_time": "2026-05-24T12:00:00Z",
  "end_time": "2026-05-24T12:00:18Z",
  "duration_seconds": 18,
  "package_count": 1247,
  "finding_count": 3,
  "root_count": 12,
  "ecosystem_coverage": ["npm", "pypi", "go", "editor-extension", "mcp"],
  "version": "v0.1.1",
  "hostname": "dev-macbook-pro.local"
}

xml

<!-- ~/Library/LaunchAgents/com.company.bumblebee.plist -->
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
  <key>Label</key>
  <string>com.company.bumblebee</string>
  <key>ProgramArguments</key>
  <array>
    <string>/usr/local/bin/bumblebee</string>
    <string>scan</string>
    <string>--profile</string>
    <string>baseline</string>
  </array>
  <key>StandardOutPath</key>
  <string>/var/log/bumblebee/baseline.ndjson</string>
  <key>StartCalendarInterval</key>
  <dict>
    <key>Hour</key>
    <integer>6</integer>
    <key>Minute</key>
    <integer>0</integer>
  </dict>
</dict>
</plist>

bash

# 1. Create exposure catalog
cat > cve-2026-xyz.json <<EOF
{
  "schema_version": "0.1.0",
  "entries": [
    {
      "id": "cve-2026-xyz",
      "name": "some-npm-package 1.2.3 RCE",
      "ecosystem": "npm",
      "package": "some-npm-package",
      "versions": ["1.2.3"],
      "severity": "critical"
    }
  ]
}
EOF

# 2. Push catalog to fleet management system
aws s3 cp cve-2026-xyz.json s3://company-security/exposure-catalogs/

# 3. Trigger fleet-wide deep scan via MDM/SSH
pdsh -w dev-fleet 'bumblebee scan --profile deep \
  --root $HOME \
  --exposure-catalog /tmp/cve-2026-xyz.json \
  --findings-only \
  --max-duration 10m' > findings.ndjson

# 4. Parse findings for affected machines
jq -r 'select(.record_type=="finding") | "\(.hostname)\t\(.source_path)"' findings.ndjson

yaml

name: Supply Chain Inventory

on:
  schedule:
    - cron: '0 6 * * *'  # Daily at 6am
  workflow_dispatch:

jobs:
  scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install Bumblebee
        run: |
          go install github.com/perplexityai/bumblebee/cmd/bumblebee@latest

      - name: Run project scan
        run: |
          bumblebee scan --profile project \
            --root "$GITHUB_WORKSPACE" \
            --exposure-catalog ./threat_intel/ > inventory.ndjson

      - name: Check for findings
        run: |
          FINDINGS=$(jq -r 'select(.record_type=="finding")' inventory.ndjson | wc -l)
          if [ "$FINDINGS" -gt 0 ]; then
            echo "::error::Found $FINDINGS exposure matches"
            exit 1
          fi

      - name: Upload inventory
        uses: actions/upload-artifact@v4
        with:
          name: supply-chain-inventory
          path: inventory.ndjson

bash

#!/bin/bash
# Forward Bumblebee findings to Splunk

bumblebee scan --profile baseline \
  --exposure-catalog /etc/bumblebee/catalogs/ \
| jq -c 'select(.record_type=="finding")' \
| while read -r line; do
    curl -k "https://splunk.company.com:8088/services/collector/event" \
      -H "Authorization: Splunk ${SPLUNK_HEC_TOKEN}" \
      -d "{\"event\":${line},\"sourcetype\":\"bumblebee:finding\"}"
  done

json

{
  "schema_version": "0.1.0",
  "entries": [
    {
      "id": "laravel-lang-typosquat-2026-05",
      "name": "larave1-lang (typosquatting laravel-lang)",
      "ecosystem": "npm",
      "package": "larave1-lang",
      "versions": ["14.3.0"],
      "severity": "critical"
    },
    {
      "id": "laravel-lang-typosquat-2026-05-b",
      "name": "laravel-1ang (typosquatting laravel-lang)",
      "ecosystem": "npm",
      "package": "laravel-1ang",
      "versions": ["14.3.0"],
      "severity": "critical"
    }
  ]
}

bash

bumblebee scan --profile deep \
  --root "$HOME" \
  --exposure-catalog <(curl -s https://raw.githubusercontent.com/perplexityai/bumblebee/main/threat_intel/laravel-lang-2026-05.json) \
  --findings-only

json

{
  "mcpServers": {
    "postgres": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-postgres", "postgresql://localhost/mydb"],
      "env": {
        "PGPASSWORD": "super-secret-password",
        "DATABASE_URL": "postgresql://user:[email protected]/main"
      }
    }
  }
}

json

{
  "record_type": "package",
  "ecosystem": "mcp",
  "name": "@modelcontextprotocol/server-postgres",
  "version": "0.1.2",
  "source_path": "/Users/dev/.config/claude/claude_desktop_config.json",
  "confidence": "medium",
  "mcp_server_name": "postgres",
  "mcp_command": "npx"
}

bash

# Security team creates exposure catalog
cat > mcp-compromise-2026.json <<EOF
{
  "schema_version": "0.1.0",
  "entries": [{
    "id": "mcp-filesystem-compromise-2026",
    "ecosystem": "mcp",
    "package": "@modelcontextprotocol/server-filesystem",
    "versions": ["2.0.1"],
    "severity": "critical"
  }]
}
EOF

# Fleet-wide scan
bumblebee scan --profile baseline \
  --ecosystem mcp \
  --exposure-catalog mcp-compromise-2026.json \
  --findings-only

Scan Profile	Roots Scanned	Packages Found	Duration
baseline	12 (global toolchains, extensions)	~1,200	8 sec
project	2 dirs (`~/code`, `~/work`, 40 repos)	~5,000	45 sec
deep	`$HOME` (250GB, 800K files)	~8,000	4 min 30 sec

Tool	Purpose	Execution Model	Speed	Ecosystems
Bumblebee	Endpoint inventory	Read-only metadata	Fast (seconds)	10+ ecosystems + MCP
Syft	SBOM generation	Read + heuristics	Medium (30s-2m)	20+ ecosystems
Trivy	Vulnerability scanning	Read + CVE DB	Medium (1-3m)	10+ ecosystems
npm audit	Project vulnerability scan	Executes npm	Slow (project-level)	npm only
pip-audit	Python vulnerability scan	Executes pip	Slow (project-level)	PyPI only
osquery	Endpoint visibility	Read + system queries	Fast (queries)	General OS, limited package support
Grype	Vulnerability scanning	Read + CVE DB	Medium (1-2m)	10+ ecosystems

bash

bumblebee roots --profile baseline
# Output: <root_kind>\t<path>
# language_toolchain    /Users/dev/.nvm/versions/node/v22.0.0
# editor_extension      /Users/dev/.vscode/extensions
# browser_extension     /Users/dev/Library/Application Support/Google/Chrome
# ...

bash

go install github.com/perplexityai/bumblebee/cmd/bumblebee@latest
bumblebee scan --profile baseline --exposure-catalog <(curl -s https://raw.githubusercontent.com/perplexityai/bumblebee/main/threat_intel/*.json)

Bumblebee: Perplexity's Open-Source Supply Chain Security Scanner for Developer Endpoints (2026)

The Supply Chain Visibility Gap

What Makes Bumblebee Different?

1. Read-Only by Design

Related posts

macOS Blocks Codex as “Malware” — XProtect False Positive Fix (July 2026)

"What Happens to Creativity When AI Makes Copying Free?" — The shadcn Debate, Explained

Thoughtworks Zero-Cost Fallacy — Open Source in the Agentic Era

2. Zero Non-Stdlib Dependencies

3. Multi-Ecosystem Coverage

4. Three Scan Profiles

Baseline (Recurring Lightweight Inventory)

Project (Development Workspaces)

Deep (Incident Response)

The Exposure Catalog System

Catalog Format

Maintained Threat Intelligence Catalogs

Using Exposure Catalogs

Output Format and State Model

Package Record Example

Scan Summary Record

Integration Patterns

1. Daily Recurring Baseline Inventory

2. Incident Response Pipeline

3. CI/CD Integration

4. SIEM Integration

Real-World Use Case: The Laravel Lang Typosquatting Campaign (May 2026)

How Bumblebee Helped

MCP (Model Context Protocol) Support

What is MCP?

Why MCP Inventory Matters

How Bumblebee Handles MCP

MCP Supply Chain Risk Scenario

Performance and Scalability

Benchmarks (MacBook Pro M1, 16GB RAM)

Memory Usage

Parallelization

Comparison to Alternative Tools

Advanced Features

1. Ecosystem Filtering

2. Max Duration Enforcement

3. Root Preview

4. Transport Options

5. Selftest (Built-In Validation)

Contributing and Community

Contributing Exposure Catalogs

Extending Bumblebee

Limitations and Future Roadmap

Current Limitations

Roadmap (as of v0.1.1)

Getting Started Checklist

Conclusion