On May 22, 2026, Perplexity AI released Bumblebee, an open-source supply chain security scanner designed to answer a deceptively simple question: when a security advisory names a vulnerable package, which developer machines in your organization have it installed right now?
Unlike SBOMs (Software Bill of Materials) that document production artifacts, or EDR tools that monitor what's running, Bumblebee focuses on the messy local development state scattered across lockfiles, package manager caches, IDE extensions, browser add-ons, and Model Context Protocol (MCP) server configs.
In a landscape where supply chain attacks like the 2024 XZ Utils backdoor, 2025 PyTorch supply chain compromise, and ongoing npm malware campaigns dominate headlines, Bumblebee offers security teams a fast, read-only, zero-dependency tool to assess exposure across their entire developer fleet.
The Supply Chain Visibility Gap
Modern software development involves dozens of package ecosystems, multiple language toolchains, and an ever-growing surface area of extensions and developer tools. When a critical advisory drops, security teams face a race against time:
- Which developers have the vulnerable package installed?
- Which projects are affected?
- What versions are in use?
Traditional tools fall short:
- SBOMs document production builds, not local dev environments
- EDR monitors running processes, not dormant dependencies
- Package manager queries (
npm ls,pip show) are slow, resource-intensive, and require execution - Vulnerability scanners focus on project-level analysis, not fleet-wide inventory
Bumblebee fills this gap with a single-purpose, surgical scanner that collects package metadata without execution overhead.
What Makes Bumblebee Different?
1. Read-Only by Design
Bumblebee never executes package managers or build tools. It only reads:
- Lockfiles (
package-lock.json,yarn.lock,go.sum,Gemfile.lock) - Package manager metadata (
node_modules/*/package.json,.dist-info/METADATA) - Extension manifests (VS Code, Cursor, Chrome, Firefox)
- MCP host configs (Claude Desktop, Cline, Gemini CLI)
This read-only approach ensures:
- No side effects on developer workflows
- Safe for CI/CD pipelines (won't trigger installs or builds)
- Fast execution (seconds, not minutes)
- No package manager version dependencies
2. Zero Non-Stdlib Dependencies
Bumblebee is a single Go binary built with only the Go standard library. No vendored dependencies, no third-party packages, no supply chain risk in the tool itself.
go install github.com/perplexityai/bumblebee/cmd/bumblebee@latest
# ~10MB binary, zero dependencies
This makes it trivial to:
- Deploy across heterogeneous fleets (macOS, Linux, ARM, x86)
- Audit the source (minimal codebase, no transitive dependencies)
- Build reproducibly (no dependency resolution surprises)
3. Multi-Ecosystem Coverage
Bumblebee scans 10 package ecosystems plus editor/browser extensions:
| Ecosystem | Sources Scanned |
|---|---|
| npm | package-lock.json, npm-shrinkwrap.json, node_modules/** |
| pnpm | pnpm-lock.yaml, .pnpm/.../package.json |
| Yarn | yarn.lock (Classic and Berry) |
| Bun | bun.lock (plus bun.lockb detection) |
| PyPI | *.dist-info/METADATA, *.egg-info/PKG-INFO, INSTALLER, direct_url.json |
| Go modules | go.sum, go.mod |
| RubyGems | Gemfile.lock, installed *.gemspec |
| Composer | composer.lock, vendor/composer/installed.json |
| MCP | mcp.json, claude_desktop_config.json, cline_mcp_settings.json, ~/.gemini/settings.json |
| Editor extensions | VS Code, Cursor, Windsurf, VSCodium manifests |
| Browser extensions | Chrome, Chromium, Edge, Firefox (per-profile manifests) |
4. Three Scan Profiles
Bumblebee offers three scan profiles for different use cases:
Baseline (Recurring Lightweight Inventory)
Scans common global/user package roots, language toolchains, editor extensions, browser extensions, and MCP configs. Designed for daily/weekly recurring scans via cron, launchd, or MDM.
bumblebee scan --profile baseline > daily-inventory.ndjson
Typical scan time: 5-15 seconds Use case: Ongoing fleet inventory for known-good state
Project (Development Workspaces)
Scans configured development directories (~/code, ~/src, ~/work). Useful for recurring inventory of active projects.
bumblebee scan --profile project \
--root "$HOME/code" \
--root "$HOME/work" > project-inventory.ndjson
Typical scan time: 30-60 seconds (depends on project count) Use case: Daily/weekly project-level visibility
Deep (Incident Response)
Scans explicit --root paths, including broad roots like $HOME. Refuses to run without explicit root specification. Designed for on-demand incident response when an advisory drops.
bumblebee scan --profile deep \
--root "$HOME" \
--exposure-catalog ./xz-backdoor-catalog.json \
--findings-only \
--max-duration 10m > incident-findings.ndjson
Typical scan time: 2-10 minutes (depends on home directory size) Use case: Emergency scans when a critical CVE affects your stack
The Exposure Catalog System
Bumblebee's killer feature is exposure catalogs: JSON files listing known-vulnerable packages for exact-match detection during scans.
Catalog Format
{
"schema_version": "0.1.0",
"entries": [
{
"id": "cve-2026-12345",
"name": "malicious-npm-package 3.1.4 (typosquatting attack)",
"ecosystem": "npm",
"package": "malicious-npm-package",
"versions": ["3.1.4", "3.1.5"],
"severity": "critical"
},
{
"id": "gh-advisory-2026-0089",
"name": "compromised-pypi-lib 2.0.0",
"ecosystem": "pypi",
"package": "compromised-pypi-lib",
"versions": ["2.0.0"],
"severity": "high"
}
]
}
Maintained Threat Intelligence Catalogs
Perplexity maintains curated exposure catalogs in the threat_intel/ directory, built from public threat intelligence and updated via community PRs:
- Laravel Lang (2026-05): Typosquatting attack on Laravel localization packages
- PyPI Fabric Compromise (2024-08): Backdoored
fabricPython package - XZ Utils Backdoor (2024-03): The infamous liblzma supply chain attack
- NPM Malware Campaigns (2024-2026): Ongoing npm registry compromise patterns
These catalogs are assembled using Perplexity Computer and validated by security researchers, providing a continuously updated knowledge base for incident response.
Using Exposure Catalogs
Single catalog file:
bumblebee scan --profile deep \
--root "$HOME" \
--exposure-catalog ./threat_intel/laravel-lang-2026-05.json \
--findings-only
Directory of catalogs (non-recursive merge):
bumblebee scan --profile deep \
--root "$HOME" \
--exposure-catalog ./threat_intel/ \
--findings-only
Output: NDJSON findings
{
"record_type": "finding",
"record_id": "finding-sha256:abc123...",
"scan_id": "scan-20260524T153045Z-hostname",
"catalog_entry_id": "laravel-lang-typosquat-2026-05",
"severity": "critical",
"ecosystem": "npm",
"package": "larave1-lang",
"version": "14.3.0",
"source_path": "/Users/dev/project-x/package-lock.json",
"confidence": "high",
"detected_at": "2026-05-24T15:30:52Z"
}
The --findings-only flag suppresses all package records and emits only matches, making output compact for incident response workflows.
Output Format and State Model
Bumblebee emits NDJSON (newline-delimited JSON) to stdout, with diagnostics to stderr.
Package Record Example
{
"record_type": "package",
"record_id": "pkg-npm-lodash-4.17.21-sha256:xyz789...",
"scan_id": "scan-20260524T120000Z-hostname",
"profile": "baseline",
"root_path": "/Users/dev/.nvm/versions/node/v22.0.0",
"root_kind": "language_toolchain",
"ecosystem": "npm",
"name": "lodash",
"version": "4.17.21",
"source_path": "/Users/dev/.nvm/versions/node/v22.0.0/lib/node_modules/lodash/package.json",
"confidence": "high",
"detected_at": "2026-05-24T12:00:15Z"
}
Key fields:
- record_id: Content-addressed hash of
(ecosystem, name, version)tuple—stable across scans for deduplication - scan_id: Unique ID for this scan run (timestamp + hostname)
- profile:
baseline,project, ordeep - root_kind:
language_toolchain,project,global_install,editor_extension,browser_extension,mcp_server - confidence:
high(exact version from metadata),medium(partial info),low(reference only)
Scan Summary Record
Every scan ends with a summary record:
{
"record_type": "scan_summary",
"scan_id": "scan-20260524T120000Z-hostname",
"profile": "baseline",
"start_time": "2026-05-24T12:00:00Z",
"end_time": "2026-05-24T12:00:18Z",
"duration_seconds": 18,
"package_count": 1247,
"finding_count": 3,
"root_count": 12,
"ecosystem_coverage": ["npm", "pypi", "go", "editor-extension", "mcp"],
"version": "v0.1.1",
"hostname": "dev-macbook-pro.local"
}
Receivers use scan_summary to decide whether to promote a scan to "current state" (e.g., only accept scans that completed successfully and covered expected ecosystems).
Integration Patterns
1. Daily Recurring Baseline Inventory
cron (Linux):
# /etc/cron.d/bumblebee-baseline
0 6 * * * bumblebee bumblebee scan --profile baseline | gzip > /var/log/bumblebee/$(date +\%Y\%m\%d)-baseline.ndjson.gz
launchd (macOS):
<!-- ~/Library/LaunchAgents/com.company.bumblebee.plist -->
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.company.bumblebee</string>
<key>ProgramArguments</key>
<array>
<string>/usr/local/bin/bumblebee</string>
<string>scan</string>
<string>--profile</string>
<string>baseline</string>
</array>
<key>StandardOutPath</key>
<string>/var/log/bumblebee/baseline.ndjson</string>
<key>StartCalendarInterval</key>
<dict>
<key>Hour</key>
<integer>6</integer>
<key>Minute</key>
<integer>0</integer>
</dict>
</dict>
</plist>
2. Incident Response Pipeline
Scenario: CVE-2026-XYZ drops at 9am, affecting [email protected].
Response workflow:
# 1. Create exposure catalog
cat > cve-2026-xyz.json <<EOF
{
"schema_version": "0.1.0",
"entries": [
{
"id": "cve-2026-xyz",
"name": "some-npm-package 1.2.3 RCE",
"ecosystem": "npm",
"package": "some-npm-package",
"versions": ["1.2.3"],
"severity": "critical"
}
]
}
EOF
# 2. Push catalog to fleet management system
aws s3 cp cve-2026-xyz.json s3://company-security/exposure-catalogs/
# 3. Trigger fleet-wide deep scan via MDM/SSH
pdsh -w dev-fleet 'bumblebee scan --profile deep \
--root $HOME \
--exposure-catalog /tmp/cve-2026-xyz.json \
--findings-only \
--max-duration 10m' > findings.ndjson
# 4. Parse findings for affected machines
jq -r 'select(.record_type=="finding") | "\(.hostname)\t\(.source_path)"' findings.ndjson
Output:
dev-macbook-01.local /Users/alice/project-x/package-lock.json
dev-linux-03.local /home/bob/repos/api-gateway/node_modules/some-npm-package/package.json
Security team now has a precise list of affected developers and projects within minutes.
3. CI/CD Integration
GitHub Actions workflow:
name: Supply Chain Inventory
on:
schedule:
- cron: '0 6 * * *' # Daily at 6am
workflow_dispatch:
jobs:
scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install Bumblebee
run: |
go install github.com/perplexityai/bumblebee/cmd/bumblebee@latest
- name: Run project scan
run: |
bumblebee scan --profile project \
--root "$GITHUB_WORKSPACE" \
--exposure-catalog ./threat_intel/ > inventory.ndjson
- name: Check for findings
run: |
FINDINGS=$(jq -r 'select(.record_type=="finding")' inventory.ndjson | wc -l)
if [ "$FINDINGS" -gt 0 ]; then
echo "::error::Found $FINDINGS exposure matches"
exit 1
fi
- name: Upload inventory
uses: actions/upload-artifact@v4
with:
name: supply-chain-inventory
path: inventory.ndjson
4. SIEM Integration
Splunk HEC forwarding:
#!/bin/bash
# Forward Bumblebee findings to Splunk
bumblebee scan --profile baseline \
--exposure-catalog /etc/bumblebee/catalogs/ \
| jq -c 'select(.record_type=="finding")' \
| while read -r line; do
curl -k "https://splunk.company.com:8088/services/collector/event" \
-H "Authorization: Splunk ${SPLUNK_HEC_TOKEN}" \
-d "{\"event\":${line},\"sourcetype\":\"bumblebee:finding\"}"
done
ELK Stack indexing:
bumblebee scan --profile baseline \
| filebeat -e -c filebeat-bumblebee.yml
Real-World Use Case: The Laravel Lang Typosquatting Campaign (May 2026)
In May 2026, threat actors compromised several typosquatted packages mimicking Laravel's popular laravel-lang localization library. Packages like larave1-lang, laravel-1ang, and laravel-lang-dev were published with identical descriptions but included credential-stealing backdoors.
How Bumblebee Helped
1. Perplexity's security team used Perplexity Computer to build an exposure catalog:
{
"schema_version": "0.1.0",
"entries": [
{
"id": "laravel-lang-typosquat-2026-05",
"name": "larave1-lang (typosquatting laravel-lang)",
"ecosystem": "npm",
"package": "larave1-lang",
"versions": ["14.3.0"],
"severity": "critical"
},
{
"id": "laravel-lang-typosquat-2026-05-b",
"name": "laravel-1ang (typosquatting laravel-lang)",
"ecosystem": "npm",
"package": "laravel-1ang",
"versions": ["14.3.0"],
"severity": "critical"
}
]
}
2. Published catalog to Bumblebee's threat_intel/ directory via PR:
The catalog was merged within hours and became available to all Bumblebee users.
3. Organizations ran fleet-wide scans:
bumblebee scan --profile deep \
--root "$HOME" \
--exposure-catalog <(curl -s https://raw.githubusercontent.com/perplexityai/bumblebee/main/threat_intel/laravel-lang-2026-05.json) \
--findings-only
4. Identified affected developers immediately:
Companies using Bumblebee detected exposed machines within minutes, while organizations relying on manual code reviews or weekly vulnerability scans took days to assess impact.
MCP (Model Context Protocol) Support
One of Bumblebee's most forward-looking features is Model Context Protocol (MCP) server inventory.
What is MCP?
MCP is Anthropic's protocol for connecting AI assistants (Claude Desktop, Cline, Cursor, etc.) to external tools and data sources via "MCP servers"—small programs that expose tools, resources, and prompts to LLMs.
Common MCP servers:
- File system access:
@modelcontextprotocol/server-filesystem - GitHub integration:
@modelcontextprotocol/server-github - PostgreSQL:
@modelcontextprotocol/server-postgres - Browser automation:
@modelcontextprotocol/server-puppeteer
These servers often have access to sensitive resources: databases, file systems, API keys.
Why MCP Inventory Matters
MCP configs can carry environment variables and credentials:
{
"mcpServers": {
"postgres": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-postgres", "postgresql://localhost/mydb"],
"env": {
"PGPASSWORD": "super-secret-password",
"DATABASE_URL": "postgresql://user:[email protected]/main"
}
}
}
}
If a malicious MCP server package is compromised (supply chain attack), it could:
- Exfiltrate credentials from
envblocks - Access all file systems or databases the user granted access to
- Inject malicious tools into the AI assistant's context
How Bumblebee Handles MCP
1. Scans MCP host configs from common locations:
~/.config/mcp/mcp.json~/.config/claude/claude_desktop_config.json~/.config/cline/mcp_settings.json~/.gemini/settings.json(Gemini CLI)- Project-level
.mcp.json
2. Parses server definitions for package inventory:
Extracts:
- Server name (e.g.,
"postgres") - Command (e.g.,
"npx") - Package name (e.g.,
"@modelcontextprotocol/server-postgres")
3. Does NOT emit credentials:
While Bumblebee parses env blocks to understand server configs, it explicitly excludes environment variables from output records.
Example MCP record:
{
"record_type": "package",
"ecosystem": "mcp",
"name": "@modelcontextprotocol/server-postgres",
"version": "0.1.2",
"source_path": "/Users/dev/.config/claude/claude_desktop_config.json",
"confidence": "medium",
"mcp_server_name": "postgres",
"mcp_command": "npx"
}
No env block, no credentials, no sensitive data.
MCP Supply Chain Risk Scenario
Hypothetical attack (2026):
- Attacker compromises
@modelcontextprotocol/[email protected]on npm - Adds backdoor that exfiltrates file paths and environment variables
- Package is downloaded by 10,000+ developers using Claude Desktop
With Bumblebee:
# Security team creates exposure catalog
cat > mcp-compromise-2026.json <<EOF
{
"schema_version": "0.1.0",
"entries": [{
"id": "mcp-filesystem-compromise-2026",
"ecosystem": "mcp",
"package": "@modelcontextprotocol/server-filesystem",
"versions": ["2.0.1"],
"severity": "critical"
}]
}
EOF
# Fleet-wide scan
bumblebee scan --profile baseline \
--ecosystem mcp \
--exposure-catalog mcp-compromise-2026.json \
--findings-only
Result: Immediate identification of affected developers and their MCP configs.
Performance and Scalability
Benchmarks (MacBook Pro M1, 16GB RAM)
| Scan Profile | Roots Scanned | Packages Found | Duration |
|---|---|---|---|
| baseline | 12 (global toolchains, extensions) | ~1,200 | 8 sec |
| project | 2 dirs (~/code, ~/work, 40 repos) | ~5,000 | 45 sec |
| deep | $HOME (250GB, 800K files) | ~8,000 | 4 min 30 sec |
Memory Usage
- Baseline: ~50MB RSS
- Project: ~120MB RSS
- Deep: ~200MB RSS (streaming parser, bounded memory)
Parallelization
Bumblebee uses goroutines for concurrent directory traversal and parsing, automatically scaling to available CPU cores. On a 16-core machine, deep scans typically achieve 90%+ CPU utilization.
Comparison to Alternative Tools
| Tool | Purpose | Execution Model | Speed | Ecosystems |
|---|---|---|---|---|
| Bumblebee | Endpoint inventory | Read-only metadata | Fast (seconds) | 10+ ecosystems + MCP |
| Syft | SBOM generation | Read + heuristics | Medium (30s-2m) | 20+ ecosystems |
| Trivy | Vulnerability scanning | Read + CVE DB | Medium (1-3m) | 10+ ecosystems |
| npm audit | Project vulnerability scan | Executes npm | Slow (project-level) | npm only |
| pip-audit | Python vulnerability scan | Executes pip | Slow (project-level) | PyPI only |
| osquery | Endpoint visibility | Read + system queries | Fast (queries) | General OS, limited package support |
| Grype | Vulnerability scanning | Read + CVE DB | Medium (1-2m) | 10+ ecosystems |
Bumblebee's niche: Fast, read-only, fleet-wide exposure checking when you already know what you're looking for (exposure catalogs).
When to use Bumblebee vs. alternatives:
- Use Bumblebee for recurring inventory and incident response ("does anyone have package X@Y?")
- Use Syft/Trivy for SBOM generation and comprehensive vulnerability scanning
- Use package-specific audits (
npm audit,pip-audit) for deep project-level analysis - Use osquery for general OS/endpoint visibility beyond package inventory
Advanced Features
1. Ecosystem Filtering
Scan only specific ecosystems to reduce runtime:
bumblebee scan --profile baseline \
--ecosystem npm,pypi,go \
--ecosystem mcp # Repeatable, comma-separated
2. Max Duration Enforcement
Set hard timeouts for deep scans on large filesystems:
bumblebee scan --profile deep \
--root "$HOME" \
--max-duration 10m # Graceful shutdown after 10 minutes
3. Root Preview
See which roots will be scanned without actually scanning:
bumblebee roots --profile baseline
# Output: <root_kind>\t<path>
# language_toolchain /Users/dev/.nvm/versions/node/v22.0.0
# editor_extension /Users/dev/.vscode/extensions
# browser_extension /Users/dev/Library/Application Support/Google/Chrome
# ...
4. Transport Options
HTTPS POST output:
bumblebee scan --profile baseline \
--output https://collector.company.com/api/v1/inventory \
--output-header "Authorization: Bearer ${API_TOKEN}"
File output with rotation:
bumblebee scan --profile baseline \
--output file:///var/log/bumblebee/baseline-$(date +%Y%m%d).ndjson
5. Selftest (Built-In Validation)
Run embedded fixtures to verify Bumblebee's detection capabilities:
bumblebee selftest
# selftest OK (2 findings in 1ms)
Uses deliberately fake package names ([email protected]) and makes no network calls. Non-zero exit means detection is broken—useful for pre-deployment smoke tests.
Contributing and Community
Bumblebee is Apache 2.0 licensed and welcoming contributions:
- GitHub: https://github.com/perplexityai/bumblebee
- Issue Tracker: https://github.com/perplexityai/bumblebee/issues
- Security Reports: [email protected] (see
SECURITY.md)
Contributing Exposure Catalogs
The threat_intel/ directory is continuously updated via community PRs. To contribute:
- Research recent supply chain incidents (npm, PyPI, etc.)
- Build catalog JSON using Perplexity Computer or manual analysis
- Submit PR with catalog file + brief description
- Perplexity security team reviews and merges
Recent community-contributed catalogs:
- Laravel Lang typosquatting (May 2026)
- PyPI Fabric compromise (updated Aug 2024)
- npm crypto-mining malware (Feb 2026)
Extending Bumblebee
Want to add support for a new ecosystem (Cargo, Hex, Maven)?
- Implement
internal/scannerinterface for your ecosystem - Add source detection in
internal/walker - Update
docs/inventory-sources.mdwith coverage details - Submit PR with tests
Limitations and Future Roadmap
Current Limitations
- No source file analysis: Doesn't parse Python imports, JavaScript requires, or Go imports
- No runtime detection: Doesn't monitor what's actually running (use EDR for that)
- Exact version matching only: No CVE database, no version range queries (yet)
- Limited config formats: MCP support is JSON-only (no TOML, YAML)
- No Windows support: macOS and Linux only (Windows support planned)
Roadmap (as of v0.1.1)
- Version range matching: Query catalogs with version ranges (e.g.,
>= 1.2.0and< 2.0.0) - CVE database integration: Optional CVE lookup for discovered packages
- Windows support: Native Windows path handling and registry queries
- Container image scanning: Docker/OCI image layer analysis
- Config format expansion: TOML, YAML support for MCP configs
- Performance improvements: Incremental scanning, filesystem watchers
Getting Started Checklist
For Security Teams:
- ✅ Install Bumblebee on a test machine (
go install ...) - ✅ Run
bumblebee selftestto verify installation - ✅ Perform baseline scan on test machine, review output
- ✅ Download exposure catalogs from
threat_intel/directory - ✅ Test incident response workflow with sample catalog
- ✅ Deploy to fleet via MDM/configuration management
- ✅ Set up recurring baseline scans (cron/launchd)
- ✅ Integrate findings with SIEM/ticketing system
For Developers:
- ✅ Install Bumblebee locally
- ✅ Run
bumblebee scan --profile project --root ~/code - ✅ Review which packages are visible in your workspace
- ✅ Add to CI/CD pipeline for automated project inventory
- ✅ Subscribe to Bumblebee releases for catalog updates
Conclusion
Bumblebee addresses a critical gap in modern software supply chain security: fast, accurate, fleet-wide visibility into developer endpoint package inventory.
In a world where supply chain compromises increasingly target developer tooling (XZ Utils, PyPI packages, npm malware, MCP servers), security teams need tools that can answer "who's affected?" in seconds, not hours or days.
Bumblebee's design philosophy—read-only, zero-dependency, exact-match detection—makes it the ideal complement to SBOMs, vulnerability scanners, and EDR systems. It's not trying to be a complete SCA solution; it's solving one problem exceptionally well: "which developer machines have package X@Y installed right now?"
With Perplexity's ongoing maintenance, community-contributed threat intelligence catalogs, and Apache 2.0 licensing, Bumblebee is positioned to become a standard tool in security operations workflows.
Start scanning your fleet today:
go install github.com/perplexityai/bumblebee/cmd/bumblebee@latest
bumblebee scan --profile baseline --exposure-catalog <(curl -s https://raw.githubusercontent.com/perplexityai/bumblebee/main/threat_intel/*.json)
Your future incident response team will thank you.