← Blog
explainx / blog

CocoIndex: incremental indexing for always-fresh agent and RAG context

CocoIndex (Apache-2): Rust core + Python API—incremental delta embeddings to Postgres for agent RAG. pip install cocoindex; github.com/cocoindex-io/cocoindex.

12 min readYash Thakker
CocoIndexRAGAgentsData engineering

MDX restores the committed source plus an HTML comment attribution; plain text bundles the rendered markdown body with the explainx.ai attribution footer.

CocoIndex: incremental indexing for always-fresh agent and RAG context

CocoIndex targets teams whose RAG and agent memory stale out between batch jobs: declare Python flows, backfill once, then recompute deltas instead of re-embedding entire corpora each night.

License: Apache-2.0 (README). Implementation: Rust core with Python 3.10–3.13 APIs.

TL;DR

TopicTakeaway
ProblemStale vectors and expensive full rebuilds when sources churn
FixIncremental graph: track changes, propagate, retire stale rows (vendor language)
APIDeclarative Python: @coco.fn, connectors (localfs, postgres, …), vector targets
SkillRepo advertises a CocoIndex skill for coding agents
Quick startpip install -U cocoindex + README snippet

The problem: why batch embedding pipelines fail agents

Traditional data pipelines for RAG applications work like this: scrape or export your corpus, chunk documents, generate embeddings, load into a vector database, then schedule the whole process to run nightly or weekly. This pattern worked well when knowledge bases changed slowly and computational resources were cheap relative to developer time.

But modern AI agent systems operate under different constraints. Documentation repositories receive dozens of commits daily. Customer support knowledge bases update hourly. Engineering wikis evolve with every sprint. Code repositories that inform developer agents change with every merge to main.

When your embedding pipeline runs every 24 hours, you are serving potentially stale context to LLMs for 23 hours and 59 minutes. When a critical bug fix lands in your documentation, agents continue citing the old approach until the next batch completes. When a pricing page changes, sales agents give incorrect quotes. When security policies update, compliance agents miss the memo.

The alternative—running full re-embeddings every hour or every commit—explodes your infrastructure costs. Embedding models process millions of tokens per run. Vector indexes require full rebuilds. Database writes generate lock contention. A codebase with 100,000 files might cost hundreds of dollars per full refresh when you factor in API calls to embedding providers.

This is the tension CocoIndex resolves: how do you keep agent context continuously fresh without continuously re-processing unchanged data?


Mental model: incremental computation for embeddings

The README walks a canonical flow:

  • walk docs on disk → chunk (RecursiveSplitter) → embed → write Postgres rows + vector index → App(...).update_blocking() on change.

But the magic is in what happens on the second run. CocoIndex builds a directed acyclic graph of your data pipeline where each node tracks:

  1. Input fingerprints — content-addressed hashes of source files, configurations, and function code
  2. Output artifacts — embedded vectors, database rows, indexes
  3. Dependency edges — which downstream nodes depend on which upstream changes

When you call update_blocking() again after modifying three files in a 10,000-file corpus, CocoIndex:

  • Detects the three changed files via filesystem metadata and content hashing
  • Propagates those changes through the DAG to find affected chunks
  • Re-embeds only the modified chunks (maybe 50 total if the files were long)
  • Updates only the corresponding rows in Postgres
  • Marks stale embeddings for cleanup
  • Leaves 99.5% of your index untouched

@coco.fn(memo=True) memoizes by hashed inputs + code so unchanged paths skip work—confirm semantics in current docs before prod. The decorator is not just caching; it is content-addressable storage for pipeline stages. If your chunking function changes, CocoIndex detects the code hash delta and re-chunks affected documents even if the source files have not changed.

Conceptually this is incremental data engineering aimed at embedding workloads, not generic ETL only. You can think of it as "make for machine learning pipelines" or "git for vector databases"—the core insight is that most ML engineering pipelines are pure functions of their inputs, so aggressive memoization with proper invalidation unlocks dramatic speedups.


Architecture: Rust for speed, Python for ergonomics

The repository architecture reflects a pragmatic split:

Rust core handles the performance-critical work:

  • File watching and change detection
  • Content hashing and fingerprinting
  • DAG construction and traversal
  • Parallel execution scheduling
  • Low-level database operations

Python API provides the developer interface:

  • Declarative pipeline definitions via decorators
  • Integration with popular ML libraries (LangChain, LlamaIndex, sentence-transformers)
  • Connector ecosystem (local filesystem, S3, Postgres, Pinecone, Weaviate)
  • Notebook-friendly iteration loops

This is the same pattern used by projects like Polars (Rust + Python) and Pydantic (Rust + Python via pydantic-core). Python wins for expressiveness and ecosystem integration. Rust wins for throughput and memory efficiency. The Rust implementation means CocoIndex can handle repositories with millions of files without the garbage collection pauses or memory overhead that plague pure-Python data tools.

From a systems perspective, the interesting challenge is correct invalidation. CocoIndex must detect:

  • Source file changes (easy: mtime and content hash)
  • Configuration changes (medium: hash config objects)
  • Code changes (hard: hash function bytecode and closure captures)
  • Dependency changes (harder: track pip environment, model versions)

The README shows @coco.fn(memo=True) doing this work automatically, but production users should understand the cache-invalidation surface area. If you swap embedding models or upgrade LangChain, you may need to manually trigger a full refresh or use version pins in your pipeline definition.


Where it sits in a stack

In a modern LLM application architecture, data flows through several layers:

  • LLM — reasoning and generation (GPT-4, Claude, Llama)
  • MCP / tools — live actions that fetch current state (API calls, database queries, web search)
  • CocoIndexdurable, reviewable pipelines into a target warehouse (often Postgres + pgvector-class indexes)
  • Vector search — retrieval that conditions the LLM prompt
  • Application layer — orchestration, auth, UI

CocoIndex sits between raw data sources and the vector search layer. It is not a vector database (it writes to your choice of Postgres, Pinecone, Weaviate, etc.). It is not a runtime retrieval system (pair it with LangChain, LlamaIndex, or custom retrieval logic). It is the ingestion and refresh engine that keeps those downstream systems fed with current embeddings.

This separation of concerns matters for operational maturity. Your vector database handles query-time concerns: similarity search, filtering, scaling concurrent reads. CocoIndex handles write-time concerns: staleness, cost, consistency. You can swap one without touching the other.

Pair with context-mode for chat-side tool spam and MCP for runtime tools—the slices differ. Context-mode addresses "how do I keep massive tool outputs from flooding my chat transcript?" CocoIndex addresses "how do I keep my knowledge base current without re-processing everything?"


Use cases and when incremental indexing matters

High-frequency documentation sites: Developer portals that publish multiple times daily benefit immediately. Each git push triggers a CocoIndex update that re-embeds only changed pages. Your docs chatbot serves current content within minutes, not hours.

Living codebases for AI dev tools: GitHub Copilot competitors and internal code search tools need indexes that track HEAD, not last-night's snapshot. CocoIndex watches your monorepo and updates embeddings for modified files as developers commit. Agents retrieve current implementation patterns, not deprecated code.

Customer support knowledge bases: Help center articles change based on product updates, A/B tests, and seasonal campaigns. Traditional batch jobs mean support agents see outdated troubleshooting steps until the next ETL window. Incremental updates propagate changes in near-real-time.

Multi-tenant SaaS with user-uploaded content: When customers upload documents, presentations, or spreadsheets into your AI application, they expect immediate availability in chat. CocoIndex lets you embed new content without triggering full-tenant re-indexing.

Compliance and audit trails: The incremental graph gives you a queryable history of what changed, when, and why. You can trace which source file caused a specific embedding to update, critical for debugging wrong retrievals or satisfying audit requirements.

Cost optimization at scale: A 100GB corpus might cost 500 dollars to fully embed via OpenAI's API. With 1% daily churn, incremental updates cost 5 dollars per day instead of 500 dollars for nightly full refreshes. Over a year, that is 1,825 dollars versus 182,500 dollars—a 100x difference.


Integration patterns and connector ecosystem

CocoIndex provides first-class connectors for common sources and targets:

Sources:

  • localfs — local directories with inotify-style watching
  • s3 — S3 buckets with change detection via ETags and versioning
  • github — repositories via webhooks or polling
  • notion — Notion workspaces via their API
  • confluence — Atlassian Confluence spaces

Chunking strategies:

  • RecursiveSplitter — recursive character splitting (LangChain-compatible)
  • SemanticSplitter — split on semantic boundaries using smaller models
  • CodeSplitter — language-aware splitting for code (respects function/class boundaries)

Embedding providers:

  • OpenAI — text-embedding-3-small, text-embedding-3-large
  • Cohere — embed-english-v3.0, embed-multilingual-v3.0
  • VoyageAI — voyage-2, voyage-code-2
  • HuggingFace — any sentence-transformers model
  • Local — run models on your infrastructure (useful for PII-sensitive data)

Targets:

  • postgres — Postgres + pgvector extension
  • pinecone — Pinecone vector database
  • weaviate — Weaviate
  • qdrant — Qdrant
  • chroma — ChromaDB
  • milvus — Milvus

The connector pattern follows a simple interface: sources yield records, transforms map records to records, targets consume records. This composability means you can mix and match—read from S3, chunk with semantic splitting, embed with Cohere, write to Qdrant—all while preserving incremental semantics.


Operational considerations for production

Concurrency and parallelism: The Rust core schedules independent DAG nodes in parallel. If you have 50 changed files and 8 CPU cores, CocoIndex processes chunks across all cores. This is not Python multiprocessing with pickle overhead; it is native Rust threads with zero-copy message passing.

Error handling and retries: Embedding API calls can fail. Networks partition. Databases reject writes during maintenance windows. CocoIndex tracks partial progress so failures do not corrupt your index or force full re-runs. Failed chunks get retried with exponential backoff; successful chunks stay committed.

Storage overhead: The incremental graph metadata lives in a local database (SQLite by default, Postgres for distributed setups). For a 100k-file repository, expect a few hundred MB of metadata. This is small compared to the vector storage itself but means your pipeline is not stateless—treat the metadata DB as part of your backup strategy.

Version pinning and reproducibility: Production pipelines should pin CocoIndex versions, Python dependencies, and embedding model versions. The examples/ tree in the repository shows recommended patterns for deterministic builds and containerization.

Monitoring and observability: CocoIndex exposes metrics for:

  • Files scanned vs files changed vs chunks re-embedded
  • Embedding API latency and cost
  • Database write throughput
  • Cache hit rates on memoized functions

Integrate these with your existing observability stack (Prometheus, Datadog, etc.) to track pipeline health and cost trends.


CocoIndex skill for AI coding agents

The README advertises a bundled skill for AI coding agents to emit correct v1 declarations. This is a meta capability: you use an AI agent to configure the very pipeline that will serve context to future agents.

The skill provides:

  • Schema-aware Python code generation for pipeline definitions
  • Validation of connector configurations
  • Best-practice templates for common source/target combinations
  • Debugging helpers for failed pipelines

To install: follow the "Use with AI coding agents" section in the repository. The skill integrates with Claude Code, Cursor, and other MCP-compatible environments. You describe your data sources in natural language; the skill generates a working CocoIndex pipeline with appropriate connectors, chunking strategies, and error handling.

This closes the loop: agents consume fresh context from CocoIndex-managed indexes, and agents help developers build those indexes correctly in the first place.


Enterprise vs open-source considerations

The repository is Apache-2.0 licensed, which means you can:

  • Use it in commercial products without fees
  • Modify it for internal needs
  • Run it at any scale on your infrastructure

The marketing copy separates an open core from an Enterprise tier aimed at larger corpora and support. Potential enterprise features might include:

  • Distributed execution across clusters (vs single-machine parallelism)
  • Advanced access controls and multi-tenancy
  • SLAs and dedicated support channels
  • Pre-built connectors for enterprise systems (SAP, Salesforce, etc.)

Evaluate against your scale and compliance needs directly with the vendor docs. For most teams processing under a million documents, the open-source version is fully production-capable. The Rust performance envelope is wide enough that single-machine deployments scale surprisingly far.


Alternatives and competitive landscape

Vector database native ingestion: Pinecone, Weaviate, and others offer their own ingestion APIs. These work well for initial loads but typically lack sophisticated incremental update strategies. You either upsert everything or manually track changes in application code.

LangChain document loaders: LangChain provides loaders for many sources but does not solve the staleness problem. Each run reprocesses everything unless you bolt on custom change detection.

Unstructured.io: Focused on parsing complex document formats (PDFs, scans, tables). Pairs well with CocoIndex—use Unstructured for parsing, CocoIndex for incremental indexing.

Airbyte / Fivetran: General ETL tools that can move data but are not optimized for embedding workloads. High overhead for simple use cases; useful when you need 300+ connectors and enterprise governance.

Custom scripts: Many teams start with cron + Python scripts that embed everything nightly. This works until it does not (costs, staleness, operational burden). CocoIndex is the "graduate from scripts" tier.


Getting started: quickstart walkthrough

Install via pip:

pip install -U cocoindex

Minimal example (from README-style quickstart):

import cocoindex as coco
from cocoindex.connectors import localfs, postgres
from cocoindex.transforms import chunk, embed

# Mount a local directory
source = localfs.mount("./docs")

# Define memoized transform
@coco.fn(memo=True)
async def process(record):
    chunks = chunk.recursive(record.content, chunk_size=512)
    embeddings = await embed.openai(chunks, model="text-embedding-3-small")
    return embeddings

# Mount Postgres target
target = postgres.mount("postgresql://localhost/vectors")

# Build app and run
app = coco.App(source=source, transform=process, target=target)
app.update_blocking()

On first run, this processes all files in ./docs. On subsequent runs, only changed files get re-embedded. The memoization is automatic; the DAG is implicit in the function calls.

For production, add error handling, monitoring, and configuration management. The repository examples/ tree shows patterns for:

  • Multi-source pipelines (combine docs + code + web scrapes)
  • Custom chunking strategies
  • Hybrid search (sparse + dense embeddings)
  • Multi-tenancy with isolated indexes

Related on ExplainX

Sources


API shapes and connector lists evolve. Treat this as May 6, 2026 README context and re-read upstream before production cutovers.

Related posts