Performance & Scaling
indx is built to turn whole document estates — not just a handful of files — into a knowledge space. This guide shows how to keep large runs fast and memory-stable by tuning the right lever at the right stage: parallelism for parsing, batching for embedding, bounded concurrency for enrichment, and content-addressed caching for cheap re-runs.
The headline targets indx is engineered against:
- Time-to-first-space: under 60 seconds on a small directory (~10 docs) on a typical laptop with defaults.
- Scale: directories of 10k+ files processed crash-free in over 99% of runs, streaming rather than loading the whole estate into memory.
- Memory: a 2 GB folder must not require 2 GB of RAM — files are processed as a stream, not materialised all at once.
The mental model: one lever per stage
Section titled “The mental model: one lever per stage”Each of the six stages (Walk → Parse → Chunk → Relate → Enrich → Embed+Pack) has a different workload profile, so there is no single concurrency knob. indx applies a stage-appropriate strategy:
| Stage | Workload | Strategy | Default |
|---|---|---|---|
| 01 Walk | I/O, CPU | Single-pass; may parallelise per-folder traversal | — |
| 02 Parse | Blocking native / CPU-bound | Worker pool across files (thread pool; process pool for GIL-bound parsers) | --jobs (CPU count) |
| 03 Chunk | CPU-bound | Single-pass | — |
| 04 Relate | CPU-bound | Single-pass | — |
| 05 Enrich | Network/model-bound (LLM/VLM) | Bounded concurrency, per-provider rate-limit aware | concurrency 4 |
| 06 Embed+Pack | Model + store I/O | Batched embed + batched Store.upsert | batch 64 |
Two principles run through this table:
- Parse is embarrassingly parallel across files. A worker pool of
--jobsrunsParser.parseconcurrently and merges results into the context keyed bydoc_id. Parsers that hold native or GIL-bound resources (such as Docling) may run in a process pool instead of a thread pool. - Batching beats parallelism for embeddings. Local embedders like
bge-m3are far more efficient on batches (CPU/GPU vectorisation), and vector-store upserts are batched to amortise round-trips. This is the single biggest performance lever, and it applies whether the embedder is local or a cloud API.
For network-bound cloud calls (a hosted LLM, a remote embedder, a server-mode store), indx uses asyncio with bounded concurrency and per-provider rate limiting, so latency is hidden without spawning threads. The default profile is deliberately conservative and bounded so a build on a laptop never exhausts memory or hammers a rate-limited API.
Batching & concurrency parameters
Section titled “Batching & concurrency parameters”These are the defaults; each is overridable via adapter sub-tables in indx.toml or component kwargs in the SDK.
| Stage | Parameter | Default |
|---|---|---|
| Embed | batch size | 64 |
| Embed | max concurrency | --jobs |
| Enrich | max concurrency | 4 |
| Parse | workers | --jobs |
--jobs (alias -j) defaults to the CPU count and controls both parse workers and embed concurrency:
# Use 8 parse workers / embed concurrencyindx ./docs --out ./ai-ready --jobs 8Caching & resume
Section titled “Caching & resume”indx keeps a content-addressed cache under <out>/.indx-cache/. Each entry is keyed on:
(stage, sha256(input), component-id, relevant-config)Pass --resume to reuse any cache entry whose key is unchanged, skipping recomputation for unmodified files and unchanged configuration:
indx ./docs --out ./ai-ready --resumeBecause the key includes the component identity and the relevant config, invalidation is precise and scoped to what actually changed:
| You change… | Invalidates |
|---|---|
The parser (--parser) | Parse and everything downstream |
The embedder (--embedder) | Only Embed |
| A single source file | That file’s entries (and their downstream) |
| Nothing | Nothing — the whole run is cache hits |
This is why re-running over a large estate after a small edit is cheap: only the touched files and the stages affected by your change are recomputed. The cache also makes stages idempotent — re-running a stage on its own output never duplicates work or corrupts state.
In --verbose mode, cache hits and misses are reported per stage, so you can confirm a resume is doing what you expect:
indx ./docs --out ./ai-ready --resume --verboseStreaming & memory rules
Section titled “Streaming & memory rules”The performance contract is that indx streams the estate rather than materialising it. Walk and the downstream stages process files as an iterator, holding the working set — not the whole directory — in memory. Files are read incrementally where the parser allows, and large intermediate buffers are released promptly.
Concretely, this is what keeps a 2 GB folder from needing 2 GB of RAM:
- Walk yields files lazily; nothing assumes the full file list fits in memory.
- Parse runs in bounded worker pools, so only
--jobsdocuments are in flight at once. - Embed and upsert flow in batches of 64, so vectors are written and dropped rather than accumulated.
- Vectors in a sealed
.indxarchive are memory-mapped on demand fromvectors.f32on load, not read whole.
Stream + batch pattern (SDK)
Section titled “Stream + batch pattern (SDK)”When building a custom stage or your own ingestion loop, follow the stream-then-batch shape rather than loading everything and processing one item at a time:
from itertools import islice
def batched(iterable, size): it = iter(iterable) while batch := list(islice(it, size)): yield batch
# Stream the walk, embed and upsert in batches, resume on cache hits.for batch in batched(ctx.chunks, size=64): if cache.has(batch): # resume: skip completed work continue vectors = embedder.embed([c.text for c in batch]) store.upsert( ids=[c.id for c in batch], vectors=vectors, payloads=[c.metadata for c in batch], )The anti-pattern to avoid is the opposite — reading every file up front and embedding one chunk at a time:
# Anti-pattern: materialises the whole estate, one round-trip per chunktexts = [p.read_text() for p in all_files]vectors = [embedder.embed([t]) for t in texts]Making the local profile fast enough
Section titled “Making the local profile fast enough”The shipped zero-config defaults are cloud-backed, so model-heavy work runs on managed APIs out of the box. The opt-in local profile (docling + ollama:qwen2.5 + bge-m3) still supports the air-gapped path with zero network calls, but local models can be the slow part of a large run on commodity hardware. When you are running the local profile, the mitigations below apply, in order of impact:
-
Parallelise parsing. Raise
--jobsto match your cores; parsing is usually the first bottleneck on document-heavy corpora. -
Lean on batching. Keep embedding batched (default 64); increase the batch size if you have GPU/CPU headroom.
-
Skip what you do not need. Use
--no-embedto produce a graph-only space (Walk → Relate, no vectors) when you only need structure, and skip enrichment by dropping the Enrich stage when you do not want LLM work:Terminal window # Graph only — no embedding, fastest path to structureindx ./docs --out ./ai-ready --no-embedfrom indx import DirectoryPipeline# Drop enrichment to skip all LLM callsspace = DirectoryPipeline().drop("enrich").run("./docs", "./out") -
Lean back on hosted models for the heaviest stages when policy allows — the same cloud backends used by the zero-config defaults. A hosted LLM for Enrich or a hosted embedder benefits from
asynciobounded concurrency and can dramatically cut wall-clock time on large estates. See Enrichment with LLM/VLM and Choosing an embedder. -
Resume aggressively. Combine
--resumewith the cache so iterative runs only pay for what changed.
Quick reference
Section titled “Quick reference”| Goal | Lever |
|---|---|
| Parse faster | --jobs / -j (parse workers) |
| Embed faster | embed batch size (default 64), --jobs (embed concurrency) |
| Hide cloud latency | Enrich/embed bounded concurrency (asyncio) |
| Cheap re-runs | --resume (+ --verbose to see hits/misses) |
| Skip vectors | --no-embed (graph-only space) |
| Skip LLM work | drop the enrich stage |
| Stay memory-stable | stream + batch; let the defaults do their job |
For the full flag list see the CLI reference (--jobs, --resume, --no-embed); for keeping runs auditable and byte-stable see Reproducibility.