Enrichment: LLMs & VLMs
The Enrich stage (05) is where indx asks a model to read your content and add useful metadata — a detected document type, topics, tags, and a summary. It uses two swappable slots: an LLM for text and an optional VLM for images. This guide shows how to choose and configure both, how to disable enrichment entirely, and how to keep things private.
For where this fits in the run, see the Enrich stage reference and the pipeline overview.
What enrichment produces
Section titled “What enrichment produces”Enrich reads the chunks and documents already in the SpaceContext and writes structured metadata back onto them:
| Field | Lands on | Example |
|---|---|---|
type | Document.type, Source.type | "policy", "guide" |
topics | Document.topics, Chunk.metadata.topics | ["retention", "compliance"] |
tags | Document.tags | ["gdpr", "data"] |
summary | Document.summary, Chunk.metadata.summary | "Defines the 90-day retention rule…" |
The LLM handles all four; the VLM contributes descriptions of images, diagrams, and scanned pages that feed into the same fields. Enrich is a per-item, skip-on-failure stage by default: if one document’s LLM call times out, that document is skipped with a StageError(kind="skip") and the run continues (use --strict to make any skip fatal).
The LLM slot
Section titled “The LLM slot”indx ships thin per-provider adapters behind a single LLM protocol. The core package depends on no LLM SDK; each backend’s client installs only via an extra. The protocol is small:
@runtime_checkableclass LLM(Protocol): """Text generation for enrichment (type, topics, tags, summaries). Default: openai:gpt-5-mini.""" def complete(self, prompt: str, *, system: str | None = None, max_tokens: int = 512, temperature: float = 0.0) -> str: ...Choosing a backend
Section titled “Choosing a backend”| Name string | Backend | Extra | Best for |
|---|---|---|---|
openai:<model> (default openai:gpt-5-mini) | OpenAI | indx[openai] + OPENAI_API_KEY | Cloud default for text enrichment. |
ollama:<model> (ollama:qwen2.5 in the local profile) | Ollama | — (local runtime) | Local, no key, air-gapped enrichment. |
vllm:<model> | vLLM | local serving | High-throughput / GPU-server deployments. |
anthropic:<model> | Anthropic | indx[anthropic] | Cloud alternative with long context. |
azure:<model> | Azure OpenAI | indx[openai] | OpenAI models through Azure governance. |
litellm:<model> | LiteLLM | indx[litellm] | Opt-in unified backend routing 100+ providers through one adapter. |
none | — | — | Disable enrichment entirely. |
The name string carries an optional :model suffix, so ollama:qwen2.5, openai:gpt-4o-mini, and anthropic:claude-3-5-haiku are all valid. The base name selects the adapter; the suffix selects the model.
Selecting the LLM
Section titled “Selecting the LLM”On the CLI, use --llm:
# Default: cloud openai:gpt-5-mini (needs OPENAI_API_KEY)indx ./docs --out ./ai-ready
# Local path — no key, air-gappedindx ./docs --out ./ai-ready --llm ollama:qwen2.5
# Switch to a different cloud provider/model (key via env var, never the config file)export INDX_LLM__API_KEY="sk-…"indx ./docs --out ./ai-ready --llm openai:gpt-4o-mini
# Turn enrichment OFFindx ./docs --out ./ai-ready --llm noneIn indx.toml, the slot lives under [enrich]:
[enrich]llm = "openai:gpt-5-mini" # name[:model] or "none". Default: "openai:gpt-5-mini".In the SDK, pass llm= to the pipeline (or drop the stage outright):
from indx import DirectoryPipeline
# Named backendspace = DirectoryPipeline(llm="anthropic:claude-3-5-haiku").run("./docs", "./out")
# Disable enrichment, two equivalent ways:DirectoryPipeline(llm="none").run("./docs", "./out")DirectoryPipeline().drop("enrich").run("./docs", "./out")--llm none and drop("enrich") both produce a fully valid knowledge space — the graph, chunks, neighbors, relations, and embeddings are all still built; only the LLM-derived type/topics/tags/summary are left empty.
The VLM slot
Section titled “The VLM slot”The VLM slot adds vision-language descriptions of figures, diagrams, scans, and screenshots that the parser captured into ParsedDoc.images. It defaults to none (off) so image-description latency and cost stay opt-in.
@runtime_checkableclass VLM(Protocol): """Vision-language enrichment for images/layout. Default: none (disabled).""" def describe(self, image: bytes, *, prompt: str | None = None) -> str: ...| Name string | Backend | Enable for |
|---|---|---|
none (default) | disabled | Text-only corpora; fastest, cheapest. |
qwen-vl | local Qwen-VL | Local image understanding, no egress. |
gpt-4o | cloud (OpenAI) | High-quality figure/diagram descriptions. |
<local adapter> | other local VLM | Any installed vision model. |
Enable it with --vlm or the [enrich].vlm key:
indx ./docs --out ./ai-ready --vlm qwen-vl[enrich]vlm = "qwen-vl" # name or "none". Default: "none".DirectoryPipeline(llm="ollama:qwen2.5", vlm="qwen-vl").run("./scans", "./out")Controlling which metadata is produced
Section titled “Controlling which metadata is produced”The [enrich].metadata key selects which of the four enrichments to compute. It defaults to all four; trim it to save time and tokens.
[enrich]llm = "ollama:qwen2.5"metadata = ["type", "topics", "summary"] # skip "tags"The allowed values are exactly the subset ["type", "topics", "tags", "summary"]. An empty-ish list narrows the work; setting llm = "none" skips the stage altogether.
Determinism and concurrency
Section titled “Determinism and concurrency”Enrichment is built to be reproducible and well-behaved against rate limits:
temperature=0.0by default. TheLLM.completesignature defaultstemperatureto0.0so reruns are as stable as the provider allows. See reproducibility for the full determinism story.- Provenance is recorded. Because some providers aren’t bit-reproducible, the resolved config snapshot — including the model name — is written into
index.json.metadataand the archivemanifest.jsonfor auditability. - Bounded concurrency. Enrich issues LLM/VLM calls per document with a bounded concurrency limit (default
4) to respect provider rate limits. Tune it per backend; see performance.
Privacy: cloud LLMs egress
Section titled “Privacy: cloud LLMs egress”By default, indx sends nothing over the network — the local stack (local parser, local Ollama LLM, local embedder, local/JSONL store) is fully air-gapped. Choosing a cloud backend changes that: openai, anthropic, azure, and a cloud VLM like gpt-4o transmit chunk text and images to that provider during Enrich.
from indx import DirectoryPipeline, SpaceContext
class PiiRedactStage: name = "pii-redact" def run(self, ctx: SpaceContext) -> SpaceContext: for chunk in ctx.chunks: chunk.text = redact(chunk.text) return ctx # MUST return the same context
pipeline = DirectoryPipeline(llm="openai:gpt-4o-mini")pipeline.insert(3, PiiRedactStage()) # lands after Chunk (index 2) and before Relate (which shifts from index 3 to 4) — and therefore before Enrichspace = pipeline.run("./docs", "./out")Secrets themselves stay out of the config file: API keys come from environment variables such as INDX_LLM__API_KEY, and indx never logs or serializes them. See custom stages for the full redaction recipe.
Next steps
Section titled “Next steps”- Enrich stage reference — exactly what stage 05 reads and writes.
- Component protocols — the full
LLMandVLMcontracts. - Extras — which
pip install indx[...]each backend needs. - Reproducibility — determinism with cloud and local models.
- Bring your own stack — swap any slot without a rewrite.