Enrichment: LLMs & VLMs

The Enrich stage (05) is where indx asks a model to read your content and add useful metadata — a detected document type, topics, tags, and a summary. It uses two swappable slots: an LLM for text and an optional VLM for images. This guide shows how to choose and configure both, how to disable enrichment entirely, and how to keep things private.

For where this fits in the run, see the Enrich stage reference and the pipeline overview.

What enrichment produces

Enrich reads the chunks and documents already in the SpaceContext and writes structured metadata back onto them:

Field	Lands on	Example
`type`	`Document.type`, `Source.type`	`"policy"`, `"guide"`
`topics`	`Document.topics`, `Chunk.metadata.topics`	`["retention", "compliance"]`
`tags`	`Document.tags`	`["gdpr", "data"]`
`summary`	`Document.summary`, `Chunk.metadata.summary`	`"Defines the 90-day retention rule…"`

The LLM handles all four; the VLM contributes descriptions of images, diagrams, and scanned pages that feed into the same fields. Enrich is a per-item, skip-on-failure stage by default: if one document’s LLM call times out, that document is skipped with a StageError(kind="skip") and the run continues (use --strict to make any skip fatal).

The LLM slot

indx ships thin per-provider adapters behind a single LLM protocol. The core package depends on no LLM SDK; each backend’s client installs only via an extra. The protocol is small:

@runtime_checkable
class LLM(Protocol):
    """Text generation for enrichment (type, topics, tags, summaries).
    Default: openai:gpt-5-mini."""
    def complete(self, prompt: str, *, system: str | None = None,
                 max_tokens: int = 512, temperature: float = 0.0) -> str: ...

Choosing a backend

Name string	Backend	Extra	Best for
`openai:<model>` (default `openai:gpt-5-mini`)	OpenAI	`indx[openai]` + `OPENAI_API_KEY`	Cloud default for text enrichment.
`ollama:<model>` (`ollama:qwen2.5` in the local profile)	Ollama	— (local runtime)	Local, no key, air-gapped enrichment.
`vllm:<model>`	vLLM	local serving	High-throughput / GPU-server deployments.
`anthropic:<model>`	Anthropic	`indx[anthropic]`	Cloud alternative with long context.
`azure:<model>`	Azure OpenAI	`indx[openai]`	OpenAI models through Azure governance.
`litellm:<model>`	LiteLLM	`indx[litellm]`	Opt-in unified backend routing 100+ providers through one adapter.
`none`	—	—	Disable enrichment entirely.

The name string carries an optional :model suffix, so ollama:qwen2.5, openai:gpt-4o-mini, and anthropic:claude-3-5-haiku are all valid. The base name selects the adapter; the suffix selects the model.

Selecting the LLM

On the CLI, use --llm:

# Default: cloud openai:gpt-5-mini (needs OPENAI_API_KEY)
indx ./docs --out ./ai-ready

# Local path — no key, air-gapped
indx ./docs --out ./ai-ready --llm ollama:qwen2.5

# Switch to a different cloud provider/model (key via env var, never the config file)
export INDX_LLM__API_KEY="sk-…"
indx ./docs --out ./ai-ready --llm openai:gpt-4o-mini

# Turn enrichment OFF
indx ./docs --out ./ai-ready --llm none

In indx.toml, the slot lives under [enrich]:

[enrich]
llm = "openai:gpt-5-mini" # name[:model] or "none". Default: "openai:gpt-5-mini".

In the SDK, pass llm= to the pipeline (or drop the stage outright):

from indx import DirectoryPipeline

# Named backend
space = DirectoryPipeline(llm="anthropic:claude-3-5-haiku").run("./docs", "./out")

# Disable enrichment, two equivalent ways:
DirectoryPipeline(llm="none").run("./docs", "./out")
DirectoryPipeline().drop("enrich").run("./docs", "./out")

--llm none and drop("enrich") both produce a fully valid knowledge space — the graph, chunks, neighbors, relations, and embeddings are all still built; only the LLM-derived type/topics/tags/summary are left empty.

The VLM slot

The VLM slot adds vision-language descriptions of figures, diagrams, scans, and screenshots that the parser captured into ParsedDoc.images. It defaults to none (off) so image-description latency and cost stay opt-in.

@runtime_checkable
class VLM(Protocol):
    """Vision-language enrichment for images/layout. Default: none (disabled)."""
    def describe(self, image: bytes, *, prompt: str | None = None) -> str: ...

Name string	Backend	Enable for
`none` (default)	disabled	Text-only corpora; fastest, cheapest.
`qwen-vl`	local Qwen-VL	Local image understanding, no egress.
`gpt-4o`	cloud (OpenAI)	High-quality figure/diagram descriptions.
`<local adapter>`	other local VLM	Any installed vision model.

Enable it with --vlm or the [enrich].vlm key:

indx ./docs --out ./ai-ready --vlm qwen-vl

[enrich]
vlm = "qwen-vl"   # name or "none". Default: "none".

DirectoryPipeline(llm="ollama:qwen2.5", vlm="qwen-vl").run("./scans", "./out")

Controlling which metadata is produced

The [enrich].metadata key selects which of the four enrichments to compute. It defaults to all four; trim it to save time and tokens.

[enrich]
llm      = "ollama:qwen2.5"
metadata = ["type", "topics", "summary"]   # skip "tags"

The allowed values are exactly the subset ["type", "topics", "tags", "summary"]. An empty-ish list narrows the work; setting llm = "none" skips the stage altogether.

Determinism and concurrency

Enrichment is built to be reproducible and well-behaved against rate limits:

temperature=0.0 by default. The LLM.complete signature defaults temperature to 0.0 so reruns are as stable as the provider allows. See reproducibility for the full determinism story.
Provenance is recorded. Because some providers aren’t bit-reproducible, the resolved config snapshot — including the model name — is written into index.json.metadata and the archive manifest.json for auditability.
Bounded concurrency. Enrich issues LLM/VLM calls per document with a bounded concurrency limit (default 4) to respect provider rate limits. Tune it per backend; see performance.

Privacy: cloud LLMs egress

By default, indx sends nothing over the network — the local stack (local parser, local Ollama LLM, local embedder, local/JSONL store) is fully air-gapped. Choosing a cloud backend changes that: openai, anthropic, azure, and a cloud VLM like gpt-4o transmit chunk text and images to that provider during Enrich.

from indx import DirectoryPipeline, SpaceContext

class PiiRedactStage:
    name = "pii-redact"
    def run(self, ctx: SpaceContext) -> SpaceContext:
        for chunk in ctx.chunks:
            chunk.text = redact(chunk.text)
        return ctx   # MUST return the same context

pipeline = DirectoryPipeline(llm="openai:gpt-4o-mini")
pipeline.insert(3, PiiRedactStage())   # lands after Chunk (index 2) and before Relate (which shifts from index 3 to 4) — and therefore before Enrich
space = pipeline.run("./docs", "./out")

Secrets themselves stay out of the config file: API keys come from environment variables such as INDX_LLM__API_KEY, and indx never logs or serializes them. See custom stages for the full redaction recipe.

Next steps

Enrich stage reference — exactly what stage 05 reads and writes.
Component protocols — the full LLM and VLM contracts.
Extras — which pip install indx[...] each backend needs.
Reproducibility — determinism with cloud and local models.
Bring your own stack — swap any slot without a rewrite.