Skip to content

Enrichment: LLMs & VLMs

The Enrich stage (05) is where indx asks a model to read your content and add useful metadata — a detected document type, topics, tags, and a summary. It uses two swappable slots: an LLM for text and an optional VLM for images. This guide shows how to choose and configure both, how to disable enrichment entirely, and how to keep things private.

For where this fits in the run, see the Enrich stage reference and the pipeline overview.

Enrich reads the chunks and documents already in the SpaceContext and writes structured metadata back onto them:

FieldLands onExample
typeDocument.type, Source.type"policy", "guide"
topicsDocument.topics, Chunk.metadata.topics["retention", "compliance"]
tagsDocument.tags["gdpr", "data"]
summaryDocument.summary, Chunk.metadata.summary"Defines the 90-day retention rule…"

The LLM handles all four; the VLM contributes descriptions of images, diagrams, and scanned pages that feed into the same fields. Enrich is a per-item, skip-on-failure stage by default: if one document’s LLM call times out, that document is skipped with a StageError(kind="skip") and the run continues (use --strict to make any skip fatal).

indx ships thin per-provider adapters behind a single LLM protocol. The core package depends on no LLM SDK; each backend’s client installs only via an extra. The protocol is small:

@runtime_checkable
class LLM(Protocol):
"""Text generation for enrichment (type, topics, tags, summaries).
Default: openai:gpt-5-mini."""
def complete(self, prompt: str, *, system: str | None = None,
max_tokens: int = 512, temperature: float = 0.0) -> str: ...
Name stringBackendExtraBest for
openai:<model> (default openai:gpt-5-mini)OpenAIindx[openai] + OPENAI_API_KEYCloud default for text enrichment.
ollama:<model> (ollama:qwen2.5 in the local profile)Ollama— (local runtime)Local, no key, air-gapped enrichment.
vllm:<model>vLLMlocal servingHigh-throughput / GPU-server deployments.
anthropic:<model>Anthropicindx[anthropic]Cloud alternative with long context.
azure:<model>Azure OpenAIindx[openai]OpenAI models through Azure governance.
litellm:<model>LiteLLMindx[litellm]Opt-in unified backend routing 100+ providers through one adapter.
noneDisable enrichment entirely.

The name string carries an optional :model suffix, so ollama:qwen2.5, openai:gpt-4o-mini, and anthropic:claude-3-5-haiku are all valid. The base name selects the adapter; the suffix selects the model.

On the CLI, use --llm:

Terminal window
# Default: cloud openai:gpt-5-mini (needs OPENAI_API_KEY)
indx ./docs --out ./ai-ready
# Local path — no key, air-gapped
indx ./docs --out ./ai-ready --llm ollama:qwen2.5
# Switch to a different cloud provider/model (key via env var, never the config file)
export INDX_LLM__API_KEY="sk-…"
indx ./docs --out ./ai-ready --llm openai:gpt-4o-mini
# Turn enrichment OFF
indx ./docs --out ./ai-ready --llm none

In indx.toml, the slot lives under [enrich]:

[enrich]
llm = "openai:gpt-5-mini" # name[:model] or "none". Default: "openai:gpt-5-mini".

In the SDK, pass llm= to the pipeline (or drop the stage outright):

from indx import DirectoryPipeline
# Named backend
space = DirectoryPipeline(llm="anthropic:claude-3-5-haiku").run("./docs", "./out")
# Disable enrichment, two equivalent ways:
DirectoryPipeline(llm="none").run("./docs", "./out")
DirectoryPipeline().drop("enrich").run("./docs", "./out")

--llm none and drop("enrich") both produce a fully valid knowledge space — the graph, chunks, neighbors, relations, and embeddings are all still built; only the LLM-derived type/topics/tags/summary are left empty.

The VLM slot adds vision-language descriptions of figures, diagrams, scans, and screenshots that the parser captured into ParsedDoc.images. It defaults to none (off) so image-description latency and cost stay opt-in.

@runtime_checkable
class VLM(Protocol):
"""Vision-language enrichment for images/layout. Default: none (disabled)."""
def describe(self, image: bytes, *, prompt: str | None = None) -> str: ...
Name stringBackendEnable for
none (default)disabledText-only corpora; fastest, cheapest.
qwen-vllocal Qwen-VLLocal image understanding, no egress.
gpt-4ocloud (OpenAI)High-quality figure/diagram descriptions.
<local adapter>other local VLMAny installed vision model.

Enable it with --vlm or the [enrich].vlm key:

Terminal window
indx ./docs --out ./ai-ready --vlm qwen-vl
[enrich]
vlm = "qwen-vl" # name or "none". Default: "none".
DirectoryPipeline(llm="ollama:qwen2.5", vlm="qwen-vl").run("./scans", "./out")

The [enrich].metadata key selects which of the four enrichments to compute. It defaults to all four; trim it to save time and tokens.

[enrich]
llm = "ollama:qwen2.5"
metadata = ["type", "topics", "summary"] # skip "tags"

The allowed values are exactly the subset ["type", "topics", "tags", "summary"]. An empty-ish list narrows the work; setting llm = "none" skips the stage altogether.

Enrichment is built to be reproducible and well-behaved against rate limits:

  • temperature=0.0 by default. The LLM.complete signature defaults temperature to 0.0 so reruns are as stable as the provider allows. See reproducibility for the full determinism story.
  • Provenance is recorded. Because some providers aren’t bit-reproducible, the resolved config snapshot — including the model name — is written into index.json.metadata and the archive manifest.json for auditability.
  • Bounded concurrency. Enrich issues LLM/VLM calls per document with a bounded concurrency limit (default 4) to respect provider rate limits. Tune it per backend; see performance.

By default, indx sends nothing over the network — the local stack (local parser, local Ollama LLM, local embedder, local/JSONL store) is fully air-gapped. Choosing a cloud backend changes that: openai, anthropic, azure, and a cloud VLM like gpt-4o transmit chunk text and images to that provider during Enrich.

from indx import DirectoryPipeline, SpaceContext
class PiiRedactStage:
name = "pii-redact"
def run(self, ctx: SpaceContext) -> SpaceContext:
for chunk in ctx.chunks:
chunk.text = redact(chunk.text)
return ctx # MUST return the same context
pipeline = DirectoryPipeline(llm="openai:gpt-4o-mini")
pipeline.insert(3, PiiRedactStage()) # lands after Chunk (index 2) and before Relate (which shifts from index 3 to 4) — and therefore before Enrich
space = pipeline.run("./docs", "./out")

Secrets themselves stay out of the config file: API keys come from environment variables such as INDX_LLM__API_KEY, and indx never logs or serializes them. See custom stages for the full redaction recipe.