Skip to content

Registry & Defaults

Every swappable slot in indx — parser, LLM, VLM, embedder, store, output writer, and even pipeline stages — is bound by a short name. This page is the authoritative reference for how those names resolve to concrete classes, what registries hold them, how third-party plugins join in, and exactly which defaults you get out of the box.

For the protocol contracts each registered class must satisfy, see the protocols reference. To ship your own registrable component, see Authoring a plugin.

Each sub-package (indx.parsers, indx.llm, indx.embed, indx.store, indx.output, …) keeps a registry that maps a short name to a class. A component can be bound two ways:

  • By object — you pass an instance that structurally satisfies the slot’s protocol.
  • By name — you pass a string (in indx.toml, on the CLI, or to use()), and the registry resolves it to a class and constructs it.

Names may carry an optional :model suffix. The part before the colon selects the adapter class; the part after selects a model that adapter loads. For example, openai:gpt-5-mini picks OpenAILLM with gpt-5-mini; ollama:qwen2.5 picks OllamaLLM with the local qwen2.5 model.

Each registry exposes the same small resolver shape (from technical-spec §4.7):

indx/llm/__init__.py
REGISTRY = {"openai": OpenAILLM, "ollama": OllamaLLM, "none": NullLLM}
def resolve(name: str) -> LLM:
base, _, model = name.partition(":")
cls = REGISTRY[base] # raises ConfigError (exit 3) if the name is unknown
return cls(model=model or None)

The base name selects the class; the optional model suffix is passed to the constructor. Slots without a model concept (such as store) ignore the suffix.

For each slot, the effective value is chosen by precedence — the first source that supplies a value wins:

explicit code argument / use() or CLI flag > indx.toml > documented default

Concretely: an instance or name passed in code (constructor argument or use()) and a CLI flag override the configuration file, which overrides the built-in default. See Configuration for the full precedence rules across all keys.

Each row lists the slot, the name strings the built-in registry accepts, and the default class indx constructs when nothing else is specified.

SlotName stringsDefault class
parserdocling, plaintextDoclingParser
llmopenai:<model>, ollama:<model>, noneOpenAILLM (gpt-5-mini)
vlmnone, <adapter>NullVLM
embedderopenai:<model>, bge-m3, e5, cohereOpenAIEmbedder (text-embedding-3-small)
storeqdrant, pgvector, chroma, lancedb, jsonlQdrantStore
output.indx, jsonl, langchain, llamaindexIndxWriter

Notes on individual slots:

  • llm — the model suffix pins the model for cloud and local adapters. The documented default is openai:gpt-5-mini; none selects a null LLM so the Enrich stage produces no LLM-derived metadata. Local and alternate adapters (ollama, anthropic, azure, vllm) become available with the matching extra.
  • vlm — defaults to none (vision enrichment off). Adapter names like qwen-vl, gpt-4o, or a local served endpoint enable it.
  • storejsonl is the zero-dependency fallback that ships in core; the others require their client extra. See choosing a store.
  • output.indx is the sealed portable archive; jsonl is the zero-dependency writer; langchain and llamaindex emit framework-native objects. See output formats.

Third-party packages extend indx without modifying it by advertising Python packaging entry points. At runtime the registry discovers them via importlib.metadata and merges them into the per-slot registries — so installing a plugin package is enough to make its name usable anywhere a built-in is.

Entry-point groupSlotProtocol
indx.parsersparserParser
indx.llmsllmLLM
indx.vlmsvlmVLM
indx.embeddersembedderEmbedder
indx.storesstoreStore
indx.outputsoutputOutputWriter
indx.stagespipelineStage

A plugin registers under the relevant group in its own pyproject.toml:

# pyproject.toml of a third-party package "indx-weaviate"
[project.entry-points."indx.stores"]
weaviate = "indx_weaviate.store:WeaviateStore"

After pip install indx-weaviate, the registry finds the entry point on first resolution, validates that the class satisfies the Store protocol, and registers it under the name weaviate. The name then works everywhere a built-in does:

[store]
backend = "weaviate"
DirectoryPipeline(store="weaviate")

The complete set of defaults applied when nothing is overridden in code, on the CLI, or in indx.toml:

SlotDefault nameDefault classNotes
parserdoclingDoclingParserHigh-fidelity local parser; extra indx[docling].
llmopenai:gpt-5-miniOpenAILLM (gpt-5-mini)Cloud enrichment; none to disable, ollama:qwen2.5 for local.
vlmnoneNullVLMVision enrichment off by default.
embedderopenai:text-embedding-3-smallOpenAIEmbedderVector dimensionality 1536.
storeqdrantQdrantStoreEmbedded or server; jsonl is the zero-dep fallback.
output.indxIndxWriterSealed portable ZIP archive.

These map directly to the indx.toml keys [parser].engine, [enrich].llm, [enrich].vlm, [embed].model, [store].backend, and [output].format. The full configuration schema and per-backend sub-tables are documented in the configuration reference.

The resolved choices are recorded in the produced archive’s manifest and in index.json metadata, so a knowledge space is self-describing — a consumer can confirm exactly which embedder (and dimensionality) produced its vectors before querying. See the .indx archive and index.json references.