Configuring indx (indx.toml)
indx.toml is the optional, declarative way to pin which components your pipeline uses and how they behave. It is never required: the documented defaults make a full run work with cloud-backed model defaults, while the local profile remains available when the run must stay offline. Reach for a config file when you want a build to be reproducible, shareable, and explicit about the stack it ran on.
Configuration is optional
Section titled “Configuration is optional”Every key in indx.toml has a documented default, so a bare run resolves a complete stack on its own:
indx ./docs --out ./ai-readyThis uses parser docling, llm openai:gpt-5-mini, vlm none, embedder openai:text-embedding-3-small, store qdrant, and output .indx — all resolvable without a config file once the selected extras are installed. A config file simply lets you write those choices down, override individual slots, and pass backend-specific options that the CLI flags don’t cover.
Precedence
Section titled “Precedence”For each component slot, indx resolves the effective value from four layers, highest priority first:
explicit code argument / use() > CLI flag > indx.toml > documented default| Layer | Example | Wins over |
|---|---|---|
Explicit code arg / use() | DirectoryPipeline(store="chroma") or .use(store="chroma") | everything below |
| CLI flag | indx ./docs -o ./out --store chroma | indx.toml and the default |
indx.toml | [store] backend = "chroma" | the documented default |
| Documented default | store = "qdrant" | — |
A name that doesn’t resolve to a registered component (in any layer) is a fatal error raised before any stage runs, so misconfiguration fails fast rather than mid-build. See errors and exit codes — a bad config file or unknown component name exits with code 3.
A complete annotated indx.toml
Section titled “A complete annotated indx.toml”Every section and key below is optional; omitted keys fall back to the documented default. The sections map one-to-one onto the pipeline’s component slots.
[parser]engine = "docling" # str. Parser name. Default: "docling".
[enrich]llm = "openai:gpt-5-mini" # str. LLM name[:model] or "none". Default: "openai:gpt-5-mini".vlm = "none" # str. VLM name or "none". Default: "none".metadata = ["type", "topics", "tags", "summary"] # list[str]. Which enrichments to produce. # Default: ["type", "topics", "tags", "summary"].
[embed]model = "openai:text-embedding-3-small" # str. Embedder name. Default cloud embedder.
[store]backend = "qdrant" # str. One of: qdrant | pgvector | chroma | lancedb | jsonl. # Default: "qdrant".
[output]format = ".indx" # str. One of: .indx | jsonl | langchain | llamaindex. # Default: ".indx".Key reference
Section titled “Key reference”| Section | Key | Type | Default | Allowed values |
|---|---|---|---|---|
[parser] | engine | string | docling | any registered parser name |
[enrich] | llm | string | openai:gpt-5-mini | <name>[:model], none |
[enrich] | vlm | string | none | <name>, none |
[enrich] | metadata | list[str] | ["type","topics","tags","summary"] | subset of those four |
[embed] | model | string | openai:text-embedding-3-small | any registered embedder name |
[store] | backend | string | qdrant | qdrant, pgvector, chroma, lancedb, jsonl |
[output] | format | string | .indx | .indx, jsonl, langchain, llamaindex |
For the exhaustive table with every key, type, and constraint, see the configuration reference.
Secrets come from the environment, never the file
Section titled “Secrets come from the environment, never the file”indx.toml is meant to be committed and shared, so it must not contain credentials. API keys and other secrets are supplied through environment variables and layered in by pydantic-settings, never written into the file.
export INDX_LLM__API_KEY="sk-..." # double underscore = nested settingindx ./docs --out ./ai-ready --llm openai:gpt-5-mini[enrich]llm = "openai:gpt-5-mini" # the model choice lives here…# …the api_key comes from $INDX_LLM__API_KEY, not this file# pick a different model with the :model suffix, e.g. "openai:gpt-4o"The env prefix is keyed on the slot name, not the TOML section heading — so the LLM’s key is INDX_LLM__… even though the LLM is configured under [enrich]. The double underscore separates the slot from the nested setting (INDX_<SLOT>__<SETTING>):
| Slot | TOML location | Env prefix |
|---|---|---|
| parser | [parser] engine | INDX_PARSER__… |
| llm | [enrich] llm | INDX_LLM__… |
| vlm | [enrich] vlm | INDX_VLM__… |
| embedder | [embed] model | INDX_EMBEDDER__… |
| store | [store] backend | INDX_STORE__… |
Backend-specific sub-tables
Section titled “Backend-specific sub-tables”Most adapters accept options beyond the simple slot name. Those live in a sub-table named after the backend (e.g. [store.qdrant]). indx passes the keys in such a sub-table verbatim to the adapter constructor — they are opaque to the core, so each backend documents its own keys.
[store]backend = "qdrant"
[store.qdrant]url = "http://localhost:6333" # passed straight through to the Qdrant adapter# collection = "handbook" # any further keys are adapter-definedThe same pattern applies to other slots and to third-party plugins — once a plugin is installed, its name works in backend/engine/etc. and its sub-table carries its options. See Bring your own stack and authoring a plugin.
Config discovery
Section titled “Config discovery”If you don’t pass --config, indx auto-loads ./indx.toml from the current directory when it exists. To use a different file, point the CLI or SDK at it explicitly:
indx ./docs --out ./ai-ready --config ./configs/prod.indx.tomlfrom indx import DirectoryPipeline
space = DirectoryPipeline(config="./configs/prod.indx.toml").run("./docs", "./ai-ready")In the SDK, config accepts either a path string or an IndxConfig object, and component arguments passed to the constructor or use() still override anything the file says (per the precedence rules above).
The resolved config is recorded for reproducibility
Section titled “The resolved config is recorded for reproducibility”After all four layers are merged, indx writes a snapshot of the resolved configuration into the output — both index.json.metadata.config and the archive’s manifest.json. That snapshot pins the exact parser, llm/model, embedder (with dim), store, and output format that produced the space, so a .indx archive is self-describing and a build is auditable long after the fact.
{ "metadata": { "tool_version": "indx 0.4.2", "embedder": { "name": "bge-m3", "dim": 1024 }, "config": { "...": "snapshot of resolved indx.toml" } }}Combined with deterministic ids and temperature=0.0 enrichment, this is what makes re-running over unchanged input reproducible. See Reproducibility for the full guarantees.
Why TOML, and why init never rewrites it
Section titled “Why TOML, and why init never rewrites it”indx parses indx.toml with the stdlib tomllib (available since Python 3.11, indx’s floor), so no extra parsing dependency is needed. tomllib is read-only — it can parse TOML but cannot write it. As a result, config scaffolding is generated from a small template and indx never round-trips (re-serializes) your file: it only ever reads the config you author, and only generates a fresh template, so your comments and formatting are never clobbered.
See also
Section titled “See also”- Configuration reference — the complete key table and types.
- Registry & defaults — every component name and its default.
- Reproducibility — how the resolved config snapshot guarantees stable builds.
- The CLI reference — flags that override
indx.toml. - Bring your own stack — swapping components and adding plugins.