Configuration Reference

indx.toml is the optional, version-controllable configuration file that binds each pipeline slot to a component. Every section and key is optional — anything you omit falls back to a documented default, so a knowledge space builds with no config file at all.

This page is the normative reference for the file format. For a task-oriented walkthrough (where to put the file, how to scaffold one, common recipes), see the configuration guide. For the list of built-in component names and their defaults, see registry and defaults.

Where the file lives

indx loads ./indx.toml from the current directory automatically when present. Point at a different file with --config / -c:

indx ./docs --out ./ai-ready --config ./config/indx.toml

The file is parsed with the standard-library tomllib (Python 3.11+), so it follows ordinary TOML syntax.

Annotated `indx.toml`

A complete file with every recognised section and key, annotated with type and default:

[parser]
engine = "docling"            # str. Parser name. Default: "docling".

[enrich]
llm      = "openai:gpt-5-mini" # str. LLM name[:model] or "none". Default: "openai:gpt-5-mini".
vlm      = "none"             # str. VLM name or "none". Default: "none".
metadata = ["type", "topics", "tags", "summary"]
                              # list[str]. Which enrichments to produce.
                              # Default: ["type","topics","tags","summary"].

[embed]
model = "openai:text-embedding-3-small" # str. Embedder name. Default cloud embedder.

[store]
backend = "qdrant"            # str. One of: qdrant | pgvector | chroma | lancedb | jsonl.
                              # Default: "qdrant".

[output]
format = ".indx"             # str. One of: .indx | jsonl | langchain | llamaindex.
                              # Default: ".indx".

Each section corresponds to one swappable slot in the pipeline: [parser] binds stage 02 Parse, [enrich] binds stage 05 Enrich, and [embed]/[store]/[output] bind stage 06 Embed+Pack. The Walk, Chunk, and Relate stages are built-in and have no component slot to configure here.

Key table

Every recognised key, with its type, default, and allowed values:

Section	Key	Type	Default	Allowed values
`[parser]`	`engine`	string	`docling`	any registered parser name
`[enrich]`	`llm`	string	`openai:gpt-5-mini`	`<name>[:model]`, `none`
`[enrich]`	`vlm`	string	`none`	`<name>`, `none`
`[enrich]`	`metadata`	`list[str]`	`["type","topics","tags","summary"]`	any subset of those four
`[embed]`	`model`	string	`openai:text-embedding-3-small`	any registered embedder name
`[store]`	`backend`	string	`qdrant`	`qdrant`, `pgvector`, `chroma`, `lancedb`, `jsonl`
`[output]`	`format`	string	`.indx`	`.indx`, `jsonl`, `langchain`, `llamaindex`

A few notes on values:

llm accepts an optional :model suffix that selects the model for the named adapter — e.g. openai:gpt-5-mini or ollama:qwen2.5. Set it to none to disable LLM enrichment entirely (equivalent to dropping the Enrich stage’s LLM work). See enrichment with LLM and VLM.
vlm defaults to none (disabled), keeping image-description latency and cost opt-in. Provide an adapter name to enable vision enrichment.
metadata controls which enrichments the Enrich stage produces. The four recognised values are type, topics, tags, and summary; list only the ones you want.
backend values qdrant, pgvector, chroma, and lancedb require their respective backend extras to be installed; jsonl is a zero-dependency fallback that ships in the core. See choosing a store.
format value .indx is the portable archive; jsonl is the zero-dependency writer; langchain and llamaindex emit framework-native structures. See output formats.

Backend-specific sub-tables

Adapters may read additional options from their own sub-table, named [<section>.<backend>]. These keys are passed verbatim to the adapter’s constructor and are otherwise opaque to the indx core — the core never validates or interprets them, so the available keys are defined by each adapter.

[store]
backend = "qdrant"

[store.qdrant]
url = "http://localhost:6333"   # passed verbatim to the Qdrant adapter
# any other adapter-specific keys go here too

The same pattern applies to any slot whose adapter accepts extra configuration (for example a parser or embedder sub-table). Because these keys flow straight to the adapter, consult the relevant adapter’s documentation for the exact key names it understands. Adapter authors are expected to read backend-specific options from the matching indx.toml sub-table — see authoring a plugin and adding a backend.

Precedence

For each component slot the effective value is resolved in this exact order, highest priority first:

explicit code argument / use()   >   CLI flag   >   indx.toml   >   documented default

Priority	Source	Example
1 (highest)	Explicit object or name in code (constructor arg or `.use()`)	`DirectoryPipeline(store="chroma")` / `pipeline.use(store=MyStore())`
2	CLI flag	`indx ./docs -o ./out --store chroma`
3	`indx.toml`	`[store]` `backend = "chroma"`
4 (lowest)	Documented default	`qdrant`

Each slot is resolved independently. For instance, you can set [embed] model in indx.toml while overriding only the store on the command line with --store jsonl; the embedder still comes from the file, and the store comes from the flag. The SDK constructor arguments and use() sit above everything, so code-level choices always win.

See the CLI reference for the full list of override flags (--parser, --llm, --vlm, --embedder, --store, --format).

Secrets via environment variables

Secrets — API keys, tokens, passwords for cloud backends — never belong in indx.toml, which is meant to be committed to version control. Provide them through environment variables instead, using the nested INDX_ convention where double underscores (__) separate the section from the key:

export INDX_LLM__API_KEY="sk-..."      # secret for the LLM slot
export INDX_STORE__API_KEY="..."       # secret for the store slot

[enrich]
llm = "openai:gpt-5-mini"
# api_key is supplied via $INDX_LLM__API_KEY — never written in the file

This keeps committed configuration free of credentials while still letting cloud-backend opt-in paths receive what they need.

Note that the environment-variable prefix keys off the slot name, not the indx.toml section heading — so the LLM slot configured under [enrich] takes its secret from INDX_LLM__*, not INDX_ENRICH__*. The mapping is fixed:

Slot	`indx.toml` section	Env-var prefix
parser	`[parser]`	`INDX_PARSER__`
llm	`[enrich]`	`INDX_LLM__`
vlm	`[enrich]`	`INDX_VLM__`
embedder	`[embed]`	`INDX_EMBEDDER__`
store	`[store]`	`INDX_STORE__`
output	`[output]`	`INDX_OUTPUT__`

The double underscore (__) then separates the slot prefix from the key — e.g. INDX_STORE__API_KEY supplies api_key to the store slot.

Config snapshot in the manifest

When a build completes, the resolved configuration (after applying all precedence rules) is recorded as a snapshot into the archive. It appears in two places:

index.json under metadata.config
the .indx archive’s manifest.json

This makes every knowledge space self-describing and auditable: a consumer can see exactly which parser, LLM, embedder, store, and output format produced it — important for reproducibility. Because some LLM providers are not bit-reproducible, the snapshot (including model name) is what lets you audit how a given space was built.

Configuration guide — how-to and recipes
Registry and defaults — component names and resolution
CLI reference — override flags and exit codes
The .indx archive and index.json — where the config snapshot is stored