Configuration Reference
indx.toml is the optional, version-controllable configuration file that binds each pipeline slot to a component. Every section and key is optional — anything you omit falls back to a documented default, so a knowledge space builds with no config file at all.
This page is the normative reference for the file format. For a task-oriented walkthrough (where to put the file, how to scaffold one, common recipes), see the configuration guide. For the list of built-in component names and their defaults, see registry and defaults.
Where the file lives
Section titled “Where the file lives”indx loads ./indx.toml from the current directory automatically when present. Point at a different file with --config / -c:
indx ./docs --out ./ai-ready --config ./config/indx.tomlThe file is parsed with the standard-library tomllib (Python 3.11+), so it follows ordinary TOML syntax.
Annotated indx.toml
Section titled “Annotated indx.toml”A complete file with every recognised section and key, annotated with type and default:
[parser]engine = "docling" # str. Parser name. Default: "docling".
[enrich]llm = "openai:gpt-5-mini" # str. LLM name[:model] or "none". Default: "openai:gpt-5-mini".vlm = "none" # str. VLM name or "none". Default: "none".metadata = ["type", "topics", "tags", "summary"] # list[str]. Which enrichments to produce. # Default: ["type","topics","tags","summary"].
[embed]model = "openai:text-embedding-3-small" # str. Embedder name. Default cloud embedder.
[store]backend = "qdrant" # str. One of: qdrant | pgvector | chroma | lancedb | jsonl. # Default: "qdrant".
[output]format = ".indx" # str. One of: .indx | jsonl | langchain | llamaindex. # Default: ".indx".Each section corresponds to one swappable slot in the pipeline: [parser] binds stage 02 Parse, [enrich] binds stage 05 Enrich, and [embed]/[store]/[output] bind stage 06 Embed+Pack. The Walk, Chunk, and Relate stages are built-in and have no component slot to configure here.
Key table
Section titled “Key table”Every recognised key, with its type, default, and allowed values:
| Section | Key | Type | Default | Allowed values |
|---|---|---|---|---|
[parser] | engine | string | docling | any registered parser name |
[enrich] | llm | string | openai:gpt-5-mini | <name>[:model], none |
[enrich] | vlm | string | none | <name>, none |
[enrich] | metadata | list[str] | ["type","topics","tags","summary"] | any subset of those four |
[embed] | model | string | openai:text-embedding-3-small | any registered embedder name |
[store] | backend | string | qdrant | qdrant, pgvector, chroma, lancedb, jsonl |
[output] | format | string | .indx | .indx, jsonl, langchain, llamaindex |
A few notes on values:
llmaccepts an optional:modelsuffix that selects the model for the named adapter — e.g.openai:gpt-5-miniorollama:qwen2.5. Set it tononeto disable LLM enrichment entirely (equivalent to dropping the Enrich stage’s LLM work). See enrichment with LLM and VLM.vlmdefaults tonone(disabled), keeping image-description latency and cost opt-in. Provide an adapter name to enable vision enrichment.metadatacontrols which enrichments the Enrich stage produces. The four recognised values aretype,topics,tags, andsummary; list only the ones you want.backendvaluesqdrant,pgvector,chroma, andlancedbrequire their respective backend extras to be installed;jsonlis a zero-dependency fallback that ships in the core. See choosing a store.formatvalue.indxis the portable archive;jsonlis the zero-dependency writer;langchainandllamaindexemit framework-native structures. See output formats.
Backend-specific sub-tables
Section titled “Backend-specific sub-tables”Adapters may read additional options from their own sub-table, named [<section>.<backend>]. These keys are passed verbatim to the adapter’s constructor and are otherwise opaque to the indx core — the core never validates or interprets them, so the available keys are defined by each adapter.
[store]backend = "qdrant"
[store.qdrant]url = "http://localhost:6333" # passed verbatim to the Qdrant adapter# any other adapter-specific keys go here tooThe same pattern applies to any slot whose adapter accepts extra configuration (for example a parser or embedder sub-table). Because these keys flow straight to the adapter, consult the relevant adapter’s documentation for the exact key names it understands. Adapter authors are expected to read backend-specific options from the matching indx.toml sub-table — see authoring a plugin and adding a backend.
Precedence
Section titled “Precedence”For each component slot the effective value is resolved in this exact order, highest priority first:
explicit code argument / use() > CLI flag > indx.toml > documented default| Priority | Source | Example |
|---|---|---|
| 1 (highest) | Explicit object or name in code (constructor arg or .use()) | DirectoryPipeline(store="chroma") / pipeline.use(store=MyStore()) |
| 2 | CLI flag | indx ./docs -o ./out --store chroma |
| 3 | indx.toml | [store] backend = "chroma" |
| 4 (lowest) | Documented default | qdrant |
Each slot is resolved independently. For instance, you can set [embed] model in indx.toml while overriding only the store on the command line with --store jsonl; the embedder still comes from the file, and the store comes from the flag. The SDK constructor arguments and use() sit above everything, so code-level choices always win.
See the CLI reference for the full list of override flags (--parser, --llm, --vlm, --embedder, --store, --format).
Secrets via environment variables
Section titled “Secrets via environment variables”Secrets — API keys, tokens, passwords for cloud backends — never belong in indx.toml, which is meant to be committed to version control. Provide them through environment variables instead, using the nested INDX_ convention where double underscores (__) separate the section from the key:
export INDX_LLM__API_KEY="sk-..." # secret for the LLM slotexport INDX_STORE__API_KEY="..." # secret for the store slot[enrich]llm = "openai:gpt-5-mini"# api_key is supplied via $INDX_LLM__API_KEY — never written in the fileThis keeps committed configuration free of credentials while still letting cloud-backend opt-in paths receive what they need.
Note that the environment-variable prefix keys off the slot name, not the indx.toml section heading — so the LLM slot configured under [enrich] takes its secret from INDX_LLM__*, not INDX_ENRICH__*. The mapping is fixed:
| Slot | indx.toml section | Env-var prefix |
|---|---|---|
| parser | [parser] | INDX_PARSER__ |
| llm | [enrich] | INDX_LLM__ |
| vlm | [enrich] | INDX_VLM__ |
| embedder | [embed] | INDX_EMBEDDER__ |
| store | [store] | INDX_STORE__ |
| output | [output] | INDX_OUTPUT__ |
The double underscore (__) then separates the slot prefix from the key — e.g. INDX_STORE__API_KEY supplies api_key to the store slot.
Config snapshot in the manifest
Section titled “Config snapshot in the manifest”When a build completes, the resolved configuration (after applying all precedence rules) is recorded as a snapshot into the archive. It appears in two places:
index.jsonundermetadata.config- the
.indxarchive’smanifest.json
This makes every knowledge space self-describing and auditable: a consumer can see exactly which parser, LLM, embedder, store, and output format produced it — important for reproducibility. Because some LLM providers are not bit-reproducible, the snapshot (including model name) is what lets you audit how a given space was built.
Related references
Section titled “Related references”- Configuration guide — how-to and recipes
- Registry and defaults — component names and resolution
- CLI reference — override flags and exit codes
- The
.indxarchive andindex.json— where the config snapshot is stored