Skip to content

Configuration Reference

indx.toml is the optional, version-controllable configuration file that binds each pipeline slot to a component. Every section and key is optional — anything you omit falls back to a documented default, so a knowledge space builds with no config file at all.

This page is the normative reference for the file format. For a task-oriented walkthrough (where to put the file, how to scaffold one, common recipes), see the configuration guide. For the list of built-in component names and their defaults, see registry and defaults.

indx loads ./indx.toml from the current directory automatically when present. Point at a different file with --config / -c:

Terminal window
indx ./docs --out ./ai-ready --config ./config/indx.toml

The file is parsed with the standard-library tomllib (Python 3.11+), so it follows ordinary TOML syntax.

A complete file with every recognised section and key, annotated with type and default:

[parser]
engine = "docling" # str. Parser name. Default: "docling".
[enrich]
llm = "openai:gpt-5-mini" # str. LLM name[:model] or "none". Default: "openai:gpt-5-mini".
vlm = "none" # str. VLM name or "none". Default: "none".
metadata = ["type", "topics", "tags", "summary"]
# list[str]. Which enrichments to produce.
# Default: ["type","topics","tags","summary"].
[embed]
model = "openai:text-embedding-3-small" # str. Embedder name. Default cloud embedder.
[store]
backend = "qdrant" # str. One of: qdrant | pgvector | chroma | lancedb | jsonl.
# Default: "qdrant".
[output]
format = ".indx" # str. One of: .indx | jsonl | langchain | llamaindex.
# Default: ".indx".

Each section corresponds to one swappable slot in the pipeline: [parser] binds stage 02 Parse, [enrich] binds stage 05 Enrich, and [embed]/[store]/[output] bind stage 06 Embed+Pack. The Walk, Chunk, and Relate stages are built-in and have no component slot to configure here.

Every recognised key, with its type, default, and allowed values:

SectionKeyTypeDefaultAllowed values
[parser]enginestringdoclingany registered parser name
[enrich]llmstringopenai:gpt-5-mini<name>[:model], none
[enrich]vlmstringnone<name>, none
[enrich]metadatalist[str]["type","topics","tags","summary"]any subset of those four
[embed]modelstringopenai:text-embedding-3-smallany registered embedder name
[store]backendstringqdrantqdrant, pgvector, chroma, lancedb, jsonl
[output]formatstring.indx.indx, jsonl, langchain, llamaindex

A few notes on values:

  • llm accepts an optional :model suffix that selects the model for the named adapter — e.g. openai:gpt-5-mini or ollama:qwen2.5. Set it to none to disable LLM enrichment entirely (equivalent to dropping the Enrich stage’s LLM work). See enrichment with LLM and VLM.
  • vlm defaults to none (disabled), keeping image-description latency and cost opt-in. Provide an adapter name to enable vision enrichment.
  • metadata controls which enrichments the Enrich stage produces. The four recognised values are type, topics, tags, and summary; list only the ones you want.
  • backend values qdrant, pgvector, chroma, and lancedb require their respective backend extras to be installed; jsonl is a zero-dependency fallback that ships in the core. See choosing a store.
  • format value .indx is the portable archive; jsonl is the zero-dependency writer; langchain and llamaindex emit framework-native structures. See output formats.

Adapters may read additional options from their own sub-table, named [<section>.<backend>]. These keys are passed verbatim to the adapter’s constructor and are otherwise opaque to the indx core — the core never validates or interprets them, so the available keys are defined by each adapter.

[store]
backend = "qdrant"
[store.qdrant]
url = "http://localhost:6333" # passed verbatim to the Qdrant adapter
# any other adapter-specific keys go here too

The same pattern applies to any slot whose adapter accepts extra configuration (for example a parser or embedder sub-table). Because these keys flow straight to the adapter, consult the relevant adapter’s documentation for the exact key names it understands. Adapter authors are expected to read backend-specific options from the matching indx.toml sub-table — see authoring a plugin and adding a backend.

For each component slot the effective value is resolved in this exact order, highest priority first:

explicit code argument / use() > CLI flag > indx.toml > documented default
PrioritySourceExample
1 (highest)Explicit object or name in code (constructor arg or .use())DirectoryPipeline(store="chroma") / pipeline.use(store=MyStore())
2CLI flagindx ./docs -o ./out --store chroma
3indx.toml[store] backend = "chroma"
4 (lowest)Documented defaultqdrant

Each slot is resolved independently. For instance, you can set [embed] model in indx.toml while overriding only the store on the command line with --store jsonl; the embedder still comes from the file, and the store comes from the flag. The SDK constructor arguments and use() sit above everything, so code-level choices always win.

See the CLI reference for the full list of override flags (--parser, --llm, --vlm, --embedder, --store, --format).

Secrets — API keys, tokens, passwords for cloud backends — never belong in indx.toml, which is meant to be committed to version control. Provide them through environment variables instead, using the nested INDX_ convention where double underscores (__) separate the section from the key:

Terminal window
export INDX_LLM__API_KEY="sk-..." # secret for the LLM slot
export INDX_STORE__API_KEY="..." # secret for the store slot
[enrich]
llm = "openai:gpt-5-mini"
# api_key is supplied via $INDX_LLM__API_KEY — never written in the file

This keeps committed configuration free of credentials while still letting cloud-backend opt-in paths receive what they need.

Note that the environment-variable prefix keys off the slot name, not the indx.toml section heading — so the LLM slot configured under [enrich] takes its secret from INDX_LLM__*, not INDX_ENRICH__*. The mapping is fixed:

Slotindx.toml sectionEnv-var prefix
parser[parser]INDX_PARSER__
llm[enrich]INDX_LLM__
vlm[enrich]INDX_VLM__
embedder[embed]INDX_EMBEDDER__
store[store]INDX_STORE__
output[output]INDX_OUTPUT__

The double underscore (__) then separates the slot prefix from the key — e.g. INDX_STORE__API_KEY supplies api_key to the store slot.

When a build completes, the resolved configuration (after applying all precedence rules) is recorded as a snapshot into the archive. It appears in two places:

  • index.json under metadata.config
  • the .indx archive’s manifest.json

This makes every knowledge space self-describing and auditable: a consumer can see exactly which parser, LLM, embedder, store, and output format produced it — important for reproducibility. Because some LLM providers are not bit-reproducible, the snapshot (including model name) is what lets you audit how a given space was built.