Skip to content

CLI Reference

The indx command line exposes three subcommands — build, inspect, and query — over the same pipeline and data model as the SDK. This page documents every flag, the stdout shapes, and all exit codes.

Install with pip install indx (Python 3.11–3.13). For the programmatic equivalent of every command here, see the SDK reference.

Terminal window
indx <dir> --out <dir> [--config indx.toml] [options] # build a knowledge space
indx inspect <archive.indx> [options] # summarize an archive
indx query <archive.indx> "<text>" [options] # semantic search

Process a directory (or a .zip) through the six-stage pipeline and write an AI-ready knowledge space to --out. The output directory receives the portable handbook.indx archive plus the expanded index.json, chunks/, and embeddings/ layout.

Build is the implicit default subcommand — there is no indx build keyword. Passing a directory (or .zip) as the first positional triggers a build; inspect and query are the only named subcommands.

FlagTypeDefaultDescription
<dir> (positional)path— (required)Directory or .zip to process.
--out, -opath— (required)Output directory; receives handbook.indx, index.json, chunks/, embeddings/.
--config, -cpath./indx.toml if presentConfiguration file. See the configuration reference.
--parserstrdoclingOverride the parser engine.
--llmstropenai:gpt-5-miniOverride the enrichment LLM (none to disable, ollama:qwen2.5 for local).
--vlmstrnoneOverride the vision model.
--embedderstropenai:text-embedding-3-smallOverride the embedder (bge-m3 for local).
--storestrqdrantOverride the vector store backend.
--formatstr.indxOutput writer: .indx, jsonl, langchain, or llamaindex.
--namestrhandbookArchive base name (produces handbook.indx).
--strictflagoffPromote per-item skips to fatal failures.
--resumeflagoffReuse cached stage outputs for unchanged files and config.
--jobs, -jintCPU countParallel workers for parse/embed.
--no-embedflagoffSkip stage 06 vectorization (produce a graph-only space).
--quiet / --verboseflagnormalDecrease / increase log verbosity.

By default the build prints one progress line per stage, then a summary:

indx ./docs → ./ai-ready
01 walk 128 files, 14 folders
02 parse 128 ok, 0 skipped
03 chunk 1042 chunks
04 relate 380 relations
05 enrich 128 documents (openai:gpt-5-mini)
06 embed 1042 vectors → qdrant, sealed handbook.indx
done: 1042 chunks, 128 docs, embed_dim=1536 (12.4s)

--quiet suppresses the per-stage lines (the summary still prints); --verbose adds detail such as per-stage cache hits/misses when --resume is active.

SDK equivalent:

from indx import DirectoryPipeline
space = DirectoryPipeline(
parser="docling",
llm="openai:gpt-5-mini",
embedder="openai:text-embedding-3-small",
store="qdrant",
).run("./docs", "./ai-ready")

The --strict flag corresponds to strict=True in the SDK; --no-embed corresponds to dropping the embed-pack stage (pipeline.drop("embed-pack")). Full details are in the SDK reference.

Summarize a sealed .indx archive without re-running the pipeline. By default it prints space stats, a document-type histogram, and a sample of relations.

FlagTypeDefaultDescription
<archive.indx> (positional)path— (required)The .indx archive to inspect.
--jsonflagoffEmit the full space.stats object as JSON instead of the human-readable summary.
--documents [type]str (optional)List documents, optionally filtered by detected type.

The --json output mirrors the SpaceStats model — documents, chunks, relations, embeddings, embed_dim, the per-type types histogram, and bytes_source. See data models for field meanings.

By default inspect prints the space stats, a document-type histogram, and a sample of relations:

handbook.indx (indx 1.0, produced by indx 0.4.2)
documents 128 chunks 1042 relations 380 embed_dim 1536
types policy 41 guide 33 reference 29 faq 25
relations (sample)
chunk:0a1f → chunk:9c3e follows
chunk:7b22 → chunk:1d80 references
doc:contracts → doc:terms cross-references

With --documents [type] each row lists the document id, detected type, source path, and chunk count; passing a type filters the listing to that detected type.

SDK equivalent: inspect reads the KnowledgeSpace you get from KnowledgeSpace.load("./ai-ready/handbook.indx")space.stats for the summary and space.documents(type=...) for the document listing.

Run a semantic search against a sealed archive and return the most similar chunks. The query text is embedded with the same embedder pinned in the archive manifest, guaranteeing query-time compatibility.

FlagTypeDefaultDescription
<archive.indx> (positional)path— (required)The .indx archive to search.
"<text>" (positional)str— (required)The query string.
-kint5Number of hits to return.
--typestrRestrict results to a single document type.
--jsonflagoffEmit the results as a SearchHit[] JSON array (including .chunk, .neighbors, and .source).

Default output is human-readable: for each hit, the rank, similarity score, source path, and chunk text along with its neighbor chunk ids (the context window). With --json, each element is a serialized SearchHit carrying the matched chunk, its score, and resolved neighbor chunks.

Terminal window
indx query ./ai-ready/handbook.indx "how long is data retained?" -k 3 --type policy

SDK equivalent:

from indx import KnowledgeSpace
space = KnowledgeSpace.load("./ai-ready/handbook.indx")
for hit in space.search("how long is data retained?", k=3):
print(hit.score, hit.source.path)
print(hit.chunk.text)

-k maps to the k argument of space.search(query, k=...). See the SDK reference.

Every command returns one of these process exit codes:

CodeMeaning
0Success.
1Fatal pipeline/runtime error (including a --strict skip promoted to fatal).
2Usage error (bad flags or arguments).
3Configuration error (invalid indx.toml or an unknown component name).
4Archive error (missing, corrupt, or incompatible .indx).