Quickstart
This is the sixty-second path: install indx, point it at a folder, and read the result. No configuration file is required to get something real — the defaults run a complete pipeline with cloud-backed model defaults and a first-class local profile when you need offline operation.
If you want the longer, narrated walkthrough instead, jump to Your first knowledge space.
The 60-second path
Section titled “The 60-second path”-
Install indx.
The base package ships zero-dependency fallbacks (a
plaintextparser, thenoneVLM, ajsonlstore, and the.indxandjsonlwriters), so a full run works even with nothing else installed. For the recommended cloud-backed stack, install Docling, OpenAI, and Qdrant. For the fully local stack — Docling parsing, an Ollama LLM, thebge-m3embedder, and a Qdrant store — install thelocalextra.Terminal window pip install "indx[docling,openai,qdrant]"Terminal window pip install "indx[local]"Terminal window pip install indxNo API key or network needed — uses the plaintext parser, jsonl store, and .indx writer.
Terminal window # Add only the backends you needpip install "indx[docling,qdrant]"indx requires Python 3.11+ (supported on 3.11–3.13). See Installation for the full extras matrix and the extras reference.
-
Build a knowledge space — zero config.
Point indx at any directory (or a
.zip) and choose an output folder. Unset components fall back to the documented defaults: parserdocling, llmopenai:gpt-5-mini, embedderopenai:text-embedding-3-small, storeqdrant, output.indx. Useindx[local]or explicit flags when the run must stay offline.The cloud defaults call OpenAI for enrichment and embeddings, so set your key first — the Local stack and Core-only paths need no key:
Terminal window export OPENAI_API_KEY="sk-..."indx ./docs --out ./ai-readyindx streams one progress line per stage, then a summary:
indx ./docs → ./ai-ready01 walk 128 files, 14 folders02 parse 128 ok, 0 skipped03 chunk 1042 chunks04 relate 380 relations05 enrich 128 documents (openai:gpt-5-mini)06 embed 1042 vectors → qdrant, sealed handbook.indxdone: 1042 chunks, 128 docs, embed_dim=1536 (12.4s) -
Read the output tree.
indx writes both the sealed, portable archive and an expanded layout beside it, so downstream tools can read either form:
Directoryai-ready/
- handbook.indx the portable archive (a ZIP container)
- index.json the knowledge graph
Directorychunks/ agent-readable chunks + per-chunk context
- chunk_0000.json
- chunk_0001.json
- …
Directoryembeddings/ vectors + manifest
- manifest.json
- vectors.f32 contiguous little-endian float32 matrix
The archive base name defaults to
handbook(override with--name). Theindex.jsonfile holds documents, chunks, and relations; embeddings live separately inembeddings/and are never inlined intoindex.json. Full details are in the .indx archive reference. -
Inspect the archive.
indx inspectprints space stats, a document-type histogram, and a sample of relations — a quick sanity check before you ship.Terminal window indx inspect ./ai-ready/handbook.indxAdd
--jsonfor the fullspace.statsobject, or--documents [type]to list documents filtered by type. More in Inspect and query. -
Query it semantically.
Ask a question in natural language; indx embeds the query and returns the top-
kmatching chunks with their source path and neighbor ids.Terminal window indx query ./ai-ready/handbook.indx "refund policy for enterprise"Useful flags:
-kto change the number of hits (default5),--typeto restrict to a document type, and--jsonto emit aSearchHit[]array (each with.chunk,.neighbors, and.source).
The same run from Python
Section titled “The same run from Python”The SDK is the CLI with handles: the returned KnowledgeSpace is a first-class object you can explore and query in process.
from indx import DirectoryPipeline
space = DirectoryPipeline().run("./docs", "./ai-ready") # zero-config defaultsprint(space.stats) # documents, chunks, relations, embed_dim…
for hit in space.search("refund policy for enterprise", k=5): print(hit.score, hit.source.path, hit.chunk.text)DirectoryPipeline(...).run(src, out) returns the KnowledgeSpace; space.stats gives aggregate counts, space.documents(type=...) filters the document graph, and space.search(query, k=5) returns ranked SearchHit objects with resolved neighbor chunks. To reopen a sealed archive later, use KnowledgeSpace.load("./ai-ready/handbook.indx").
See the SDK reference for the complete public surface and the data models reference for every field.
What just happened
Section titled “What just happened”Under the hood, DirectoryPipeline ran six ordered, replaceable stages that share a single SpaceContext: 01 Walk → 02 Parse → 03 Chunk → 04 Relate → 05 Enrich → 06 Embed+Pack. Every stage is an interface, not a hard-coded implementation — which is what lets you swap any component or insert your own stage later. Read Pipeline and stages for the model, or Bring your own stack to start customizing.