Quickstart

This is the sixty-second path: install indx, point it at a folder, and read the result. No configuration file is required to get something real — the defaults run a complete pipeline with cloud-backed model defaults and a first-class local profile when you need offline operation.

If you want the longer, narrated walkthrough instead, jump to Your first knowledge space.

The 60-second path

Install indx.

The base package ships zero-dependency fallbacks (a plaintext parser, the none VLM, a jsonl store, and the .indx and jsonl writers), so a full run works even with nothing else installed. For the recommended cloud-backed stack, install Docling, OpenAI, and Qdrant. For the fully local stack — Docling parsing, an Ollama LLM, the bge-m3 embedder, and a Qdrant store — install the local extra.
Terminal window
pip install "indx[docling,openai,qdrant]"
Terminal window
pip install "indx[local]"
Terminal window
pip install indx
No API key or network needed — uses the plaintext parser, jsonl store, and .indx writer.
Terminal window
# Add only the backends you need pip install "indx[docling,qdrant]"
indx requires Python 3.11+ (supported on 3.11–3.13). See Installation for the full extras matrix and the extras reference.
Build a knowledge space — zero config.

Point indx at any directory (or a .zip) and choose an output folder. Unset components fall back to the documented defaults: parser docling, llm openai:gpt-5-mini, embedder openai:text-embedding-3-small, store qdrant, output .indx. Use indx[local] or explicit flags when the run must stay offline.

The cloud defaults call OpenAI for enrichment and embeddings, so set your key first — the Local stack and Core-only paths need no key:
Terminal window
```
export OPENAI_API_KEY="sk-..."
indx ./docs --out ./ai-ready
```
indx streams one progress line per stage, then a summary:
```
indx ./docs → ./ai-ready
  01 walk    128 files, 14 folders
  02 parse   128 ok, 0 skipped
  03 chunk   1042 chunks
  04 relate  380 relations
  05 enrich  128 documents (openai:gpt-5-mini)
  06 embed   1042 vectors → qdrant, sealed handbook.indx
done: 1042 chunks, 128 docs, embed_dim=1536  (12.4s)
```
Add --llm none to skip enrichment, or --no-embed to skip vectorization and produce a graph-only space. Both still seal a valid archive. See the enrichment guide.
Read the output tree.

indx writes both the sealed, portable archive and an expanded layout beside it, so downstream tools can read either form:
- Directoryai-ready/
  - handbook.indx the portable archive (a ZIP container)
  - index.json the knowledge graph
  - Directorychunks/ agent-readable chunks + per-chunk context
    chunk_0000.json
    chunk_0001.json
    …
  - Directoryembeddings/ vectors + manifest
    manifest.json
    vectors.f32 contiguous little-endian float32 matrix
The archive base name defaults to handbook (override with --name). The index.json file holds documents, chunks, and relations; embeddings live separately in embeddings/ and are never inlined into index.json. Full details are in the .indx archive reference.

The .indx archive always packs embeddings/vectors.f32 regardless of store — the → qdrant in the summary is just where vectors land at write time. The default qdrant runs embedded, so no separate server is needed for a local build.
Inspect the archive.

indx inspect prints space stats, a document-type histogram, and a sample of relations — a quick sanity check before you ship.
Terminal window
```
indx inspect ./ai-ready/handbook.indx
```
Add --json for the full space.stats object, or --documents [type] to list documents filtered by type. More in Inspect and query.
Query it semantically.

Ask a question in natural language; indx embeds the query and returns the top-k matching chunks with their source path and neighbor ids.
Terminal window
```
indx query ./ai-ready/handbook.indx "refund policy for enterprise"
```
Useful flags: -k to change the number of hits (default 5), --type to restrict to a document type, and --json to emit a SearchHit[] array (each with .chunk, .neighbors, and .source).

The same run from Python

The SDK is the CLI with handles: the returned KnowledgeSpace is a first-class object you can explore and query in process.

from indx import DirectoryPipeline

space = DirectoryPipeline().run("./docs", "./ai-ready")   # zero-config defaults
print(space.stats)                                         # documents, chunks, relations, embed_dim…

for hit in space.search("refund policy for enterprise", k=5):
    print(hit.score, hit.source.path, hit.chunk.text)

DirectoryPipeline(...).run(src, out) returns the KnowledgeSpace; space.stats gives aggregate counts, space.documents(type=...) filters the document graph, and space.search(query, k=5) returns ranked SearchHit objects with resolved neighbor chunks. To reopen a sealed archive later, use KnowledgeSpace.load("./ai-ready/handbook.indx").

See the SDK reference for the complete public surface and the data models reference for every field.

What just happened

Under the hood, DirectoryPipeline ran six ordered, replaceable stages that share a single SpaceContext: 01 Walk → 02 Parse → 03 Chunk → 04 Relate → 05 Enrich → 06 Embed+Pack. Every stage is an interface, not a hard-coded implementation — which is what lets you swap any component or insert your own stage later. Read Pipeline and stages for the model, or Bring your own stack to start customizing.

Next steps

Your first knowledge space A deeper, narrated tutorial — sample directory, stage-by-stage output, and what to do with the result.

Configuration Describe your stack once in indx.toml and reuse it across runs and machines.

CLI reference Every command, flag, and exit code for build, inspect, and query.