Skip to content

Quickstart

This is the sixty-second path: install indx, point it at a folder, and read the result. No configuration file is required to get something real — the defaults run a complete pipeline with cloud-backed model defaults and a first-class local profile when you need offline operation.

If you want the longer, narrated walkthrough instead, jump to Your first knowledge space.

  1. Install indx.

    The base package ships zero-dependency fallbacks (a plaintext parser, the none VLM, a jsonl store, and the .indx and jsonl writers), so a full run works even with nothing else installed. For the recommended cloud-backed stack, install Docling, OpenAI, and Qdrant. For the fully local stack — Docling parsing, an Ollama LLM, the bge-m3 embedder, and a Qdrant store — install the local extra.

    Terminal window
    pip install "indx[docling,openai,qdrant]"

    indx requires Python 3.11+ (supported on 3.11–3.13). See Installation for the full extras matrix and the extras reference.

  2. Build a knowledge space — zero config.

    Point indx at any directory (or a .zip) and choose an output folder. Unset components fall back to the documented defaults: parser docling, llm openai:gpt-5-mini, embedder openai:text-embedding-3-small, store qdrant, output .indx. Use indx[local] or explicit flags when the run must stay offline.

    The cloud defaults call OpenAI for enrichment and embeddings, so set your key first — the Local stack and Core-only paths need no key:

    Terminal window
    export OPENAI_API_KEY="sk-..."
    indx ./docs --out ./ai-ready

    indx streams one progress line per stage, then a summary:

    indx ./docs → ./ai-ready
    01 walk 128 files, 14 folders
    02 parse 128 ok, 0 skipped
    03 chunk 1042 chunks
    04 relate 380 relations
    05 enrich 128 documents (openai:gpt-5-mini)
    06 embed 1042 vectors → qdrant, sealed handbook.indx
    done: 1042 chunks, 128 docs, embed_dim=1536 (12.4s)
  3. Read the output tree.

    indx writes both the sealed, portable archive and an expanded layout beside it, so downstream tools can read either form:

    • Directoryai-ready/
      • handbook.indx the portable archive (a ZIP container)
      • index.json the knowledge graph
      • Directorychunks/ agent-readable chunks + per-chunk context
        • chunk_0000.json
        • chunk_0001.json
      • Directoryembeddings/ vectors + manifest
        • manifest.json
        • vectors.f32 contiguous little-endian float32 matrix

    The archive base name defaults to handbook (override with --name). The index.json file holds documents, chunks, and relations; embeddings live separately in embeddings/ and are never inlined into index.json. Full details are in the .indx archive reference.

  4. Inspect the archive.

    indx inspect prints space stats, a document-type histogram, and a sample of relations — a quick sanity check before you ship.

    Terminal window
    indx inspect ./ai-ready/handbook.indx

    Add --json for the full space.stats object, or --documents [type] to list documents filtered by type. More in Inspect and query.

  5. Query it semantically.

    Ask a question in natural language; indx embeds the query and returns the top-k matching chunks with their source path and neighbor ids.

    Terminal window
    indx query ./ai-ready/handbook.indx "refund policy for enterprise"

    Useful flags: -k to change the number of hits (default 5), --type to restrict to a document type, and --json to emit a SearchHit[] array (each with .chunk, .neighbors, and .source).

The SDK is the CLI with handles: the returned KnowledgeSpace is a first-class object you can explore and query in process.

from indx import DirectoryPipeline
space = DirectoryPipeline().run("./docs", "./ai-ready") # zero-config defaults
print(space.stats) # documents, chunks, relations, embed_dim…
for hit in space.search("refund policy for enterprise", k=5):
print(hit.score, hit.source.path, hit.chunk.text)

DirectoryPipeline(...).run(src, out) returns the KnowledgeSpace; space.stats gives aggregate counts, space.documents(type=...) filters the document graph, and space.search(query, k=5) returns ranked SearchHit objects with resolved neighbor chunks. To reopen a sealed archive later, use KnowledgeSpace.load("./ai-ready/handbook.indx").

See the SDK reference for the complete public surface and the data models reference for every field.

Under the hood, DirectoryPipeline ran six ordered, replaceable stages that share a single SpaceContext: 01 Walk → 02 Parse → 03 Chunk → 04 Relate → 05 Enrich → 06 Embed+Pack. Every stage is an interface, not a hard-coded implementation — which is what lets you swap any component or insert your own stage later. Read Pipeline and stages for the model, or Bring your own stack to start customizing.