Skip to content

Roadmap

This page sketches where indx is headed, from the MVP through v1.0 and beyond. Treat dates and phases as indicative direction, not commitments — they help you plan, but priorities shift as the project and its community grow.

PhaseThemeHeadline scope
Phase 1 — MVPEnd-to-end, cloud default + local profileThe six-stage pipeline, core data model + .indx, cloud-backed model defaults, local / air-gapped profile, three CLI commands, the SDK
Phase 2 — v1.0Breadth + rigorFull slot matrix, advanced Relate/Enrich, indx.toml + reproducibility manifest, performance hardening, published benchmark
Phase 3 — BeyondLiving knowledgeIncremental/watch-mode re-indexing, richer relation inference, cross-space linking, community adapter ecosystem

The MVP delivers the full thesis end to end: a directory becomes a portable knowledge space in one command, entirely offline.

  • End-to-end pipeline running all six stages: WalkParseChunkRelate (basic)Enrich (basic)Embed+Pack. “Basic” Relate covers sibling, parent, and continues; “basic” Enrich covers topics, tags, and summaries.
  • Core data modelKnowledgeSpace, Document, Chunk, and Relation — with .indx serialization. See the data-model reference.
  • The default stack and local profile: defaults use parser=docling, llm=openai:gpt-5-mini, embedder=openai:text-embedding-3-small (dim 1536), store=qdrant, and output=.indx; the local profile uses llm=ollama:qwen2.5, embedder=bge-m3 (dim 1024), plus the same local-capable store/output path. Switching profiles changes the embedding dimension — and therefore the dim recorded in the manifest.
  • The three CLI commands plus component flags and per-stage progress/summary: indx <dir> --out <dir>, indx inspect <archive.indx>, and indx query <archive.indx> "<text>". See the CLI reference.
  • The SDK: DirectoryPipeline, first-class access to the model objects and their graph, load/save of .indx, and custom slot registration. See the SDK reference.
  • A documented fully-local / air-gapped path — see the air-gapped guide.

v1.0 broadens the swappable slots and adds the rigor that production and regulated users need.

  • Full slot matrix:
    • Parser adapters: Unstructured, LlamaParse, MarkItDown.
    • Hosted LLMs (any OpenAI-compatible / hosted model via adapter).
    • VLM support for image/figure understanding.
    • Embedding adapters.
    • Stores: pgvector, chroma, lancedb.
    • Outputs: langchain, llamaindex.
  • Advanced Relate: references and duplicate-of edge inference, plus type-aware Enrich that adapts behaviour to the detected document type.
  • indx.toml configuration with precedence (CLI flag > indx.toml > built-in default) and a reproducibility manifest recorded into the knowledge space. See the configuration guide and reproducibility guide.
  • Large-directory performance hardening: parallelism, batching, and observability polish for estates of 10k+ files. See the performance guide.
  • A published retrieval-quality benchmark measuring the lift from structure and relations versus a flat chunk-soup baseline.

Later work makes knowledge spaces living artifacts and grows the ecosystem — while holding the line that indx is not a runtime or vector DB.

  • Incremental / watch-mode re-indexing: delta updates without a full re-process.
  • Richer relation inference plus cross-space linking and merging.
  • A community adapter ecosystem and registry.
  • Optional managed/serving integrations — strictly downstream; serving stays out of core.

indx uses P0 / P1 / P2 to tie requirements to phases: P0 = MVP must-have, P1 = v1.0 should-have, P2 = beyond. The release plan above is the rollup of those priorities.

The roadmap is judged against a few concrete targets:

CategoryMetricTarget
ActivationTime-to-first-knowledge-space (install → first .indx)< 60 seconds on defaults, small dir
ActivationZero-config success rate> 90% on common document directories
Retrieval qualityAccuracy vs. flat chunk-soup baselineMeasurable lift from structure + relations (recall@k / groundedness)
BreadthAdapters across all six slotsDefaults at MVP; full matrix by v1.0; growing community adapters after
ReliabilityCrash-free runs on large directories (10k+ files)> 99%
AdoptionPyPI installs, GitHub stars, external contributorsMonth-over-month growth; ≥10 external contributors by v1.0

These are deliberately unresolved and may shape what ships in each phase:

IDQuestion
OQ-1.indx format: container choice, and whether embeddings ship inside the archive by default or by reference (size vs. portability).
OQ-2How document-type detection is implemented at MVP — heuristics, LLM classification, or both — and the canonical type taxonomy.
OQ-3Which retrieval engine backs indx query for the no-DB (jsonl) case — in-memory brute force vs. a lightweight local index.
OQ-4How references are resolved (filename mentions, link extraction, embedding similarity) and the acceptable precision bar.
OQ-5The scope of determinism guarantees given inherently non-deterministic LLM enrichment.
OQ-6The default chunking strategy and whether it is parser/type-aware out of the box.
IDRiskMitigation
R-1Parser dependency surface — heavyweight parsers add install weight, model downloads, and failure modes, working against the <60s / air-gapped goals.Lean default install, optional extras per adapter, cached/offline model paths.
R-2Local-profile performanceollama:qwen2.5 + bge-m3 on commodity hardware may be slow on large estates.Parallelism, batching, cloud defaults for general use, clear perf guidance.
R-3Relationship quality — weak references / duplicate-of inference erodes the central value prop.Start conservative (high precision), benchmark, iterate.
R-4Ecosystem churn — LangChain / LlamaIndex / vector-DB APIs move fast.Thin adapter layer, version pinning, contract tests.
R-5Positioning confusion — users may think indx competes with parsers.Consistent “composes, not replaces” messaging across docs and CLI.
R-6Scope creep toward a runtime/DB — pressure to serve retrieval or store vectors.Hold the line: serving and storage stay downstream of core.
  • Bring Your Own Stack — the swappable-slot architecture the v1.0 matrix builds on.
  • FAQ — common questions about scope, status, and what indx will and won’t do.