Skip to content

indx

Parsers turn one PDF into clean text. indx turns an entire folder into a knowledge space — with structure, relationships, and semantic metadata that AI agents and RAG systems can actually reason over.
Terminal window
pip install indx
indx ./docs --out ./ai-ready

A great parser answers “what does this file say?” An agent searching your knowledge base is asking something harder: “where does this belong, what does it relate to, and what context do I need to trust it?” A folder is not a bag of files — it has a shape. Most tooling throws that shape away. indx keeps the map, and hands it to the agent.

Directory-level

The unit of work is the directory, not the file. Nested trees, ZIPs, and mixed formats flow through one pipeline into one coherent knowledge space.

Relationship-aware

Folder hierarchy, sibling files, and cross-document references become a typed graph — so an agent knows that /contracts/2024/ means something.

Semantic metadata

Document type, topics, tags, and summaries are attached as metadata, so retrieval can filter and reason instead of guessing.

Portable output

A self-contained, versioned .indx archive — equally legible to a person and to an LLM context window. Build it once, ship it anywhere.

Bring your own stack

Parser, LLM, VLM, embedder, vector store, output — every slot is a typed interface with a sensible default. No lock-in, ever.

Local-first

The local profile runs fully offline — local parser, local LLM, local embedder, no-DB output. Air-gapped by default, not as an afterthought.