Use Cases & Personas
indx exists for one reason: most real knowledge does not live in a single file — it lives in the arrangement of files. This page maps the people who feel that pain and the concrete jobs indx does for them. If you’re new here, start with What is indx? or jump straight to the quickstart.
The four personas
Section titled “The four personas”indx is designed around four representative users. They differ in what they’re building and what constrains them, but they share one frustration: today’s tooling answers “what does this file say?” when they need to know “how does this body of knowledge fit together?”
| Persona | Who they are | Primary need | Pain today |
|---|---|---|---|
| Maya — RAG / Agent Engineer | Builds retrieval and agent apps on LangChain / LlamaIndex. | Grounded, well-structured context with relationships — not a flat chunk soup. | Hand-wires parser + splitter + embedder + store; loses folder context; answers are shallow because cross-document links are gone. |
| Devin — Enterprise / Air-gapped ML Lead | Owns the document-AI platform at a bank, hospital, or government agency. Data cannot leave the network. | Fully local, auditable, reproducible ingestion across large on-prem document estates. | SaaS parsers and cloud LLMs are non-starters; existing OSS tools assume internet and discard structure needed for compliance and lineage. |
| Priya — OSS Developer / Integrator | Maintains internal handbooks, codebases, and tooling; contributes to OSS. | A composable, no-lock-in library she can extend with her own parser, store, or output. | Monolithic ingestion tools are hard to extend; swapping an embedder or store means rewriting the pipeline. |
| Dr. Chen — Researcher | Works with large archives of papers, datasets, and notes. | Turn a messy research archive into a navigable, citable knowledge graph. | Papers reference each other and share datasets, but tools index them in isolation; no portable artifact to share with collaborators. |
The use-case catalogue
Section titled “The use-case catalogue”Each use case below ties back to a persona and to a tested user story from the product requirements.
Grounded RAG / agent retrieval
Section titled “Grounded RAG / agent retrieval”For Maya. Feed an agent a knowledge space instead of a vector blob. In indx, every Chunk carries its source Document, its position, and links to its neighbors — so context travels with the text. Answers come back with source folder, document type, and continues / references edges attached, which means fewer hallucinations and traceable citations. Folder lineage is preserved on every document, so an agent can filter and reason by location — “only contracts under /2024/acme” — instead of matching against an undifferentiated pile of text.
US-1 — As a RAG engineer, I want chunks that carry their source document, position, and neighbor links, so that my agent can expand context and follow
continues/referencesedges instead of retrieving orphaned fragments.
See Core objects for the Chunk and Relation models, and Inspect & query to try retrieval before wiring it into production.
Enterprise on-prem & air-gapped estates
Section titled “Enterprise on-prem & air-gapped estates”For Devin. Point indx at a decade of SharePoint exports or a file server and turn legacy folders into a queryable knowledge base — without a single byte leaving the network. The local profile is fully local: docling for parsing, ollama:qwen2.5 for enrichment, bge-m3 for embeddings, and a no-DB jsonl output. Zero-dependency fallbacks (plaintext parser, jsonl store, none VLM, .indx + jsonl writers) ship in the core, so a complete run works offline out of the box. For compliance, the run is reproducible and the chosen configuration is recorded into the KnowledgeSpace manifest — versions, models, and config — so any space can be audited and re-created.
US-3 — As a regulated-enterprise ML lead, I want to run the entire pipeline with local parser, local LLM, and a no-DB output, so that no document or embedding ever leaves the network.
The full offline path is documented in the local & air-gapped guide, with reproducibility details in the reproducibility guide.
Codebases & handbooks
Section titled “Codebases & handbooks”For Priya. Repos, design docs, and runbooks have deep structure: a test file belongs to a module belongs to a service; an onboarding doc sits beside its siblings. indx detects document types and derives sibling, parent, and continues relations across the tree, so “how do I onboard?” retrieves the onboarding doc and its neighbors rather than random matches. Because structure becomes signal, an assistant built on the space understands the shape of your engineering knowledge instead of guessing at it.
US-5 — As an OSS developer, I want indx to detect document types and relationships across a handbook / codebase, so that retrieval surfaces the right doc and its siblings, not random matches.
Research paper & data archives
Section titled “Research paper & data archives”For Dr. Chen. Mixed PDFs, notebooks, and datasets become one navigable space with citation relations intact. The Relate stage derives references edges so you can follow citations across the archive, and duplicate-of edges so you can dedupe versions of the same paper. The result is a knowledge graph ready for literature agents and synthesis — and, because it serializes to a portable archive, one you can hand directly to a collaborator.
The full RelationType set — sibling, parent, references, continues, duplicate-of — is documented in the data models reference and the Relate stage.
Portability — ship knowledge as a file
Section titled “Portability — ship knowledge as a file”For everyone. A KnowledgeSpace serializes to a single, self-contained, versioned .indx archive. Build it in CI, hand it to a teammate, mount it in a serverless function — the whole knowledge estate travels and re-loads without re-processing. The same archive can be inspected and queried directly with the CLI, so you can sanity-check structure and retrieval before it ever reaches production.
US-7 — As any user, I want to serialize a
KnowledgeSpaceto a single.indxarchive, so that I can hand the whole knowledge estate to a teammate or another machine and re-load it without re-processing.
In code this is symmetric and explicit:
from indx import KnowledgeSpace
space = KnowledgeSpace.load("./ai-ready/handbook.indx")print(space.stats)hits = space.search("data retention", k=5)Learn more in the .indx archive reference and the SDK reference.
Vendor-free migration foundation
Section titled “Vendor-free migration foundation”For Devin and Priya. Because every major component is a typed, swappable slot — parser, LLM, VLM, embedder, store, output — the pipeline you write today survives the model you’ll use next year. indx acts as a neutral intermediate layer: build the knowledge space once, then emit it to JSONL, LangChain, LlamaIndex, or any supported vector store without re-deriving anything. Re-embed, re-store, or re-export with the same code. That makes indx a migration foundation rather than another thing to migrate off of.
US-9 — As a platform lead locked into a vendor’s ingestion, I want indx as a neutral intermediate layer that outputs to LangChain / LlamaIndex / JSONL / any vector DB, so that I can migrate stacks without re-deriving my knowledge.
See the output formats guide for the available writers and the bring-your-own-stack overview for how slots fit together.
Where to go next
Section titled “Where to go next”- New to indx? Read What is indx?, then run the quickstart and build your first knowledge space.
- Need it fully offline? Follow the local & air-gapped guide.
- Picking your stack? See guides on choosing a parser, an embedder, and a store.
- Curious about the bigger picture? Read use-case-driven design principles, the roadmap, and the FAQ.