Skip to content

Use Cases & Personas

indx exists for one reason: most real knowledge does not live in a single file — it lives in the arrangement of files. This page maps the people who feel that pain and the concrete jobs indx does for them. If you’re new here, start with What is indx? or jump straight to the quickstart.

indx is designed around four representative users. They differ in what they’re building and what constrains them, but they share one frustration: today’s tooling answers “what does this file say?” when they need to know “how does this body of knowledge fit together?”

PersonaWho they arePrimary needPain today
Maya — RAG / Agent EngineerBuilds retrieval and agent apps on LangChain / LlamaIndex.Grounded, well-structured context with relationships — not a flat chunk soup.Hand-wires parser + splitter + embedder + store; loses folder context; answers are shallow because cross-document links are gone.
Devin — Enterprise / Air-gapped ML LeadOwns the document-AI platform at a bank, hospital, or government agency. Data cannot leave the network.Fully local, auditable, reproducible ingestion across large on-prem document estates.SaaS parsers and cloud LLMs are non-starters; existing OSS tools assume internet and discard structure needed for compliance and lineage.
Priya — OSS Developer / IntegratorMaintains internal handbooks, codebases, and tooling; contributes to OSS.A composable, no-lock-in library she can extend with her own parser, store, or output.Monolithic ingestion tools are hard to extend; swapping an embedder or store means rewriting the pipeline.
Dr. Chen — ResearcherWorks with large archives of papers, datasets, and notes.Turn a messy research archive into a navigable, citable knowledge graph.Papers reference each other and share datasets, but tools index them in isolation; no portable artifact to share with collaborators.

Each use case below ties back to a persona and to a tested user story from the product requirements.

For Maya. Feed an agent a knowledge space instead of a vector blob. In indx, every Chunk carries its source Document, its position, and links to its neighbors — so context travels with the text. Answers come back with source folder, document type, and continues / references edges attached, which means fewer hallucinations and traceable citations. Folder lineage is preserved on every document, so an agent can filter and reason by location — “only contracts under /2024/acme — instead of matching against an undifferentiated pile of text.

US-1As a RAG engineer, I want chunks that carry their source document, position, and neighbor links, so that my agent can expand context and follow continues / references edges instead of retrieving orphaned fragments.

See Core objects for the Chunk and Relation models, and Inspect & query to try retrieval before wiring it into production.

For Devin. Point indx at a decade of SharePoint exports or a file server and turn legacy folders into a queryable knowledge base — without a single byte leaving the network. The local profile is fully local: docling for parsing, ollama:qwen2.5 for enrichment, bge-m3 for embeddings, and a no-DB jsonl output. Zero-dependency fallbacks (plaintext parser, jsonl store, none VLM, .indx + jsonl writers) ship in the core, so a complete run works offline out of the box. For compliance, the run is reproducible and the chosen configuration is recorded into the KnowledgeSpace manifest — versions, models, and config — so any space can be audited and re-created.

US-3As a regulated-enterprise ML lead, I want to run the entire pipeline with local parser, local LLM, and a no-DB output, so that no document or embedding ever leaves the network.

The full offline path is documented in the local & air-gapped guide, with reproducibility details in the reproducibility guide.

For Priya. Repos, design docs, and runbooks have deep structure: a test file belongs to a module belongs to a service; an onboarding doc sits beside its siblings. indx detects document types and derives sibling, parent, and continues relations across the tree, so “how do I onboard?” retrieves the onboarding doc and its neighbors rather than random matches. Because structure becomes signal, an assistant built on the space understands the shape of your engineering knowledge instead of guessing at it.

US-5As an OSS developer, I want indx to detect document types and relationships across a handbook / codebase, so that retrieval surfaces the right doc and its siblings, not random matches.

For Dr. Chen. Mixed PDFs, notebooks, and datasets become one navigable space with citation relations intact. The Relate stage derives references edges so you can follow citations across the archive, and duplicate-of edges so you can dedupe versions of the same paper. The result is a knowledge graph ready for literature agents and synthesis — and, because it serializes to a portable archive, one you can hand directly to a collaborator.

The full RelationType set — sibling, parent, references, continues, duplicate-of — is documented in the data models reference and the Relate stage.

For everyone. A KnowledgeSpace serializes to a single, self-contained, versioned .indx archive. Build it in CI, hand it to a teammate, mount it in a serverless function — the whole knowledge estate travels and re-loads without re-processing. The same archive can be inspected and queried directly with the CLI, so you can sanity-check structure and retrieval before it ever reaches production.

US-7As any user, I want to serialize a KnowledgeSpace to a single .indx archive, so that I can hand the whole knowledge estate to a teammate or another machine and re-load it without re-processing.

In code this is symmetric and explicit:

from indx import KnowledgeSpace
space = KnowledgeSpace.load("./ai-ready/handbook.indx")
print(space.stats)
hits = space.search("data retention", k=5)

Learn more in the .indx archive reference and the SDK reference.

For Devin and Priya. Because every major component is a typed, swappable slot — parser, LLM, VLM, embedder, store, output — the pipeline you write today survives the model you’ll use next year. indx acts as a neutral intermediate layer: build the knowledge space once, then emit it to JSONL, LangChain, LlamaIndex, or any supported vector store without re-deriving anything. Re-embed, re-store, or re-export with the same code. That makes indx a migration foundation rather than another thing to migrate off of.

US-9As a platform lead locked into a vendor’s ingestion, I want indx as a neutral intermediate layer that outputs to LangChain / LlamaIndex / JSONL / any vector DB, so that I can migrate stacks without re-deriving my knowledge.

See the output formats guide for the available writers and the bring-your-own-stack overview for how slots fit together.