Output Formats & Integrations
The final stage of every build — 06 Embed+Pack — hands the assembled KnowledgeSpace to an OutputWriter. The writer decides what artifact lands on disk: the default portable .indx archive, plain JSONL, or objects ready to drop straight into a LangChain or LlamaIndex application. This guide covers all four writers, how to select one, and why .indx makes indx a vendor-free migration layer between frameworks.
The OutputWriter slot
Section titled “The OutputWriter slot”OutputWriter is one of the six swappable component slots. Like every slot it is a typed Protocol — any object that satisfies it can serialize a space, and the built-in writers are resolved by name from the registry.
from typing import Protocol, runtime_checkable
@runtime_checkableclass OutputWriter(Protocol): """Serializes a KnowledgeSpace to disk. Default: .indx. Also: jsonl, langchain, llamaindex.""" format: str def write(self, space: KnowledgeSpace, out: str) -> None: ...Two members make up the contract: a format string identifier and a single write(space, out) method that materializes the space into the out directory. That is the entire surface — see the full slot list in the protocols reference.
--format name | Writer class | Emits | Availability |
|---|---|---|---|
.indx (default) | IndxWriter | Portable Zip archive (handbook.indx) + expanded layout | Core |
jsonl | JsonlWriter | Newline-delimited documents/chunks | Core (zero-dep) |
langchain | LangChainWriter | LangChain Document objects | Extra: indx[langchain] |
llamaindex | LlamaIndexWriter | LlamaIndex Node objects | Extra: indx[llamaindex] |
Selecting a writer
Section titled “Selecting a writer”The output writer is resolved with the same precedence as every slot: explicit code argument / use() → CLI flag → indx.toml → documented default. See the configuration guide for the full precedence rules.
On the CLI
Section titled “On the CLI”# default: seal a portable .indx archiveindx ./docs --out ./ai-ready
# export newline-delimited JSONL insteadindx ./docs --out ./ai-ready --format jsonl
# emit LangChain Documents (requires the extra)indx ./docs --out ./ai-ready --format langchainIn indx.toml
Section titled “In indx.toml”[output]format = ".indx" # one of: .indx | jsonl | langchain | llamaindexIn the SDK
Section titled “In the SDK”Pass output= as a name string or as a custom instance — see custom components.
from indx import DirectoryPipeline
# by namepipeline = DirectoryPipeline(output="jsonl")
# or swap it later; use() accepts names or instancespipeline = DirectoryPipeline().use(output="llamaindex")
space = pipeline.run("./docs", "./ai-ready").indx — the recommended artifact
Section titled “.indx — the recommended artifact”The default IndxWriter seals the space into a single .indx file: a ZIP container (deflate) with a defined internal layout, a manifest.json carrying checksums, the index.json knowledge graph, per-chunk files under chunks/, and the vector matrix under embeddings/. Running a build also writes the expanded form alongside the archive, so downstream tools can read either shape:
ai-ready/├── handbook.indx # the portable archive├── index.json # the knowledge graph├── chunks/ # agent-readable chunks + per-chunk context└── embeddings/ # vectors + manifestThe defining property of .indx is that it is self-contained and re-loadable without re-processing. The manifest pins the embedder name and dimensionality (e.g. bge-m3, dim 1024), so a consumer knows exactly which model produced the vectors and can detect a mismatch before querying. You can hand the file to anyone and reopen it instantly:
from indx import KnowledgeSpace
space = KnowledgeSpace.load("./ai-ready/handbook.indx")hits = space.search("gdpr compliance", k=5)Use --name to control the archive base name (handbook → handbook.indx). For the full byte-level layout, manifest schema, and versioning rules, see the .indx archive reference and the index.json reference.
jsonl — zero-dependency export
Section titled “jsonl — zero-dependency export”The JsonlWriter emits newline-delimited records for documents and chunks. It ships in core, pulls no dependencies, and produces a format that any tool can stream line by line — ideal for piping into custom loaders, data warehouses, or quick scripts where you do not need the sealed archive or memory-mapped vectors.
Framework writers — LangChain & LlamaIndex
Section titled “Framework writers — LangChain & LlamaIndex”The framework writers skip the archive entirely and hand you objects your existing application already understands:
langchain(LangChainWriter) emits LangChainDocumentobjects — page content plus metadata — ready to push into a LangChain retriever or vector store.llamaindex(LlamaIndexWriter) emits LlamaIndexNodeobjects, preserving chunk text, source provenance, and relationships for a LlamaIndex index.
Both are optional extras — install the matching extra before selecting them:
pip install "indx[langchain]"# orpip install "indx[llamaindex]"If the extra is missing when the slot is selected, indx raises a single actionable MissingDependencyError naming the exact pip install command. See errors & exit codes and the extras reference for the full matrix.
.indx vs framework writers
Section titled “.indx vs framework writers”.indx (default) | langchain / llamaindex | |
|---|---|---|
| Artifact | Single portable, checksummed file | Native LC Documents / LI Nodes |
| Re-loadable without re-processing | Yes — KnowledgeSpace.load(...) | No — re-derive if you change stacks |
| Carries vectors + manifest | Yes (self-describing) | Hands off to the framework’s own index |
| Best for | Archiving, sharing, future-proofing | Plugging straight into an existing LC/LI app |
Reach for a framework writer when you have a LangChain or LlamaIndex application today and want indx’s structured chunks fed in with no glue code. Reach for .indx when you want a durable, neutral artifact you can reopen, audit, and re-target later.
A vendor-free migration foundation
Section titled “A vendor-free migration foundation”indx composes the AI stack rather than locking you into one. Because every heavy capability — parser, LLM, VLM, embedder, store, output — sits behind a typed protocol with a named default, the .indx archive becomes a neutral intermediate layer: it captures the expensive, hard-won work (walking, parsing, chunking, relating, enriching, embedding) once, in a framework-agnostic form.
That decoupling is what makes downstream migration cheap. The directory has already been turned into a portable knowledge space; switching the framework or stack on top of it is a configuration choice, not a re-derivation:
- Build once into
.indx, then export to LangChain today and LlamaIndex tomorrow by changing only[output].format— no re-walk, no re-parse, no re-embed. - Keep the canonical
.indxas the source of truth and treat framework writers as disposable projections of it. - Re-embedding is only required if you change the embedder itself (the manifest pins the embedder’s
nameanddim); see reproducibility.
Custom writers
Section titled “Custom writers”Need a format indx does not ship? Implement the OutputWriter protocol — set a format string and a write(space, out) method — and pass an instance to DirectoryPipeline(output=...) or use(output=...). To distribute it for others to select by name, register it under the indx.outputs entry-point group. See custom components and authoring a plugin.
Next steps
Section titled “Next steps”- Inspect & query — read a sealed
.indxarchive back. - .indx archive reference — the full container spec.
- Extras reference — every optional dependency, including the framework writers.
- Protocols reference — the
OutputWritercontract in context.