Skip to content

Output Formats & Integrations

The final stage of every build — 06 Embed+Pack — hands the assembled KnowledgeSpace to an OutputWriter. The writer decides what artifact lands on disk: the default portable .indx archive, plain JSONL, or objects ready to drop straight into a LangChain or LlamaIndex application. This guide covers all four writers, how to select one, and why .indx makes indx a vendor-free migration layer between frameworks.

OutputWriter is one of the six swappable component slots. Like every slot it is a typed Protocol — any object that satisfies it can serialize a space, and the built-in writers are resolved by name from the registry.

from typing import Protocol, runtime_checkable
@runtime_checkable
class OutputWriter(Protocol):
"""Serializes a KnowledgeSpace to disk. Default: .indx.
Also: jsonl, langchain, llamaindex."""
format: str
def write(self, space: KnowledgeSpace, out: str) -> None: ...

Two members make up the contract: a format string identifier and a single write(space, out) method that materializes the space into the out directory. That is the entire surface — see the full slot list in the protocols reference.

--format nameWriter classEmitsAvailability
.indx (default)IndxWriterPortable Zip archive (handbook.indx) + expanded layoutCore
jsonlJsonlWriterNewline-delimited documents/chunksCore (zero-dep)
langchainLangChainWriterLangChain Document objectsExtra: indx[langchain]
llamaindexLlamaIndexWriterLlamaIndex Node objectsExtra: indx[llamaindex]

The output writer is resolved with the same precedence as every slot: explicit code argument / use() → CLI flag → indx.toml → documented default. See the configuration guide for the full precedence rules.

Terminal window
# default: seal a portable .indx archive
indx ./docs --out ./ai-ready
# export newline-delimited JSONL instead
indx ./docs --out ./ai-ready --format jsonl
# emit LangChain Documents (requires the extra)
indx ./docs --out ./ai-ready --format langchain
[output]
format = ".indx" # one of: .indx | jsonl | langchain | llamaindex

Pass output= as a name string or as a custom instance — see custom components.

from indx import DirectoryPipeline
# by name
pipeline = DirectoryPipeline(output="jsonl")
# or swap it later; use() accepts names or instances
pipeline = DirectoryPipeline().use(output="llamaindex")
space = pipeline.run("./docs", "./ai-ready")

The default IndxWriter seals the space into a single .indx file: a ZIP container (deflate) with a defined internal layout, a manifest.json carrying checksums, the index.json knowledge graph, per-chunk files under chunks/, and the vector matrix under embeddings/. Running a build also writes the expanded form alongside the archive, so downstream tools can read either shape:

ai-ready/
├── handbook.indx # the portable archive
├── index.json # the knowledge graph
├── chunks/ # agent-readable chunks + per-chunk context
└── embeddings/ # vectors + manifest

The defining property of .indx is that it is self-contained and re-loadable without re-processing. The manifest pins the embedder name and dimensionality (e.g. bge-m3, dim 1024), so a consumer knows exactly which model produced the vectors and can detect a mismatch before querying. You can hand the file to anyone and reopen it instantly:

from indx import KnowledgeSpace
space = KnowledgeSpace.load("./ai-ready/handbook.indx")
hits = space.search("gdpr compliance", k=5)

Use --name to control the archive base name (handbookhandbook.indx). For the full byte-level layout, manifest schema, and versioning rules, see the .indx archive reference and the index.json reference.

The JsonlWriter emits newline-delimited records for documents and chunks. It ships in core, pulls no dependencies, and produces a format that any tool can stream line by line — ideal for piping into custom loaders, data warehouses, or quick scripts where you do not need the sealed archive or memory-mapped vectors.

Framework writers — LangChain & LlamaIndex

Section titled “Framework writers — LangChain & LlamaIndex”

The framework writers skip the archive entirely and hand you objects your existing application already understands:

  • langchain (LangChainWriter) emits LangChain Document objects — page content plus metadata — ready to push into a LangChain retriever or vector store.
  • llamaindex (LlamaIndexWriter) emits LlamaIndex Node objects, preserving chunk text, source provenance, and relationships for a LlamaIndex index.

Both are optional extras — install the matching extra before selecting them:

Terminal window
pip install "indx[langchain]"
# or
pip install "indx[llamaindex]"

If the extra is missing when the slot is selected, indx raises a single actionable MissingDependencyError naming the exact pip install command. See errors & exit codes and the extras reference for the full matrix.

.indx (default)langchain / llamaindex
ArtifactSingle portable, checksummed fileNative LC Documents / LI Nodes
Re-loadable without re-processingYes — KnowledgeSpace.load(...)No — re-derive if you change stacks
Carries vectors + manifestYes (self-describing)Hands off to the framework’s own index
Best forArchiving, sharing, future-proofingPlugging straight into an existing LC/LI app

Reach for a framework writer when you have a LangChain or LlamaIndex application today and want indx’s structured chunks fed in with no glue code. Reach for .indx when you want a durable, neutral artifact you can reopen, audit, and re-target later.

indx composes the AI stack rather than locking you into one. Because every heavy capability — parser, LLM, VLM, embedder, store, output — sits behind a typed protocol with a named default, the .indx archive becomes a neutral intermediate layer: it captures the expensive, hard-won work (walking, parsing, chunking, relating, enriching, embedding) once, in a framework-agnostic form.

That decoupling is what makes downstream migration cheap. The directory has already been turned into a portable knowledge space; switching the framework or stack on top of it is a configuration choice, not a re-derivation:

  • Build once into .indx, then export to LangChain today and LlamaIndex tomorrow by changing only [output].format — no re-walk, no re-parse, no re-embed.
  • Keep the canonical .indx as the source of truth and treat framework writers as disposable projections of it.
  • Re-embedding is only required if you change the embedder itself (the manifest pins the embedder’s name and dim); see reproducibility.

Need a format indx does not ship? Implement the OutputWriter protocol — set a format string and a write(space, out) method — and pass an instance to DirectoryPipeline(output=...) or use(output=...). To distribute it for others to select by name, register it under the indx.outputs entry-point group. See custom components and authoring a plugin.