Output Formats & Integrations

The final stage of every build — 06 Embed+Pack — hands the assembled KnowledgeSpace to an OutputWriter. The writer decides what artifact lands on disk: the default portable .indx archive, plain JSONL, or objects ready to drop straight into a LangChain or LlamaIndex application. This guide covers all four writers, how to select one, and why .indx makes indx a vendor-free migration layer between frameworks.

The OutputWriter slot

OutputWriter is one of the six swappable component slots. Like every slot it is a typed Protocol — any object that satisfies it can serialize a space, and the built-in writers are resolved by name from the registry.

from typing import Protocol, runtime_checkable

@runtime_checkable
class OutputWriter(Protocol):
    """Serializes a KnowledgeSpace to disk. Default: .indx.
    Also: jsonl, langchain, llamaindex."""
    format: str
    def write(self, space: KnowledgeSpace, out: str) -> None: ...

Two members make up the contract: a format string identifier and a single write(space, out) method that materializes the space into the out directory. That is the entire surface — see the full slot list in the protocols reference.

`--format` name	Writer class	Emits	Availability
`.indx` (default)	`IndxWriter`	Portable Zip archive (`handbook.indx`) + expanded layout	Core
`jsonl`	`JsonlWriter`	Newline-delimited documents/chunks	Core (zero-dep)
`langchain`	`LangChainWriter`	LangChain `Document` objects	Extra: `indx[langchain]`
`llamaindex`	`LlamaIndexWriter`	LlamaIndex `Node` objects	Extra: `indx[llamaindex]`

Selecting a writer

The output writer is resolved with the same precedence as every slot: explicit code argument / use() → CLI flag → indx.toml → documented default. See the configuration guide for the full precedence rules.

On the CLI

# default: seal a portable .indx archive
indx ./docs --out ./ai-ready

# export newline-delimited JSONL instead
indx ./docs --out ./ai-ready --format jsonl

# emit LangChain Documents (requires the extra)
indx ./docs --out ./ai-ready --format langchain

In `indx.toml`

[output]
format = ".indx"   # one of: .indx | jsonl | langchain | llamaindex

In the SDK

Pass output= as a name string or as a custom instance — see custom components.

from indx import DirectoryPipeline

# by name
pipeline = DirectoryPipeline(output="jsonl")

# or swap it later; use() accepts names or instances
pipeline = DirectoryPipeline().use(output="llamaindex")

space = pipeline.run("./docs", "./ai-ready")

`.indx` — the recommended artifact

The default IndxWriter seals the space into a single .indx file: a ZIP container (deflate) with a defined internal layout, a manifest.json carrying checksums, the index.json knowledge graph, per-chunk files under chunks/, and the vector matrix under embeddings/. Running a build also writes the expanded form alongside the archive, so downstream tools can read either shape:

ai-ready/
├── handbook.indx        # the portable archive
├── index.json           # the knowledge graph
├── chunks/              # agent-readable chunks + per-chunk context
└── embeddings/          # vectors + manifest

The defining property of .indx is that it is self-contained and re-loadable without re-processing. The manifest pins the embedder name and dimensionality (e.g. bge-m3, dim 1024), so a consumer knows exactly which model produced the vectors and can detect a mismatch before querying. You can hand the file to anyone and reopen it instantly:

from indx import KnowledgeSpace

space = KnowledgeSpace.load("./ai-ready/handbook.indx")
hits = space.search("gdpr compliance", k=5)

Use --name to control the archive base name (handbook → handbook.indx). For the full byte-level layout, manifest schema, and versioning rules, see the .indx archive reference and the index.json reference.

`jsonl` — zero-dependency export

The JsonlWriter emits newline-delimited records for documents and chunks. It ships in core, pulls no dependencies, and produces a format that any tool can stream line by line — ideal for piping into custom loaders, data warehouses, or quick scripts where you do not need the sealed archive or memory-mapped vectors.

Framework writers — LangChain & LlamaIndex

The framework writers skip the archive entirely and hand you objects your existing application already understands:

langchain (LangChainWriter) emits LangChain Document objects — page content plus metadata — ready to push into a LangChain retriever or vector store.
llamaindex (LlamaIndexWriter) emits LlamaIndex Node objects, preserving chunk text, source provenance, and relationships for a LlamaIndex index.

Both are optional extras — install the matching extra before selecting them:

pip install "indx[langchain]"
# or
pip install "indx[llamaindex]"

If the extra is missing when the slot is selected, indx raises a single actionable MissingDependencyError naming the exact pip install command. See errors & exit codes and the extras reference for the full matrix.

`.indx` vs framework writers

	`.indx` (default)	`langchain` / `llamaindex`
Artifact	Single portable, checksummed file	Native LC `Document`s / LI `Node`s
Re-loadable without re-processing	Yes — `KnowledgeSpace.load(...)`	No — re-derive if you change stacks
Carries vectors + manifest	Yes (self-describing)	Hands off to the framework’s own index
Best for	Archiving, sharing, future-proofing	Plugging straight into an existing LC/LI app

Reach for a framework writer when you have a LangChain or LlamaIndex application today and want indx’s structured chunks fed in with no glue code. Reach for .indx when you want a durable, neutral artifact you can reopen, audit, and re-target later.

A vendor-free migration foundation

indx composes the AI stack rather than locking you into one. Because every heavy capability — parser, LLM, VLM, embedder, store, output — sits behind a typed protocol with a named default, the .indx archive becomes a neutral intermediate layer: it captures the expensive, hard-won work (walking, parsing, chunking, relating, enriching, embedding) once, in a framework-agnostic form.

That decoupling is what makes downstream migration cheap. The directory has already been turned into a portable knowledge space; switching the framework or stack on top of it is a configuration choice, not a re-derivation:

Build once into .indx, then export to LangChain today and LlamaIndex tomorrow by changing only [output].format — no re-walk, no re-parse, no re-embed.
Keep the canonical .indx as the source of truth and treat framework writers as disposable projections of it.
Re-embedding is only required if you change the embedder itself (the manifest pins the embedder’s name and dim); see reproducibility.

Custom writers

Need a format indx does not ship? Implement the OutputWriter protocol — set a format string and a write(space, out) method — and pass an instance to DirectoryPipeline(output=...) or use(output=...). To distribute it for others to select by name, register it under the indx.outputs entry-point group. See custom components and authoring a plugin.

Next steps

Inspect & query — read a sealed .indx archive back.
.indx archive reference — the full container spec.
Extras reference — every optional dependency, including the framework writers.
Protocols reference — the OutputWriter contract in context.