Bringing Your Own Component
indx is built so that every heavy-lifting component is a swappable adapter behind a typed protocol. When the built-in adapters do not fit, you can supply your own object and hand it to the pipeline — no base class to inherit, no fork of indx, no edits to core/.
This guide covers bring-your-own (BYO) components written in code: a one-off class you instantiate and pass into a DirectoryPipeline. If you want to publish your adapter as an installable package that resolves by name (so others can write store = "weaviate" in indx.toml), see authoring a plugin instead. To add a stage rather than a component, see writing a custom stage.
Structural typing: why no base class
Section titled “Structural typing: why no base class”indx interfaces are typing.Protocol definitions, not abstract base classes. That means indx uses structural typing (“duck typing with type-checking”): any object whose methods match the protocol’s signatures satisfies it. You do not import and subclass Parser — you simply implement parse(...) with the right shape, and your object drops straight in.
The six component slots and their protocols (from the technical spec) are:
| Slot | Protocol | Key method(s) | Default |
|---|---|---|---|
parser | Parser | parse(file) -> ParsedDoc | docling |
llm | LLM | complete(prompt, *, system=, max_tokens=, temperature=) -> str | openai:gpt-5-mini |
vlm | VLM | describe(image, *, prompt=) -> str | none |
embedder | Embedder | dim: int; embed(texts) -> list[list[float]] | openai:text-embedding-3-small (dim 1536) |
store | Store | upsert(...), query(...), persist(dest) | qdrant |
output | OutputWriter | format: str; write(space, out) -> None | .indx |
The Default column is the shipped zero-config cloud stack (it needs OPENAI_API_KEY). The all-local stack — ollama:qwen2.5 + bge-m3 (dim 1024) — is the local profile (opt-in), selected via pip install "indx[local]" or explicit flags/config; see registry and defaults.
See the protocols reference for the exact, complete signatures of every slot.
Example 1 — a custom Parser
Section titled “Example 1 — a custom Parser”A Parser turns one file into a ParsedDoc. Here is a minimal Markdown parser that splits on blank lines into paragraph blocks. Note that it is a plain class — it implements parse and returns a core ParsedDoc, nothing more.
from indx import DirectoryPipeline, ParsedDoc, Source
class MyMarkdownParser: """Custom Parser: satisfies the Parser protocol structurally."""
def parse(self, file) -> ParsedDoc: text = file.read_text() return ParsedDoc( source=Source(path=file.path, folder=file.folder, type="markdown"), text=text, blocks=[{"kind": "paragraph", "text": p} for p in text.split("\n\n")], )
pipeline = DirectoryPipeline(embedder="bge-m3", store="chroma")pipeline.use(parser=MyMarkdownParser()) # BYO parser instancespace = pipeline.run("./notes", "./out")The file argument is a FileRef produced by stage 01 (Walk): it carries the resolved path, a bytes/text accessor, the detected MIME type, and the folder lineage. Your job is to read it and return normalized text plus any structural blocks, tables, or images the Chunk stage can use to split with structure intact.
Example 2 — chaining .use(...) and .drop(...)
Section titled “Example 2 — chaining .use(...) and .drop(...)”use() returns the pipeline, so component swaps chain fluently. You can also remove a stage entirely — drop("enrich") skips all LLM/VLM work, which is handy when you have no model available or simply do not want enrichment.
from indx import DirectoryPipeline, ParsedDoc, Source
class MyMarkdownParser: def parse(self, file) -> ParsedDoc: text = file.read_text() return ParsedDoc( source=Source(path=file.path, folder=file.folder, type="markdown"), text=text, blocks=[{"kind": "paragraph", "text": p} for p in text.split("\n\n")], )
pipeline = ( DirectoryPipeline(embedder="bge-m3", store="chroma") .use(parser=MyMarkdownParser()) # BYO parser instance .drop("enrich") # skip LLM enrichment entirely)space = pipeline.run("./notes", "./out")Dropping enrich removes the only stage that would call an LLM or VLM, so documents keep their detected type but get no LLM-derived topics, tags, or summaries. Dropping embed-pack instead produces a graph-only space with no vectors. See the pipeline overview for what each stage contributes.
Passing components: construction vs use()
Section titled “Passing components: construction vs use()”There are two equivalent ways to bind a component, and they accept either an instance (your BYO object) or a name string (a registered adapter).
# (a) At constructionpipeline = DirectoryPipeline( parser=MyMarkdownParser(), # instance embedder="bge-m3", # name string store="chroma",)
# (b) Via use() after construction (chainable, returns self)pipeline = DirectoryPipeline(embedder="bge-m3", store="chroma")pipeline.use(parser=MyMarkdownParser())Both forms are interchangeable; pick whichever reads better. The keyword names are the same in both: parser=, llm=, vlm=, embedder=, store=, output=.
For any slot you leave unset, indx resolves the effective component using this precedence:
explicit code argument / use() > CLI flag > indx.toml > documented defaultSo a BYO object passed in code always wins over config or defaults. See the configuration guide and configuration reference for the full resolution rules.
The adapter authoring contract
Section titled “The adapter authoring contract”Whether your component is a throwaway class or a future plugin, it should honour the same five-point contract from the spec. Following it keeps your adapter portable, import-safe, and friendly when a dependency is missing.
1. Implement the protocol — exactly
Section titled “1. Implement the protocol — exactly”Match the method names and signatures in the protocols reference precisely. Structural typing means a near-miss (wrong argument name, wrong return type) will not raise at bind time but will fail mid-run. Run mypy/pyright against your adapter to catch mismatches early.
2. Convert at the edge — never leak vendor types into core models
Section titled “2. Convert at the edge — never leak vendor types into core models”A ParsedDoc, Chunk, or Document must never hold a vendor object (a raw provider response, a qdrant_client.PointStruct, a LangChain Document, etc.). Do all conversion to and from core types inside your adapter, at its boundary. Core models stay vendor-free so the resulting .indx archive is portable regardless of which backend produced it.
class MyStore: def upsert(self, ids, vectors, payloads) -> None: from my_vendor_sdk import Point # vendor type stays local points = [Point(id=i, vec=v, payload=p) for i, v, p in zip(ids, vectors, payloads)] self._client.upsert(points) # convert here, not in core/3. Lazy-import heavy deps, with a MissingDependencyError hint
Section titled “3. Lazy-import heavy deps, with a MissingDependencyError hint”Do not import a heavy or optional dependency at module top level — that would break indx’s light, air-gapped core. Import it inside the method that needs it and raise MissingDependencyError with an actionable pip install hint when it is absent.
from indx.core.errors import MissingDependencyError
class MyEmbedder: dim = 768
def embed(self, texts: list[str]) -> list[list[float]]: try: from sentence_transformers import SentenceTransformer # lazy except ModuleNotFoundError as exc: raise MissingDependencyError( "MyEmbedder requires sentence-transformers. " "Install it with: pip install sentence-transformers" ) from exc model = SentenceTransformer("my-model") return [list(v) for v in model.encode(texts)]4. Be import-safe
Section titled “4. Be import-safe”Importing your adapter module must never fail because a backend is absent or a service is unreachable. Defer all heavy work — SDK imports, network connections, model loading — to construction or method-call time, not module import. This is what lets the registry discover adapters cheaply without paying for backends you are not using.
5. For Store, materialize the archive
Section titled “5. For Store, materialize the archive”If you write a custom Store, implement persist(dest) to flush your vectors into the embeddings/ layout. That keeps the sealed .indx archive self-contained and loadable even when your backing database is offline. See the embed + pack stage and the .indx archive reference for the on-disk format.
Where to go next
Section titled “Where to go next”- Protocols reference — the complete, normative signature for every slot.
- Writing a custom stage — add or replace a whole pipeline phase, not just a component.
- Authoring a plugin — package your adapter so it resolves by name via entry points.
- Adding a backend — contribute an adapter upstream into indx itself.