Skip to content

Adding a Backend

Adding a new backend — a Parser, LLM, VLM, Embedder, Store, or OutputWriter — is the most common contribution to indx, and the recipe is deliberately fixed. Follow these six steps and your adapter drops straight into the registry, resolves by name, and never weighs down the dependency-light core.

The whole design rests on two ideas: backends satisfy a typed Protocol by structure (no subclassing), and heavy dependencies live behind optional extras that are imported lazily. This page is the contributor-facing checklist; for shipping an adapter as a separate PyPI package, see Authoring a Plugin.

Every backend follows the same path. The steps below are normative — the pull-request checklist enforces them.

#StepWhy it matters
1Implement the Protocol exactlyStructural typing means any object that fits “drops in”
2Convert at the edgeVendor types never leak into core models
3Lazy-import the heavy dependencypip install indx stays light and air-gapped
4Declare the extra in pyproject.tomlUsers get one clear install command
5Register via entry pointNo edits to core/; resolves by name
6Write the adapter contract testProves the Protocol fit and the round-trip

Match the method signatures in core protocols exactly. indx uses typing.Protocol (structural typing), not abstract base classes, so you do not subclass anything — you just write a class whose methods satisfy the interface.

For a Store, that means implementing upsert, query, and persist:

src/indx/store/qdrant.py
from __future__ import annotations
class Qdrant: # satisfies the Store protocol structurally
"""Vector store backed by Qdrant. Registry key: 'qdrant'."""
def __init__(self, url: str = "http://localhost:6333") -> None:
self.url = url
def upsert(
self,
ids: list[str],
vectors: list[list[float]],
payloads: list[dict],
) -> None: ...
def query(
self,
vector: list[float],
k: int = 5,
filter: dict | None = None,
) -> list[tuple[str, float]]: ...
def persist(self, dest: str) -> None: ...

2. Convert at the edge — never leak vendor types

Section titled “2. Convert at the edge — never leak vendor types”

Your adapter accepts and returns only core domain types (ParsedDoc, Chunk, vectors as list[float], and so on). The vendor SDK exists only inside your adapter module. A Document must never store a qdrant_client.PointStruct, and a Protocol method must never return a raw provider response.

This is the rule that keeps the dependency graph a DAG pointing inward at core/: convert core types into vendor types on the way in, and vendor results back into core types on the way out, right there at the adapter boundary.

# ✅ vendor type built here, at the edge, and discarded here
def upsert(self, ids, vectors, payloads):
from qdrant_client.models import PointStruct # vendor type, local to this method
points = [
PointStruct(id=i, vector=v, payload=p)
for i, v, p in zip(ids, vectors, payloads)
]
self._client.upsert(collection_name="indx", points=points)

The bare install must work with no network and no GPU. So the vendor SDK is imported inside the method that needs it — never at module top level — and a missing dependency raises MissingDependencyError carrying the exact pip install indx[<extra>] hint. The extra is named after the registry key.

from indx.core.errors import MissingDependencyError
class Qdrant:
def __init__(self, url: str = "http://localhost:6333") -> None:
self.url = url
def connect(self) -> None:
try:
from qdrant_client import QdrantClient # lazy import
except ModuleNotFoundError as exc:
raise MissingDependencyError(
"The Qdrant store requires the 'qdrant' extra. "
"Install it with: pip install indx[qdrant]"
) from exc
self._client = QdrantClient(url=self.url)

Because the import is deferred to runtime, plugin discovery never fails just because a backend’s package is absent — the error only surfaces when that slot is actually selected.

Add an entry under [project.optional-dependencies], keyed by the registry key, listing the packages your adapter needs:

[project.optional-dependencies]
qdrant = ["qdrant-client>=1.7"]

Now pip install indx[qdrant] pulls exactly what the Qdrant store needs, and nothing more. See Extras for the full install matrix and the defaults / all bundles.

5. Register via entry point — don’t hard-wire it

Section titled “5. Register via entry point — don’t hard-wire it”

indx resolves backends by name through per-slot registries. This recipe targets third-party plugins shipped as their own PyPI package — wire your class in with an entry point so it resolves by name, and never hard-code it into core/. (A first-party adapter living in this repository instead registers in registry/builtins.py; entry points are reserved for out-of-tree plugins. See Authoring a Plugin for the standalone-package path.)

[project.entry-points."indx.stores"]
qdrant = "indx.store.qdrant:Qdrant" # registry key → "module:Class"

Each slot has its own entry-point group:

Entry-point groupSlotProtocol
indx.parsersparserParser
indx.llmsllmLLM
indx.vlmsvlmVLM
indx.embeddersembedderEmbedder
indx.storesstoreStore
indx.outputsoutputOutputWriter
indx.stagespipelineStage

Once registered, the backend is usable by name anywhere a built-in is — in indx.toml, on the CLI, or in code:

[store]
backend = "qdrant"
from indx import DirectoryPipeline
DirectoryPipeline(store="qdrant")

See Registry and Defaults for resolution order and how first-party built-ins relate to discovered plugins.

Every Protocol implementation ships with a contract test that proves two things: the adapter satisfies the Protocol, and it round-trips core types. The test layout mirrors src/ — so src/indx/store/qdrant.py is tested by tests/store/test_qdrant.py (unit tests under tests/unit/).

Network and model calls are mocked: real provider calls never run in the default offline suite. Use a fake client or a recorded HTTP cassette, and seed any randomness so the test is deterministic.

from indx.store import Store
from indx.store.qdrant import Qdrant
def test_qdrant_satisfies_store_protocol():
store = Qdrant(url="http://localhost:6333")
assert isinstance(store, Store) # @runtime_checkable protocol
def test_qdrant_round_trips_core_types(fake_qdrant_client):
store = Qdrant()
store._client = fake_qdrant_client # network mocked
store.upsert(["chunk_0001"], [[0.1, 0.2]], [{"path": "a.md"}])
hits = store.query([0.1, 0.2], k=1)
assert hits == [("chunk_0001", 1.0)] # returns core types, not vendor objects

For Store adapters specifically, also exercise persist() so the archive’s embeddings/ layout is materialized — this keeps sealed .indx archives portable regardless of which backend produced them. See Testing for the full contract-test conventions, fakes, and golden-file rules.

Putting the pieces together, here is the minimal shape of a backend — pyproject.toml declarations plus the adapter skeleton with its lazy import and edge conversion:

pyproject.toml
[project.optional-dependencies]
qdrant = ["qdrant-client>=1.7"]
[project.entry-points."indx.stores"]
qdrant = "indx.store.qdrant:Qdrant" # registry key → class
src/indx/store/qdrant.py
from __future__ import annotations
from indx.core.errors import MissingDependencyError
class Qdrant: # satisfies the Store protocol structurally
"""Vector store backed by Qdrant. Registry key: 'qdrant'."""
def __init__(self, url: str = "http://localhost:6333") -> None:
self.url = url
self._client = None
def _connect(self) -> None:
try:
from qdrant_client import QdrantClient # lazy import
except ModuleNotFoundError as exc:
raise MissingDependencyError(
"The Qdrant store requires the 'qdrant' extra. "
"Install it with: pip install indx[qdrant]"
) from exc
self._client = QdrantClient(url=self.url)
def upsert(
self,
ids: list[str],
vectors: list[list[float]],
payloads: list[dict],
) -> None:
if self._client is None:
self._connect()
from qdrant_client.models import PointStruct # vendor type, local
# convert core ids/vectors/payloads → vendor PointStruct *here*, never in core/
points = [
PointStruct(id=i, vector=v, payload=p)
for i, v, p in zip(ids, vectors, payloads)
]
self._client.upsert(collection_name="indx", points=points)
def query(
self,
vector: list[float],
k: int = 5,
filter: dict | None = None,
) -> list[tuple[str, float]]:
if self._client is None:
self._connect()
results = self._client.search(
collection_name="indx", query_vector=vector, limit=k
)
# convert vendor results → core tuples at the edge
return [(str(r.id), float(r.score)) for r in results]
def persist(self, dest: str) -> None:
"""Flush/export vectors into the output `embeddings/` layout."""
...

That is the entire contract. The adapter satisfies Store structurally, never leaks qdrant_client types out of the module, imports the SDK lazily with an actionable hint, declares its extra, registers by entry point, and is covered by a contract test.

Before a new backend can merge, the following must be true:

  • The class satisfies its Protocol exactly (signatures match the protocols reference).
  • Only core types cross the boundary; vendor types stay inside the adapter module.
  • The heavy dependency is imported lazily and raises MissingDependencyError with a pip install indx[<extra>] hint.
  • The extra is declared under [project.optional-dependencies], named after the registry key.
  • The class is registered via the correct [project.entry-points."indx.<group>"].
  • An adapter contract test proves the Protocol fit and core-type round-trip, with network/model mocked.
  • ruff, mypy --strict, pyright (strict), and the offline pytest suite all pass.