Skip to content

Dependency Rules

indx’s internal dependency graph is a directed acyclic graph (DAG) that points inward, toward a tiny domain core. That single rule — plus a handful of import constraints enforced in CI — is what lets you add a new backend without ever touching core/, and what keeps pip install indx fast and offline-capable. This page documents the direction, the per-package import budget, and the mechanics that hold it together.

For the bigger picture of how packages fit together, see the architecture overview. To act on these rules as a contributor, see adding a backend.

The graph has exactly one sink: core/. It depends on nothing else in the package — only on Pydantic v2 and the standard library. Everything else depends, directly or transitively, on core/ and on the slot protocols (the base.py modules), never the other way around.

Concretely:

  • Protocols are central. Each swappable slot declares its contract as a typing.Protocol in that sub-package’s base.py. Implementations import that protocol and core — and nothing else internal.
  • Implementations are leaves. A DoclingParser knows about the Parser protocol and core types. It does not know that QdrantStore or BGEM3Embedder exist.
  • Wiring lives at the edges. Only the registry/ resolves names to concrete classes, and only pipeline/ and cli/ orchestrate them — and they do so exclusively through protocols and the registry.

Because every edge points toward the protocols and core, there are no cycles, and a change to one backend cannot ripple into another.

Package responsibilities and import budget

Section titled “Package responsibilities and import budget”

The table below is the authoritative “may import” list. If a package imports something not in its row, the build is wrong.

PackageResponsibilityMay import (internal)
core/Domain model (Pydantic v2)(nothing internal)
utils/Cross-cutting helpers, require_extra()errors
errors.pyException hierarchy(nothing)
*/base.pySlot protocols (Parser, LLM, VLM, Embedder, Store, OutputWriter)core
parsers/*Parser implementationsparsers.base, core, utils
llm/*LLM implementationsllm.base, core, utils
vlm/*VLM implementationsvlm.base, core, utils
embed/*Embedder implementationsembed.base, core, utils
store/*Vector store implementationsstore.base, core, utils
output/*Output writersoutput.base, core, archive (indx writer only), utils
archive/.indx read/writecore, utils
config/indx.toml schema + loadercore, errors
registry/name to class resolution + plugin discoveryall */base.py; lazy-imports impls; config, errors
pipeline/Stage orchestrationcore, all */base.py, registry, config, utils
cli/Typer/Rich UIpipeline, config, registry, archive, core, utils

The clear pattern: each row may reach toward core and protocols, never sideways into a peer. The output writer’s permission to touch archive/ is the one extra edge — and only the default .indx writer uses it.

These four rules are checked, not merely encouraged. An import-linter contract declared in pyproject.toml runs in CI and fails the build on any violation.

  1. core/ imports nothing from elsewhere in indx. It is the leaf everyone depends on. No backend, no registry, no config may leak into the domain model — and core models never store a vendor type (e.g. a qdrant_client.PointStruct); adapters convert at their own edge.

  2. Implementations import only their own base.py plus core (and utils). A parser must not import a store; an LLM adapter must not import an embedder. No sibling implementation, no other slot. Structural typing makes this painless: an implementation satisfies its protocol without inheriting anything.

  3. The registry is the only place that imports concrete implementation classes — and it does so lazily. registry/builtins.py holds the name to class registration table, but the actual import of, say, QdrantStore happens only when that slot is selected. A missing extra therefore never breaks an unrelated code path.

  4. pipeline/ and cli/ depend on protocols, obtaining concretes only through the registry. They are written entirely against Parser, Store, Embedder, and friends. They never import indx.store.qdrant. This is the contract verified by the import-linter check.

Keeping the core light is not just about which packages exist — it is about when their dependencies load. Optional backends (a vendor SDK, torch, a database client) must be imported inside the method that needs them, never at module top level. A bare pip install indx pulls only Typer, Rich, Click, Pydantic v2, and pydantic-settings (TOML parsing is stdlib tomllib); importing any module must succeed even when no extra is installed.

When the heavy import does run and the extra is absent, raise a MissingDependencyError carrying the exact pip install hint.

# ❌ top-level import of an optional backend — breaks the light core
import qdrant_client # ImportError on a clean `pip install indx`
# ✅ lazy import inside the method, with an actionable error
def connect(self) -> None:
try:
from qdrant_client import QdrantClient
except ModuleNotFoundError as exc:
raise MissingDependencyError(
"The Qdrant store requires the 'qdrant' extra. "
"Install it with: pip install indx[qdrant]"
) from exc
self._client = QdrantClient(url=self.url)

The require_extra() helper in utils/lazy.py centralizes this pattern so every adapter raises the same friendly, actionable message. The registry surfaces it only when that slot is actually selected — so an unrelated run never trips over a backend you did not ask for.

Why this DAG means “add a backend without touching core”

Section titled “Why this DAG means “add a backend without touching core””

The payoff of the inward DAG is composability. Because:

  • backends depend only on a protocol, and
  • the registry resolves names lazily, and
  • nothing in core/, pipeline/, or cli/ names a concrete class,

a new parser, embedder, or store is a leaf you bolt on, not a change you thread through the system. First-party backends register in registry/builtins.py; third-party backends register via Python entry points (groups like indx.parsers, indx.stores, indx.embedders) and are discovered at runtime — so installing a plugin package is enough to make store = "weaviate" work in indx.toml, with no edit to indx itself.

Run the test for it without a single heavy dependency, too: unit tests substitute a protocol-typed fake for any slot, so they never import a real backend. A fake that satisfies the protocol is a drop-in.

The same property protects you in reverse: because the graph has no cycles and no sideways edges, adding or upgrading one backend cannot silently perturb another. The blast radius of any change is exactly one leaf.