Dependency Rules
indx’s internal dependency graph is a directed acyclic graph (DAG) that points inward, toward a tiny domain core. That single rule — plus a handful of import constraints enforced in CI — is what lets you add a new backend without ever touching core/, and what keeps pip install indx fast and offline-capable. This page documents the direction, the per-package import budget, and the mechanics that hold it together.
For the bigger picture of how packages fit together, see the architecture overview. To act on these rules as a contributor, see adding a backend.
Dependency direction: inward, always
Section titled “Dependency direction: inward, always”The graph has exactly one sink: core/. It depends on nothing else in the package — only on Pydantic v2 and the standard library. Everything else depends, directly or transitively, on core/ and on the slot protocols (the base.py modules), never the other way around.
Concretely:
- Protocols are central. Each swappable slot declares its contract as a
typing.Protocolin that sub-package’sbase.py. Implementations import that protocol andcore— and nothing else internal. - Implementations are leaves. A
DoclingParserknows about theParserprotocol and core types. It does not know thatQdrantStoreorBGEM3Embedderexist. - Wiring lives at the edges. Only the
registry/resolves names to concrete classes, and onlypipeline/andcli/orchestrate them — and they do so exclusively through protocols and the registry.
Because every edge points toward the protocols and core, there are no cycles, and a change to one backend cannot ripple into another.
Package responsibilities and import budget
Section titled “Package responsibilities and import budget”The table below is the authoritative “may import” list. If a package imports something not in its row, the build is wrong.
| Package | Responsibility | May import (internal) |
|---|---|---|
core/ | Domain model (Pydantic v2) | (nothing internal) |
utils/ | Cross-cutting helpers, require_extra() | errors |
errors.py | Exception hierarchy | (nothing) |
*/base.py | Slot protocols (Parser, LLM, VLM, Embedder, Store, OutputWriter) | core |
parsers/* | Parser implementations | parsers.base, core, utils |
llm/* | LLM implementations | llm.base, core, utils |
vlm/* | VLM implementations | vlm.base, core, utils |
embed/* | Embedder implementations | embed.base, core, utils |
store/* | Vector store implementations | store.base, core, utils |
output/* | Output writers | output.base, core, archive (indx writer only), utils |
archive/ | .indx read/write | core, utils |
config/ | indx.toml schema + loader | core, errors |
registry/ | name to class resolution + plugin discovery | all */base.py; lazy-imports impls; config, errors |
pipeline/ | Stage orchestration | core, all */base.py, registry, config, utils |
cli/ | Typer/Rich UI | pipeline, config, registry, archive, core, utils |
The clear pattern: each row may reach toward core and protocols, never sideways into a peer. The output writer’s permission to touch archive/ is the one extra edge — and only the default .indx writer uses it.
The enforced rules
Section titled “The enforced rules”These four rules are checked, not merely encouraged. An import-linter contract declared in pyproject.toml runs in CI and fails the build on any violation.
-
core/imports nothing from elsewhere in indx. It is the leaf everyone depends on. No backend, no registry, no config may leak into the domain model — and core models never store a vendor type (e.g. aqdrant_client.PointStruct); adapters convert at their own edge. -
Implementations import only their own
base.pypluscore(andutils). A parser must not import a store; an LLM adapter must not import an embedder. No sibling implementation, no other slot. Structural typing makes this painless: an implementation satisfies its protocol without inheriting anything. -
The registry is the only place that imports concrete implementation classes — and it does so lazily.
registry/builtins.pyholds the name to class registration table, but the actual import of, say,QdrantStorehappens only when that slot is selected. A missing extra therefore never breaks an unrelated code path. -
pipeline/andcli/depend on protocols, obtaining concretes only through the registry. They are written entirely againstParser,Store,Embedder, and friends. They neverimport indx.store.qdrant. This is the contract verified by the import-linter check.
No heavy import at module top level
Section titled “No heavy import at module top level”Keeping the core light is not just about which packages exist — it is about when their dependencies load. Optional backends (a vendor SDK, torch, a database client) must be imported inside the method that needs them, never at module top level. A bare pip install indx pulls only Typer, Rich, Click, Pydantic v2, and pydantic-settings (TOML parsing is stdlib tomllib); importing any module must succeed even when no extra is installed.
When the heavy import does run and the extra is absent, raise a MissingDependencyError carrying the exact pip install hint.
# ❌ top-level import of an optional backend — breaks the light coreimport qdrant_client # ImportError on a clean `pip install indx`
# ✅ lazy import inside the method, with an actionable errordef connect(self) -> None: try: from qdrant_client import QdrantClient except ModuleNotFoundError as exc: raise MissingDependencyError( "The Qdrant store requires the 'qdrant' extra. " "Install it with: pip install indx[qdrant]" ) from exc self._client = QdrantClient(url=self.url)The require_extra() helper in utils/lazy.py centralizes this pattern so every adapter raises the same friendly, actionable message. The registry surfaces it only when that slot is actually selected — so an unrelated run never trips over a backend you did not ask for.
Why this DAG means “add a backend without touching core”
Section titled “Why this DAG means “add a backend without touching core””The payoff of the inward DAG is composability. Because:
- backends depend only on a protocol, and
- the registry resolves names lazily, and
- nothing in
core/,pipeline/, orcli/names a concrete class,
a new parser, embedder, or store is a leaf you bolt on, not a change you thread through the system. First-party backends register in registry/builtins.py; third-party backends register via Python entry points (groups like indx.parsers, indx.stores, indx.embedders) and are discovered at runtime — so installing a plugin package is enough to make store = "weaviate" work in indx.toml, with no edit to indx itself.
Run the test for it without a single heavy dependency, too: unit tests substitute a protocol-typed fake for any slot, so they never import a real backend. A fake that satisfies the protocol is a drop-in.
The same property protects you in reverse: because the graph has no cycles and no sideways edges, adding or upgrading one backend cannot silently perturb another. The blast radius of any change is exactly one leaf.
Where to go next
Section titled “Where to go next”- Adding a backend — the step-by-step recipe that satisfies every rule on this page.
- Component protocols — the exact interfaces your implementation must satisfy.
- Architecture overview — how the packages and stages fit together end to end.
- Design principles — the values these rules serve.