Skip to content

Contributing Overview

indx is open-source (Apache-2.0) and built to be extended. Whether you are fixing a bug, adding a vector store, or writing docs, this page is your starting point: the dev stack, the workflow, and exactly what a pull request has to satisfy before it merges.

indx is a strictly-typed, dependency-light Python project. Knowing the toolchain up front saves round-trips with CI.

AreaChoiceNotes
LanguagePython 3.11+ (3.11–3.13)tomllib, Self, ExceptionGroup/except* all require 3.11 as the floor.
TypingStrict, full hintsfrom __future__ import annotations in every module; mypy --strict is the gate, pyright (strict) for editor feedback. No bare Any.
Data modelsPydantic v2All boundary types (config, index.json, domain records); plain dataclasses only for hot-loop internals.
CLITyper + RichTyper builds the command tree from typed signatures; Rich renders progress, tables, and panels.
Lint + formatRuffOne tool replaces flake8 + black + isort.
Type checkmypy (gate) + pyright (editor)mypy is canonical when the two disagree.
TestspytestOffline by default; VCR/mocking for network calls; golden-file tests for artifacts.
PluginsEntry points + registryBuilt-ins register internally; third parties advertise entry points under indx.parsers, indx.stores, etc.
ExtrasOptional dependenciesHeavy/cloud backends ship behind pip install indx[<extra>]; the core stays light.
BuildHatchling + PEP 621python -m build must produce a clean light-core install.
Project docsAstro Starlight (this site) + MkDocs API referenceTwo surfaces — the user-facing docs you are reading are built with Astro Starlight (npm run build/npm run dev to preview); the SDK/API reference is generated from typed docstrings with MkDocs + Material + mkdocstrings (mkdocs build). Starlight content edits need a Starlight build; docstring/API changes need an MkDocs build.

For the full normative rules behind these choices, see Coding Standards, and for the rationale and trade-offs see the architecture overview.

A few load-bearing rules shape almost every contribution:

  • Protocol-first. Behaviour is defined as a typed Protocol (Parser, LLM, VLM, Embedder, Store, OutputWriter, plus the Stage protocol) before any implementation exists. Code depends on interfaces, never on concrete backends.
  • Dependency direction points inward. core/ imports nothing internal; implementations import only their own slot’s protocol and core — never a sibling or another slot. An import-linter contract in CI enforces this.
  • Convert at the edge. Vendor types stay inside the adapter module. A Document never holds a raw provider response.
  • Stages share one context. Every stage obeys run(ctx: SpaceContext) -> SpaceContext and returns the same mutated context. See Pipeline and Stages.

Branches follow type/short-slug:

feat/lancedb-store
fix/parse-empty-pdf
docs/coding-standards

Commit messages use Conventional Commits. The recognised types are:

TypeUse for
featA new feature or backend
fixA bug fix
docsDocumentation only
refactorA code change that neither fixes a bug nor adds a feature
testAdding or fixing tests
choreTooling, deps, housekeeping
perfA performance improvement

Breaking changes append ! to the type (for example feat!:) and add a BREAKING CHANGE: footer. Keep the subject in imperative mood and 72 characters or fewer.

feat!: rename the `store` registry key for the qdrant backend
BREAKING CHANGE: `store = "qdrant-local"` is now `store = "qdrant"`.

Every pull request must satisfy this checklist before merge. CI re-runs all of it, so check it locally first.

  • ruff check and ruff format --check pass
  • mypy --strict and pyright (strict) pass with zero errors
  • pytest (offline suite) passes, and new code is covered
  • Golden files updated deliberately if the index.json / .indx shape changed — with a schema_version bump and a documented migration
  • New backend? Its extra is declared in pyproject.toml, its entry point is registered, and an adapter contract test is added
  • Public API is documented (Google-style docstrings), and CLI ⇔ SDK parity is preserved
  • Starlight content changed? npm run build succeeds (preview locally with npm run dev)
  • mkdocs build succeeds if the SDK/API reference (docstrings) changed

CI is the same checklist, automated, and a red check blocks merge. The GitHub Actions matrix runs across Python 3.11 / 3.12 / 3.13 (Linux primary, macOS/Windows smoke) and enforces:

  • Ruff — lint and format-check.
  • mypy + pyright — strict type checking, with mypy as the canonical gate.
  • pytest with coverage — the offline suite only; real provider calls never run in the default suite. Coverage targets are ≥ 90% for core/ and stages/, ≥ 80% for adapters, and new code must not lower coverage.
  • python -m build — the package builds and the light core installs cleanly with no extras.

A separate, manually triggered job runs the opt-in --live integration tests against real backends (Qdrant, Ollama, and downloaded models). These are gated by @pytest.mark.integration and skip automatically when the required extra or service is absent.

  • Adding a Backend — the fixed recipe for a new parser, LLM, VLM, embedder, store, or writer.
  • Coding Standards — the enforceable rules on typing, models, errors, and layout.
  • Testing — unit vs integration, fakes and cassettes, and the golden-file workflow.
  • Architecture Overview — the dependency graph and design principles you build within.

Found a bug or have an idea? Open an issue or PR on the GitHub repository. If a rule here gets in your way, propose a change to the rule in a PR rather than routing around it.