Running Fully Local & Air-Gapped

indx ships cloud defaults out of the box, but it is built so the entire pipeline can run on your own hardware with nothing leaving the network — you just have to opt into the local profile. This guide walks the regulated / air-gapped path end to end — install, pre-stage model weights for boxes with no internet, choose a no-dependency store, produce a self-contained .indx, and record an audit-grade manifest.

Why the local profile is offline

A bare zero-config run (indx <dir> --out <dir>) uses the cloud defaults — docling + openai:gpt-5-mini + openai:text-embedding-3-small (dim 1536) + qdrant — and needs an OPENAI_API_KEY. The local profile below is opt-in: you select it via pip install "indx[local]" or by naming the local backends explicitly (flags / indx.toml). Once selected, every value is local-capable, so the run never touches the internet:

Slot	Local profile (opt-in)	Where it runs
Parser	`docling`	Local — no API key, no network
LLM	`ollama:qwen2.5`	Local — via your Ollama daemon
VLM	`none`	No vision calls at all
Embedder	`bge-m3` (dim 1024)	Local — via FlagEmbedding / sentence-transformers
Store	`qdrant` (embedded) or `jsonl` (no DB)	Local on-disk
Output	`.indx`	Local self-contained archive

On top of the local profile, the core install ships zero-dependency fallbacks — a plaintext parser, a jsonl store, the none VLM, and the .indx + jsonl writers — so even a bare pip install indx can complete a full run offline.

Step 1 — Install the local stack

On a machine that still has internet (your build/staging box), install the curated offline bundle:

pip install "indx[local]"

indx[local] (aliased as indx[defaults] — both resolve to the same local bundle) is the one-line install of the recommended air-gapped stack: the Docling parser, the local Ollama LLM client, the local embedding runtime (FlagEmbedding / sentence-transformers + Torch), and the Qdrant client. If you prefer to assemble it yourself — for example to use the no-DB JSONL store and skip the Qdrant client entirely — install the pieces explicitly:

# Minimal offline floor: docling + local bge-m3 + jsonl no-DB store
pip install "indx[docling]" "indx[bge]"

The bare core (pip install indx) already includes the jsonl store and .indx/jsonl writers, so the command above gives you a complete offline pipeline with no database installed.

Step 2 — Pre-stage model weights for offline boxes

An air-gapped target has no internet, so the LLM, embedder, and parser model weights must be present before you disconnect. Stage them once on a connected machine and copy the caches across.

Pre-pull the Ollama model

The local profile LLM is ollama:qwen2.5, served by a local Ollama daemon. Pull the model while online:

ollama pull qwen2.5

On the air-gapped box, run an Ollama daemon and either pull from an internal mirror or copy the Ollama model directory (~/.ollama/models on Linux/macOS) from the staging machine. indx talks to the local Ollama endpoint only — it never reaches out to a model registry itself.

Pre-download the embedder and parser weights

The local profile embedder bge-m3 and the Docling parser download model weights from Hugging Face on first use. Warm those caches while online, then transport them:

# Trigger downloads once on a connected machine, then copy the caches.
export HF_HOME=/srv/indx-models/hf      # parser + embedder weights
# Name the local backends so the warm-up exercises local weights, not the cloud defaults.
indx ./sample-docs --out ./warmup --embedder bge-m3 --llm ollama:qwen2.5

On the offline box, point the same environment variables at the copied caches and enable offline mode so the libraries never attempt a network fetch:

export HF_HOME=/srv/indx-models/hf
export HF_HUB_OFFLINE=1                  # fail fast instead of reaching out
export TRANSFORMERS_OFFLINE=1

Step 3 — Choose your store: zero-dependency floor vs. embedded Qdrant

Both local store options keep vectors on disk and never contact a server. Pick based on corpus size.

JSONL — the absolute zero-dependency floor

The jsonl store ships in the core, needs no database and no extra at all, and writes vectors as newline-delimited records. It performs brute-force (linear-scan) similarity search, which is perfectly fine for small and medium corpora and maximizes portability inside the .indx archive.

indx ./docs --out ./ai-ready --store jsonl

This is the most defensible choice for an air-gapped deployment: there is nothing to install, nothing to operate, and the resulting archive is fully self-contained.

Qdrant in embedded / local mode

Qdrant’s embedded mode runs in-process against a local on-disk path with no server, so you keep local-first guarantees while getting a real ANN index that scales to larger corpora. The same client code later points at a self-hosted server if you grow into one.

indx ./docs --out ./ai-ready --store qdrant

Store	Search	Best for	Network
`jsonl` (no DB)	Brute-force linear scan	Small/medium corpora; maximum portability	None
`qdrant` (embedded)	ANN index, on-disk	Larger corpora; same code scales to a server	None (embedded)

Step 4 — Produce a self-contained `.indx` with vectors inline

The default output writer is .indx — a Zip container holding manifest.json, the index.json knowledge graph, JSONL chunk shards, optional vector blobs, and enrichments. When built with the JSONL store, vectors are stored inline, so the archive is completely self-contained: it re-opens and answers queries with no external service.

# Fully offline build → portable, self-contained archive
indx ./docs --out ./ai-ready --store jsonl

# Sanity-check structure and retrieval, still offline
indx inspect ./ai-ready/handbook.indx
indx query  ./ai-ready/handbook.indx "onboarding checklist"

The same flow from the SDK:

from indx import DirectoryPipeline, KnowledgeSpace

space = DirectoryPipeline(store="jsonl").run("./docs", "./ai-ready")
print(space.stats)

# Re-load later on any machine, no re-processing, no network
space = KnowledgeSpace.load("./ai-ready/handbook.indx")
hits = space.search("onboarding checklist", k=5)

Because the archive carries the embedder’s name and dim (1024 for bge-m3) in its manifest, a consumer can detect a model/dimension mismatch before querying. See The .indx archive and index.json for the on-disk layout.

Step 5 — Capture the reproducibility manifest for audit

Every run records its provenance into the .indx manifest: the producing build (tool_version), the schema version (indx_version), the chosen slot backends, the embedder name and dim, and per-member checksums. This is what makes a knowledge space auditable and re-creatable — given the same inputs, config, and model versions, a run is reproducible and the manifest tells you exactly how each space was produced.

// manifest.json (illustrative excerpt)
{
  "indx_version": "1.0",
  "tool_version": "indx 0.4.2",
  "slots": {
    "parser": "docling",
    "llm": "ollama:qwen2.5",
    "vlm": "none",
    "embedder": "bge-m3",
    "store": "jsonl",
    "output": "indx"
  },
  "embedder": { "name": "bge-m3", "dim": 1024 }
}

For a fully byte-stable index.json, seed any randomness and pin model identifiers. See Reproducibility for the full recipe (seeding, deterministic serialization, and golden-file verification).

The safe pattern if a cloud component is ever introduced

If policy later allows a single cloud backend (say, a hosted LLM for higher-quality enrichment), keep egress controlled by inserting a redaction stage before Enrich. Redaction is a first-class extension point precisely so sensitive content can be stripped before any egress-capable component sees it.

Because stages obey a uniform run(ctx: SpaceContext) -> SpaceContext contract and communicate only through the shared SpaceContext, a redaction stage drops cleanly into the ordered pipeline:

01 Walk → 02 Parse → 03 Chunk → 04 Relate → [Redact] → 05 Enrich → 06 Embed+Pack

This keeps Walk/Parse/Chunk/Relate fully local and ensures the only stage that could egress receives already-sanitized text. See Custom stages for how to author and insert one, and Enrichment with LLM/VLM for the enrichment slot itself.

Checklist for an air-gapped deployment

Install indx[local] (or indx[docling] + indx[bge] for the no-DB floor) on a connected staging box.
ollama pull qwen2.5; copy ~/.ollama/models to the target.
Warm and copy the Hugging Face cache (HF_HOME); set HF_HUB_OFFLINE=1 / TRANSFORMERS_OFFLINE=1 on the target.
Choose --store jsonl (zero-dependency floor) or embedded --store qdrant.
Build → inspect → query entirely offline; confirm no outbound traffic.
Confirm indx.toml names only local backends — no openai/anthropic/hosted-Qdrant.
Archive the .indx plus its manifest for audit and reproducibility.

Choosing a store — JSONL vs. Qdrant vs. the rest.
Reproducibility — seeding, deterministic output, golden files.
Extras reference — every pip install "indx[...]" target.
Configuration guide and Configuration reference — pinning slots in indx.toml.
The .indx archive — the self-contained, versioned format.