Running Fully Local & Air-Gapped
indx ships cloud defaults out of the box, but it is built so the entire pipeline can run on your own hardware with nothing leaving the network — you just have to opt into the local profile. This guide walks the regulated / air-gapped path end to end — install, pre-stage model weights for boxes with no internet, choose a no-dependency store, produce a self-contained .indx, and record an audit-grade manifest.
Why the local profile is offline
Section titled “Why the local profile is offline”A bare zero-config run (indx <dir> --out <dir>) uses the cloud defaults — docling + openai:gpt-5-mini + openai:text-embedding-3-small (dim 1536) + qdrant — and needs an OPENAI_API_KEY. The local profile below is opt-in: you select it via pip install "indx[local]" or by naming the local backends explicitly (flags / indx.toml). Once selected, every value is local-capable, so the run never touches the internet:
| Slot | Local profile (opt-in) | Where it runs |
|---|---|---|
| Parser | docling | Local — no API key, no network |
| LLM | ollama:qwen2.5 | Local — via your Ollama daemon |
| VLM | none | No vision calls at all |
| Embedder | bge-m3 (dim 1024) | Local — via FlagEmbedding / sentence-transformers |
| Store | qdrant (embedded) or jsonl (no DB) | Local on-disk |
| Output | .indx | Local self-contained archive |
On top of the local profile, the core install ships zero-dependency fallbacks — a plaintext parser, a jsonl store, the none VLM, and the .indx + jsonl writers — so even a bare pip install indx can complete a full run offline.
Step 1 — Install the local stack
Section titled “Step 1 — Install the local stack”On a machine that still has internet (your build/staging box), install the curated offline bundle:
pip install "indx[local]"indx[local] (aliased as indx[defaults] — both resolve to the same local bundle) is the one-line install of the recommended air-gapped stack: the Docling parser, the local Ollama LLM client, the local embedding runtime (FlagEmbedding / sentence-transformers + Torch), and the Qdrant client. If you prefer to assemble it yourself — for example to use the no-DB JSONL store and skip the Qdrant client entirely — install the pieces explicitly:
# Minimal offline floor: docling + local bge-m3 + jsonl no-DB storepip install "indx[docling]" "indx[bge]"The bare core (pip install indx) already includes the jsonl store and .indx/jsonl writers, so the command above gives you a complete offline pipeline with no database installed.
Step 2 — Pre-stage model weights for offline boxes
Section titled “Step 2 — Pre-stage model weights for offline boxes”An air-gapped target has no internet, so the LLM, embedder, and parser model weights must be present before you disconnect. Stage them once on a connected machine and copy the caches across.
Pre-pull the Ollama model
Section titled “Pre-pull the Ollama model”The local profile LLM is ollama:qwen2.5, served by a local Ollama daemon. Pull the model while online:
ollama pull qwen2.5On the air-gapped box, run an Ollama daemon and either pull from an internal mirror or copy the Ollama model directory (~/.ollama/models on Linux/macOS) from the staging machine. indx talks to the local Ollama endpoint only — it never reaches out to a model registry itself.
Pre-download the embedder and parser weights
Section titled “Pre-download the embedder and parser weights”The local profile embedder bge-m3 and the Docling parser download model weights from Hugging Face on first use. Warm those caches while online, then transport them:
# Trigger downloads once on a connected machine, then copy the caches.export HF_HOME=/srv/indx-models/hf # parser + embedder weights# Name the local backends so the warm-up exercises local weights, not the cloud defaults.indx ./sample-docs --out ./warmup --embedder bge-m3 --llm ollama:qwen2.5On the offline box, point the same environment variables at the copied caches and enable offline mode so the libraries never attempt a network fetch:
export HF_HOME=/srv/indx-models/hfexport HF_HUB_OFFLINE=1 # fail fast instead of reaching outexport TRANSFORMERS_OFFLINE=1Step 3 — Choose your store: zero-dependency floor vs. embedded Qdrant
Section titled “Step 3 — Choose your store: zero-dependency floor vs. embedded Qdrant”Both local store options keep vectors on disk and never contact a server. Pick based on corpus size.
JSONL — the absolute zero-dependency floor
Section titled “JSONL — the absolute zero-dependency floor”The jsonl store ships in the core, needs no database and no extra at all, and writes vectors as newline-delimited records. It performs brute-force (linear-scan) similarity search, which is perfectly fine for small and medium corpora and maximizes portability inside the .indx archive.
indx ./docs --out ./ai-ready --store jsonlThis is the most defensible choice for an air-gapped deployment: there is nothing to install, nothing to operate, and the resulting archive is fully self-contained.
Qdrant in embedded / local mode
Section titled “Qdrant in embedded / local mode”Qdrant’s embedded mode runs in-process against a local on-disk path with no server, so you keep local-first guarantees while getting a real ANN index that scales to larger corpora. The same client code later points at a self-hosted server if you grow into one.
indx ./docs --out ./ai-ready --store qdrant| Store | Search | Best for | Network |
|---|---|---|---|
jsonl (no DB) | Brute-force linear scan | Small/medium corpora; maximum portability | None |
qdrant (embedded) | ANN index, on-disk | Larger corpora; same code scales to a server | None (embedded) |
Step 4 — Produce a self-contained .indx with vectors inline
Section titled “Step 4 — Produce a self-contained .indx with vectors inline”The default output writer is .indx — a Zip container holding manifest.json, the index.json knowledge graph, JSONL chunk shards, optional vector blobs, and enrichments. When built with the JSONL store, vectors are stored inline, so the archive is completely self-contained: it re-opens and answers queries with no external service.
# Fully offline build → portable, self-contained archiveindx ./docs --out ./ai-ready --store jsonl
# Sanity-check structure and retrieval, still offlineindx inspect ./ai-ready/handbook.indxindx query ./ai-ready/handbook.indx "onboarding checklist"The same flow from the SDK:
from indx import DirectoryPipeline, KnowledgeSpace
space = DirectoryPipeline(store="jsonl").run("./docs", "./ai-ready")print(space.stats)
# Re-load later on any machine, no re-processing, no networkspace = KnowledgeSpace.load("./ai-ready/handbook.indx")hits = space.search("onboarding checklist", k=5)Because the archive carries the embedder’s name and dim (1024 for bge-m3) in its manifest, a consumer can detect a model/dimension mismatch before querying. See The .indx archive and index.json for the on-disk layout.
Step 5 — Capture the reproducibility manifest for audit
Section titled “Step 5 — Capture the reproducibility manifest for audit”Every run records its provenance into the .indx manifest: the producing build (tool_version), the schema version (indx_version), the chosen slot backends, the embedder name and dim, and per-member checksums. This is what makes a knowledge space auditable and re-creatable — given the same inputs, config, and model versions, a run is reproducible and the manifest tells you exactly how each space was produced.
// manifest.json (illustrative excerpt){ "indx_version": "1.0", "tool_version": "indx 0.4.2", "slots": { "parser": "docling", "llm": "ollama:qwen2.5", "vlm": "none", "embedder": "bge-m3", "store": "jsonl", "output": "indx" }, "embedder": { "name": "bge-m3", "dim": 1024 }}For a fully byte-stable index.json, seed any randomness and pin model identifiers. See Reproducibility for the full recipe (seeding, deterministic serialization, and golden-file verification).
The safe pattern if a cloud component is ever introduced
Section titled “The safe pattern if a cloud component is ever introduced”If policy later allows a single cloud backend (say, a hosted LLM for higher-quality enrichment), keep egress controlled by inserting a redaction stage before Enrich. Redaction is a first-class extension point precisely so sensitive content can be stripped before any egress-capable component sees it.
Because stages obey a uniform run(ctx: SpaceContext) -> SpaceContext contract and communicate only through the shared SpaceContext, a redaction stage drops cleanly into the ordered pipeline:
01 Walk → 02 Parse → 03 Chunk → 04 Relate → [Redact] → 05 Enrich → 06 Embed+PackThis keeps Walk/Parse/Chunk/Relate fully local and ensures the only stage that could egress receives already-sanitized text. See Custom stages for how to author and insert one, and Enrichment with LLM/VLM for the enrichment slot itself.
Checklist for an air-gapped deployment
Section titled “Checklist for an air-gapped deployment”- Install
indx[local](orindx[docling]+indx[bge]for the no-DB floor) on a connected staging box. -
ollama pull qwen2.5; copy~/.ollama/modelsto the target. - Warm and copy the Hugging Face cache (
HF_HOME); setHF_HUB_OFFLINE=1/TRANSFORMERS_OFFLINE=1on the target. - Choose
--store jsonl(zero-dependency floor) or embedded--store qdrant. - Build →
inspect→queryentirely offline; confirm no outbound traffic. - Confirm
indx.tomlnames only local backends — noopenai/anthropic/hosted-Qdrant. - Archive the
.indxplus its manifest for audit and reproducibility.
Related
Section titled “Related”- Choosing a store — JSONL vs. Qdrant vs. the rest.
- Reproducibility — seeding, deterministic output, golden files.
- Extras reference — every
pip install "indx[...]"target. - Configuration guide and Configuration reference — pinning slots in
indx.toml. - The .indx archive — the self-contained, versioned format.