Skip to content

Running Fully Local & Air-Gapped

indx ships cloud defaults out of the box, but it is built so the entire pipeline can run on your own hardware with nothing leaving the network — you just have to opt into the local profile. This guide walks the regulated / air-gapped path end to end — install, pre-stage model weights for boxes with no internet, choose a no-dependency store, produce a self-contained .indx, and record an audit-grade manifest.

A bare zero-config run (indx <dir> --out <dir>) uses the cloud defaultsdocling + openai:gpt-5-mini + openai:text-embedding-3-small (dim 1536) + qdrant — and needs an OPENAI_API_KEY. The local profile below is opt-in: you select it via pip install "indx[local]" or by naming the local backends explicitly (flags / indx.toml). Once selected, every value is local-capable, so the run never touches the internet:

SlotLocal profile (opt-in)Where it runs
ParserdoclingLocal — no API key, no network
LLMollama:qwen2.5Local — via your Ollama daemon
VLMnoneNo vision calls at all
Embedderbge-m3 (dim 1024)Local — via FlagEmbedding / sentence-transformers
Storeqdrant (embedded) or jsonl (no DB)Local on-disk
Output.indxLocal self-contained archive

On top of the local profile, the core install ships zero-dependency fallbacks — a plaintext parser, a jsonl store, the none VLM, and the .indx + jsonl writers — so even a bare pip install indx can complete a full run offline.

On a machine that still has internet (your build/staging box), install the curated offline bundle:

Terminal window
pip install "indx[local]"

indx[local] (aliased as indx[defaults] — both resolve to the same local bundle) is the one-line install of the recommended air-gapped stack: the Docling parser, the local Ollama LLM client, the local embedding runtime (FlagEmbedding / sentence-transformers + Torch), and the Qdrant client. If you prefer to assemble it yourself — for example to use the no-DB JSONL store and skip the Qdrant client entirely — install the pieces explicitly:

Terminal window
# Minimal offline floor: docling + local bge-m3 + jsonl no-DB store
pip install "indx[docling]" "indx[bge]"

The bare core (pip install indx) already includes the jsonl store and .indx/jsonl writers, so the command above gives you a complete offline pipeline with no database installed.

Step 2 — Pre-stage model weights for offline boxes

Section titled “Step 2 — Pre-stage model weights for offline boxes”

An air-gapped target has no internet, so the LLM, embedder, and parser model weights must be present before you disconnect. Stage them once on a connected machine and copy the caches across.

The local profile LLM is ollama:qwen2.5, served by a local Ollama daemon. Pull the model while online:

Terminal window
ollama pull qwen2.5

On the air-gapped box, run an Ollama daemon and either pull from an internal mirror or copy the Ollama model directory (~/.ollama/models on Linux/macOS) from the staging machine. indx talks to the local Ollama endpoint only — it never reaches out to a model registry itself.

Pre-download the embedder and parser weights

Section titled “Pre-download the embedder and parser weights”

The local profile embedder bge-m3 and the Docling parser download model weights from Hugging Face on first use. Warm those caches while online, then transport them:

Terminal window
# Trigger downloads once on a connected machine, then copy the caches.
export HF_HOME=/srv/indx-models/hf # parser + embedder weights
# Name the local backends so the warm-up exercises local weights, not the cloud defaults.
indx ./sample-docs --out ./warmup --embedder bge-m3 --llm ollama:qwen2.5

On the offline box, point the same environment variables at the copied caches and enable offline mode so the libraries never attempt a network fetch:

Terminal window
export HF_HOME=/srv/indx-models/hf
export HF_HUB_OFFLINE=1 # fail fast instead of reaching out
export TRANSFORMERS_OFFLINE=1

Step 3 — Choose your store: zero-dependency floor vs. embedded Qdrant

Section titled “Step 3 — Choose your store: zero-dependency floor vs. embedded Qdrant”

Both local store options keep vectors on disk and never contact a server. Pick based on corpus size.

JSONL — the absolute zero-dependency floor

Section titled “JSONL — the absolute zero-dependency floor”

The jsonl store ships in the core, needs no database and no extra at all, and writes vectors as newline-delimited records. It performs brute-force (linear-scan) similarity search, which is perfectly fine for small and medium corpora and maximizes portability inside the .indx archive.

Terminal window
indx ./docs --out ./ai-ready --store jsonl

This is the most defensible choice for an air-gapped deployment: there is nothing to install, nothing to operate, and the resulting archive is fully self-contained.

Qdrant’s embedded mode runs in-process against a local on-disk path with no server, so you keep local-first guarantees while getting a real ANN index that scales to larger corpora. The same client code later points at a self-hosted server if you grow into one.

Terminal window
indx ./docs --out ./ai-ready --store qdrant
StoreSearchBest forNetwork
jsonl (no DB)Brute-force linear scanSmall/medium corpora; maximum portabilityNone
qdrant (embedded)ANN index, on-diskLarger corpora; same code scales to a serverNone (embedded)

Step 4 — Produce a self-contained .indx with vectors inline

Section titled “Step 4 — Produce a self-contained .indx with vectors inline”

The default output writer is .indx — a Zip container holding manifest.json, the index.json knowledge graph, JSONL chunk shards, optional vector blobs, and enrichments. When built with the JSONL store, vectors are stored inline, so the archive is completely self-contained: it re-opens and answers queries with no external service.

Terminal window
# Fully offline build → portable, self-contained archive
indx ./docs --out ./ai-ready --store jsonl
# Sanity-check structure and retrieval, still offline
indx inspect ./ai-ready/handbook.indx
indx query ./ai-ready/handbook.indx "onboarding checklist"

The same flow from the SDK:

from indx import DirectoryPipeline, KnowledgeSpace
space = DirectoryPipeline(store="jsonl").run("./docs", "./ai-ready")
print(space.stats)
# Re-load later on any machine, no re-processing, no network
space = KnowledgeSpace.load("./ai-ready/handbook.indx")
hits = space.search("onboarding checklist", k=5)

Because the archive carries the embedder’s name and dim (1024 for bge-m3) in its manifest, a consumer can detect a model/dimension mismatch before querying. See The .indx archive and index.json for the on-disk layout.

Step 5 — Capture the reproducibility manifest for audit

Section titled “Step 5 — Capture the reproducibility manifest for audit”

Every run records its provenance into the .indx manifest: the producing build (tool_version), the schema version (indx_version), the chosen slot backends, the embedder name and dim, and per-member checksums. This is what makes a knowledge space auditable and re-creatable — given the same inputs, config, and model versions, a run is reproducible and the manifest tells you exactly how each space was produced.

// manifest.json (illustrative excerpt)
{
"indx_version": "1.0",
"tool_version": "indx 0.4.2",
"slots": {
"parser": "docling",
"llm": "ollama:qwen2.5",
"vlm": "none",
"embedder": "bge-m3",
"store": "jsonl",
"output": "indx"
},
"embedder": { "name": "bge-m3", "dim": 1024 }
}

For a fully byte-stable index.json, seed any randomness and pin model identifiers. See Reproducibility for the full recipe (seeding, deterministic serialization, and golden-file verification).

The safe pattern if a cloud component is ever introduced

Section titled “The safe pattern if a cloud component is ever introduced”

If policy later allows a single cloud backend (say, a hosted LLM for higher-quality enrichment), keep egress controlled by inserting a redaction stage before Enrich. Redaction is a first-class extension point precisely so sensitive content can be stripped before any egress-capable component sees it.

Because stages obey a uniform run(ctx: SpaceContext) -> SpaceContext contract and communicate only through the shared SpaceContext, a redaction stage drops cleanly into the ordered pipeline:

01 Walk → 02 Parse → 03 Chunk → 04 Relate → [Redact] → 05 Enrich → 06 Embed+Pack

This keeps Walk/Parse/Chunk/Relate fully local and ensures the only stage that could egress receives already-sanitized text. See Custom stages for how to author and insert one, and Enrichment with LLM/VLM for the enrichment slot itself.

  • Install indx[local] (or indx[docling] + indx[bge] for the no-DB floor) on a connected staging box.
  • ollama pull qwen2.5; copy ~/.ollama/models to the target.
  • Warm and copy the Hugging Face cache (HF_HOME); set HF_HUB_OFFLINE=1 / TRANSFORMERS_OFFLINE=1 on the target.
  • Choose --store jsonl (zero-dependency floor) or embedded --store qdrant.
  • Build → inspectquery entirely offline; confirm no outbound traffic.
  • Confirm indx.toml names only local backends — no openai/anthropic/hosted-Qdrant.
  • Archive the .indx plus its manifest for audit and reproducibility.