index.json Schema
index.json is the serialized knowledge graph for a knowledge space — the human- and agent-readable heart of every build. It sits next to the .indx archive (and inside it) and is consistent with the canonical chunk shape that flows through the whole pipeline.
It contains the document graph, the chunks, the resolved relations, build metadata, and aggregate stats — everything except the vectors. Embeddings live separately under embeddings/ so the graph stays small, diffable, and easy to read.
Where it lives
Section titled “Where it lives”Running indx ./docs --out ./ai-ready writes index.json both as a top-level file in the output directory and as an entry inside the sealed archive:
ai-ready/├── handbook.indx # portable archive (contains a copy of index.json)├── index.json # the knowledge graph (this page)├── chunks/ # per-chunk files (same chunk shape + resolved context)└── embeddings/ # vectors + manifest (NOT in index.json)The on-disk index.json and the one inside the archive are identical. See the .indx archive reference for the full container layout and manifest.json.
Canonical chunk shape
Section titled “Canonical chunk shape”A chunk is the retrievable unit of content. It always remembers where it came from (source), what sits next to it (neighbors), and any typed edges it owns (relations). This exact shape appears in the chunks[] array and in the per-chunk files under chunks/.
{ "id": "chunk_0481", "text": "Enterprise data is retained for 90 days…", "source": { "path": "policies/data/retention.pdf", "folder": "policies/data", "type": "policy" }, "metadata": { "topics": ["retention", "compliance"], "summary": "90-day retention rule…" }, "neighbors": ["chunk_0480", "chunk_0482"], "relations": [ { "type": "references", "to": "legal/gdpr.md" } ]}| Field | Type | Notes |
|---|---|---|
id | string | Stable, zero-padded id matching ^chunk_\d+$, e.g. chunk_0481. |
text | string | The retrievable text payload. |
source | object | Provenance: path, folder, type (see source). |
metadata | object | Enriched fields: topics (string[]), summary (string), tags (string[]). |
neighbors | string[] | Adjacent chunk ids — typically the previous and next chunk. |
relations | array | Outgoing typed edges from this chunk (see relation). |
Top-level structure
Section titled “Top-level structure”The root of index.json is the serialized KnowledgeSpace. The required keys are version, root, documents, and chunks; metadata, stats, and relations are present in normal builds.
{ "version": "1.0", "root": "/abs/path/docs", "metadata": { "tool_version": "indx 0.4.2", "created_at": "2026-06-06T12:00:00Z", "embedder": { "name": "bge-m3", "dim": 1024 }, "config": { "...": "snapshot of resolved indx.toml" } }, "stats": { "documents": 128, "chunks": 1042, "relations": 380, "embeddings": 1042, "embed_dim": 1024, "types": { "policy": 40, "guide": 30, "table": 12 } }, "documents": [ { "id": "doc_0007", "path": "policies/data/retention.pdf", "folder": "policies/data", "lineage": ["policies", "policies/data"], "type": "policy", "topics": ["retention", "compliance"], "tags": ["gdpr", "data"], "summary": "Defines the 90-day retention rule…", "chunk_ids": ["chunk_0480", "chunk_0481", "chunk_0482"], "references": [ { "type": "references", "to": "legal/gdpr.md" } ], "referenced_by": [ { "type": "references", "to": "policies/data/retention.pdf", "from_id": "guides/onboarding.md" } ] } ], "chunks": [ /* objects in the canonical chunk shape above */ ], "relations": [ /* graph-level edges; an optional mirror of per-object edges */ ]}Top-level keys
Section titled “Top-level keys”| Key | Type | Required | Description |
|---|---|---|---|
version | string | yes | Knowledge-space schema version, e.g. "1.0". |
root | string | yes | Absolute path of the walked directory or ZIP. |
metadata | object | — | Build provenance: tool_version, created_at, embedder ({name, dim}), and a config snapshot of the resolved indx.toml. May also carry an errors array of non-fatal per-item failures. |
stats | object | — | Aggregate counts (see stats). |
documents | array | yes | The document graph (see document). |
chunks | array | yes | All chunks in the canonical shape. |
relations | array | — | Graph-level edges. An optional mirror of edges stored on individual chunks and documents. |
document
Section titled “document”A document is one source file, enriched: its folder lineage, detected type, LLM-derived topics/tags/summary, the ids of the chunks it produced, and the references in both directions.
| Field | Type | Notes |
|---|---|---|
id | string | Stable id matching ^doc_\d+$, e.g. doc_0007. |
path | string | Original path relative to root. |
folder | string | Containing folder (a lineage segment). |
lineage | string[] | Folder ancestry, root→leaf. |
type | string | Detected/enriched document type, e.g. policy. |
topics | string[] | Enriched topics. |
tags | string[] | Enriched tags. |
summary | string | LLM-generated summary (may be absent). |
chunk_ids | string[] | Chunks produced from this document, in order. |
references | array | Outgoing references resolved in the Relate stage. |
referenced_by | array | Incoming references (reverse edges). |
JSON Schema
Section titled “JSON Schema”The following is the abridged schema (JSON Schema draft 2020-12). It defines $defs for source, relation, chunk, document, and stats, and is authoritative for the shapes above.
{ "$schema": "https://json-schema.org/draft/2020-12/schema", "title": "indx index.json", "type": "object", "required": ["version", "root", "documents", "chunks"], "properties": { "version": { "type": "string" }, "root": { "type": "string" }, "metadata": { "type": "object" }, "stats": { "$ref": "#/$defs/stats" }, "documents": { "type": "array", "items": { "$ref": "#/$defs/document" } }, "chunks": { "type": "array", "items": { "$ref": "#/$defs/chunk" } }, "relations": { "type": "array", "items": { "$ref": "#/$defs/relation" } } }, "$defs": { "source": { "type": "object", "required": ["path", "folder", "type"], "properties": { "path": { "type": "string" }, "folder": { "type": "string" }, "type": { "type": "string" } } }, "relation": { "type": "object", "required": ["type", "to"], "properties": { "type": { "enum": ["sibling", "parent", "references", "continues", "duplicate-of"] }, "to": { "type": "string" }, "from_id": { "type": "string" }, "weight": { "type": "number", "minimum": 0, "maximum": 1 } } }, "chunk": { "type": "object", "required": ["id", "text", "source"], "properties": { "id": { "type": "string", "pattern": "^chunk_\\d+$" }, "text": { "type": "string" }, "source": { "$ref": "#/$defs/source" }, "metadata": { "type": "object", "properties": { "topics": { "type": "array", "items": { "type": "string" } }, "summary": { "type": "string" }, "tags": { "type": "array", "items": { "type": "string" } } } }, "neighbors": { "type": "array", "items": { "type": "string" } }, "relations": { "type": "array", "items": { "$ref": "#/$defs/relation" } } } }, "document": { "type": "object", "required": ["id", "path", "folder", "type"], "properties": { "id": { "type": "string", "pattern": "^doc_\\d+$" }, "path": { "type": "string" }, "folder": { "type": "string" }, "lineage": { "type": "array", "items": { "type": "string" } }, "type": { "type": "string" }, "topics": { "type": "array", "items": { "type": "string" } }, "tags": { "type": "array", "items": { "type": "string" } }, "summary": { "type": "string" }, "chunk_ids": { "type": "array", "items": { "type": "string" } }, "references": { "type": "array", "items": { "$ref": "#/$defs/relation" } }, "referenced_by":{ "type": "array", "items": { "$ref": "#/$defs/relation" } } } }, "stats": { "type": "object", "properties": { "documents": { "type": "integer" }, "chunks": { "type": "integer" }, "relations": { "type": "integer" }, "embeddings": { "type": "integer" }, "embed_dim": { "type": "integer" }, "types": { "type": "object", "additionalProperties": { "type": "integer" } } } } }}source
Section titled “source”Provenance for a chunk or parsed unit. All three fields are required: path (file path relative to the walked root), folder (containing folder, relative to root), and type (the detected/enriched document type).
relation
Section titled “relation”A typed, directed edge. type and to are required; from_id is omitted when the edge is stored on its owning object (the owner is the implicit source). weight is an optional confidence/similarity score in [0, 1].
type value | Meaning |
|---|---|
sibling | Same folder / same logical group. |
parent | Folder lineage / containment. |
references | Outgoing citation, link, or mention. |
continues | Next unit in a split sequence. |
duplicate-of | Near or exact duplicate content. |
The serialized SpaceStats — the same object returned by space.stats and emitted by indx inspect --json. embed_dim is the vector dimensionality (e.g. 1024 for bge-m3) and types is a document-count histogram keyed by detected type.
Per-chunk files (chunks/)
Section titled “Per-chunk files (chunks/)”The per-chunk files under chunks/ (chunk_0000.json, chunk_0001.json, …) use the same canonical chunk shape shown above. They may additionally carry a resolved context window — the neighbor chunks expanded inline — so an agent can read a single file and get the chunk plus its surrounding context without dereferencing ids against index.json.
Related references
Section titled “Related references”- .indx Archive Format — the ZIP container,
manifest.json, andembeddings/layout. - Data Models — the Pydantic v2 types behind every object here.
- Protocols — the
OutputWriterthat serializes aKnowledgeSpace. - Configuration Reference — what the
metadata.configsnapshot captures.