Skip to content

index.json Schema

index.json is the serialized knowledge graph for a knowledge space — the human- and agent-readable heart of every build. It sits next to the .indx archive (and inside it) and is consistent with the canonical chunk shape that flows through the whole pipeline.

It contains the document graph, the chunks, the resolved relations, build metadata, and aggregate stats — everything except the vectors. Embeddings live separately under embeddings/ so the graph stays small, diffable, and easy to read.

Running indx ./docs --out ./ai-ready writes index.json both as a top-level file in the output directory and as an entry inside the sealed archive:

ai-ready/
├── handbook.indx # portable archive (contains a copy of index.json)
├── index.json # the knowledge graph (this page)
├── chunks/ # per-chunk files (same chunk shape + resolved context)
└── embeddings/ # vectors + manifest (NOT in index.json)

The on-disk index.json and the one inside the archive are identical. See the .indx archive reference for the full container layout and manifest.json.

A chunk is the retrievable unit of content. It always remembers where it came from (source), what sits next to it (neighbors), and any typed edges it owns (relations). This exact shape appears in the chunks[] array and in the per-chunk files under chunks/.

{
"id": "chunk_0481",
"text": "Enterprise data is retained for 90 days…",
"source": { "path": "policies/data/retention.pdf", "folder": "policies/data", "type": "policy" },
"metadata": { "topics": ["retention", "compliance"], "summary": "90-day retention rule…" },
"neighbors": ["chunk_0480", "chunk_0482"],
"relations": [ { "type": "references", "to": "legal/gdpr.md" } ]
}
FieldTypeNotes
idstringStable, zero-padded id matching ^chunk_\d+$, e.g. chunk_0481.
textstringThe retrievable text payload.
sourceobjectProvenance: path, folder, type (see source).
metadataobjectEnriched fields: topics (string[]), summary (string), tags (string[]).
neighborsstring[]Adjacent chunk ids — typically the previous and next chunk.
relationsarrayOutgoing typed edges from this chunk (see relation).

The root of index.json is the serialized KnowledgeSpace. The required keys are version, root, documents, and chunks; metadata, stats, and relations are present in normal builds.

{
"version": "1.0",
"root": "/abs/path/docs",
"metadata": {
"tool_version": "indx 0.4.2",
"created_at": "2026-06-06T12:00:00Z",
"embedder": { "name": "bge-m3", "dim": 1024 },
"config": { "...": "snapshot of resolved indx.toml" }
},
"stats": {
"documents": 128, "chunks": 1042, "relations": 380,
"embeddings": 1042, "embed_dim": 1024,
"types": { "policy": 40, "guide": 30, "table": 12 }
},
"documents": [
{
"id": "doc_0007",
"path": "policies/data/retention.pdf",
"folder": "policies/data",
"lineage": ["policies", "policies/data"],
"type": "policy",
"topics": ["retention", "compliance"],
"tags": ["gdpr", "data"],
"summary": "Defines the 90-day retention rule…",
"chunk_ids": ["chunk_0480", "chunk_0481", "chunk_0482"],
"references": [ { "type": "references", "to": "legal/gdpr.md" } ],
"referenced_by": [ { "type": "references", "to": "policies/data/retention.pdf", "from_id": "guides/onboarding.md" } ]
}
],
"chunks": [ /* objects in the canonical chunk shape above */ ],
"relations": [ /* graph-level edges; an optional mirror of per-object edges */ ]
}
KeyTypeRequiredDescription
versionstringyesKnowledge-space schema version, e.g. "1.0".
rootstringyesAbsolute path of the walked directory or ZIP.
metadataobjectBuild provenance: tool_version, created_at, embedder ({name, dim}), and a config snapshot of the resolved indx.toml. May also carry an errors array of non-fatal per-item failures.
statsobjectAggregate counts (see stats).
documentsarrayyesThe document graph (see document).
chunksarrayyesAll chunks in the canonical shape.
relationsarrayGraph-level edges. An optional mirror of edges stored on individual chunks and documents.

A document is one source file, enriched: its folder lineage, detected type, LLM-derived topics/tags/summary, the ids of the chunks it produced, and the references in both directions.

FieldTypeNotes
idstringStable id matching ^doc_\d+$, e.g. doc_0007.
pathstringOriginal path relative to root.
folderstringContaining folder (a lineage segment).
lineagestring[]Folder ancestry, root→leaf.
typestringDetected/enriched document type, e.g. policy.
topicsstring[]Enriched topics.
tagsstring[]Enriched tags.
summarystringLLM-generated summary (may be absent).
chunk_idsstring[]Chunks produced from this document, in order.
referencesarrayOutgoing references resolved in the Relate stage.
referenced_byarrayIncoming references (reverse edges).

The following is the abridged schema (JSON Schema draft 2020-12). It defines $defs for source, relation, chunk, document, and stats, and is authoritative for the shapes above.

{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "indx index.json",
"type": "object",
"required": ["version", "root", "documents", "chunks"],
"properties": {
"version": { "type": "string" },
"root": { "type": "string" },
"metadata": { "type": "object" },
"stats": { "$ref": "#/$defs/stats" },
"documents": { "type": "array", "items": { "$ref": "#/$defs/document" } },
"chunks": { "type": "array", "items": { "$ref": "#/$defs/chunk" } },
"relations": { "type": "array", "items": { "$ref": "#/$defs/relation" } }
},
"$defs": {
"source": {
"type": "object",
"required": ["path", "folder", "type"],
"properties": {
"path": { "type": "string" },
"folder": { "type": "string" },
"type": { "type": "string" }
}
},
"relation": {
"type": "object",
"required": ["type", "to"],
"properties": {
"type": { "enum": ["sibling", "parent", "references", "continues", "duplicate-of"] },
"to": { "type": "string" },
"from_id": { "type": "string" },
"weight": { "type": "number", "minimum": 0, "maximum": 1 }
}
},
"chunk": {
"type": "object",
"required": ["id", "text", "source"],
"properties": {
"id": { "type": "string", "pattern": "^chunk_\\d+$" },
"text": { "type": "string" },
"source": { "$ref": "#/$defs/source" },
"metadata": {
"type": "object",
"properties": {
"topics": { "type": "array", "items": { "type": "string" } },
"summary": { "type": "string" },
"tags": { "type": "array", "items": { "type": "string" } }
}
},
"neighbors": { "type": "array", "items": { "type": "string" } },
"relations": { "type": "array", "items": { "$ref": "#/$defs/relation" } }
}
},
"document": {
"type": "object",
"required": ["id", "path", "folder", "type"],
"properties": {
"id": { "type": "string", "pattern": "^doc_\\d+$" },
"path": { "type": "string" },
"folder": { "type": "string" },
"lineage": { "type": "array", "items": { "type": "string" } },
"type": { "type": "string" },
"topics": { "type": "array", "items": { "type": "string" } },
"tags": { "type": "array", "items": { "type": "string" } },
"summary": { "type": "string" },
"chunk_ids": { "type": "array", "items": { "type": "string" } },
"references": { "type": "array", "items": { "$ref": "#/$defs/relation" } },
"referenced_by":{ "type": "array", "items": { "$ref": "#/$defs/relation" } }
}
},
"stats": {
"type": "object",
"properties": {
"documents": { "type": "integer" },
"chunks": { "type": "integer" },
"relations": { "type": "integer" },
"embeddings": { "type": "integer" },
"embed_dim": { "type": "integer" },
"types": { "type": "object", "additionalProperties": { "type": "integer" } }
}
}
}
}

Provenance for a chunk or parsed unit. All three fields are required: path (file path relative to the walked root), folder (containing folder, relative to root), and type (the detected/enriched document type).

A typed, directed edge. type and to are required; from_id is omitted when the edge is stored on its owning object (the owner is the implicit source). weight is an optional confidence/similarity score in [0, 1].

type valueMeaning
siblingSame folder / same logical group.
parentFolder lineage / containment.
referencesOutgoing citation, link, or mention.
continuesNext unit in a split sequence.
duplicate-ofNear or exact duplicate content.

The serialized SpaceStats — the same object returned by space.stats and emitted by indx inspect --json. embed_dim is the vector dimensionality (e.g. 1024 for bge-m3) and types is a document-count histogram keyed by detected type.

The per-chunk files under chunks/ (chunk_0000.json, chunk_0001.json, …) use the same canonical chunk shape shown above. They may additionally carry a resolved context window — the neighbor chunks expanded inline — so an agent can read a single file and get the chunk plus its surrounding context without dereferencing ids against index.json.