Inspecting & Querying a Space

Once you have built a .indx archive, the next step is not to wire it into an agent — it is to look at it. indx inspect tells you whether the structure came out the way you expected, and indx query lets you run retrieval by hand before any production code depends on it. Both commands have exact SDK equivalents, so the same checks run in a notebook or a test.

The two-step sanity workflow

Inspect the archive to confirm counts, the document-type histogram, and a sample of relations look right. If the type histogram is empty or wrong, your Enrich stage probably did not run.
Query the archive with a handful of real questions and read the SearchHits — score, source path, chunk text, and neighbor ids — to confirm retrieval is grounded before you ship.

Inspecting a space

indx inspect <archive.indx> summarizes an archive without loading it into your own code. By default it prints the aggregate stats, a document-type histogram, and a sample of relations.

CLI
Python

indx inspect ./ai-ready/handbook.indx

handbook.indx  (indx 0.4.2)
  documents   128
  chunks      1042
  relations   380
  embeddings  1042   (bge-m3, dim 1024)

types
  policy   40
  guide    30
  table    12
  …

relations (sample)
  references     guides/onboarding.md → policies/data/retention.pdf
  sibling        policies/data/retention.pdf ↔ policies/data/access.pdf
  continues      chunk_0481 → chunk_0482

from indx import KnowledgeSpace

space = KnowledgeSpace.load("./ai-ready/handbook.indx")
stats = space.stats

print(stats.documents, stats.chunks, stats.relations)
print(stats.embeddings, stats.embed_dim)   # 1042, 1024
print(stats.types)                          # {"policy": 40, "guide": 30, ...}

`--json`: the full `space.stats`

Pass --json to emit the complete SpaceStats object as JSON — handy for assertions in CI or piping into jq.

indx inspect ./ai-ready/handbook.indx --json

{
  "documents": 128,
  "chunks": 1042,
  "relations": 380,
  "embeddings": 1042,
  "embed_dim": 1024,
  "types": { "policy": 40, "guide": 30, "table": 12 },
  "bytes_source": 8421340
}

This is exactly the shape of space.stats in the SDK. See the SpaceStats model for every field.

`--documents [type]`: list documents

--documents lists the documents in the space. Supply an optional type to filter the list to a single detected type — the same set you would get from space.documents(type=...).

CLI
Python

# every document
indx inspect ./ai-ready/handbook.indx --documents

# only documents enriched as type "policy"
indx inspect ./ai-ready/handbook.indx --documents policy

from indx import KnowledgeSpace

space = KnowledgeSpace.load("./ai-ready/handbook.indx")

for doc in space.documents(type="policy"):
    print(doc.id, doc.path, doc.topics, doc.summary)

space.documents() with no argument returns every Document; passing type= filters by the detected/enriched type string. Each Document carries its path, folder, lineage, topics, tags, summary, chunk_ids, and its resolved references / referenced_by edges — see the data models reference.

Querying a space

indx query <archive.indx> "<text>" embeds your query with the same embedder that built the archive and returns the top matching chunks. The default output is human-readable; --json gives you the structured SearchHit[].

CLI
Python

indx query ./ai-ready/handbook.indx "how long is enterprise data retained?"

The default output is human-readable — see Default human output below for the exact shape (rank, score, source path, chunk text, neighbor ids).

from indx import KnowledgeSpace

space = KnowledgeSpace.load("./ai-ready/handbook.indx")

for hit in space.search("how long is enterprise data retained?", k=5):
    print(f"{hit.score:.3f}  {hit.source.path}")
    print(hit.chunk.text)
    print("context:", [c.id for c in hit.neighbors])

Query flags

| Flag | Type | Default | Description | |--------|------|---------|-------------| | -k | int | 5 | Number of hits to return. | | --type | str | — | Restrict results to a single document type. | | --json | flag | off | Emit SearchHit[] as JSON (including .chunk, .neighbors, .source). |

# top 3 hits, only from "policy" documents
indx query ./ai-ready/handbook.indx "data retention" -k 3 --type policy

# machine-readable output
indx query ./ai-ready/handbook.indx "data retention" --json

In the SDK, -k maps to the k argument of space.search(query, k=5). Type filtering at the SDK level is done by inspecting hit.source.type (or by pre-filtering with space.documents(type=...)).

Default human output

Without --json, each hit prints as a rank, a similarity score (higher is better), the source path, the chunk text, and the neighbor ids that bracket it:

1  score 0.842  policies/data/retention.pdf
   Enterprise data is retained for 90 days…
   neighbors: chunk_0480, chunk_0482

2  score 0.791  legal/gdpr.md
   Personal data must not be kept longer than necessary…
   neighbors: chunk_0903, chunk_0905

Reading a `SearchHit`

A SearchHit is the unit of retrieval. Every hit bundles the matched chunk, its similarity score, the resolved neighbor chunks, and a convenience accessor for provenance.

| Field | Type | Meaning | |-------|------|---------| | hit.chunk | Chunk | The matched chunk: id, text, source, doc_id, index, metadata, neighbors, relations. | | hit.score | float | Similarity score; higher is better. | | hit.neighbors | list[Chunk] | The adjacent chunks (prev/next), fully resolved — not just ids. | | hit.source | Source | Provenance of the matched chunk (.path, .folder, .type). A property that returns hit.chunk.source. |

hit = space.search("data retention", k=1)[0]

hit.chunk.text          # the retrievable text payload
hit.chunk.metadata      # enriched topics / summary / tags
hit.score               # e.g. 0.842
hit.source.path         # "policies/data/retention.pdf"
hit.source.folder       # "policies/data"
hit.source.type         # "policy"
[c.text for c in hit.neighbors]   # surrounding context window

The --json output mirrors this structure exactly:

[
  {
    "chunk": {
      "id": "chunk_0481",
      "text": "Enterprise data is retained for 90 days…",
      "source": { "path": "policies/data/retention.pdf", "folder": "policies/data", "type": "policy" },
      "metadata": { "topics": ["retention", "compliance"], "summary": "90-day retention rule…" },
      "neighbors": ["chunk_0480", "chunk_0482"],
      "relations": [{ "type": "references", "to": "legal/gdpr.md" }]
    },
    "score": 0.842,
    "neighbors": [ { "id": "chunk_0480", "text": "…" }, { "id": "chunk_0482", "text": "…" } ]
  }
]

Why neighbors and source make retrieval grounded

A bare vector match gives you a floating snippet. A SearchHit gives you a snippet in context:

source tells an agent (and you) where the text actually lives — the file path, the folder, and the document type. That is the difference between a citable answer and an unverifiable one, and it lets you filter by type when the same phrase means different things in different document classes.
neighbors are the resolved chunks immediately before and after the match. Feeding them alongside the hit gives the model a real context window instead of a sentence ripped out of its paragraph — the single biggest lever against “technically retrieved, contextually wrong” answers.

Because both travel with every hit, you can hand a SearchHit straight to a prompt and get a grounded, traceable response. This is the payoff of the structure that the Chunk and Relate stages preserve.

Where to go next

Full flag tables and exit codes: the CLI reference.
Every method signature, including KnowledgeSpace.load, space.stats, space.documents, and space.search: the SDK reference.
Field-by-field definitions of Chunk, Document, Source, SpaceStats, and SearchHit: the data models reference.
Want to run all of this offline? See Local & air-gapped.