Skip to content

Inspecting & Querying a Space

Once you have built a .indx archive, the next step is not to wire it into an agent — it is to look at it. indx inspect tells you whether the structure came out the way you expected, and indx query lets you run retrieval by hand before any production code depends on it. Both commands have exact SDK equivalents, so the same checks run in a notebook or a test.

  1. Inspect the archive to confirm counts, the document-type histogram, and a sample of relations look right. If the type histogram is empty or wrong, your Enrich stage probably did not run.

  2. Query the archive with a handful of real questions and read the SearchHits — score, source path, chunk text, and neighbor ids — to confirm retrieval is grounded before you ship.

indx inspect <archive.indx> summarizes an archive without loading it into your own code. By default it prints the aggregate stats, a document-type histogram, and a sample of relations.

Terminal window
indx inspect ./ai-ready/handbook.indx
handbook.indx (indx 0.4.2)
documents 128
chunks 1042
relations 380
embeddings 1042 (bge-m3, dim 1024)
types
policy 40
guide 30
table 12
relations (sample)
references guides/onboarding.md → policies/data/retention.pdf
sibling policies/data/retention.pdf ↔ policies/data/access.pdf
continues chunk_0481 → chunk_0482

Pass --json to emit the complete SpaceStats object as JSON — handy for assertions in CI or piping into jq.

Terminal window
indx inspect ./ai-ready/handbook.indx --json
{
"documents": 128,
"chunks": 1042,
"relations": 380,
"embeddings": 1042,
"embed_dim": 1024,
"types": { "policy": 40, "guide": 30, "table": 12 },
"bytes_source": 8421340
}

This is exactly the shape of space.stats in the SDK. See the SpaceStats model for every field.

--documents lists the documents in the space. Supply an optional type to filter the list to a single detected type — the same set you would get from space.documents(type=...).

Terminal window
# every document
indx inspect ./ai-ready/handbook.indx --documents
# only documents enriched as type "policy"
indx inspect ./ai-ready/handbook.indx --documents policy

space.documents() with no argument returns every Document; passing type= filters by the detected/enriched type string. Each Document carries its path, folder, lineage, topics, tags, summary, chunk_ids, and its resolved references / referenced_by edges — see the data models reference.

indx query <archive.indx> "<text>" embeds your query with the same embedder that built the archive and returns the top matching chunks. The default output is human-readable; --json gives you the structured SearchHit[].

Terminal window
indx query ./ai-ready/handbook.indx "how long is enterprise data retained?"

The default output is human-readable — see Default human output below for the exact shape (rank, score, source path, chunk text, neighbor ids).

| Flag | Type | Default | Description | |--------|------|---------|-------------| | -k | int | 5 | Number of hits to return. | | --type | str | — | Restrict results to a single document type. | | --json | flag | off | Emit SearchHit[] as JSON (including .chunk, .neighbors, .source). |

Terminal window
# top 3 hits, only from "policy" documents
indx query ./ai-ready/handbook.indx "data retention" -k 3 --type policy
# machine-readable output
indx query ./ai-ready/handbook.indx "data retention" --json

In the SDK, -k maps to the k argument of space.search(query, k=5). Type filtering at the SDK level is done by inspecting hit.source.type (or by pre-filtering with space.documents(type=...)).

Without --json, each hit prints as a rank, a similarity score (higher is better), the source path, the chunk text, and the neighbor ids that bracket it:

1 score 0.842 policies/data/retention.pdf
Enterprise data is retained for 90 days…
neighbors: chunk_0480, chunk_0482
2 score 0.791 legal/gdpr.md
Personal data must not be kept longer than necessary…
neighbors: chunk_0903, chunk_0905

A SearchHit is the unit of retrieval. Every hit bundles the matched chunk, its similarity score, the resolved neighbor chunks, and a convenience accessor for provenance.

| Field | Type | Meaning | |-------|------|---------| | hit.chunk | Chunk | The matched chunk: id, text, source, doc_id, index, metadata, neighbors, relations. | | hit.score | float | Similarity score; higher is better. | | hit.neighbors | list[Chunk] | The adjacent chunks (prev/next), fully resolved — not just ids. | | hit.source | Source | Provenance of the matched chunk (.path, .folder, .type). A property that returns hit.chunk.source. |

hit = space.search("data retention", k=1)[0]
hit.chunk.text # the retrievable text payload
hit.chunk.metadata # enriched topics / summary / tags
hit.score # e.g. 0.842
hit.source.path # "policies/data/retention.pdf"
hit.source.folder # "policies/data"
hit.source.type # "policy"
[c.text for c in hit.neighbors] # surrounding context window

The --json output mirrors this structure exactly:

[
{
"chunk": {
"id": "chunk_0481",
"text": "Enterprise data is retained for 90 days…",
"source": { "path": "policies/data/retention.pdf", "folder": "policies/data", "type": "policy" },
"metadata": { "topics": ["retention", "compliance"], "summary": "90-day retention rule…" },
"neighbors": ["chunk_0480", "chunk_0482"],
"relations": [{ "type": "references", "to": "legal/gdpr.md" }]
},
"score": 0.842,
"neighbors": [ { "id": "chunk_0480", "text": "" }, { "id": "chunk_0482", "text": "" } ]
}
]

Why neighbors and source make retrieval grounded

Section titled “Why neighbors and source make retrieval grounded”

A bare vector match gives you a floating snippet. A SearchHit gives you a snippet in context:

  • source tells an agent (and you) where the text actually lives — the file path, the folder, and the document type. That is the difference between a citable answer and an unverifiable one, and it lets you filter by type when the same phrase means different things in different document classes.
  • neighbors are the resolved chunks immediately before and after the match. Feeding them alongside the hit gives the model a real context window instead of a sentence ripped out of its paragraph — the single biggest lever against “technically retrieved, contextually wrong” answers.

Because both travel with every hit, you can hand a SearchHit straight to a prompt and get a grounded, traceable response. This is the payoff of the structure that the Chunk and Relate stages preserve.

  • Full flag tables and exit codes: the CLI reference.
  • Every method signature, including KnowledgeSpace.load, space.stats, space.documents, and space.search: the SDK reference.
  • Field-by-field definitions of Chunk, Document, Source, SpaceStats, and SearchHit: the data models reference.
  • Want to run all of this offline? See Local & air-gapped.