Inspecting & Querying a Space
Once you have built a .indx archive, the next step is not to wire it into an
agent — it is to look at it. indx inspect tells you whether the structure
came out the way you expected, and indx query lets you run retrieval by hand
before any production code depends on it. Both commands have exact SDK
equivalents, so the same checks run in a notebook or a test.
The two-step sanity workflow
Section titled “The two-step sanity workflow”-
Inspect the archive to confirm counts, the document-type histogram, and a sample of relations look right. If the type histogram is empty or wrong, your Enrich stage probably did not run.
-
Query the archive with a handful of real questions and read the
SearchHits — score, source path, chunk text, and neighbor ids — to confirm retrieval is grounded before you ship.
Inspecting a space
Section titled “Inspecting a space”indx inspect <archive.indx> summarizes an archive without loading it into your
own code. By default it prints the aggregate stats, a document-type histogram,
and a sample of relations.
indx inspect ./ai-ready/handbook.indxhandbook.indx (indx 0.4.2) documents 128 chunks 1042 relations 380 embeddings 1042 (bge-m3, dim 1024)
types policy 40 guide 30 table 12 …
relations (sample) references guides/onboarding.md → policies/data/retention.pdf sibling policies/data/retention.pdf ↔ policies/data/access.pdf continues chunk_0481 → chunk_0482from indx import KnowledgeSpace
space = KnowledgeSpace.load("./ai-ready/handbook.indx")stats = space.stats
print(stats.documents, stats.chunks, stats.relations)print(stats.embeddings, stats.embed_dim) # 1042, 1024print(stats.types) # {"policy": 40, "guide": 30, ...}--json: the full space.stats
Section titled “--json: the full space.stats”Pass --json to emit the complete SpaceStats object as JSON — handy for
assertions in CI or piping into jq.
indx inspect ./ai-ready/handbook.indx --json{ "documents": 128, "chunks": 1042, "relations": 380, "embeddings": 1042, "embed_dim": 1024, "types": { "policy": 40, "guide": 30, "table": 12 }, "bytes_source": 8421340}This is exactly the shape of space.stats in the SDK. See the
SpaceStats model for every field.
--documents [type]: list documents
Section titled “--documents [type]: list documents”--documents lists the documents in the space. Supply an optional type to
filter the list to a single detected type — the same set you would get from
space.documents(type=...).
# every documentindx inspect ./ai-ready/handbook.indx --documents
# only documents enriched as type "policy"indx inspect ./ai-ready/handbook.indx --documents policyfrom indx import KnowledgeSpace
space = KnowledgeSpace.load("./ai-ready/handbook.indx")
for doc in space.documents(type="policy"): print(doc.id, doc.path, doc.topics, doc.summary)space.documents() with no argument returns every Document; passing type=
filters by the detected/enriched type string. Each Document carries its
path, folder, lineage, topics, tags, summary, chunk_ids, and its
resolved references / referenced_by edges — see the
data models reference.
Querying a space
Section titled “Querying a space”indx query <archive.indx> "<text>" embeds your query with the same embedder
that built the archive and returns the top matching chunks. The default output
is human-readable; --json gives you the structured SearchHit[].
indx query ./ai-ready/handbook.indx "how long is enterprise data retained?"The default output is human-readable — see Default human output below for the exact shape (rank, score, source path, chunk text, neighbor ids).
from indx import KnowledgeSpace
space = KnowledgeSpace.load("./ai-ready/handbook.indx")
for hit in space.search("how long is enterprise data retained?", k=5): print(f"{hit.score:.3f} {hit.source.path}") print(hit.chunk.text) print("context:", [c.id for c in hit.neighbors])Query flags
Section titled “Query flags”| Flag | Type | Default | Description |
|--------|------|---------|-------------|
| -k | int | 5 | Number of hits to return. |
| --type | str | — | Restrict results to a single document type. |
| --json | flag | off | Emit SearchHit[] as JSON (including .chunk, .neighbors, .source). |
# top 3 hits, only from "policy" documentsindx query ./ai-ready/handbook.indx "data retention" -k 3 --type policy
# machine-readable outputindx query ./ai-ready/handbook.indx "data retention" --jsonIn the SDK, -k maps to the k argument of
space.search(query, k=5). Type filtering at the SDK level is done by
inspecting hit.source.type (or by pre-filtering with
space.documents(type=...)).
Default human output
Section titled “Default human output”Without --json, each hit prints as a rank, a similarity score (higher is
better), the source path, the chunk text, and the neighbor ids that bracket it:
1 score 0.842 policies/data/retention.pdf Enterprise data is retained for 90 days… neighbors: chunk_0480, chunk_0482
2 score 0.791 legal/gdpr.md Personal data must not be kept longer than necessary… neighbors: chunk_0903, chunk_0905Reading a SearchHit
Section titled “Reading a SearchHit”A SearchHit is the unit of retrieval. Every hit bundles the matched chunk, its
similarity score, the resolved neighbor chunks, and a convenience accessor for
provenance.
| Field | Type | Meaning |
|-------|------|---------|
| hit.chunk | Chunk | The matched chunk: id, text, source, doc_id, index, metadata, neighbors, relations. |
| hit.score | float | Similarity score; higher is better. |
| hit.neighbors | list[Chunk] | The adjacent chunks (prev/next), fully resolved — not just ids. |
| hit.source | Source | Provenance of the matched chunk (.path, .folder, .type). A property that returns hit.chunk.source. |
hit = space.search("data retention", k=1)[0]
hit.chunk.text # the retrievable text payloadhit.chunk.metadata # enriched topics / summary / tagshit.score # e.g. 0.842hit.source.path # "policies/data/retention.pdf"hit.source.folder # "policies/data"hit.source.type # "policy"[c.text for c in hit.neighbors] # surrounding context windowThe --json output mirrors this structure exactly:
[ { "chunk": { "id": "chunk_0481", "text": "Enterprise data is retained for 90 days…", "source": { "path": "policies/data/retention.pdf", "folder": "policies/data", "type": "policy" }, "metadata": { "topics": ["retention", "compliance"], "summary": "90-day retention rule…" }, "neighbors": ["chunk_0480", "chunk_0482"], "relations": [{ "type": "references", "to": "legal/gdpr.md" }] }, "score": 0.842, "neighbors": [ { "id": "chunk_0480", "text": "…" }, { "id": "chunk_0482", "text": "…" } ] }]Why neighbors and source make retrieval grounded
Section titled “Why neighbors and source make retrieval grounded”A bare vector match gives you a floating snippet. A SearchHit gives you a
snippet in context:
sourcetells an agent (and you) where the text actually lives — the file path, the folder, and the document type. That is the difference between a citable answer and an unverifiable one, and it lets you filter bytypewhen the same phrase means different things in different document classes.neighborsare the resolved chunks immediately before and after the match. Feeding them alongside the hit gives the model a real context window instead of a sentence ripped out of its paragraph — the single biggest lever against “technically retrieved, contextually wrong” answers.
Because both travel with every hit, you can hand a SearchHit straight to a
prompt and get a grounded, traceable response. This is the payoff of the
structure that the Chunk and Relate
stages preserve.
Where to go next
Section titled “Where to go next”- Full flag tables and exit codes: the CLI reference.
- Every method signature, including
KnowledgeSpace.load,space.stats,space.documents, andspace.search: the SDK reference. - Field-by-field definitions of
Chunk,Document,Source,SpaceStats, andSearchHit: the data models reference. - Want to run all of this offline? See Local & air-gapped.