Matsya - spec_0.1a (rigorous RAG for theory + translation)¶
matsya #rag #correctness #translation #theory #verification¶
This document specifies the first correctness-oriented architecture for Matsya as a Bellman calculus research and translation assistant.
It is distinct from the earlier deployment note development/dolo-plus-impl/spec_0.1k-rag-assistant.md:
0.1kis mainly about standing up the assistant in the existing hosting stack.0.1ais about how Matsya should be built if we want high-confidence theory answers, high-confidence YAML <-> formal MDP translation, and a clean path toward future training on verified examples.
The motivating use cases are already visible in the current docs:
- theory and research Q&A over Bellman calculus, DDSL, category-theoretic foundations, and dynamic programming
- strict dolo-plus YAML <-> formal MDP translation
- refinement / round-tripping for higher confidence
- runtime de-sugaring / interpretation support in development workflows
See:
AI/matsya/matsya.mdAI/context/matsya-context.mddevelopment/research-notes/matsya-desugar-whisperer.md
0. Implementation location¶
This spec lives in bellman-ddsl because that repository contains the canonical theory, spec, examples, and curated context that Matsya must read.
However, Matsya itself is implemented as part of the HARK_ask-your-project stack, not as a runtime subsystem inside bellman-ddsl.
Current local implementation target:
/Users/akshayshanker/Desktop/econark/HARK_ask-your-project/
Accordingly, the boundary is:
bellman-ddsl:- canonical docs/specs/examples
- curated RAG context
- development notes describing desired Matsya behavior
HARK_ask-your-project:- retrieval/index build pipeline
- vector store / metadata store
- system prompts
- UI / CLI / deployment
- runtime answer assembly and tool orchestration
Implementation-facing files currently visible in HARK_ask-your-project include:
config/repositories.ymlconfig/system_prompts/Bellman-DDSL.mdscripts/build/build_final_index.pyscripts/build/repository_discovery.pyscripts/rag-cli/ragscripts/rag-cli/matsya-localapp.py
So this document should be read as:
- a requirements/specification document stored in
bellman-ddsl, but - a document whose runtime/build obligations are primarily implemented in
HARK_ask-your-project.
1. Goal¶
Matsya should support two primary tasks:
- Theory mode: help users understand the mathematical and semantic structure of BE-DDSL, including theory that is still under development.
- Translation mode: translate between dolo-plus YAML and formal mathematical MDP writeups with strong guarantees of structural correctness.
The core requirement is correctness before fluency. If Matsya lacks evidence, or if the retrieved sources disagree, it should expose the uncertainty rather than synthesize a smooth but unsupported answer.
2. Modes and correctness targets¶
| Mode | Primary output | Correctness target |
|---|---|---|
| Theory / research Q&A | Explanation with equations and citations | Every substantive claim is grounded in retrieved sources; unresolved tensions are surfaced explicitly |
| Translation: YAML -> formal MDP | Structured mathematical writeup | Output matches retrieved syntax/semantic rules and the input YAML; no invented fields or operators |
| Translation: formal MDP -> YAML | Valid dolo-plus stage YAML | Output passes parser/schema/consistency checks |
| Refine / repair | Improved YAML or writeup plus diff | Each iteration reduces inconsistency; fixed-point or explicit failure reason |
| Development-theory mode | Explanation of provisional or dev-spec semantics | Clearly separate canonical sources from provisional dev notes |
The key asymmetry is:
- In translation mode, correctness can often be checked mechanically.
- In theory mode, correctness is mostly evidence-grounded rather than fully formal, unless and until we attach formal proof artifacts.
3. Design principles¶
3.1 Canonical-first retrieval¶
Matsya should prefer sources in this order:
- canonical docs/spec/reference pages
- canonical example YAMLs and grammar/code sources
- development specs and working notes
- curated external theory and literature
This avoids a common failure mode where draft notes or prompt mirrors out-rank the actual project canon.
3.2 Structure fidelity¶
Do not flatten math-heavy documents into generic text windows if a stronger document structure is available.
For Matsya, the native structure is often the meaning-bearing structure:
- a section / subsection in a theory page
- a theorem / definition / proposition block in TeX
- a YAML block such as
symbols,equations,cntn_to_dcsn_mover - a grammar rule or parser normalization note
Recent academic-RAG work argues for preserving this hierarchy rather than destroying it during chunking. This is especially important for mathematical and specification-heavy corpora.
3.3 Evidence before synthesis¶
Answer generation should happen only after Matsya has assembled an evidence packet:
- retrieved chunks
- reranked support
- source identities
- trust tier
- local contradictions or missing pieces
This is more important than raw model size.
3.4 Machine checking whenever possible¶
If a claim can be checked by a parser, schema validator, round-trip procedure, or compiler-like consistency pass, Matsya should use that route rather than relying on free-form generation.
3.5 Fail safely¶
For this project, "I cannot verify that from the current sources" is a better answer than a polished hallucination.
4. Corpus policy¶
The corpus should be organized into trust tiers. AI/context/matsya-context.md is the curation policy for what to ingest.
Tier 1: project canon¶
docs/packages/dolo/examples/models/doloplus/packages/dolang/dolang/- selected reference pages such as symbol conventions, schema, and syntax-semantic rules
Tier 2: active development sources¶
AI/working/docs/development/dolo-plus-impl/- selected research notes that explain active design decisions
Tier 3: curated external theory¶
AI/context/literature/- curated modular-MDP theory sources
- selected Dyn-X / StageClass theory notes
Tier 4: optional large or noisy sources¶
- PDFs
- exploratory notebooks
- large external codebases
Tier 4 should be used sparingly because it often hurts retrieval precision.
5. Document preparation and chunking¶
5.1 Chunk by semantic unit, not only by token count¶
Recommended chunking policy:
- Markdown: split by heading; keep equations and nearby explanatory prose together
- YAML: split by semantic block (
symbols,equations, sub-blocks such asV[>],dcsn_to_cntn_transition,Bellman) - TeX: split by
section,subsection,definition,theorem,proposition, and equation neighborhoods - Code / grammar: split by parser rule, transformation function, or validation routine
Fixed-size windows may still be used inside very large sections, but only after the semantic boundary has been respected.
5.2 Preserve metadata¶
Each chunk should carry at least:
- source path
- heading / section path
- document type (
spec,theory,yaml-example,grammar,working-note,literature) - trust tier
- key symbols / operators mentioned
- model family / example family where relevant
5.3 Prefer source text over PDFs¶
For mathematics, native .tex and .md files are strongly preferred over PDF extraction. If both exist, Matsya should rank the source files above the PDF text.
5.4 De-duplicate mirrors¶
Prompt-context mirrors, repeated exports, and copied external folders should not all be indexed equally. Duplicate content lowers retrieval quality and inflates spurious confidence.
6. Retrieval architecture¶
6.1 One corpus, multiple retrieval profiles¶
Matsya can share one underlying index, but retrieval should be profile-specific.
Theory profile¶
Prioritize:
- theory docs
- syntax-semantic rules
- symbol conventions
- curated literature and foundational notes
Translation profile¶
Prioritize:
- schema/spec pages
- canonical example YAMLs
- Dolang grammar / parser behavior
- working notes that record active syntax decisions
Development-theory profile¶
Prioritize:
- dev specs
- research notes
- active working notes
But the final answer must label claims as:
- canonical
- development / provisional
- unresolved / ambiguous
6.2 Hybrid retrieval¶
Matsya should not rely on vector search alone.
Recommended stack:
- dense retrieval for semantic recall
- lexical / BM25 retrieval for exact notation, symbol names, and file-local terminology
- fused candidate set
- reranking step before answer generation
This is particularly important for syntax questions such as V[>][label], argmax_{c}, or dcsn_to_cntn_transition.
6.3 Reranking¶
Use a reranker to improve precision after recall:
- cross-encoder reranker if available
- otherwise an LLM scoring pass over candidate chunks
The reranker should favor:
- definition-bearing chunks
- local equation neighborhoods
- canonical sources over drafts when both are relevant
6.4 Query expansion and adaptive retrieval¶
For hard questions, retrieval should be allowed to adapt:
- reformulate the query
- retrieve by symbols and by meaning
- widen from exact operator names to surrounding semantic concepts
- retrieve additional support if the first pass is thin or contradictory
This is closer to scientific-RAG and proof-search systems than to a static one-shot retriever.
6.5 Retrieve examples, not just definitions¶
For translation and proof-like tasks, Matsya should retrieve:
- relevant definitions
- similar YAML examples
- similar formal writeups
- nearby or analogous solved cases
The closest analog from theorem-proving systems is retrieving both premises and similar proofs from the same project.
7. Answer synthesis¶
7.1 Theory mode¶
Theory answers should be:
- grounded in retrieved sources
- citation-rich
- explicit about what is canonical vs provisional
- conservative about unstated claims
Long answers should use an outline-guided synthesis pass so that the final answer is coherent rather than a bag of retrieved snippets.
7.2 Translation mode¶
Translation should be treated as a constrained transduction task:
- retrieval seeds conventions and examples
- the generator proposes a candidate output
- validators check the output
- refinement repairs the output if needed
This is different from ordinary prose QA. The output must satisfy structural rules, not merely sound plausible.
7.3 Development-theory mode¶
When the question touches active design work, Matsya should surface disagreement directly. For example:
- what the canonical spec says
- what the current dev-spec says
- whether the point is still marked as ambiguous
This is essential for a project where the semantics are still evolving.
8. Verification stack¶
8.1 Translation verification¶
For generated YAML, Matsya should aim for the following checks:
- YAML parse success
- Dolang parse success for embedded equations
- schema / symbol-group validation
- internal consistency checks:
- declared symbols vs used symbols
- perch-tag consistency
- branch-label consistency
- operator/binder consistency
- optional round-trip refinement:
- YAML -> formal MDP -> YAML
- formal MDP -> YAML -> formal MDP
8.2 Theory verification¶
Theory answers cannot usually be machine-proved at v0.1a, but they can still be checked for evidential rigor:
- every nontrivial claim must be supportable by retrieved chunks
- equations should be copied or paraphrased only when the source clearly supports them
- if sources conflict, Matsya must say so
- if evidence is weak, Matsya must lower its confidence
8.3 Correctness ladder¶
Matsya outputs should be thought of in levels:
- Retrieved: relevant sources were found
- Grounded: answer explicitly relies on those sources
- Parser-valid: output syntax is accepted
- Schema-valid: output satisfies structural constraints
- Round-trip stable: output survives refinement / round-trip
- Human-reviewed: output accepted as canonical example
Translation mode should push as high on this ladder as possible.
9. Example bank and future training¶
Over time, Matsya should accumulate a verified example bank.
Each example should carry metadata such as:
- source model family
- translation direction
- parser-valid
- schema-valid
- round-trip converged
- human-reviewed
- retrieved supporting sources
These examples serve two purposes:
- better retrieval at inference time
- future fine-tuning data
Training should happen only on examples that have passed validation. Retrieval improvement should generally come before model fine-tuning.
10. Evaluation¶
Matsya should be evaluated as a technical assistant with correctness constraints, not as a generic chatbot.
10.1 Theory benchmark¶
Build a benchmark of questions about:
- Bellman calculus foundations
- perch semantics
- syntax and notation
- branching and composition
- development-time ambiguities
Metrics:
- citation precision
- canonical-source hit rate
- unsupported-claim rate
- contradiction detection rate
- human judgment on correctness and clarity
10.2 Translation benchmark¶
Build a benchmark of:
- YAML -> formal MDP tasks
- formal model -> YAML tasks
- repair / refine tasks
Metrics:
- parse pass rate
- schema pass rate
- round-trip convergence
- human acceptance
- failure mode classification
10.3 Retrieval benchmark¶
Measure retrieval directly:
- does the top-k contain the canonical source?
- does the retrieved set contain the needed example?
- are noisy prompt-draft chunks excluded?
Poor retrieval should not be confused with poor generation.
11. Immediate implementation implications¶
All implementation tasks in this section are primarily tasks for the
HARK_ask-your-project codebase and deployment stack. The corresponding role of
bellman-ddsl is to provide the corpus, canon, examples, and evaluation cases
that Matsya indexes and reasons over.
11.1 Now¶
- curate the corpus aggressively using
AI/context/matsya-context.md - add trust-tier metadata
- introduce profile-specific retrieval
- add reranking
- keep structure-preserving chunks for specs, TeX, and YAML
Concrete implementation targets in HARK_ask-your-project will likely include:
config/repositories.ymlfor repository groups, weights, and indexing optionsscripts/build/build_final_index.pyand related build utilities for chunking / metadata / retrieval profilesconfig/system_prompts/Bellman-DDSL.mdfor mode behavior and answer disciplineapp.pyand CLI entry points for profile selection, citations, and verification-facing UX
11.2 Next¶
- add machine validators to translation mode
- formalize round-trip refinement as a correctness pass
- build a benchmark set of theory and translation tasks
11.3 Later¶
- collect verified examples and repair traces
- train retrieval components on project-local examples
- consider fine-tuning generation only after validators and evals are stable
- explore stronger proof / theorem-verification hooks for theory mode
12. Relation to existing Matsya notes¶
development/dolo-plus-impl/spec_0.1k-rag-assistant.md- deployment and hosting note
development/research-notes/matsya-desugar-whisperer.md- Matsya as a de-sugaring / interpretation layer in the solver pipeline
AI/context/matsya-context.md- corpus inclusion / exclusion policy
AI/matsya/matsya.md- user-facing modes and operational instructions
13. External references informing this spec¶
- PT-RAG: structure-fidelity retrieval for academic papers
- https://arxiv.org/abs/2602.13647v1
- SciRAG: adaptive, citation-aware, outline-guided scientific synthesis
- https://arxiv.org/abs/2511.14362
- PaperQA2: high-accuracy scientific-document RAG with evidence gathering and reranking
- https://github.com/Future-House/paper-qa
- Rango: retrieval-augmented proving with similar proofs + premises
- https://arxiv.org/abs/2412.14063
- RAG-Verus: repository-level retrieval for verification tasks
- https://arxiv.org/abs/2502.05344
- AWS guidance on context engineering / writing for RAG
- https://docs.aws.amazon.com/prescriptive-guidance/latest/writing-best-practices-rag/introduction.html
14. Bottom line¶
Matsya should be built as a retrieval + verification system, not just a prompt over a vector store.
For this project:
- theory mode needs evidence discipline
- translation mode needs compiler-like validation
- future training should be based on verified examples, not raw generated text
That is the minimal path to a research assistant that is actually useful for BE-DDSL development.