Skip to content

Matsya - spec_0.1a (rigorous RAG for theory + translation)

matsya #rag #correctness #translation #theory #verification

This document specifies the first correctness-oriented architecture for Matsya as a Bellman calculus research and translation assistant.

It is distinct from the earlier deployment note development/dolo-plus-impl/spec_0.1k-rag-assistant.md:

  • 0.1k is mainly about standing up the assistant in the existing hosting stack.
  • 0.1a is about how Matsya should be built if we want high-confidence theory answers, high-confidence YAML <-> formal MDP translation, and a clean path toward future training on verified examples.

The motivating use cases are already visible in the current docs:

  • theory and research Q&A over Bellman calculus, DDSL, category-theoretic foundations, and dynamic programming
  • strict dolo-plus YAML <-> formal MDP translation
  • refinement / round-tripping for higher confidence
  • runtime de-sugaring / interpretation support in development workflows

See:

  • AI/matsya/matsya.md
  • AI/context/matsya-context.md
  • development/research-notes/matsya-desugar-whisperer.md

0. Implementation location

This spec lives in bellman-ddsl because that repository contains the canonical theory, spec, examples, and curated context that Matsya must read.

However, Matsya itself is implemented as part of the HARK_ask-your-project stack, not as a runtime subsystem inside bellman-ddsl.

Current local implementation target:

  • /Users/akshayshanker/Desktop/econark/HARK_ask-your-project/

Accordingly, the boundary is:

  • bellman-ddsl:
  • canonical docs/specs/examples
  • curated RAG context
  • development notes describing desired Matsya behavior
  • HARK_ask-your-project:
  • retrieval/index build pipeline
  • vector store / metadata store
  • system prompts
  • UI / CLI / deployment
  • runtime answer assembly and tool orchestration

Implementation-facing files currently visible in HARK_ask-your-project include:

  • config/repositories.yml
  • config/system_prompts/Bellman-DDSL.md
  • scripts/build/build_final_index.py
  • scripts/build/repository_discovery.py
  • scripts/rag-cli/rag
  • scripts/rag-cli/matsya-local
  • app.py

So this document should be read as:

  • a requirements/specification document stored in bellman-ddsl, but
  • a document whose runtime/build obligations are primarily implemented in HARK_ask-your-project.

1. Goal

Matsya should support two primary tasks:

  1. Theory mode: help users understand the mathematical and semantic structure of BE-DDSL, including theory that is still under development.
  2. Translation mode: translate between dolo-plus YAML and formal mathematical MDP writeups with strong guarantees of structural correctness.

The core requirement is correctness before fluency. If Matsya lacks evidence, or if the retrieved sources disagree, it should expose the uncertainty rather than synthesize a smooth but unsupported answer.

2. Modes and correctness targets

Mode Primary output Correctness target
Theory / research Q&A Explanation with equations and citations Every substantive claim is grounded in retrieved sources; unresolved tensions are surfaced explicitly
Translation: YAML -> formal MDP Structured mathematical writeup Output matches retrieved syntax/semantic rules and the input YAML; no invented fields or operators
Translation: formal MDP -> YAML Valid dolo-plus stage YAML Output passes parser/schema/consistency checks
Refine / repair Improved YAML or writeup plus diff Each iteration reduces inconsistency; fixed-point or explicit failure reason
Development-theory mode Explanation of provisional or dev-spec semantics Clearly separate canonical sources from provisional dev notes

The key asymmetry is:

  • In translation mode, correctness can often be checked mechanically.
  • In theory mode, correctness is mostly evidence-grounded rather than fully formal, unless and until we attach formal proof artifacts.

3. Design principles

3.1 Canonical-first retrieval

Matsya should prefer sources in this order:

  1. canonical docs/spec/reference pages
  2. canonical example YAMLs and grammar/code sources
  3. development specs and working notes
  4. curated external theory and literature

This avoids a common failure mode where draft notes or prompt mirrors out-rank the actual project canon.

3.2 Structure fidelity

Do not flatten math-heavy documents into generic text windows if a stronger document structure is available.

For Matsya, the native structure is often the meaning-bearing structure:

  • a section / subsection in a theory page
  • a theorem / definition / proposition block in TeX
  • a YAML block such as symbols, equations, cntn_to_dcsn_mover
  • a grammar rule or parser normalization note

Recent academic-RAG work argues for preserving this hierarchy rather than destroying it during chunking. This is especially important for mathematical and specification-heavy corpora.

3.3 Evidence before synthesis

Answer generation should happen only after Matsya has assembled an evidence packet:

  • retrieved chunks
  • reranked support
  • source identities
  • trust tier
  • local contradictions or missing pieces

This is more important than raw model size.

3.4 Machine checking whenever possible

If a claim can be checked by a parser, schema validator, round-trip procedure, or compiler-like consistency pass, Matsya should use that route rather than relying on free-form generation.

3.5 Fail safely

For this project, "I cannot verify that from the current sources" is a better answer than a polished hallucination.

4. Corpus policy

The corpus should be organized into trust tiers. AI/context/matsya-context.md is the curation policy for what to ingest.

Tier 1: project canon

  • docs/
  • packages/dolo/examples/models/doloplus/
  • packages/dolang/dolang/
  • selected reference pages such as symbol conventions, schema, and syntax-semantic rules

Tier 2: active development sources

  • AI/working/
  • docs/development/dolo-plus-impl/
  • selected research notes that explain active design decisions

Tier 3: curated external theory

  • AI/context/literature/
  • curated modular-MDP theory sources
  • selected Dyn-X / StageClass theory notes

Tier 4: optional large or noisy sources

  • PDFs
  • exploratory notebooks
  • large external codebases

Tier 4 should be used sparingly because it often hurts retrieval precision.

5. Document preparation and chunking

5.1 Chunk by semantic unit, not only by token count

Recommended chunking policy:

  • Markdown: split by heading; keep equations and nearby explanatory prose together
  • YAML: split by semantic block (symbols, equations, sub-blocks such as V[>], dcsn_to_cntn_transition, Bellman)
  • TeX: split by section, subsection, definition, theorem, proposition, and equation neighborhoods
  • Code / grammar: split by parser rule, transformation function, or validation routine

Fixed-size windows may still be used inside very large sections, but only after the semantic boundary has been respected.

5.2 Preserve metadata

Each chunk should carry at least:

  • source path
  • heading / section path
  • document type (spec, theory, yaml-example, grammar, working-note, literature)
  • trust tier
  • key symbols / operators mentioned
  • model family / example family where relevant

5.3 Prefer source text over PDFs

For mathematics, native .tex and .md files are strongly preferred over PDF extraction. If both exist, Matsya should rank the source files above the PDF text.

5.4 De-duplicate mirrors

Prompt-context mirrors, repeated exports, and copied external folders should not all be indexed equally. Duplicate content lowers retrieval quality and inflates spurious confidence.

6. Retrieval architecture

6.1 One corpus, multiple retrieval profiles

Matsya can share one underlying index, but retrieval should be profile-specific.

Theory profile

Prioritize:

  • theory docs
  • syntax-semantic rules
  • symbol conventions
  • curated literature and foundational notes

Translation profile

Prioritize:

  • schema/spec pages
  • canonical example YAMLs
  • Dolang grammar / parser behavior
  • working notes that record active syntax decisions

Development-theory profile

Prioritize:

  • dev specs
  • research notes
  • active working notes

But the final answer must label claims as:

  • canonical
  • development / provisional
  • unresolved / ambiguous

6.2 Hybrid retrieval

Matsya should not rely on vector search alone.

Recommended stack:

  • dense retrieval for semantic recall
  • lexical / BM25 retrieval for exact notation, symbol names, and file-local terminology
  • fused candidate set
  • reranking step before answer generation

This is particularly important for syntax questions such as V[>][label], argmax_{c}, or dcsn_to_cntn_transition.

6.3 Reranking

Use a reranker to improve precision after recall:

  • cross-encoder reranker if available
  • otherwise an LLM scoring pass over candidate chunks

The reranker should favor:

  • definition-bearing chunks
  • local equation neighborhoods
  • canonical sources over drafts when both are relevant

6.4 Query expansion and adaptive retrieval

For hard questions, retrieval should be allowed to adapt:

  • reformulate the query
  • retrieve by symbols and by meaning
  • widen from exact operator names to surrounding semantic concepts
  • retrieve additional support if the first pass is thin or contradictory

This is closer to scientific-RAG and proof-search systems than to a static one-shot retriever.

6.5 Retrieve examples, not just definitions

For translation and proof-like tasks, Matsya should retrieve:

  • relevant definitions
  • similar YAML examples
  • similar formal writeups
  • nearby or analogous solved cases

The closest analog from theorem-proving systems is retrieving both premises and similar proofs from the same project.

7. Answer synthesis

7.1 Theory mode

Theory answers should be:

  • grounded in retrieved sources
  • citation-rich
  • explicit about what is canonical vs provisional
  • conservative about unstated claims

Long answers should use an outline-guided synthesis pass so that the final answer is coherent rather than a bag of retrieved snippets.

7.2 Translation mode

Translation should be treated as a constrained transduction task:

  • retrieval seeds conventions and examples
  • the generator proposes a candidate output
  • validators check the output
  • refinement repairs the output if needed

This is different from ordinary prose QA. The output must satisfy structural rules, not merely sound plausible.

7.3 Development-theory mode

When the question touches active design work, Matsya should surface disagreement directly. For example:

  • what the canonical spec says
  • what the current dev-spec says
  • whether the point is still marked as ambiguous

This is essential for a project where the semantics are still evolving.

8. Verification stack

8.1 Translation verification

For generated YAML, Matsya should aim for the following checks:

  1. YAML parse success
  2. Dolang parse success for embedded equations
  3. schema / symbol-group validation
  4. internal consistency checks:
  5. declared symbols vs used symbols
  6. perch-tag consistency
  7. branch-label consistency
  8. operator/binder consistency
  9. optional round-trip refinement:
  10. YAML -> formal MDP -> YAML
  11. formal MDP -> YAML -> formal MDP

8.2 Theory verification

Theory answers cannot usually be machine-proved at v0.1a, but they can still be checked for evidential rigor:

  • every nontrivial claim must be supportable by retrieved chunks
  • equations should be copied or paraphrased only when the source clearly supports them
  • if sources conflict, Matsya must say so
  • if evidence is weak, Matsya must lower its confidence

8.3 Correctness ladder

Matsya outputs should be thought of in levels:

  1. Retrieved: relevant sources were found
  2. Grounded: answer explicitly relies on those sources
  3. Parser-valid: output syntax is accepted
  4. Schema-valid: output satisfies structural constraints
  5. Round-trip stable: output survives refinement / round-trip
  6. Human-reviewed: output accepted as canonical example

Translation mode should push as high on this ladder as possible.

9. Example bank and future training

Over time, Matsya should accumulate a verified example bank.

Each example should carry metadata such as:

  • source model family
  • translation direction
  • parser-valid
  • schema-valid
  • round-trip converged
  • human-reviewed
  • retrieved supporting sources

These examples serve two purposes:

  • better retrieval at inference time
  • future fine-tuning data

Training should happen only on examples that have passed validation. Retrieval improvement should generally come before model fine-tuning.

10. Evaluation

Matsya should be evaluated as a technical assistant with correctness constraints, not as a generic chatbot.

10.1 Theory benchmark

Build a benchmark of questions about:

  • Bellman calculus foundations
  • perch semantics
  • syntax and notation
  • branching and composition
  • development-time ambiguities

Metrics:

  • citation precision
  • canonical-source hit rate
  • unsupported-claim rate
  • contradiction detection rate
  • human judgment on correctness and clarity

10.2 Translation benchmark

Build a benchmark of:

  • YAML -> formal MDP tasks
  • formal model -> YAML tasks
  • repair / refine tasks

Metrics:

  • parse pass rate
  • schema pass rate
  • round-trip convergence
  • human acceptance
  • failure mode classification

10.3 Retrieval benchmark

Measure retrieval directly:

  • does the top-k contain the canonical source?
  • does the retrieved set contain the needed example?
  • are noisy prompt-draft chunks excluded?

Poor retrieval should not be confused with poor generation.

11. Immediate implementation implications

All implementation tasks in this section are primarily tasks for the HARK_ask-your-project codebase and deployment stack. The corresponding role of bellman-ddsl is to provide the corpus, canon, examples, and evaluation cases that Matsya indexes and reasons over.

11.1 Now

  • curate the corpus aggressively using AI/context/matsya-context.md
  • add trust-tier metadata
  • introduce profile-specific retrieval
  • add reranking
  • keep structure-preserving chunks for specs, TeX, and YAML

Concrete implementation targets in HARK_ask-your-project will likely include:

  • config/repositories.yml for repository groups, weights, and indexing options
  • scripts/build/build_final_index.py and related build utilities for chunking / metadata / retrieval profiles
  • config/system_prompts/Bellman-DDSL.md for mode behavior and answer discipline
  • app.py and CLI entry points for profile selection, citations, and verification-facing UX

11.2 Next

  • add machine validators to translation mode
  • formalize round-trip refinement as a correctness pass
  • build a benchmark set of theory and translation tasks

11.3 Later

  • collect verified examples and repair traces
  • train retrieval components on project-local examples
  • consider fine-tuning generation only after validators and evals are stable
  • explore stronger proof / theorem-verification hooks for theory mode

12. Relation to existing Matsya notes

  • development/dolo-plus-impl/spec_0.1k-rag-assistant.md
  • deployment and hosting note
  • development/research-notes/matsya-desugar-whisperer.md
  • Matsya as a de-sugaring / interpretation layer in the solver pipeline
  • AI/context/matsya-context.md
  • corpus inclusion / exclusion policy
  • AI/matsya/matsya.md
  • user-facing modes and operational instructions

13. External references informing this spec

14. Bottom line

Matsya should be built as a retrieval + verification system, not just a prompt over a vector store.

For this project:

  • theory mode needs evidence discipline
  • translation mode needs compiler-like validation
  • future training should be based on verified examples, not raw generated text

That is the minimal path to a research assistant that is actually useful for BE-DDSL development.