0.1k RAG Assistant

dolo-plus — spec_0.1k (RAG assistant for BE-DDSL)¶

1. Goal¶

Deploy a web-hosted assistant for the Bellman-DDSL project that supports three interaction modes:

YAML → Formal MDP: Translate a stage YAML into a formal MDP writeup (Rosetta Stone, Model, Bellman Equation, FOCs, Forward Operator, Calibration)
Formal Model → YAML: Given a mathematical model description, generate the corresponding dolo-plus/DDSL stage YAML
Syntax Q&A: Answer questions about DDSL syntax — schema rules, Υ mapping, stage structure, symbols, spaces, shock declarations, equations blocks

2. Architecture¶

The assistant is deployed as a new group in the existing HARK_ask-your-project platform, which provides:

FAISS + sentence-transformers vector search
Streamlit UI with multi-model support (OpenAI, Anthropic, Google)
Group/variant index management
MathJax rendering for LaTeX equations

HARK_ask-your-project/
├── config/repositories.yml          # ← Bellman-DDSL group added here
├── indices/Bellman-DDSL/minimal/
│   ├── system_prompt.txt            # 3-mode system prompt
│   ├── chunks_final.jsonl           # (built by index builder)
│   ├── vector_index                 # (built by index builder)
│   └── ...
└── app.py                           # Streamlit entry point

3. Repository configuration¶

File: config/repositories.yml

Bellman-DDSL:
  description: "BE-DDSL: Syntax for Modular Bellman Problems"
  indexing:
    chunk_size: 800
    chunk_overlap: 200
    priority_extensions: [".md", ".yaml", ".yml", ".tex", ".py"]
    enable_pdf: true
  repositories:
    - name: "bellman-ddsl"
      url: "https://github.com/bright-forest/bellman-ddsl"
      branch: "main"
      local_path: "/Users/akshayshanker/Research/Repos/bellman-ddsl"
      type: "library"
      priority: 1

Key directories indexed:

Directory	Content
`docs/examples/formal-mdps/`	Canonical MDP writeups (few-shot examples for Mode 1)
`docs/examples/`	Schema docs, example walkthroughs
`docs/specs/`	Syntax-semantic rules, implementation specs
`packages/dolo/examples/models/doloplus/`	Stage YAML source files

4. System prompt design¶

The system prompt supports all three modes. The assistant detects the user's intent from the query and responds accordingly.

Mode detection heuristic (handled by the LLM):

Trigger	Mode
User pastes YAML or says "translate/write up this YAML"	YAML → Formal MDP
User describes a model mathematically and asks for YAML	Formal Model → YAML
User asks "how do I…" or "what is…" about DDSL syntax	Syntax Q&A

Notation conventions (enforced in system prompt):

Value functions: \(V_{\prec}\), \(V\) (decision, unmarked), \(V_{\succ}\)
Marginal values: \(\partial_w V\)
Distributions: \(\mu_{\prec}\), \(\mu\) (decision, unmarked), \(\mu_{\succ}\)
States: \(x_{\prec}\), \(x\) (decision, unmarked), \(x_{\succ}\)

5. What was done (completed steps)¶

All config/code changes are in place. Nothing needs to be written — only build + deploy + verify remain.

File	Repo	Change
`config/repositories.yml`	HARK_ask-your-project	Updated Bellman-DDSL group: fixed `local_path`, set `chunk_size: 800`, added `.yaml/.yml` to priority extensions, updated `folder_weights`
`indices/Bellman-DDSL/minimal/system_prompt.txt`	HARK_ask-your-project	Created 3-mode system prompt (YAML→MDP, Model→YAML, Syntax Q&A)
`docs/specs/.../spec_0.1k-rag-assistant.md`	bellman-ddsl	This file
`docs/demos/yaml-to-mdp.md`	bellman-ddsl	Demo page linking to deployed assistant
`mkdocs.yml`	bellman-ddsl	Added `0.1k RAG Assistant` nav entry + `Demos` section
`docs/specs/.../README.md`	bellman-ddsl	Added `0.1k` row to implementation phases table

6. Next steps (what to do when you come back)¶

Step A: Build the FAISS index¶

cd /Users/akshayshanker/Desktop/econark/HARK_ask-your-project
./rebuild_project.sh --group Bellman-DDSL --minimal

This reads config/repositories.yml, discovers files in the bellman-ddsl repo, chunks + embeds them with sentence-transformers, and writes FAISS artifacts into indices/Bellman-DDSL/minimal/ (alongside the system_prompt.txt already there).

If the build fails, check:

Python environment has sentence-transformers, faiss-cpu, PyPDF2 installed
The local_path in repositories.yml is accessible (/Users/akshayshanker/Research/Repos/bellman-ddsl)
Run python scripts/build/build_final_index.py --group Bellman-DDSL --minimal directly for better error messages

Step B: Test locally¶

cd /Users/akshayshanker/Desktop/econark/HARK_ask-your-project
streamlit run app.py

Select Bellman-DDSL from the group dropdown in the sidebar.

Step C: Verify all three modes¶

#	Test	What to check
1	Paste `cons_stage.yaml` and say "translate this to a formal MDP"	Output has all 6 sections (Rosetta Stone, Model, Bellman Equation, FOCs, Forward Operator, Calibration). Notation uses \(V\) (decision value, unmarked), \(\partial_w V\), etc.
2	"Write a dolo-plus stage YAML for a consumption-savings problem with CRRA utility, a single asset with return R, and IID income shocks"	Valid YAML with `name`, `symbols`, `equations` (arvl_to_dcsn, dcsn_to_cntn, reward), `calibration`
3	"How do I declare a pre-decision Markov shock in DDSL?"	Answer cites schema rules from `specs/` docs and shows correct `exog_shocks` syntax
4	Open the context expander on any response	Verify retrieved chunks are from bellman-ddsl docs/examples

Step D: Deploy to production¶

The DigitalOcean droplet at http://45.55.225.169:8501/ runs the same app.py. To deploy the new group:

Push the updated config/repositories.yml and indices/Bellman-DDSL/ to the HARK_ask-your-project repo (or rsync to the droplet)
Restart the Streamlit process on the droplet
Access at http://45.55.225.169:8501/ and select Bellman-DDSL group

Step E (optional): Iterate on the system prompt¶

The system prompt lives at:

/Users/akshayshanker/Desktop/econark/HARK_ask-your-project/indices/Bellman-DDSL/minimal/system_prompt.txt

Edit it directly — no rebuild needed. The app reads it fresh on each session. If retrieval quality is poor (wrong chunks surfacing), rebuild the index with adjusted chunk_size/chunk_overlap in repositories.yml.

7. Deployed URL¶

http://45.55.225.169:8501/?group=Bellman-DDSL