Matsya de-sugaring as the Υ interpretation layer¶

be-ddsl #matsya #whisperer #upsilon #rho #ffp #backus #recipe #egm #vfi #ambiguity¶

This note records the lessons from building the port-cons FFP notebook, where Matsya performs the Υ map at runtime inside the solver pipeline.

The core discovery¶

The whisperer does not need to hard-wire a parser for dolo-plus syntax. Instead:

The whisperer reads the stage YAML as raw strings — no dolang compilation, no perch resolver, just the declared equations with their bracket tags.
The whisperer sends the full stage YAML to Matsya with a binding context (what V[>] and dV[>] mean in code, what the grid variables are called).
Matsya returns a structured JSON recipe — Python expressions keyed by DDSL equation block names.
The whisperer compiles those expressions into numpy callables via eval().

The pipeline is:

syntax  →  whisperer + Matsya (Υ)  →  code (Ρ)

Matsya substitutes for solver-level interpretation of the syntax. The solver receives callables; it never touches equation strings.

Why the full stage YAML matters¶

An early attempt sent only the extracted equations to Matsya. This failed because perch tags only have meaning in context. V[>] means "the value function evaluated at the continuation perch" — but which continuation perch? What transition connects decision to continuation? What are the state variables?

The de-sugaring requires:

Symbol declarations — to know that a is a poststate in Xa = R+, c is a control in R+
Transitions — to know that g: a = m_d - c connects decision to continuation
Perch tags on values — to see that V[>] is typed as a value at the continuation perch, and dV[>] is its marginal
Mover sub-equation structure — to see that InvEuler, MarginalBellman, cntn_to_dcsn_transition are sub-blocks of the same backward mover

Without this context, V[>] is just a string. With it, Matsya can resolve V[>] → V_cont(m_d - c) by finding the transition g and composing.

This is exactly the Backus point: the definitional system needs the full declaration context, not just the expression in isolation.

The recipe format¶

The recipe keys are DDSL equation block names — not solver-internal names. This was a deliberate design choice after an early version used invented terms (egm_reverse, egm_inv_euler). The problem: those names were solver-specific and didn't correspond to anything in the stage YAML.

The current mapping:

Recipe key	Stage YAML block	YAML expression	De-sugared expression
`dcsn_to_cntn_transition`	`equations.dcsn_to_cntn_transition`	`a = m_d - c`	`m_d - c`
`cntn_to_dcsn_transition`	`cntn_to_dcsn_mover.cntn_to_dcsn_transition`	`m_d[>] = a + c[>]`	`a + c`
`inv_euler`	`cntn_to_dcsn_mover.InvEuler`	`c[>] = (β*dV[>])^(-1/ρ)`	`(betadV_cont(a))*(-1/rho)`
`marginal_bellman`	`cntn_to_dcsn_mover.MarginalBellman`	`dV = (c)^(-ρ)`	`c**(-rho)`
`utility`	`cntn_to_dcsn_mover.Bellman` (sub-eq)	`u = (c^(1-ρ))/(1-ρ)`	`(c**(1-rho))/(1-rho)`
`vfi_maximand`	`cntn_to_dcsn_mover.Bellman` (composed)	`V = max_{c}(u + β*V[>])`	`(c*(1-rho))/(1-rho) + betaV_cont(m_d - c)`

The vfi_maximand is the only expression that Matsya composes — it substitutes u and V[>] inline. All other expressions are direct translations of individual YAML equations.

Recipe validation and fallback¶

Matsya is an LLM — it can return syntactically invalid Python, hallucinate variable names, or time out. The notebook validates every recipe expression:

def _validate_recipe(recipe):
    for key, expr in recipe.items():
        compile(expr, f'<recipe:{key}>', 'eval')  # syntax check

If any expression fails syntax checking, the entire recipe is discarded and a known-good fallback is used. The fallback is hand-written and tested — it encodes the same mathematics as the YAML but in Python syntax.

This is important architecturally: the whisperer works with or without Matsya. The RAG assistant is an accelerator, not a dependency. In production, the recipe would be generated once and cached; Matsya is called at development time to bootstrap the recipe from the YAML declarations.

Sequential eval as imperative composition¶

The EGM operator evaluates recipe expressions sequentially, binding results into a shared namespace:

# 1. Compute c from inverse Euler
c_endo = eval(recipe['inv_euler'], ns)       # uses dV_cont, a

# 2. Bind c into namespace
ns['c'] = c_endo

# 3. Compute m_d from reverse transition (uses bound c)
m_endo = eval(recipe['cntn_to_dcsn_transition'], ns)  # uses a, c

This mirrors the mathematical structure of EGM: first solve the Euler equation for c on the exogenous grid, then apply the reverse transition to find the endogenous decision grid. The sequential binding is how the algorithm naturally works — each step depends on the previous.

The VFI operator, by contrast, evaluates vfi_maximand as a single closed-form expression — everything is pre-substituted. This is because VFI does grid search: it evaluates the objective for many candidate c values simultaneously, so the expression must be self-contained.

This difference is the source of the recipe scope ambiguity.

Three failures and what they teach¶

Failure 1: `V[>]` as a composition operator¶

An early draft of the overview doc described [>] as a "Backus composition operator" — claiming it composes V with the transition g. This was wrong. V[>] is a perch-qualified identifier, not a composition operator. The composition happens when the definitional system resolves the perch tag by looking up the transition and substituting.

Lesson: The Backus connection is about objects and the definitional system, not about functional operators. V[>] is a string (object) that acquires meaning through D; the [>] tag is a directive to D, not an operator in the algebra.

Failure 2: Unicode/ASCII namespace mismatch¶

The calibration functor produces parameter values with Unicode keys (β, ρ) — because that's how they're declared in the YAML. Matsya returns expressions with ASCII names (beta, rho) — because that's natural for an LLM generating Python. When the whisperer's eval namespace only had Unicode keys, the recipe expressions failed with NameError: name 'beta' is not defined.

Lesson: The Υ/Ρ boundary crosses notation conventions. The YAML lives in a Unicode world (mathematical symbols); the executable code lives in an ASCII world (Python identifiers). The whisperer must bridge this gap. The fix is an alias map that makes both forms available in the eval namespace.

This is a micro-instance of the general Υ → Ρ problem: the same mathematical object (β) has different representations on each side of the map. The representation map must handle the notation translation, not just the semantic translation.

Failure 3: multi-line queries over SSH¶

The Matsya CLI embeds the query in a single-quoted Python string literal for SSH transport. Newlines in the query broke the string literal on the remote host. The fix was trivial (escape \n to \\n), but the bug was invisible — the CLI exited with code 1 and the notebook silently fell back to the hardcoded recipe.

Lesson: String objects must survive transport layers. This is another Backus point: objects are syntactic, and their representation matters when they cross system boundaries. The full stage YAML is a large, structured string with newlines, Unicode, backticks, and YAML special characters — all of which must be preserved through the CLI → SSH → Python → LLM pipeline.

The deeper lesson is about silent failure: the notebook appeared to work fine (the fallback recipe is correct), but the Matsya integration was broken. Explicit error reporting — checking return codes, printing stderr — is essential when the fallback masks the failure.

The scope ambiguity #ambiguity¶

When Matsya returns cntn_to_dcsn_transition: a + c, what is c?

In the stage YAML, the reverse transition says m_d[>] = a + c[>]. The c[>] here is explicitly scoped: it's c at the continuation perch, the same object defined by the inverse Euler equation c[>] = (β*dV[>])^(-1/ρ). The [>] tag on c tells us this is not the general control variable c — it's the specific value computed by the EGM backward step.

But in the recipe, the [>] tag is stripped. c is just c. The EGM operator relies on execution order to give c the right value: it evaluates inv_euler first, binds the result as c, then evaluates cntn_to_dcsn_transition which sees the bound c.

This works, but it's fragile. The meaning of c in the recipe depends on which expressions have been evaluated before it — a form of imperative state that the declarative YAML doesn't have.

Two possible resolutions:

Composable fragments (current): Recipes contain bare variables. The whisperer's eval sequence provides the bindings. The recipe format is simple but order-dependent. This is natural for EGM.
Fully-substituted forms: Matsya substitutes c[>] by its defining expression wherever it appears. The recipe for the reverse transition becomes a + (beta*dV_cont(a))**(-1/rho). Each expression is self-contained. This is natural for VFI (which already has this in vfi_maximand).

The resolution may be that both forms coexist, selected by method:

EGM consumes the fragment keys (inv_euler, cntn_to_dcsn_transition, marginal_bellman) sequentially
VFI consumes the composed key (vfi_maximand) as a single expression

This is already what happens in the notebook. The open question is whether this should be formalized — whether the recipe format should explicitly distinguish fragment keys from composed keys, and whether the Υ map or the Ρ map is responsible for the composition.

If the Υ map produces fragments and the Ρ map (method-dependent) composes them, then the recipe is a Υ-level artifact and composition is a Ρ-level operation. If Matsya returns both forms, the recipe spans both layers.

See the roadmap todo for the current status of this decision.

Implications for the compiler pipeline¶

If Matsya can perform the Υ map from raw YAML strings, the compiler pipeline gains a new option:

Current path: dolang parser → AST → perch resolver → translator → horse → dolo solver
Matsya path: raw YAML strings → Matsya (Υ) → recipe (Python expressions) → eval → solver

The Matsya path bypasses the dolang parser, the perch resolver, and the translator entirely. It goes straight from syntax to executable expressions. This is viable for stages where the equations are simple enough for an LLM to de-sugar correctly (as in the cons stage), but may not scale to complex multi-branch stages.

The two paths are not exclusive. The Matsya path can serve as a rapid prototyping tool — get a notebook working before the full compiler pipeline is built — while the compiler path provides the production guarantee (deterministic, validated, no LLM in the loop).

Meaning map to code (overview) — concise version of this note
Core FFP concepts — Backus objects, functions, functional forms, definitional system
Mathematical definition — D, Υ, Ρ, and the three-layer architecture
Object-first model specification — the typing/wiring layer that precedes Υ
Compiler pipeline architecture — the deterministic alternative to the Matsya path
Port-with-shocks example — the stage YAMLs used in the notebook
Methodization — how method tags select sub-equations