Skip to content

Bellman Foundations

DDSL Phase 1.1 — Deck 1/2

Convening Meeting — January 2026


Motivation: DP does not have one canonical operator ordering

A dynamic program can be written as a composition of operators: - expectation / integration - maximization / decision - push-forward / simulation - approximation, etc.

But perches are not “endpoints of operators.”


The framing we want

Previous framing (wrong):

Perches are endpoints of operators (expectation, maximization, …)

Correct framing:

Perches are information sets — σ-algebras (filtrations) describing what is known at each point.

Perches exist to answer one question:

What can the actor condition their policy on?


The general setting: partially observable MDPs (POMDPs)

In a POMDP, the actor: - has a true state \(s \in S\) (possibly hidden) - receives observations \(o = O(s,\eta)\) - must maintain a belief \(b(s)\) over the hidden state - chooses actions based on observation history

Problem: optimal policies become maps \(\mathcal{P}(S) \to A\). (Scope pointer: see Information Structure Restriction.)


Our restriction: observable sufficient statistics

We restrict attention to problems where:

The information available at decision time is a sufficient statistic for continuation.

So we avoid full belief-state POMDPs while covering most economic models. (Scope pointer: see Information Structure Restriction.)


The three perches (three filtrations)

┌──────────────┐       ┌──────────────┐       ┌──────────────┐
│   ARRIVAL    │       │   DECISION   │       │ CONTINUATION │
│    x_arvl    │ ────▶ │    x_dcsn    │ ────▶ │    x_cntn    │
│    𝓕_arvl    │       │    𝓕_dcsn    │       │    𝓕_cntn    │
└──────────────┘       └──────────────┘       └──────────────┘

Three filtrations; everything else is bookkeeping.


Perch tags mean “adapted to the filtration”

In DDSL-SYM, a perch index is a measurability claim:

  • $z[<]$ means $z$ is \(\mathcal{F}_{\text{arvl}}\)-measurable
  • $z$ (unmarked) means $z$ is \(\mathcal{F}_{\text{dcsn}}\)-measurable
  • $z[>]$ means $z$ is \(\mathcal{F}_{\text{cntn}}\)-measurable

So every transition / equation must be written so that the mapping exists: the RHS can only use information available at that perch.


Arrival perch = prior filtration

The state before any within-stage observations or decisions.

\[ \mathcal{F}_{\text{arvl}} = \mathcal{F}_{-1} \]

“What do I know coming in?”


Decision perch = observable filtration

All information used by the actor to choose.

\[ \mathcal{F}_{\text{dcsn}} = \mathcal{F}_{\text{arvl}} \vee \sigma(\zeta) \]

where \(\zeta\) are shocks/observations revealed before action.

“What can my policy condition on?”


Continuation perch = full filtration

The realized outcome after action and all within-stage uncertainty.

\[ \mathcal{F}_{\text{cntn}} = \mathcal{F}_{\text{dcsn}} \vee \sigma(\eta,\pi) \]

where \(\eta\) are shocks revealed after action \(\pi\).

“What is the realized state passed onward?”


The Markov restriction (the tractability constraint)

The continuation state depends only on:

\[ x_{\text{cntn}} = g\big(x_{\text{dcsn}},\, \pi,\, \eta\big) \]

Equivalently:

\[ x_{\text{cntn}} \perp x_{\text{arvl}} \;\mid\; x_{\text{dcsn}}, \pi, \eta \]

This prevents a drift back to full POMDP belief tracking. (Scope pointer: see Information Structure Restriction.)


Two timing patterns (both fit the same three perches)

Pattern A (observed shock before action):

ARRIVAL ────(observe ζ)──▶ DECISION ────(choose π)───▶ CONTINUATION

Pattern B (shock after action):

ARRIVAL ───────▶ DECISION ──(choose π, then η)──────▶ CONTINUATION

Expectation placement follows timing

If shocks are observed before action, expectations naturally live “on the left”:

\[ \mathcal{A}(x_a) = \mathbb{E}_{\zeta}\big[\mathcal{V}(g_{av}(x_a,\zeta))\big] \]

If shocks are unobserved at action time, expectations naturally live “inside the choice”:

\[ \mathcal{V}(x_v) = \max_{\pi}\Big\{r(x_v,\pi) + \beta\,\mathbb{E}_{\eta}\big[\mathcal{E}(g_{ve}(x_v,\pi,\eta))\big]\Big\} \]

Perches don’t proliferate

Between decision and continuation you might compute: - a maximization (choice) - an expectation (unobserved uncertainty) - other operators (inversion, projection, …)

That does not create new perches.

Perches track information, not “how many operators we applied.” (Scope pointer: see Factorizations in Scope.)


What DDSL will ask you to specify (high-level)

  • The state objects at each perch \(x_{\text{arvl}}, x_{\text{dcsn}}, x_{\text{cntn}}\)
  • Shocks and when they are revealed (observed vs unobserved)
  • Transitions between perches
  • Rewards and discounting

Numerics (grids, interpolation, quadrature) live elsewhere.


A stage is already a small graph (perches = nodes, movers = edges)

  ^------------ Push-forward measure -----------|
  |                                              v
arvl  ──(observe)────▶  dcsn  ──(choose)────▶  cntn
  ^                                              |
  |-------------- Bellman backward -------------|

The stage’s content is an operator factorization: it makes the Bellman structure explicit as composable pieces.

Conjugation (duality): Backward and forward operators are conjugates: \(\langle \mathcal{F} \mu, V \rangle = \langle \mu, \mathcal{B} V \rangle\)

  • \(\mathcal{B}_{\text{arvl}}\) (expectation) ↔ \(\mathcal{F}_{\text{arvl}}\) (push-forward)
  • \(\mathcal{B}_{\text{dcsn}}\) (Bellman max) ↔ \(\mathcal{F}_{\text{dcsn}}\) (policy push-forward)

Syntax specifies backward (problem); forward (simulation) derived via conjugation. (Scope pointer: see Problem Chunks.)


A model is a graph (not a sequence) of stages

In the simplest case you have a chain, but branching and reuse are natural:

Stage A  ──▶  Stage B  ──▶  Stage C
   └────▶  Stage D

Edges are connectors: formal maps that wire one stage’s continuation objects into another stage’s arrival objects (renaming, projections, “twisters”, etc.). (Scope pointer: see Problem Chunks.)


Graph view ↔ category view (same idea, more structure)

  • graph: stages as nodes, connectors as arrows
  • category: stages as objects; connectors as morphisms; wiring is composition
    (identities correspond to “no-op” connectors)

This is the right abstraction when models are not linear time-indexed scripts.


The expressive job of DDSL (and what it is not)

  • DDSL is for: representing Bellman operators (stages) and connecting them formally (connectors / composition)
  • DDSL is not: “start with a sequence of equations over time” as the primary organizing principle

Sequence is a special case of a stage graph.


Next: DDSL foundations (deck 2/2)

Open: AI/working/12012026/ddsl_foundations.md

Topics: - SYM vs CORE - Υ / ρ meaning maps - methodization + calibration + settings - worked examples (top-down)