Bellman Foundations¶

DDSL Phase 1.1 — Deck 1/2¶

Convening Meeting — January 2026

Motivation: DP does not have one canonical operator ordering¶

A dynamic program can be written as a composition of operators: - expectation / integration - maximization / decision - push-forward / simulation - approximation, etc.

But perches are not “endpoints of operators.”

The framing we want¶

Previous framing (wrong):

Perches are endpoints of operators (expectation, maximization, …)

Correct framing:

Perches are information sets — σ-algebras (filtrations) describing what is known at each point.

Perches exist to answer one question:

What can the actor condition their policy on?

The general setting: partially observable MDPs (POMDPs)¶

In a POMDP, the actor: - has a true state $s \in S$ (possibly hidden) - receives observations $o = O(s,\eta)$ - must maintain a belief $b(s)$ over the hidden state - chooses actions based on observation history

Problem: optimal policies become maps $\mathcal{P}(S) \to A$. (Scope pointer: see Information Structure Restriction.)

Our restriction: observable sufficient statistics¶

We restrict attention to problems where:

The information available at decision time is a sufficient statistic for continuation.

So we avoid full belief-state POMDPs while covering most economic models. (Scope pointer: see Information Structure Restriction.)

The three perches (three filtrations)¶

┌──────────────┐       ┌──────────────┐       ┌──────────────┐
│   ARRIVAL    │       │   DECISION   │       │ CONTINUATION │
│    x_arvl    │ ────▶ │    x_dcsn    │ ────▶ │    x_cntn    │
│    𝓕_arvl    │       │    𝓕_dcsn    │       │    𝓕_cntn    │
└──────────────┘       └──────────────┘       └──────────────┘

Three filtrations; everything else is bookkeeping.

Perch tags mean “adapted to the filtration”¶

In DDSL-SYM, a perch index is a measurability claim:

$z[<]$ means $z$ is $\mathcal{F}_{\text{arvl}}$-measurable
$z$ (unmarked) means $z$ is $\mathcal{F}_{\text{dcsn}}$-measurable
$z[>]$ means $z$ is $\mathcal{F}_{\text{cntn}}$-measurable

So every transition / equation must be written so that the mapping exists: the RHS can only use information available at that perch.

Arrival perch = prior filtration¶

The state before any within-stage observations or decisions.

\[ \mathcal{F}_{\text{arvl}} = \mathcal{F}_{-1} \]

“What do I know coming in?”

Decision perch = observable filtration¶

All information used by the actor to choose.

\[ \mathcal{F}_{\text{dcsn}} = \mathcal{F}_{\text{arvl}} \vee \sigma(\zeta) \]

where $\zeta$ are shocks/observations revealed before action.

“What can my policy condition on?”

Continuation perch = full filtration¶

The realized outcome after action and all within-stage uncertainty.

\[ \mathcal{F}_{\text{cntn}} = \mathcal{F}_{\text{dcsn}} \vee \sigma(\eta,\pi) \]

where $\eta$ are shocks revealed after action $\pi$.

“What is the realized state passed onward?”

The Markov restriction (the tractability constraint)¶

The continuation state depends only on:

\[ x_{\text{cntn}} = g\big(x_{\text{dcsn}},\, \pi,\, \eta\big) \]

Equivalently:

\[ x_{\text{cntn}} \perp x_{\text{arvl}} \;\mid\; x_{\text{dcsn}}, \pi, \eta \]

This prevents a drift back to full POMDP belief tracking. (Scope pointer: see Information Structure Restriction.)

Two timing patterns (both fit the same three perches)¶

Pattern A (observed shock before action):

ARRIVAL ────(observe ζ)──▶ DECISION ────(choose π)───▶ CONTINUATION

Pattern B (shock after action):

ARRIVAL ───────▶ DECISION ──(choose π, then η)──────▶ CONTINUATION

Expectation placement follows timing¶

If shocks are observed before action, expectations naturally live “on the left”:

\[ \mathcal{A}(x_a) = \mathbb{E}_{\zeta}\big[\mathcal{V}(g_{av}(x_a,\zeta))\big] \]

If shocks are unobserved at action time, expectations naturally live “inside the choice”:

\[ \mathcal{V}(x_v) = \max_{\pi}\Big\{r(x_v,\pi) + \beta\,\mathbb{E}_{\eta}\big[\mathcal{E}(g_{ve}(x_v,\pi,\eta))\big]\Big\} \]

Perches don’t proliferate¶

Between decision and continuation you might compute: - a maximization (choice) - an expectation (unobserved uncertainty) - other operators (inversion, projection, …)

That does not create new perches.

Perches track information, not “how many operators we applied.” (Scope pointer: see Factorizations in Scope.)

What DDSL will ask you to specify (high-level)¶

The state objects at each perch $x_{\text{arvl}}, x_{\text{dcsn}}, x_{\text{cntn}}$
Shocks and when they are revealed (observed vs unobserved)
Transitions between perches
Rewards and discounting

Numerics (grids, interpolation, quadrature) live elsewhere.

A stage is already a small graph (perches = nodes, movers = edges)¶

  ^------------ Push-forward measure -----------|
  |                                              v
arvl  ──(observe)────▶  dcsn  ──(choose)────▶  cntn
  ^                                              |
  |-------------- Bellman backward -------------|

The stage’s content is an operator factorization: it makes the Bellman structure explicit as composable pieces.

Conjugation (duality): Backward and forward operators are conjugates: $\langle \mathcal{F} \mu, V \rangle = \langle \mu, \mathcal{B} V \rangle$

$\mathcal{B}_{\text{arvl}}$ (expectation) ↔ $\mathcal{F}_{\text{arvl}}$ (push-forward)
$\mathcal{B}_{\text{dcsn}}$ (Bellman max) ↔ $\mathcal{F}_{\text{dcsn}}$ (policy push-forward)

Syntax specifies backward (problem); forward (simulation) derived via conjugation. (Scope pointer: see Problem Chunks.)

A model is a graph (not a sequence) of stages¶

In the simplest case you have a chain, but branching and reuse are natural:

Stage A  ──▶  Stage B  ──▶  Stage C
   │
   └────▶  Stage D

Edges are connectors: formal maps that wire one stage’s continuation objects into another stage’s arrival objects (renaming, projections, “twisters”, etc.). (Scope pointer: see Problem Chunks.)

Graph view ↔ category view (same idea, more structure)¶

graph: stages as nodes, connectors as arrows
category: stages as objects; connectors as morphisms; wiring is composition
(identities correspond to “no-op” connectors)

This is the right abstraction when models are not linear time-indexed scripts.

The expressive job of DDSL (and what it is not)¶

DDSL is for: representing Bellman operators (stages) and connecting them formally (connectors / composition)
DDSL is not: “start with a sequence of equations over time” as the primary organizing principle

Sequence is a special case of a stage graph.

Next: DDSL foundations (deck 2/2)¶

Open: AI/working/12012026/ddsl_foundations.md

Topics: - SYM vs CORE - Υ / ρ meaning maps - methodization + calibration + settings - worked examples (top-down)