5. Bellman Theory Integration¶
This section establishes how DDSL models connect to the mathematical theory of dynamic programming through movers as the central functional operators, the ADC (Arrival-Decision-Continuation) framework, and the composition of registry operators.
5.1 Movers as the Central Concept¶
The Primacy of Movers¶
In the DDSL, movers are the primary functional operators that implement the Bellman decomposition:
Definition (Mover): A mover is a syntactic object that represents a functional operator transforming functions or distributions between perches. Each mover: - Has a mathematical meaning under Υ (an operator on function/distribution spaces) - Composes built-in registry operators (E, argmax, ~, APPROX) - Has an approximate numerical counterpart once methodized and calibrated
Movers are what the methodization functor acts upon to produce numerical algorithms.
Mover Categories¶
Backward movers implement value function updates (Bellman-type operations): - Transform continuation values into current-period values - Implement expectation over shocks, optimization over actions - Flow: continuation → decision → arrival
Forward movers implement distribution evolution (simulation/push-forward): - Transform distributions through state transitions and policies - Implement sampling, policy application, state evolution - Flow: arrival → decision → continuation
The Factored Bellman Operator¶
The Bellman operator is expressed as composition of movers:
where: - \(\mathbb{B}\): Decision mover (optimization + continuation linkage) - \(\mathbb{I}\): Arrival mover (expectation over shocks)
Similarly, the conjugate (forward) operator:
5.2 The ADC Factorization¶
Perches as information sets (the correct framing)¶
It is tempting to think of perches as “endpoints of operators” (expectation, maximization, etc.). That framing is misleading.
Correct framing: perches represent information sets—\(\sigma\)-algebras (a filtration) describing what is known at each point within a stage. Perches exist to answer one question:
What can the actor condition their policy on?
Formally, within a stage we have a filtration
where \(x_m\) is the (sufficient) information state at perch \(m\).
This framing is compatible with (and motivated by) POMDPs: in the fully general case, policies depend on observation history (or beliefs). We restrict attention to the common economic case where the decision-time information state is a sufficient statistic for continuation (a Markov restriction), so we can work with finite-dimensional \(x_m\) rather than belief distributions.
Perch tags are measurability claims¶
In DDSL-SYM/dolo-plus stage notation, a perch tag is a measurability/adaptedness claim:
- \(z[<]\) means \(z\) is \(\mathcal{F}_{\prec}\)-measurable
- \(z\) (unmarked) means \(z\) is \(\mathcal{F}_{\sim}\)-measurable
- \(z[>]\) means \(z\) is \(\mathcal{F}_{\succ}\)-measurable
So every equation must be written so that the mapping exists: the RHS can only use information available at that perch.
This also clarifies where expectation/maximization “live”: they are operations inside movers, and different operator orderings (e.g. moving an expectation inside a maximization) do not change the meaning of perches themselves.
Three-Perch Structure¶
The ADC framework decomposes each period into three perches:
- Arrival (\(\prec\)): State at the beginning of the period, before shock realization
- Decision (\(\sim\)): State after shock realization, where optimization occurs
- Continuation (\(\succ\)): State after decision, linking to the next period
Each perch has its own state space: - \(X_{\prec}\) - arrival state space - \(X_{\sim}\) - decision state space - \(X_{\succ}\) - continuation state space
Perch-to-Perch Movers¶
The ADC structure defines canonical mover directions:
Backward (value iteration):
Ve (continuation) ──B_dcsn──> Vv (decision) ──B_arvl──> Va (arrival)
Forward (simulation):
D_arvl (arrival) ──F_arvl──> D_dcsn (decision) ──F_dcsn──> D_cntn (continuation)
In YAML:
movers:
# Backward movers
"cntn_to_dcsn": B_dcsn: [Xe -> R] -> [Xv -> R, Xv -> A]
"dcsn_to_arvl": B_arvl: [Xv -> R] -> [Xa -> R]
# Forward movers
"arvl_to_dcsn": F_arvl: Dist(Xa) -> Dist(Xv)
"dcsn_to_cntn": F_dcsn: Dist(Xv) -> Dist(Xe)
5.3 Built-in Operators Inside Movers¶
Each mover composes registry operators to implement its transformation. The operator registry (§3.5) provides the building blocks.
Note on notation: the @methods: {...} lines shown inside mover bodies below are a compact way to display the result of methodization (a “methodized-stage view”). In the canonical three-file architecture, the authored stage.yml contains no @methods; method choices live in methodization.yml.
The Decision Mover (\(\mathbb{B}\))¶
The decision mover implements the Bellman maximization:
B_dcsn: Ve -> [Vv, astar] @via |
astar(xv) = argmax_{a in Gamma(xv)} Q_kernel(xv, a)
Vv(xv) = Q_kernel(xv, astar(xv))
@methods: {argmax: optim_a, Vv: interp_Vv}
Registry operators used:
- argmax_{a ∈ Γ(x)}: Optimization operator (finds optimal action)
- APPROX: Approximation operator (attached by methodization to produce Vv as interpolant)
Mathematical meaning under Υ: $$ (\mathbb{B}\, V_{\succ})(x_{\sim}) = \max_{a \in \Gamma(x_{\sim})} Q(x_{\sim}, a) $$ $$ a^*(x_{\sim}) = \arg\max_{a \in \Gamma(x_{\sim})} Q(x_{\sim}, a) $$
where \(Q(x_{\sim}, a) = u(a) + \beta V_{\succ}(\mathrm{g}_{\sim\succ}(x_{\sim}, a))\).
The Arrival Mover (\(\mathbb{I}\))¶
The arrival mover implements the conditional expectation:
B_arvl: Vv -> Va @via |
xv = g_av(xa, w)
Va(xa) = E_w[Vv(xv)]
@methods: {E_w: expect_w, Va: interp_Va}
Registry operators used:
- E_w: Expectation operator (integrates over shock distribution)
- APPROX: Approximation operator (produces Va as interpolant)
Mathematical meaning under Υ: $$ (\mathbb{I}\, V_{\sim})(x_{\prec}) = \mathbb{E}w[V}(\mathrm{g{\prec\sim}(x, w))] $$
This is a conditional expectation: we condition on the arrival state \(x_{\prec}\) and integrate over the shock \(w\).
Forward Mover: Arrival to Decision (\(\mathrm{F}_{\prec}\))¶
F_arvl: D_arvl -> D_dcsn @via |
xa_tilde ~ D_arvl
w ~ D_w
xv_tilde = g_av(xa_tilde, w)
@methods: {w: simulate_w}
Registry operators used:
- ~ (draw): Sampling operator (draws from distributions)
- PushForward: Push-forward induced by the transition kernel (often implicit in SYM; realized via sampling/quadrature under ρ)
Kernel vs. method: the transition kernel g_av is a pure function object. The methodization-relevant pieces for forward evolution are the draw / push-forward operators (e.g., simulation scheme, RNG settings), not the kernel definition itself.
Mathematical meaning under Υ: $$ (\mathrm{F}{\prec}\, \mu}) = (\mathrm{g{\prec\sim})* (\mu_{\prec} \otimes \mu_w) $$
The pushforward of the product measure through the transition kernel.
Forward Mover: Decision to Continuation (\(\mathrm{F}_{\succ}\))¶
F_dcsn: D_dcsn -> D_cntn @via |
xv_tilde ~ D_dcsn
a_tilde = astar(xv_tilde)
xe_tilde = g_ve(xv_tilde, a_tilde)
Registry operators used:
- ~ (draw): Sampling from decision distribution
- Policy evaluation: Apply converged policy function
- Pushforward through transition
Mathematical meaning under Υ: $$ (\mathrm{F}{\succ}\, \mu}) = (\mathrm{g{\sim\succ}(\cdot, a^(\cdot)))_ \mu $$
5.4 Transition Kernels¶
The ADC framework uses specific transition kernels between perches:
equations:
# Arrival to decision (with shock)
g_av: [xa, w] @mapsto xv @via |
xv = xa * r + w
# Decision to continuation (with action)
g_ve: [xv, a] @mapsto xe @via |
xe = xv - a
These kernels define the state evolution: - \(x_{\sim} = \mathrm{g}_{\prec\sim}(x_{\prec}, w)\): Shock realization moves state from arrival to decision - \(x_{\succ} = \mathrm{g}_{\sim\succ}(x_{\sim}, a)\): Action moves state from decision to continuation - \(x_{\prec,+1} = g_{ea}(x_{\succ})\): Time evolution to next period (often identity)
Method note: transition kernels are Υ-level primitives. They only become method-relevant when you evaluate movers under ρ and the mover execution invokes methodized operators (expectation, optimization/rootfinding, approximation/interpolation, draws/push-forwards, etc.).
5.5 Value Functions and Policy Functions¶
Value Function Decomposition¶
The ADC framework decomposes the value function into perch-specific components:
- \(V_{\prec}: X_{\prec} \to \mathbb{R}\) — value at arrival
- \(V_{\sim}: X_{\sim} \to \mathbb{R}\) — value at decision (after observing shock)
- \(V_{\succ}: X_{\succ} \to \mathbb{R}\) — continuation value
These satisfy the system: $$ \begin{align} V_{\succ}(x_{\succ}) &= V_{\prec,+1}(g_{ea}(x_{\succ})) \ V_{\sim}(x_{\sim}) &= \max_{a \in \Gamma(x_{\sim})} \left{ u(a) + \beta V_{\succ}(\mathrm{g}{\sim\succ}(x, a)) \right} \ V_{\prec}(x_{\prec}) &= \mathbb{E}w[V}(\mathrm{g{\prec\sim}(x, w))] \end{align} $$
Policy Functions¶
The optimization at the decision perch yields a policy function:
This policy function is: - Computed by the decision mover (\(\mathbb{B}\)) - Stored/approximated according to its method schema - Used by forward movers for simulation
5.6 Forward-Backward Duality¶
Conjugate Operators¶
Each backward mover has a conjugate forward mover:
| Backward Mover | Conjugate Forward Mover | Duality |
|---|---|---|
| \(\mathbb{I}\): \([X_{\sim} \to \mathbb{R}] \to [X_{\prec} \to \mathbb{R}]\) | \(\mathrm{F}_{\prec}\): \(\mathcal{D}(X_{\prec}) \to \mathcal{D}(X_{\sim})\) | Expectation ↔ Pushforward |
| \(\mathbb{B}\): \([X_{\succ} \to \mathbb{R}] \to [X_{\sim} \to \mathbb{R}]\) | \(\mathrm{F}_{\succ}\): \(\mathcal{D}(X_{\sim}) \to \mathcal{D}(X_{\succ})\) | Bellman ↔ Policy pushforward |
Duality Relation¶
The forward and backward operators satisfy:
where \(\langle \cdot, \cdot \rangle\) denotes integration: \(\langle \mu, f \rangle = \int f \, d\mu\).
5.7 Computational Flow¶
Backward Pass (Value Function Iteration)¶
Ve (given/guess)
↓ B_dcsn (argmax, APPROX)
[Vv, astar]
↓ B_arvl (E_w, APPROX)
Va
↓ link to next period
Ve (updated)
Algorithm: 1. Start with initial guess \(V_{\succ}^0\) 2. Apply decision mover: \((V_{\sim}^n, a^{*,n}) = \mathbb{B}\, V_{\succ}^n\) 3. Apply arrival mover: \(V_{\prec}^n = \mathbb{I}\, V_{\sim}^n\) 4. Update continuation: \(V_{\succ}^{n+1} = V_{\prec}^n\) (or appropriate linkage) 5. Check convergence: \(\|V_{\succ}^{n+1} - V_{\succ}^n\| < \epsilon\)
Forward Pass (Simulation)¶
D_arvl (given/initial)
↓ F_arvl (~, pushforward)
D_dcsn
↓ F_dcsn (policy, pushforward)
D_cntn
↓ link to next period
D_arvl (next period)
Algorithm: 1. Start with initial distribution \(\mu_{\prec}^0\) 2. Apply arrival forward: \(\mu_{\sim}^t = \mathrm{F}_{\prec}\, \mu_{\prec}^t\) 3. Apply decision forward: \(\mu_{\succ}^t = \mathrm{F}_{\succ}\, \mu_{\sim}^t\) 4. Evolve to next period: \(\mu_{\prec}^{t+1} = \mathrm{F}_{\text{link}}\, \mu_{\succ}^t\)
5.8 Convergence in the Limit¶
Mathematical Contraction¶
Under standard conditions (boundedness, discounting, continuity), the factorized Bellman operator is a contraction:
This guarantees: 1. Existence: A unique fixed point \(V^*\) exists 2. Uniqueness: The value function is uniquely determined 3. Convergence: Value iteration converges geometrically
Approximate Operators and Limit Agreement¶
The methodized and calibrated mover under ρ is an approximate operator:
This approximate operator: - Acts on discrete primitive objects (grid values, coefficient arrays) - Is not equal to the mathematical operator \(\Upsilon(\mathbb{T})\) - Converges to the true operator as discretization improves
Limit Agreement Theorem (informal): Under appropriate assumptions on the numerical schemes:
where the convergence is in an appropriate operator norm.
Error Decomposition¶
The total error in the numerical solution decomposes as:
Each term corresponds to a registry operator and its method: - \(\epsilon_{\text{interp}}\): From APPROX with interpolation scheme - \(\epsilon_{\text{quad}}\): From E_w with quadrature scheme - \(\epsilon_{\text{optim}}\): From argmax with optimization scheme - \(\epsilon_{\text{iter}}\): From finite iterations
5.9 Buffer-Stock Example: Complete Mover Structure¶
Stage Definition¶
stage:
name: consind
model_type: adc_stage
inputs: [Ve, D_arvl]
outputs: [Va, astar, D_dcsn, D_cntn]
Backward Movers¶
Decision mover (cntn_to_dcsn):
B_dcsn: Ve -> [Vv, astar] @via |
astar(xv) = argmax_{a in Gamma(xv)} Q_kernel(xv, a)
Vv(xv) = Q_kernel(xv, astar(xv))
@methods: {argmax: optim_a, Vv: interp_Vv}
Mathematical meaning:
- Uses argmax from registry to find \(a^*(x_{\sim})\)
- Uses APPROX to store \(V_{\sim}\) as interpolant
- \(Q(x_{\sim}, a) = u(a) + \beta V_{\succ}(\mathrm{g}_{\sim\succ}(x_{\sim}, a))\)
Arrival mover (dcsn_to_arvl):
B_arvl: Vv -> Va @via |
xv = g_av(xa, w)
Va(xa) = E_w[Vv(xv)]
@methods: {E_w: expect_w, Va: interp_Va}
Mathematical meaning:
- Uses E_w from registry for conditional expectation
- Integrates over shock \(w\) conditioned on arrival state \(x_{\prec}\)
- Uses APPROX to store \(V_{\prec}\) as interpolant
Forward Movers¶
Arrival forward (arvl_to_dcsn):
F_arvl: D_arvl -> D_dcsn @via |
xa_tilde ~ D_arvl
w ~ D_w
xv_tilde = g_av(xa_tilde, w)
@methods: {w: simulate_w}
Decision forward (dcsn_to_cntn):
F_dcsn: D_dcsn -> D_cntn @via |
xv_tilde ~ D_dcsn
a_tilde = astar(xv_tilde)
xe_tilde = g_ve(xv_tilde, a_tilde)
Factored Bellman Operator¶
The complete Bellman operator is:
which under Υ corresponds to:
5.10 Summary¶
The Bellman theory integration in DDSL provides:
- Movers as central objects: Functional operators that implement the Bellman decomposition
- Registry operator composition: Each mover uses E, argmax, ~, APPROX from the operator registry
- ADC factorization: Systematic decomposition into arrival, decision, continuation
- Forward-backward duality: Conjugate operators for value iteration and simulation
- Convergence guarantees: Mathematical operators as limits of approximate numerical operators
- Explicit method attachment: Registry operators get schemes through methodization
Key insight:
A syntactic mover maps to a mathematical operator under Υ. A methodized mover maps to an approximate operator under ρ. The two agree only in the limit as discretization parameters go to their ideal values.
This ensures: - Theoretical grounding: Based on established dynamic programming theory - Computational clarity: Explicit decomposition into implementable pieces - Numerical control: Error bounds through method selection - Modularity: Movers can be composed and reused