Skip to content

Related

Syntax & implementation spec: Spec 0.1l — Branching stages (syntax & implementation)

Baseline (sequential-only): Spec 0.1h — Periods and Nests

Semantic rules: 05 Periods & Models, 03 Equations

Spec 0.1l — Branching stages (theory)

This document collects the theory/reference material for branching stages (coproduct continuation spaces, multi-source movers, endogenous choice probabilities, and the index-shape/pushout intuition for wiring). The corresponding surface syntax and implementation decisions live in spec_0.1l-branching.md.

1. Mathematical foundations

1.1 The branching Bellman equation

The standard (sequential) stage Bellman equation at the decision perch is:

\[ \mathrm{v}(x) = \max_\pi \bigl\{ \mathrm{r}(x, \pi) + \beta(x) \, \mathrm{v}_{\succ}(\mathrm{g}_{\sim\succ}(x, \pi)) \bigr\} \]

where \(\mathrm{v}_{\succ}\) is a single continuation value function. The branching generalization replaces the single continuation with a weighted combination over branches:

\[ \mathrm{v}(x) = \max_\pi \left\{ \mathrm{r}(x, \pi) + \beta(x) \sum_{j \in N_+} p(j \mid x, \pi) \, \mathrm{v}_{\prec,j}\!\bigl(\mathrm{g}_{\sim\succ,j}(x, \pi)\bigr) \right\} \]

where:

  • \(N_+\) is the set of valid successor stage kinds (the branch labels)
  • \(p(j \mid x, \pi)\) is the branching probability for branch \(j\), satisfying \(\sum_{j \in N_+} p(j \mid x, \pi) = 1\)
  • \(\mathrm{v}_{\prec,j}\) is the arrival value function for branch \(j\) — a separate named function on branch \(j\)'s state space
  • \(\mathrm{g}_{\sim\succ,j}: \mathsf{X} \times \Pi \to \mathsf{X}_{\succ,j}\) is the branch-specific transition from decision to successor arrival

Two aggregation modes:

Aggregation Probabilities Value operator Interpretation
Discrete choice (max) \(p(j \mid x_v, \pi) \in \{0, 1\}\) with exactly one \(j^* = \pi\) \(\max_j V_j\) Agent chooses best branch
Expectation (expectation) \(p(j \mid x_v, \pi) \in [0, 1]\) \(\sum_j p_j V_j\) Weighted average over branches

The expectation mode supports both exogenous and endogenous probabilities:

Probability source How \(p\) is determined Example
Exogenous Declared in parameters: or exogenous: Survival probability \(s_h\)
Endogenous Computed from branch value functions, then fed into expectation Softmax: \(p(j) = \exp(V_j/\sigma) / \sum_k \exp(V_k/\sigma)\)

For endogenous probabilities, the probability generation is a separate computational step that precedes the expectation. The aggregation mode remains expectation — no third mode is introduced. See §1.6 for the mathematical foundations and the syntax/implementation spec for the YAML pattern.

For discrete choice, the equation simplifies to:

\[ \mathrm{v}(x) = \max_{j \in N_+} \mathrm{v}_{\prec,j}\!\bigl(\mathrm{g}_{\sim\succ,j}(x)\bigr) \]

where \(\mathrm{g}_{\sim\succ,j}\) is the branch-specific transition (each branch may have a different state space mapping).

1.2 Within a single stage: multiple continuation perches

A branching stage has multiple continuation perches, one per branch:

Standard stage Branching stage
arvldcsncntn arvldcsnown, rent, ...
One continuation state space \(\mathsf{X}_{\succ}\) Branch-specific: \(\mathsf{X}_{\succ,\text{own}}\), \(\mathsf{X}_{\succ,\text{rent}}\), ...
One continuation value \(\mathrm{v}_{\succ}(x_{\succ})\) Branch-specific: \(\mathrm{v}_{\succ,\text{own}}(x_{\succ,\text{own}})\), \(\mathrm{v}_{\succ,\text{rent}}(x_{\succ,\text{rent}})\), ...

The key structural rule: value functions along each branch are named and separable. Each branch's continuation value function is an independent named object, not a component of a vector-valued function.

Formal claim: the continuation space is a coproduct

Claim. A poststates declaration with named sub-blocks defines the continuation state space as a coproduct (disjoint union) indexed by a discrete label set:

\[ \mathsf{X}_{\succ} \;=\; \coprod_{j \in N_+} \mathsf{X}_{\succ,j} \]

An element of \(\mathsf{X}_{\succ}\) is a tagged pair \((j,\, x_j)\) where \(j \in N_+\) is the branch label and \(x_j \in \mathsf{X}_{\succ,j}\) is the branch-specific (typically continuous) state. The discrete index \(j\) and the continuous state \(x_j\) are jointly the continuation state.

Special case (common continuous dimensions). When all branches share the same continuous state space \(\mathsf{X}_c\), the coproduct collapses to a product:

\[ \mathsf{X}_{\succ} = N_+ \times \mathsf{X}_c \]

i.e., the continuation state is a pair \((j, x_c)\) with \(j\) a discrete index and \(x_c\) a continuous state. Value functions separate as \(\mathrm{v}_{\succ}(j, x_c) = \mathrm{v}_{\succ,j}(x_c)\).

General case (heterogeneous dimensions). When branches have different continuous state spaces (e.g., \(\mathsf{X}_{\succ,\text{own}} = \mathbb{R}_+ \times \mathcal{H} \times \mathcal{Y}\) vs. \(\mathsf{X}_{\succ,\text{rent}} = \mathbb{R}_+ \times \mathcal{Y}\)), the coproduct is the appropriate construction. The YAML sub-block declaration is the syntactic representation of this coproduct: each named sub-block declares one summand with its own field names and domains.

Consequence for value functions. A value function on a coproduct is equivalent to a family of value functions, one per summand:

\[ \mathrm{v}_{\succ} \in \mathcal{V}(\mathsf{X}_{\succ}) \;\;\iff\;\; (\mathrm{v}_{\succ,j})_{j \in N_+} \in \prod_{j \in N_+} \mathcal{V}(\mathsf{X}_{\succ,j}) \]

This is precisely the separability condition: the named sub-blocks in poststates declare that the continuation value decomposes into independent, named components indexed by the discrete label \(j\).

Categorical foundations: coproduct, contravariance, and the backward mover

The coproduct construction has a precise categorical interpretation that clarifies the backward mover's signature and the role of the aggregation mode.

The value-function functor is contravariant. Define \(\mathcal{V}: \mathbf{Meas}^{\mathrm{op}} \to \mathbf{Vect}\) by \(\mathcal{V}(\mathsf{X}) \coloneqq \{v : \mathsf{X} \to \overline{\mathbb{R}} \mid v \text{ measurable}\}\). A morphism \(f: \mathsf{X} \to \mathsf{Y}\) induces a pullback \(f^*: \mathcal{V}(\mathsf{Y}) \to \mathcal{V}(\mathsf{X})\) by \(f^*(v) = v \circ f\). This reverses arrows.

Contravariance sends coproducts to products. The coproduct inclusions \(\iota_j: \mathsf{X}_{\succ,j} \hookrightarrow \coprod_k \mathsf{X}_{\succ,k}\) induce projections \(\pi_j = \iota_j^*: \mathcal{V}(\coprod_k \mathsf{X}_{\succ,k}) \to \mathcal{V}(\mathsf{X}_{\succ,j})\) (restriction to summand \(j\)). This gives:

\[ \mathcal{V}\!\Bigl(\coprod_{j \in N_+} \mathsf{X}_{\succ,j}\Bigr) \;\cong\; \prod_{j \in N_+} \mathcal{V}(\mathsf{X}_{\succ,j}) \]

A value function on the coproduct is a family of value functions, one per summand. This is the categorical statement underlying the separability condition above.

The backward mover is a morphism out of the product. The backward mover has signature:

\[ \mathbb{B}: \prod_{j \in N_+} \mathcal{V}(\mathsf{X}_{\succ,j}) \longrightarrow \mathcal{V}(\mathsf{X}) \]

This is a morphism out of the product of value-function spaces. The product universal property characterises morphisms into the product (given components, construct the tuple). The backward mover goes the other direction — it consumes the product. This means \(\mathbb{B}\) is not the unique mediating morphism from the product universal property; it is additional structure.

Factorisation of \(\mathbb{B}\). The backward mover decomposes as:

\[ \mathbb{B} = \mathcal{A} \circ (\mathbb{B}_j)_{j \in N_+} \]

where:

  1. Branch-specific movers \(\mathbb{B}_j: \mathcal{V}(\mathsf{X}_{\succ,j}) \to \mathcal{V}(\mathsf{X})\) each produce a "what-if" value: the value at \(x\) if branch \(j\) were taken.
  2. Aggregator \(\mathcal{A}: \prod_{j \in N_+} \mathcal{V}(\mathsf{X}) \to \mathcal{V}(\mathsf{X})\) combines the branch values pointwise.

The aggregator is determined by the aggregation mode:

Mode \(\mathcal{A}\bigl((w_j)_j\bigr)(x)\) Category
max (agent) \(\max_{j} w_j(x)\) Join in the pointwise lattice \((\mathcal{V}(\mathsf{X}), \leq)\)
expectation (nature) \(\sum_j p_j(x) \, w_j(x)\) Linear combination in \(\mathbf{Vect}\)

The aggregation mode is part of the model specification (branch_control: agent | nature), not a consequence of the categorical structure. The max aggregator is universal in the category of lattices (as a join/supremum); the expectation aggregator is a specific linear map. Neither arises from the product's universal property in \(\mathbf{Vect}\).

Forward-backward duality. Measures are covariant — the pushforward \(f_*\mu\) goes in the same direction as \(f\). So the coproduct of state spaces induces a coproduct (direct sum) of measure spaces:

\[ \mathcal{M}\!\Bigl(\coprod_j \mathsf{X}_{\succ,j}\Bigr) \;\cong\; \bigoplus_j \mathcal{M}(\mathsf{X}_{\succ,j}) \]

The backward mover fans in (product \(\to\) single); the forward mover fans out (single \(\to\) coproduct):

\[ \text{Backward:}\quad \prod_j \mathcal{V}(\mathsf{X}_{\succ,j}) \xrightarrow{\;\mathbb{B}\;} \mathcal{V}(\mathsf{X}) \qquad\qquad \text{Forward:}\quad \mathcal{M}(\mathsf{X}) \xrightarrow{\;\mathbb{F}\;} \bigoplus_j \mathcal{M}(\mathsf{X}_{\succ,j}) \]

The adjunction \(\langle \mathbb{F}\mu,\, \mathrm{v}_\succ \rangle = \langle \mu,\, \mathbb{B}\,\mathrm{v}_\succ \rangle\) holds when \(\mathcal{A} = \mathbb{E}_p\) (the linear case). For \(\mathcal{A} = \max\), the adjunction holds only in a subdifferential/envelope sense.

The decision perch as fan-out apex

In a branching stage, all computational structure radiates from the decision perch. The decision perch is the apex of a cone (fan-out) over the discrete diagram of branch successor spaces.

Cone structure. The branch transitions \((g_j: \mathsf{X} \to \mathsf{X}_{\succ,j})_{j \in N_+}\) form a cone with apex \(\mathsf{X}\) over the diagram \(\{X_{\succ,j}\}_j\). By the product universal property, this family is equivalent to a single map \(\mathbf{G}: \mathsf{X} \to \prod_j \mathsf{X}_{\succ,j}\).

Terminology note. A cone has morphisms from the apex to the base objects; a cocone has morphisms from the base objects to the apex. The branching fan-out is a cone. The product is the universal such cone. We use "precomposition" (not "pullback") for the backward operation \(v_{\succ,j} \circ g_j\), since "pullback" has a distinct technical meaning as a limit over a cospan (Milewski, Ch. 23).

Forward (measure). The population measure \(\mu\) on \(\mathsf{X}\) is routed through the cone legs. Under deterministic branch selection (max), the measure partitions — each agent goes to exactly one branch. Under stochastic selection (expectation), the population splits by weight via Markov kernels. In both cases, data flows out of the decision perch.

Backward (value). Continuation values \(\mathrm{v}_{\succ,j}\) on each \(\mathsf{X}_{\succ,j}\) are precomposed along the cone legs: \(x \mapsto \mathrm{v}_{\succ,j}(g_j(x, \pi))\). This precomposition is contravariant functoriality — it is defined by the outward transitions. The aggregator \(\mathcal{A}\) (\(\max\) or \(\mathbb{E}\)) then collapses the precomposed values to a single decision value \(\mathrm{v}(x)\). Backward computation is thus also determined by the outward cone structure, plus the aggregation mode.

Aggregation is additional structure. The aggregator is not determined by the cone alone — it enriches the cone with the datum of how to collapse the branch values. This enrichment is part of the model specification (branch_control: agent | nature).

The arrival perch contributes a single incoming morphism \(\mathrm{g}_{\prec\sim}: \mathsf{X}_\prec \to \mathsf{X}\) (the arrival-to-decision transition). It does not participate in the branching fan-out, which is exclusively a decision-perch phenomenon. The arrival mover \(\mathbb{I}\) is precomposition along \(\mathrm{g}_{\prec\sim}\) (possibly with integration over pre-decision shocks).

Synthesis. A branching stage is characterised by: 1. The arrival morphism \(\mathrm{g}_{\prec\sim}\) (information delivery to the decision). 2. The branch cone \((g_j: \mathsf{X} \to \mathsf{X}_{\succ,j})_j\) (fan-out from the decision). 3. The aggregation mode \(\mathcal{A} \in \{\max, \mathbb{E}_p\}\) (enrichment).

The decomposition \(\mathbf{U} \circ \mathbf{G}_\sigma \circ \mathbf{F}\) from the abstract programs framework centres on \(\mathbf{G}_\sigma\) precisely because the decision perch is the cone apex — the generator of all outward computational structure in the stage.

1.3 Backward mover at a branching stage

The backward mover \(\mathbb{B}\) at a branching stage receives multiple continuation value functions (one per branch) and produces a single decision-perch value function:

\[ \mathbb{B}: \prod_{j \in N_+} \mathcal{V}(\mathsf{X}_{\succ,j}) \longrightarrow \mathcal{V}(\mathsf{X}) \]

The aggregation mode determines how the branch values are combined:

  • max: \(V(x_v) = \max_{j \in N_+} V_j\bigl(g_{ve,j}(x_v)\bigr)\)
  • expectation: \(V(x_v) = \sum_{j \in N_+} p(j \mid x_v) \, V_j\bigl(g_{ve,j}(x_v)\bigr)\)

For expectation, the probabilities \(p(j \mid x_v)\) can be either exogenous (declared parameters) or endogenous (computed from the branch values, e.g., via softmax). In the endogenous case, the probabilities are generated first, then used in a standard weighted sum (see §1.6).

1.4 Forward operator and population dynamics

The forward operator at a branching stage splits the population measure across branches:

Branch-specific measure evolution:

\[ \mu_{a_{+j}}(A) = \mathbb{E}_{\mu_v}\!\bigl[ p(j \mid x_v, \pi^*(x_v)) \, \mathbf{1}\{g_{va_+}(x_v, \pi^*(x_v)) \in A\} \bigr] \]

Conservation:

\[ \sum_{j \in N_+} \mu_{a_{+j}}(\mathcal{X}_a) = \mu_v(\mathcal{X}_v) \]

For discrete choice (max), \(p(j \mid x_v, \pi^*) = \mathbf{1}\{j = j^*(x_v)\}\) where \(j^*\) is the optimal branch, so the population partitions cleanly into subpopulations.

1.4.1 Fan-in (merging) and path indicators

Branching is fan-out, but period/nest composition graphs may also include fan-in: multiple upstream paths that converge to the same successor stage. When the successor stage does not need to distinguish which path the agent came from, fan-in requires no new stage-local syntax: the incoming populations simply merge.

Concretely, suppose a successor stage \(S\) has an arrival space \(\mathsf{X}_{\prec,S}\) and it receives \(K\) incoming arrival measures \(\mu_{\prec,S}^{(k)}\) on that same space (after any necessary renames/connectors/twisters). The merged arrival measure is:

\[ \mu_{\prec,S} \;=\; \sum_{k=1}^{K} \mu_{\prec,S}^{(k)}. \]

Mass conservation is then automatic:

\[ \mu_{\prec,S}(\mathsf{X}_{\prec,S}) \;=\; \sum_{k=1}^{K} \mu_{\prec,S}^{(k)}(\mathsf{X}_{\prec,S}). \]

If the modeller does need to distinguish incoming paths downstream (e.g., identical numerical state tuples \((a,y)\) should behave differently depending on provenance), then the state is not sufficient. The remedy is to carry an indicator (a discrete regime/path label) as part of the state (product case), or equivalently to keep paths as distinct summands of a coproduct and delay the merge. In short: merge unless you need to distinguish; if you need to distinguish, encode it as state.

This “merge vs carry a label” choice is a property of the composition / simulation layer (how measures are routed and summed), not of an individual stage’s transition equations.

1.4.2 Fan-out as kernels and pushforwards (measure splitting)

It is useful to view each branch transition as a (possibly state-dependent) Markov kernel from the decision-perch state space to the branch-specific successor arrival space:

\[ K_{+j} : \mathsf X_v \longrightarrow \mathrm{Dist}(\mathsf X_{a,+j}), \qquad j \in N_+ . \]

Given a decision-perch population measure \(\mu_v \in \mathrm{Dist}(\mathsf X_v)\), the induced branch-specific successor-arrival measure is the usual pushforward-by-kernel:

\[ \mu_{a,+j}(A) \;=\; \int_{\mathsf X_v} K_{+j}(x_v, A)\,\mu_v(dx_v). \]

In the common deterministic-policy case (fixed optimal policy \(\pi^*(x_v)\)) with branch probabilities \(p(j\mid x_v)\) and deterministic branch maps \(g_{v a,+j} : \mathsf X_v \to \mathsf X_{a,+j}\), a canonical kernel is:

\[ K_{+j}(x_v, A) \;:=\; p(j\mid x_v)\,\mathbf{1}\{g_{v a,+j}(x_v)\in A\}. \]

Then the splitting formula in §1.4 is just the kernel pushforward written out.

Mass accounting. In this deterministic-map form:

\[ \mu_{a,+j}(\mathsf X_{a,+j}) \;=\; \int_{\mathsf X_v} p(j\mid x_v)\,\mu_v(dx_v), \qquad \sum_{j\in N_+}\mu_{a,+j}(\mathsf X_{a,+j}) \;=\; \mu_v(\mathsf X_v), \]

so “mass conservation” is simply \(\sum_j p(j\mid x_v)=1\) integrated against \(\mu_v\).

1.4.3 Coproduct continuation spaces: measures decompose by branch

When branches have different successor arrival spaces \(\mathsf X_{a,+j}\), it is natural to treat the “next-arrival space” as a coproduct:

\[ \mathsf X_{a,+} \;=\; \coprod_{j\in N_+}\mathsf X_{a,+j}. \]

A measure \(\mu_{a,+}\in \mathrm{Dist}(\mathsf X_{a,+})\) is equivalently a family \((\mu_{a,+j})_{j\in N_+}\) of measures, one per summand. This is the measure-theoretic analog of the value-function separability discussed in §1.2:

  • Branchwise representation: store \(\mu_{a,+j}\) on each \(\mathsf X_{a,+j}\), or
  • Single object on coproduct: store \(\mu_{a,+}\) on \(\coprod_j \mathsf X_{a,+j}\).

Either way, fan-out corresponds to building the family \((\mu_{a,+j})_j\) from \(\mu_v\) via the kernels \(K_{+j}\).

1.4.4 Joining (fan-in) into a common successor interface

Fan-in becomes relevant when multiple branches ultimately feed a single successor stage with a single arrival interface \(\mathsf X_{a,+}\) (a common state space), rather than distinct successor stages with distinct state spaces.

To model this, supply for each branch a measurable map (often an embedding / “fill missing fields” map):

\[ i_j : \mathsf X_{a,+j} \longrightarrow \mathsf X_{a,+}, \qquad j\in N_+ . \]

Example (mortgage-style “renters have \(H=0\)”): a renter branch arrival might be \((w)\in\mathbb R_+\) while the unified successor expects \((w,H)\in\mathbb R_+\times\mathbb R_+\), and you set \(i_{\text{rent}}(w)=(w,0)\) while \(i_{\text{own}}(w,H)=(w,H)\).

Given branch measures \(\mu_{a,+j}\) on \(\mathsf X_{a,+j}\), the merged successor-arrival measure on the unified interface is the sum of pushforwards:

\[ \mu_{a,+} \;=\; \sum_{j\in N_+} (i_j)_{\#}\,\mu_{a,+j}. \]

This is the formal version of “incoming paths merge; mass sums”. It makes clear that:

  • merging is a composition/simulation operation (it uses the wiring maps \(i_j\)), and
  • it is safe only to the extent that \(\mathsf X_{a,+}\) remains a sufficient (Markov) state for downstream stages (see §1.4.6).

1.4.5 “Same successor value function” viewpoint (backward wiring under a join)

If branches join into a single successor stage, then backward induction supplies one successor arrival value function:

\[ \mathcal A_+ : \mathsf X_{a,+} \to \mathbb R. \]

Each branch contributes a branch-specific continuation term by pullback along the branch map into the unified interface:

\[ \mathcal A_+\bigl(i_j(g_{v a,+j}(x_v,\pi))\bigr), \qquad j\in N_+ . \]

So in the joined-successor case, the “branch-specific continuation values” are not independent functions; they are pullbacks of the same successor value function along different maps.

This is exactly the intuition you want in the mortgage/tenure-style pattern: going backward, the branching stage “fans out” to evaluate the same downstream value operator on different images, and then aggregates (max or expectation).

1.4.6 When is merging sound? (carry an indicator vs encode it in-state)

Merging is sound when the unified successor state \(x_{a,+}\in\mathsf X_{a,+}\) is a sufficient statistic for the future problem:

  • If downstream stages only depend on \(x_{a,+}\), then summing pushforwards \(\sum_j (i_j)_{\#}\mu_{a,+j}\) is lossless for simulation and \(\mathcal A_+(x_{a,+})\) is well-defined for backward recursion.
  • If downstream behavior depends on branch provenance beyond what is encoded in \(x_{a,+}\), then the unified state is not Markov. Remedy: carry an explicit discrete indicator (product case) or keep branches separate as a coproduct until the point where sufficiency holds.

The “renters carry \(H=0\)” trick is precisely a way to encode provenance in-state so that a join into a common successor interface remains Markov.

1.5 Graph-theoretic view

In the DDSL composition graph (random variables as nodes, stages as edges):

  • Sequential composition is a path: \(a \xrightarrow{S_1} b \xrightarrow{S_2} c\)
  • Branching composition is a fan-out: \(a \xrightarrow{S_1} \{b_1, b_2, \ldots\}\) where each \(b_j\) is a distinct random variable (branch-specific continuation state)

The fan-out generalizes the period subgraph from a path to a tree (or more generally a DAG if branches later merge).

1.5.1 Index-shape view: perches as \([2]\), wiring as gluing (pushouts)

It is often helpful to separate the index shape of a stage from its analytic interpretation.

  • A standard stage has perch index category \(P \cong [2]\): three objects \(<, 0, >\) and arrows \(< \to 0 \to >\).
  • For each perch \(p \in P\), the stage declares a bundle of state fields available at that information interface. Write the perch state as a (record-valued) random variable \(x_p\) taking values in a state space \(\mathsf X_p\) (a product/record built from the declared field domains).
  • A value field at perch \(p\) is then a function of that perch state: schematically an element of \(C^{\mathsf X_p}\) (or \(C(\mathsf X_p)\)).

Sequential composition connects two stage instances by gluing an output interface to the next input interface (continuation \(\to\) arrival), optionally after a rename (connector/twister). At the level of index categories, this is literally “identify the boundary object” — i.e. a pushout/gluing construction (union + identification) on the perch-shaped diagrams.

Equivalently (and closer to the “open system” intuition), a stage can be viewed as a module with an explicit left boundary (arrival interface, carrying \(x_{<}\)) and right boundary (continuation interface, carrying \(x_{>}\)). A connector/twister is the witness that the right boundary of one module matches the left boundary of the next (often a rename/isomorphism of field names). Composition is then “glue along the shared boundary” — a pushout-style construction in a category of typed interfaces/specifications.

Branching replaces the single continuation object \(>\) with a family of named continuation perches \(>_j\) (one per branch), corresponding to the coproduct claim in §1.2. Each branch output \(>_j\) is then glued to the appropriate successor arrival interface (by namespace match or explicit rename) exactly as in the sequential case.

1.6 Endogenous branch probabilities: the softmax/logit case

taste-shock #endogenous-probability #softmax

1.6.1 Motivation

In many economic models, the choice probability across branches is endogenous: it depends on the value function itself, not on an exogenous probability. The canonical example is the discrete choice with taste shocks (McFadden, 1973): the agent receives i.i.d. Type-I extreme value (Gumbel) taste shocks \(\epsilon_j\) for each branch, and deterministically maximizes the shocked value:

\[ j^* = \arg\max_{j \in N_+} \bigl\{ V_j\bigl(g_{ve,j}(x_v)\bigr) + \sigma \epsilon_j \bigr\} \]

Taking expectations over the taste shocks yields a closed-form expression for the choice probability — the softmax:

\[ p(j \mid x_v) = \frac{\exp\!\bigl(V_j(g_{ve,j}(x_v)) / \sigma\bigr)}{\sum_{k \in N_+} \exp\!\bigl(V_k(g_{ve,k}(x_v)) / \sigma\bigr)} \]

where \(\sigma > 0\) is the scale parameter (inverse temperature) controlling choice sensitivity.

The approach in this spec is to generate these probabilities first and then use standard expectation aggregation with the generated weights:

  1. Compute \(p(j \mid x_v) = \mathrm{softmax}(V_1/\sigma, \ldots, V_K/\sigma)\)
  2. Aggregate: \(V(x_v) = \sum_{j \in N_+} p(j \mid x_v) \, V_j\bigl(g_{ve,j}(x_v)\bigr)\)

This keeps the aggregation mode as expectation — no third mode is needed.

1.6.2 Relationship to the log-sum-exp value

The expected value of the taste-shock maximum also has a closed form — the log-sum-exp (LSE):

\[ \mathbb{E}_\epsilon\!\left[\max_{j} \bigl\{ V_j + \sigma \epsilon_j \bigr\}\right] = \sigma \log \sum_{j \in N_+} \exp\!\left(\frac{V_j}{\sigma}\right) \]

The LSE and the softmax expectation differ by the entropy bonus:

\[ \mathrm{LSE} = \sum_j p(j) V_j + \sigma H(p) \]

where \(H(p) = -\sum_j p_j \log p_j\) is the Shannon entropy. The entropy-regularization interpretation:

\[ \mathrm{LSE} = \max_{p \in \Delta_{|N_+|}} \left\{ \sum_j p_j V_j + \sigma H(p) \right\} \]

connects the aggregation modes: max = optimize over degenerate measures; softmax = optimize over measures with entropy penalty; exogenous expectation = evaluate a fixed measure.

LSE vs softmax-weighted expectation #todo/math

For value function iteration (VFI), the mathematically correct aggregated value is the LSE (includes the entropy bonus). For Euler-equation based methods (EGM), the marginal values use the probability-weighted form \(\sum_j p(j) \partial V_j / \partial x\) regardless — the entropy term drops out in the envelope condition. The Eggsandbaskets code uses the softmax-weighted expectation (not LSE) for the value function, which is standard in the applied literature. The distinction matters for VFI convergence guarantees but not for EGM or forward simulation. This is an Υ/Ρ boundary question: the mathematical model (Υ) gives LSE; the computational implementation (Ρ) may use the softmax expectation.

1.6.3 Properties of the softmax probabilities

Limiting cases:

  • As \(\sigma \to 0\): \(p(j \mid x_v) \to \mathbf{1}\{j = j^*\}\) and \(\sum_j p(j) V_j \to \max_j V_j\) — recovers max aggregation
  • As \(\sigma \to \infty\): \(p(j \mid x_v) \to 1/|N_+|\) — uniform random choice

These probabilities are a byproduct of the backward solve — they are computed from the solved branch value functions, not declared as exogenous inputs.

1.6.4 Envelope condition and marginal values

The envelope condition gives:

\[ \frac{\partial V}{\partial x_v} = \sum_{j \in N_+} p(j \mid x_v) \, \frac{\partial V_j}{\partial x_v}\bigl(g_{ve,j}(x_v)\bigr) \cdot \nabla g_{ve,j}(x_v) \]

The marginal value is a probability-weighted average of branch marginal values — the same functional form as exogenous expectation, but with the endogenous softmax probabilities. This is why the "generate probabilities first, then use expectation" approach works for EGM solvers: the marginal value function has exactly the expectation form once the probabilities are fixed.

1.6.5 Forward operator with endogenous probabilities

The forward operator uses the endogenous probabilities computed during the backward solve:

\[ \mu_{a_{+j}}(A) = \mathbb{E}_{\mu_v}\!\bigl[ p(j \mid x_v) \, \mathbf{1}\{g_{va_{+j}}(x_v, \pi^*(x_v)) \in A\} \bigr] \]

Conservation still holds: \(\sum_j \mu_{a_{+j}}(\mathsf{X}_{a_{+j}}) = \mu_v(\mathsf{X}_v)\).

Unlike max (deterministic partition) or exogenous expectation (fixed weights), the population split with endogenous probabilities depends on the solved value function. The forward operator requires the backward solution to be available.

1.6.6 Example: portfolio choice (Eggsandbaskets)

In the Eggsandbaskets lifecycle model (AI/context/econ-applications/), the portfolio choice over risky asset shares \(\pi \in \{\pi_1, \ldots, \pi_K\}\) uses endogenous softmax probabilities:

  • Each portfolio share \(\pi_k\) leads to a different return distribution and hence a different continuation value \(V_k(x)\)
  • Step 1 (generate probabilities): \(p(k \mid x) = \exp(V_k(x) / \sigma_\pi) / \sum_l \exp(V_l(x) / \sigma_\pi)\)
  • Step 2 (expectation): \(V(x) = \sum_k p(k \mid x) \, V_k(x)\)
  • The scale parameter \(\sigma_\pi\) (e.g., sigma_DC_pi = 0.1351) controls how sharply agents optimize their portfolio
  • The endogenous probabilities weight the marginal utility for the Euler equation: \(\Lambda_A = \sum_k p(k|x) \Lambda_{A,k}\)

This is precisely a branching stage where the continuation perches are indexed by portfolio shares and the backward mover uses expectation aggregation with endogenous softmax weights generated from the branch values.

2. Branching is a stage-level concept

2.1 The central insight

Branching does not require new connector or twister syntax. It is entirely a property of the stage:

  1. The stage declares multiple named continuation perches (own, rent, ...) in poststates.
  2. The stage's backward mover declares multiple sources and an aggregation mode (max or expectation).
  3. Each continuation perch has its own transition function.

Composition with successor stages then follows the existing namespace rule from spec 0.1h:

  • If a continuation perch's output fields match a successor stage's arrival fields by name → implicit composition (no connector needed).
  • If names differ → a standard rename connector provides the mapping.

Connectors remain exactly what they are in spec 0.1h: {from, to, rename}. They just rename. They do not know or care whether the source stage is branching.

2.2 How the namespace determines wiring

Consider a branching stage tenure_choice with:

  • own outputting fields {a, H, y}
  • rent outputting fields {w, y}

And successor stages:

  • owner_housing with arrival fields {a, H, y}
  • renter_housing with arrival fields {w, y}

The period namespace contains all these fields. Composition is implicit:

  • own.{a, H, y} matches owner_housing.arvl.{a, H, y} → no connector needed
  • rent.{w, y} matches renter_housing.arvl.{w, y} → no connector needed

If names differ, a standard rename connector resolves the mismatch — exactly as in sequential composition.

Graph rule: arrival perches on parallel branches

Namespace ambiguity with shared arrival field names #ambiguity #branching #graph-rule

Two successor stages on parallel branches (e.g., owner_housing and renter_housing) can never connect to each other — they are on separate paths in the DAG. In the graph-theoretic view, their arrival perches represent distinct random variables even if the variables share field names.

However, in the flat period namespace, shared field names between parallel arrival perches create an ambiguity for namespace-based composition:

Unambiguous case. If the continuation perch field sets are disjoint or distinguishable — e.g., own outputs {a, H, y} and rent outputs {w, y} — then the namespace uniquely determines which continuation perch maps to which successor stage (only owner_housing expects {a, H, y}; only renter_housing expects {w, y}).

Ambiguous case. If both branches output the same field set — e.g., own outputs {a, y} and rent also outputs {a, y} — and both successors arrive with {a, y}, then the namespace alone cannot determine the wiring.

Two possible resolutions:

  1. Require distinct arrival field names across parallel branches. Force the modeller to use unique names (e.g., a_own vs. a_rent) for arrival fields on parallel branches. This keeps namespace composition unambiguous but may feel artificial when the economic meaning is the same.

  2. Allow shared names; require explicit connectors when ambiguous. When the namespace is ambiguous, require connectors that specify which continuation perch maps to which successor. The connector's from/to pair disambiguates.

Current status: This is an open design question. The examples in the syntax/implementation spec (housing tenure choice) happen to be unambiguous because the own and rent paths have different state spaces. For models where parallel branches share the same state space, disambiguation is needed. #todo/syntax

2.3 Intra-period branching

Within a period, a branching stage's multiple continuation perches connect to multiple successor stages via the namespace:

tenure_choice ──┬── own  ──(namespace)──→ owner_housing ──→ owner_cons
                └── rent ──(namespace)──→ renter_housing ──→ renter_cons

The period's stage list is no longer a strict linear chain — it is a DAG. The solve order is a topological sort.

2.4 Inter-period branching

For inter-period branching (e.g., survival/death), the branching stage is typically the last stage in the period. Its multiple continuation perches map to different successor periods via standard twisters with rename:

  • survive → twister with rename → next working period
  • die → twister with rename → terminal period

The aggregation (expectation weighted by survival probability) lives in the stage's backward mover, not in the twister. The twister is just a rename.

The only structural change at the nest level: DAG nests require twisters with explicit targets (since positional alignment presumes a linear chain). This is a DAG-topology concern, not a branching-specific one.

2.5 Branching probabilities and weights (exogenous vs endogenous)

For max and expectation aggregation, branching probabilities are standard exogenous variables or parameters declared via the existing exogenous: or parameters: mechanism in the stage YAML:

  • If branching probability is constant: declare it in parameters: (e.g., surv_prob)
  • If branching depends on a shock: declare it in exogenous: as a standard shock variable
  • The backward mover references these declared variables in its aggregation expression

For logit/taste-shock models, the branch probabilities are endogenous: they are computed from the branch value functions during the backward solve (see §1.6). The only additional model primitive is the scale parameter \(\sigma\) in parameters:. In surface syntax/implementation, it can still be useful to name the resulting probability vector (e.g. p) as a derived policy field for typing and solution storage.