RELATIONSHIP TO STANDARD MDP¶

Before introducing the recursive structure with periods, stages, and perches, it may help to restate the problem in standard Markov Decision Process (MDP) terms. This will provide a more conventional conceptual foundation, onto which the detailed hierarchical structure can later be mapped.

IMPORTANT: This section presents the classical MDP framework using standard notation, before introducing any of our extensions. The notation here deliberately excludes stages, multiple state types, and other refinements that will be introduced later. We use V for the value function here (rather than 𝒱 which we will use later) to emphasize that this is a different mathematical object from the one in our extended model.

Note on notation: We use italicized '𝜋' for choice variables in maximization problems and non-italicized 'π' for policy functions.

Basic MDP Setup¶

State space: 𝓧
Feasible choice correspondence: Π(x) ⊆ Π for each x ∈ 𝓧
Transition function: P(x₊ | x, 𝜋) for 𝜋 ∈ Π(x) (Note: This will later be replaced by more specific transition functions like gₐᵥ, gᵥₑ, etc.)
Reward function: r(x, 𝜋) for 𝜋 ∈ Π(x)
Discount factor: β (assumed constant for now)

The goal is to find a policy π: 𝓧 → Π with π(x) ∈ Π(x) that maximizes the expected discounted sum of rewards:

V^(π)(x) = E[Σₜ βᵗ r(xₜ, π(xₜ)) | x₀ = x]

The optimal value function V satisfies the Bellman equation:

V(x) = max_{𝜋 ∈ Π(x)} [r(x, 𝜋) + β E[V(x₊) | x, 𝜋]]

By starting from this high-level MDP viewpoint, we see that the subsequent definitions of 𝓧ₐ (arrival states), 𝓧ᵥ (decision states), 𝓧ₑ (continuation states), shocks, and connector functions (gₐᵥ, gᵥₑ, …) are just careful refinements of the basic MDP structure. Similarly, branching and sequential stages are special cases where the transitions and feasible choice sets differ, but they still fit into the MDP framework.

Note that the policy function π(x) will be extended in the full structure to π(t, i, κ, xᵥ) to account for time period t, stage position i, and stage kind κ, but the fundamental concept remains the same.