RELATIONSHIP TO STANDARD MDP¶
Before introducing the recursive structure with periods, stages, and perches, it may help to restate the problem in standard Markov Decision Process (MDP) terms. This will provide a more conventional conceptual foundation, onto which the detailed hierarchical structure can later be mapped.
IMPORTANT: This section presents the classical MDP framework using standard notation, before introducing any of our extensions. The notation here deliberately excludes stages, multiple state types, and other refinements that will be introduced later. We use V for the value function here (rather than ๐ฑ which we will use later) to emphasize that this is a different mathematical object from the one in our extended model.
Note on notation: We use italicized '๐' for choice variables in maximization problems and non-italicized 'ฯ' for policy functions.
Basic MDP Setup¶
- State space: ๐ง
- Feasible choice correspondence: ฮ (x) โ ฮ for each x โ ๐ง
- Transition function: P(xโ | x, ๐) for ๐ โ ฮ (x) (Note: This will later be replaced by more specific transition functions like gโแตฅ, gแตฅโ, etc.)
- Reward function: r(x, ๐) for ๐ โ ฮ (x)
- Discount factor: ฮฒ (assumed constant for now)
The goal is to find a policy ฯ: ๐ง โ ฮ with ฯ(x) โ ฮ (x) that maximizes the expected discounted sum of rewards:
V^(ฯ)(x) = E[ฮฃโ ฮฒแต r(xโ, ฯ(xโ)) | xโ = x]
The optimal value function V satisfies the Bellman equation:
V(x) = max_{๐ โ ฮ (x)} [r(x, ๐) + ฮฒ E[V(xโ) | x, ๐]]
By starting from this high-level MDP viewpoint, we see that the subsequent definitions of ๐งโ (arrival states), ๐งแตฅ (decision states), ๐งโ (continuation states), shocks, and connector functions (gโแตฅ, gแตฅโ, โฆ) are just careful refinements of the basic MDP structure. Similarly, branching and sequential stages are special cases where the transitions and feasible choice sets differ, but they still fit into the MDP framework.
Note that the policy function ฯ(x) will be extended in the full structure to ฯ(t, i, ฮบ, xแตฅ) to account for time period t, stage position i, and stage kind ฮบ, but the fundamental concept remains the same.