Skip to content

RELATIONSHIP TO STANDARD MDP

Before introducing the recursive structure with periods, stages, and perches, it may help to restate the problem in standard Markov Decision Process (MDP) terms. This will provide a more conventional conceptual foundation, onto which the detailed hierarchical structure can later be mapped.

IMPORTANT: This section presents the classical MDP framework using standard notation, before introducing any of our extensions. The notation here deliberately excludes stages, multiple state types, and other refinements that will be introduced later. We use V for the value function here (rather than ๐’ฑ which we will use later) to emphasize that this is a different mathematical object from the one in our extended model.

Note on notation: We use italicized '๐œ‹' for choice variables in maximization problems and non-italicized 'ฯ€' for policy functions.

Basic MDP Setup

  • State space: ๐“ง
  • Feasible choice correspondence: ฮ (x) โІ ฮ  for each x โˆˆ ๐“ง
  • Transition function: P(xโ‚Š | x, ๐œ‹) for ๐œ‹ โˆˆ ฮ (x) (Note: This will later be replaced by more specific transition functions like gโ‚แตฅ, gแตฅโ‚‘, etc.)
  • Reward function: r(x, ๐œ‹) for ๐œ‹ โˆˆ ฮ (x)
  • Discount factor: ฮฒ (assumed constant for now)

The goal is to find a policy ฯ€: ๐“ง โ†’ ฮ  with ฯ€(x) โˆˆ ฮ (x) that maximizes the expected discounted sum of rewards:

V^(ฯ€)(x) = E[ฮฃโ‚œ ฮฒแต— r(xโ‚œ, ฯ€(xโ‚œ)) | xโ‚€ = x]

The optimal value function V satisfies the Bellman equation:

V(x) = max_{๐œ‹ โˆˆ ฮ (x)} [r(x, ๐œ‹) + ฮฒ E[V(xโ‚Š) | x, ๐œ‹]]

By starting from this high-level MDP viewpoint, we see that the subsequent definitions of ๐“งโ‚ (arrival states), ๐“งแตฅ (decision states), ๐“งโ‚‘ (continuation states), shocks, and connector functions (gโ‚แตฅ, gแตฅโ‚‘, โ€ฆ) are just careful refinements of the basic MDP structure. Similarly, branching and sequential stages are special cases where the transitions and feasible choice sets differ, but they still fit into the MDP framework.

Note that the policy function ฯ€(x) will be extended in the full structure to ฯ€(t, i, ฮบ, xแตฅ) to account for time period t, stage position i, and stage kind ฮบ, but the fundamental concept remains the same.