Nash Q-Learning for General-Sum Stochastic Games
... discount factor. v(s, π) represents the value for state s under strategy π. A strategy is a plan for
playing a game. Here π = (π0 , . . . , πt , . . .) is defined over the entire course of the game, where πt
is called the decision rule at time t. A decision rule is a function πt : Ht → ∆(A), where H ...