Reinforcement learning methods are methods for solving optimal control problems for which only a partial amount of initial data are available to the system that learns. To solve such optimal control problems, Dynamic Programming (DP) methods like for example Value Iteration, are used to determine the optimal value function and therefore the optimal control policy.
In the case of a continuous state space, appropriate discretization methods are needed. Especially in regard of making progress for higher dimensional problems, we examine the use of sparse grids in this context.
So in this talk, we will present an approach using adaptive sparse grids to approximate the value function via Dynamic Programming. By means of some low dimensional examples from deterministic and stochastic optimal control, we will point out some difficulties concerning the convergence of the approximation scheme.