Please enable JavaScript.
Coggle requires JavaScript to display documents.
RL (Tabular State Space Setting (Bandits Introductory Paper (Stochastic…
RL
Tabular State Space Setting
Finite Known MDP
Transition dynamics is known
Dynamic Programming
Policy Evaluation
Policy Iteration
Generalised Policy Iteration
Value Iteration
On-Policy
Learn from the current policy's behaviour
SARSA
n-step SARSA
Off-Policy
Based on Importance Sampling
learn from the ofher policy's behaviour
Q-Learning
n-step Q
Policy Improvement
Asynchronous DP
Bandits
Introductory Paper
Adversarial Bandits
EWF: Exponentially Weighted Forecaster
EXP3: Exploration-Exploitation using Exponential Weights
Stochastic Bandits
adaptive exploration
forward look: with prior information
uniform exploration
Lower Bounds
UCB: Upper-confidence-Bound
Bayesian Bandits and Thompson Sampling
Bayesian bandits
Thompson Sampling
Lipschitz Bandit
Continuun-armed Bandits
Adaptive discretization: the Zooming Algorith
Lipschitz MAB
Semi-bandits
Contextual Bandits
Planning&Learning
Exploration/Exploitation
how to explore the known dynamics, mainly differences are: full vs sample, deep vs shallow backup
Knows what it knows: supervised learning model
Prioritized Sweeping
: Which states or state-action pairs should be generated during planning? and work backwards from the state which just changed the state-value
Rollout Algorithms
Monte Carlo Tree Search
Heuristic Search
E3(Explicit-Exploit-or-Explore ): maintains counts for states and actions to quantify the confidence in predicted transition
Trajectory Sampling: perform backups to update the value functions along simulated trajectories, on-policy
RMAX
Integration
Dyna-Q
Dyna-Q+: exploration bonus to visit long unvisited states
Deep Dyna-Q
: used to simulate the user in the dialogue system
Dyna-AC
PILCO
Planning
Given the known dynamics, seek an optimal decision-making
Control Theory
MPC: model predictive control
LQR: linear quadratic regulator
iLQR: iterative LQR
Classical Planning
state-space planning
: full backup required
plan-space planning
: specify the region to explore
Finite Unknown MDP
Transition dynamics is unknown
TD: Temporal Difference
Monte Carlo Methods
Continuous State Space Setting
Approximation Methods