Please enable JavaScript.

Coggle requires JavaScript to display documents.

RL (Tabular State Space Setting (Bandits Introductory Paper (Stochastic…

- - - - Policy Evaluation
        
        Policy Iteration
        
        Generalised Policy Iteration
      - Value Iteration
        
        On-Policy
        Learn from the current policy's behaviour
        
        SARSA
        
        n-step SARSA
        
        Off-Policy
        Based on Importance Sampling
        learn from the ofher policy's behaviour
        
        Q-Learning
        
        n-step Q
      - Policy Improvement
      - Asynchronous DP
  - - - EWF: Exponentially Weighted Forecaster
      - EXP3: Exploration-Exploitation using Exponential Weights
    - - adaptive exploration
      - forward look: with prior information
      - uniform exploration
    - - UCB: Upper-confidence-Bound
    - - Bayesian bandits
      - Thompson Sampling
    - - Continuun-armed Bandits
      - Adaptive discretization: the Zooming Algorith
      - Lipschitz MAB
  - - - Knows what it knows: supervised learning model
      - Prioritized Sweeping: Which states or state-action pairs should be generated during planning? and work backwards from the state which just changed the state-value
      - Rollout Algorithms
        
        Monte Carlo Tree Search
      - Heuristic Search
      - E3(Explicit-Exploit-or-Explore ): maintains counts for states and actions to quantify the confidence in predicted transition
      - Trajectory Sampling: perform backups to update the value functions along simulated trajectories, on-policy
      - RMAX
    - - Dyna-Q
        
        Dyna-Q+: exploration bonus to visit long unvisited states
        
        Deep Dyna-Q: used to simulate the user in the dialogue system
      - Dyna-AC
      - PILCO
    - - Control Theory
        
        MPC: model predictive control
        
        LQR: linear quadratic regulator
        
        iLQR: iterative LQR
      - Classical Planning
        
        state-space planning: full backup required
        
        plan-space planning: specify the region to explore