Reinforcement Learning
In this assignment, I implemented a reinforcement learning agent using Q-learning with function approximation to solve the Mountain Car and Grid World environments. The project reinforced my understanding of temporal-difference learning, function approximation, and experience replay in dynamic environments.
Model Definition:
- Q-Learning with Linear Function Approximation: Updated weights using gradient descent to minimize temporal-difference error.
- Epsilon-Greedy Policy: Balanced exploration and exploitation by selecting a random action with probability ϵ and a greedy action otherwise.
Tasks Accomplished:
- Mountain Car: Applied Q-learning on both raw and tile-coded state representations to evaluate learning efficiency.
- Replay Buffer: Implemented experience replay to decorrelate updates and improve convergence stability.
- Grid World: Solved a navigation task using Q-learning with tile-based state encoding.
Implementation Details:
- Feature Engineering:
- Raw: Used direct position and velocity as features.
- Tile: Used binary-coded grid features across multiple tilings.
- State Representations: Raw mode used 2D continuous features; tile mode used high-dimensional sparse binary vectors.
- Batch Updates: When replay was enabled, transitions were sampled in batches for gradient updates.
Empirical Evaluations:
- Compared reward curves and convergence across raw vs. tile features.
- Visualized value functions and analyzed learned policies in Mountain Car and Grid World.
- Assessed replay buffer’s impact on stability and performance.
Programming Techniques:
- Vectorization: Used NumPy for fast gradient updates.
- Hyperparameter Tuning: Tuned α (learning rate), γ (discount), ϵ (exploration), and replay settings.
- Debugging: Employed logging and unit tests to verify transitions, updates, and convergence behaviors.
Outputs and Results:
- Plotted reward curves, rolling averages, and final Q-value heatmaps.
- Showed improved generalization and stability with tile features and replay buffer.
- Visualized learned policies in both environments.
This project enhanced my understanding of reinforcement learning, particularly Q-learning with function approximation, experience replay, and policy analysis. It also improved my ability to design and debug scalable RL systems with complex feature representations.
- Reinforcement Learning
- Q-Learning
- Function Approximation
- Mountain Car
- Grid World
- Experience Replay
- Exploration Strategies
- NumPy
- Feature Engineering