Reinforcement Learning

In this assignment, I implemented a reinforcement learning agent using Q-learning with function approximation to solve the Mountain Car and Grid World environments. The project reinforced my understanding of temporal-difference learning, function approximation, and experience replay in dynamic environments.

Model Definition:

Q-Learning with Linear Function Approximation: Updated weights using gradient descent to minimize temporal-difference error.
Epsilon-Greedy Policy: Balanced exploration and exploitation by selecting a random action with probability ϵ and a greedy action otherwise.

Tasks Accomplished:

Mountain Car: Applied Q-learning on both raw and tile-coded state representations to evaluate learning efficiency.
Replay Buffer: Implemented experience replay to decorrelate updates and improve convergence stability.
Grid World: Solved a navigation task using Q-learning with tile-based state encoding.

Implementation Details:

Feature Engineering:
- Raw: Used direct position and velocity as features.
- Tile: Used binary-coded grid features across multiple tilings.
State Representations: Raw mode used 2D continuous features; tile mode used high-dimensional sparse binary vectors.
Batch Updates: When replay was enabled, transitions were sampled in batches for gradient updates.

Empirical Evaluations:

Compared reward curves and convergence across raw vs. tile features.
Visualized value functions and analyzed learned policies in Mountain Car and Grid World.
Assessed replay buffer’s impact on stability and performance.

Programming Techniques:

Vectorization: Used NumPy for fast gradient updates.
Hyperparameter Tuning: Tuned α (learning rate), γ (discount), ϵ (exploration), and replay settings.
Debugging: Employed logging and unit tests to verify transitions, updates, and convergence behaviors.

Outputs and Results:

Plotted reward curves, rolling averages, and final Q-value heatmaps.
Showed improved generalization and stability with tile features and replay buffer.
Visualized learned policies in both environments.

This project enhanced my understanding of reinforcement learning, particularly Q-learning with function approximation, experience replay, and policy analysis. It also improved my ability to design and debug scalable RL systems with complex feature representations.

Reinforcement Learning
Q-Learning
Function Approximation
Mountain Car
Grid World
Experience Replay
Exploration Strategies
NumPy
Feature Engineering

Phone:

+(886) 909 756 966

Email:

moneychien20639@gmail.com

Arthur Chien

Course:

Time Spent: