Phone:

+(886) 909 756 966

Email:
moneychien20639@gmail.com

© 2024 Yu-Hang

Course:

Introduction to Machine Learning (10-601)

Time Spent:

10 hours

Source Code:
to github

Reinforcement Learning

In this assignment, I implemented a reinforcement learning agent using Q-learning with function approximation to solve the Mountain Car and Grid World environments. The project reinforced my understanding of temporal-difference learning, function approximation, and experience replay in dynamic environments.


Model Definition:

  • Q-Learning with Linear Function Approximation: Updated weights using gradient descent to minimize temporal-difference error.
  • Epsilon-Greedy Policy: Balanced exploration and exploitation by selecting a random action with probability ϵ and a greedy action otherwise.

Tasks Accomplished:

  • Mountain Car: Applied Q-learning on both raw and tile-coded state representations to evaluate learning efficiency.
  • Replay Buffer: Implemented experience replay to decorrelate updates and improve convergence stability.
  • Grid World: Solved a navigation task using Q-learning with tile-based state encoding.

Implementation Details:

  • Feature Engineering:
    • Raw: Used direct position and velocity as features.
    • Tile: Used binary-coded grid features across multiple tilings.
  • State Representations: Raw mode used 2D continuous features; tile mode used high-dimensional sparse binary vectors.
  • Batch Updates: When replay was enabled, transitions were sampled in batches for gradient updates.

Empirical Evaluations:

  • Compared reward curves and convergence across raw vs. tile features.
  • Visualized value functions and analyzed learned policies in Mountain Car and Grid World.
  • Assessed replay buffer’s impact on stability and performance.

Programming Techniques:

  • Vectorization: Used NumPy for fast gradient updates.
  • Hyperparameter Tuning: Tuned α (learning rate), γ (discount), ϵ (exploration), and replay settings.
  • Debugging: Employed logging and unit tests to verify transitions, updates, and convergence behaviors.

Outputs and Results:

  • Plotted reward curves, rolling averages, and final Q-value heatmaps.
  • Showed improved generalization and stability with tile features and replay buffer.
  • Visualized learned policies in both environments.

This project enhanced my understanding of reinforcement learning, particularly Q-learning with function approximation, experience replay, and policy analysis. It also improved my ability to design and debug scalable RL systems with complex feature representations.


  • Reinforcement Learning
  • Q-Learning
  • Function Approximation
  • Mountain Car
  • Grid World
  • Experience Replay
  • Exploration Strategies
  • NumPy
  • Feature Engineering