memento mori

On February 7th 2019, you finished reading your very first self-help book: The Subtle Art of Not Giving a Fuck by Mark Manson. I will remember the arbitrary button-pressing experiment as support for…

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转

Q Learning From Scratch in Python

Baby learning to walk as an example of learning by doing, created by DALLE

One of the early breakthroughs in reinforcement learning was the development of an algorithm known as Q-learning. It is defined by:

In this tutorial we will unlock the mystery behind this formula and bring it to life with Python!

So, fasten your seatbelts and get ready to:

You will get most out of this exciting tutorial if you have a solid understanding of RL basics, TD learning and model free control. Here is a list of some materials that will get you up to speed with these concepts:

Q-Learning is a reinforcement learning method that is :

Q Learning is a specific type of Temporal Difference (TD) Learning. Both TD and Q Learning are bootstrapping methods. Bootstrapping, also known as learning by doing, refers to the process of updating the value of a state and state-action pair (Q-value) based on the estimate of the value of the next state-action pair rather than the true value. This approach allows the agent to update its estimates of the values of states and state-action pairs without waiting for the true values to be known, which accelerates the learning process.

In Q-learning the agent interacts with the environment and updates estimates of state-action values according to the following bootstrapping equation:

Where:

At this point you might be excited about the fact that a Q Learning agent learns one guess from the next without waiting for an actual outcome, but is there a guarantee to find optimal policy at all? The answer is yes! Under certain conditions Q Learning can be shown to converge to the optimal Q values with high probability. These conditions are:

Q-learning is a versatile algorithm that can be applied to a wide range of problems such as:

A quick overview of the Q-Learning algorithm pseudocode and Python implementation can be found below:

To evaluate the effectiveness of the Q-Learning implementation, I trained and tested Q learning agent on three distinct challenges from the OpenAI gymnasium, let’s check out how well the agent did!

Frozen Lake

Cliff Walking

Taxi Problem

Additionally, the provided Jupyter notebook includes classes for logging and visualising evaluation metrics. As a demonstration, the evaluation metrics of applying Q-learning to the Taxi Problem are depicted in two figures below. These plots clearly indicate that the total episode reward increases over the course of training and the episode length decreases, suggesting that the agent is approaching the optimal policy.

Q-learning is a powerful algorithm but it also has some limitations:

These and other limitations can make it difficult to apply Q-learning to certain problems, but there are also techniques and variations of the algorithm that can address some of them.

In next articles I am planning to perform and cover the analysis of how different values of hyperparameters impact the Q Learning performance. Subsequently, I will publish a comprehensive tutorial on the implementation of the Deep Q-Learning algorithm. DQN is a variant of Q-learning that uses deep neural networks to approximate the Q-function. DQN has been successfully applied to a wide range of Atari games, and it is considered one of the most powerful versions of Q-learning.

Please leave your comments on what else you would like to see in the coming series and share your feedback! Peace 🙌

memento mori

Q Learning From Scratch in Python

Add a comment

Related posts:

Dear Mr. Ultrasensitive

HR Consulting Services For Startups

Your Soul Self and Your Ego Self