Walk the Talk

Photo by Joshua Earle on Unsplash

Generally, the Embedded people or the robotic amateurs, when experiment with pick and place robot without using reinforcement learning, they feed a full map of the environment in the memory of robot. What we are doing here is completely different. The robot will be trained using Q-Learning. It will know nothing about the environment initialy. It doesn’t even know that there is a destination where it has to reach. It will figure out everything over a lot of episodes using Q-Learning.

Its is very fascinating to see Q-Learning in action. To demonstrate that, i am considering a…


The result of action depends upon the very quality of the action …

Photo by Chinmay Bhattar on Unsplash

In one of my previous articles, Reinforcement Learning 5: Finite Markov Decision Processes (part-3), I explained action value function which is denoted by q(s,a) or simply q. The optimal q value function is denoted by q*. The optimal q value also gives, optimal policy.

Remember, q function is a function of state and action. It is given by


How much greedy is good enough.

Photo by Gabriel Meinert on Unsplash

Before getting into Q-Learning, lets understand the importance concept of Exploration vs Exploitation.

Exploration vs Exploitation

Exploration is trying new possibilities in order to find better rewards. Exploitation is to keep opting for same actions that has given some significant rewards in the past. To illustrate this we can consider an example of restaurants. Suppose i recently start living in a new city. On a lazy weekend, i am fade up of cooking myself and and want to try a restaurant. I start visiting different restaurants on every weekend. In 2 to 3 weekends i discovered a…


That’s part of the policy. To keep switching gears. — Ridley Scott

Photo by Hassan Pasha on Unsplash

Policies and Value Functions

Estimating value functions is the vital aspect of reinforcement learning. It tells how good it is to perform a given action in a given state and how good it is to get maximum expected return. A policy is a function that maps a given state to probabilities of selecting each possible action from that state. The policy of an agent changes with experience.

There can be multiple value functions corresponding to multiple policies. The policy that gives the greatest value of expected return is called as the optimal…


There is always some maths ….

Photo by Roman Mager on Unsplash

Episodic Tasks and Continuing Tasks

The agent-environment interaction can happen in two types of situations. One in which, the interaction can be broken into sub-sequences called as episodes. For example a game play. In games like chess or Atari games, every game is a episode which ends in a state called terminal state. The terminal state can be winning or loosing or a game being draw. After this terminal state the game is reset and the next episode starts all over again. Not just games, but any tasks that can be broken into such episodes are called as Episodic…


Learning along the way …..

Photo by Alexander Schimmeck on Unsplash

Markov Chains and Markov Process

According to Wikipedia,

“A Markov chain is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event”.

Andrey Andreyevich Markov was a phenomenal Russian mathematician who is best known for his work on stochastic processes. In probability theory , any process with which some randomness is associated is called stochastic process. Markov chain talks about a sequence of events in which the probability of every event depends on its previous event. For example if there is a sequence of…


… and so the adventure begins !!!!

Photo by Susan Q Yin on Unsplash

Richard S. Sutton and Andrew G. Barto in their book titled “Reinforcement learning”, which is main inspiration for this series of articles, describe it as…

Reinforcement learning is learning what to do — how to map situations to actions — so as to maximize a numerical reward signal. The learner is not told which actions to take, but instead must discover which actions yield the most reward by trying them. In the most interesting and challenging cases, actions may affect not only the immediate reward but also the next situation and, through that…


Autoencoders with variational inference

Photo by 浮萍 闪电 on Unsplash

There are generative models and there are discriminative models. Discriminative models discriminate between different kinds of data instances while generative models generate new instances of data. There are many types of generative models. Variational Autoencoders is certainly one of the most popular generative models. They were introduced by Diederik P Kingma and Max Welling in their research paper titled “ Auto-Encoding Variational Bayes” which can be found here

How is it different from Autoencoder?

So what is the difference between Autoencoders and Variational Autoencoders? An Autoencoder is mostly used for dimensionality reduction. So as shown in the figure 1, the input(x) is…


Faster than fast

Photo by Valerie Blanchett on Unsplash

1. Introduction

Fast R-CNN was introduced in April 2015. It was faster than R-CNN. But not fast enough and hence unsuitable for real time applications. Faster R-CNN was introduced by Ross Girshick et al. in the June of same year and its much faster that Fast R-CNN. The research paper can be found here.

2. First Look at Faster R-CNN


Improved R-CNN

Photo by Gnider Tam on Unsplash

1. Introduction

R-CNN is slow. Detecting with R-CNN with VGG16 backbone takes 47 seconds for one image at test time. That makes it unsuitable for low latency applications. So in April 2015 Ross Girshick who was also one of the authors of R-CNN, single handedly proposed a better algorithm, called Fast Algorithm. The research paper can be found here. In this write up, we will get down to the nitty gritty of Fast R-CNN. …

Ashutosh Makone

I am a hands-on guy. I appreciate the beauty of theory but understand its futility without application. ML, DL and computer vision are my interests.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store