Walk the Talk

Generally, the Embedded people or the robotic amateurs, when experiment with pick and place robot without using reinforcement learning, they feed a full map of the environment in the memory of robot. What we are doing here is completely different. The robot will be trained using Q-Learning…


The result of action depends upon the very quality of the action …

In one of my previous articles, Reinforcement Learning 5: Finite Markov Decision Processes (part-3), I explained action value function which is denoted by q(s,a) or simply q. The optimal q value function is denoted by q*. …


How much greedy is good enough.

Before getting into Q-Learning, lets understand the importance concept of Exploration vs Exploitation.

Exploration vs Exploitation

Exploration is trying new possibilities in order to find better rewards. Exploitation is to keep opting for same actions that has given some significant rewards in the past. To illustrate this…


That’s part of the policy. To keep switching gears. — Ridley Scott

Policies and Value Functions

Estimating value functions is the vital aspect of reinforcement learning. It tells how good it is to perform a given action in a given state and how good it is to get maximum expected return. A policy is…


There is always some maths ….

Episodic Tasks and Continuing Tasks

The agent-environment interaction can happen in two types of situations. One in which, the interaction can be broken into sub-sequences called as episodes. For example a game play. In games like chess or Atari games, every game is a episode which ends in a…


Learning along the way …..

Markov Chains and Markov Process

According to Wikipedia,

“A Markov chain is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event”.

Andrey Andreyevich Markov was a phenomenal Russian mathematician who is best known…


… and so the adventure begins !!!!

Richard S. Sutton and Andrew G. Barto in their book titled “Reinforcement learning”, which is main inspiration for this series of articles, describe it as…

Reinforcement learning is learning what to do — how to map situations to actions — so as to…


Autoencoders with variational inference

There are generative models and there are discriminative models. Discriminative models discriminate between different kinds of data instances while generative models generate new instances of data. There are many types of generative models. Variational Autoencoders is certainly one of the most popular generative models. …


Faster than fast

1. Introduction

Fast R-CNN was introduced in April 2015. It was faster than R-CNN. But not fast enough and hence unsuitable for real time applications. Faster R-CNN was introduced by Ross Girshick et al. in the June of same year and its much faster that Fast R-CNN. …


Improved R-CNN

1. Introduction

R-CNN is slow. Detecting with R-CNN with VGG16 backbone takes 47 seconds for one image at test time. That makes it unsuitable for low latency applications. So in April 2015 Ross Girshick who was also one of the authors of R-CNN, single handedly proposed a better algorithm, called Fast Algorithm…

Ashutosh Makone

I am a hands-on guy. I appreciate the beauty of theory but understand its futility without application. ML, DL and computer vision are my interests.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store