Our quest to make Reinforcement Learning 200 times more efficient

Image from: Hello I’m NikUnsplash

In this series of articles, we explained why the sample inefficiency is a critical limit of the Deep Reinforcement Learning and why the model-based approach can help solve it. Then we presented a state-of-the-art algorithm called PlaNet, and we used it to test the hypothesis. Today we present the obtained result.


To maintain consistency with the original PlaNet paper, we have computed all the experiments with the DeepMind Control Suite. This Suite provides continuous control tasks built for benchmarking reinforcement learning agents. We have chosen four of them: Cartpole, Cheetah, Walker, and Reacher.


The Deep Planning Network (PlaNet)

Image from: Jason LeungUnsplash

In the previous article Deep RL: a Model-Based approach (part 2), we saw how Deep Reinforcement Learning (DRL) works and how the model-based approach can improve sample efficiency. In this article, we present a specific algorithm model-based algorithm called PlaNet.


Model-based Deep Reinforcement Learning explained

Image from: Jason LeungUnsplash

In the previous article Deep RL: a Model-Based approach (part 1), we saw how Deep Reinforcement Learning (DRL) could be very effective and very inefficient. Now we examine how it works and why a model-based approach can drastically improve the sample efficiency.

Reinforcement learning

In reinforcement learning, an agent acts in an unknown environment to reach an unknown goal.

Time is discretized into time steps, and for each of them, the agent receives information about the environment and takes action. Then it receives a feedback signal called reward. This reward is positive when the action brings the…


Deep Reinforcement Learning doesn’t really work… Yet

Image from: Jason LeungUnsplash

Using Deep Reinforcement Learning (RL), we can train an agent to solve a task without explicitly programming it. This approach is so general that, in principle, we can apply it to any sequential decision-making. For example, in 2015, a research team developed a DRL algorithm called DQN to play Atari games. They use the same method across 57 different games, each with particular goals to achieve, with peculiar enemies, and different agent moves. Their agent learns to solve many games. In some cases, it reaches even better human-level performance.


Checking the reliability of SHAP methods

Image from: Lucas SantosUnsplash

Our previous article, A Game of Prediction (Part 1), presented four methods, based on the SHAP framework, to explain the neural network result. In this article, we compare and evaluate them using a sanity check.


Coalition Games for explaining DNN

Image from: Jonathan PeterssonUnsplash

Deep neural networks received great attention because it can be proven that under certain assumptions, they are universal approximators. However, how the network reaches a certain prediction is not well understood. Therefore, deep neural networks have been referred to as black-box algorithms.

To understand how deep neural networks produce a certain prediction and which input features were the most responsible for predicting, algorithms named Explanation methods have been introduced. The majority of these methods are based on heuristics and backpropagation.

A novel class of such explanation methods is based on the Shapley values for coalitional…

Sanity Checks on XAI with Captum, INNvestigate, and TorchRay

Assessing the scope and quality of results provided by eXplainable AI

Photo by Caroline on Unsplash

Sanity Checks

In the previous article, we’ve seen a brief introduction of the why’s and the how’s of Explainable Artificial Intelligence. Here we’re are going to see the results of some Sanity Checks experiments.

The experiments conducted here aim to answer the following question: who assures us that the explanation provided by the method actually tells us reliably about what the network has learned to take that decision?

Specifically, we want to assess the sensitivity of explanation methods to model parameters: if one method really highlights the most important regions of…

aka Training Self-Driving with Virtual Worlds

This series's previous two articles presented some challenges in training self-driving systems and the first methods to overcome: Pixel-level Adaptation. Today we’ll see a second approach: Features-level Adaptation.

Feature-level Adaptation

This method is based on adding a loss based on the segmentation model’s ability to fool a discriminator trained with the labeled data. If the model can do that even with never seen before images, it learned to produce very realistic results.

Moreover, this kind of training can also help to reduce the domain shift. Instead of working in pixel space, we operate with the respective…

aka Training Self-Driving with Virtual Worlds

In the previous article, I presented some challenges in training self-driving systems and two methods to overcome them: Pixel-level Adaptation and Features-level Adaptation, both based on Generative Adversarial Network. This article will explain how they work and show our experimental results.

Pixel-level Adaptation

The first method is based on a particular architecture of GAN called Cycle-Gan. This model can capture the principal style features from a set of images and then apply them to another image collection from another domain. For example, you can provide the model with two groups of images. The first one contains…

aka Training Self-Driving with Virtual Worlds

Self-driving cars should have already been here but have not yet arrived, even if Tesla, with the latest version of the FSD Beta, now seems very close to success.
In this series of articles, we try to understand why it is so difficult and time-intensive to build an autonomous driving system and how we can speed up this process using “video games”.

Deep Learning systems learn from examples, i.e., from input/output pairs. In the specific case of vision systems, these pairs are represented by images in Input and Structured Data in output. The input…

Enrico Busto

Founding Partner and CTO @ Addfor S.p.A. We develop Artificial Intelligence Solutions.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store