DON’T GUESS YOUR NEXT MOVE. PLAN IT!

Our quest to make Reinforcement Learning 200 times more efficient

Image for post
Image for post
Image from: Hello I’m NikUnsplash

In this series of articles, we explained why the sample inefficiency is a critical limit of the Deep Reinforcement Learning and why the model-based approach can help solve it. Then we presented a state-of-the-art algorithm called PlaNet, and we used it to test the hypothesis. Today we present the obtained result.

Experiments

To maintain consistency with the original PlaNet paper, we have computed all the experiments with the DeepMind Control Suite. This Suite provides continuous control tasks built for benchmarking reinforcement learning agents. We have chosen four of them: Cartpole, Cheetah, Walker, and Reacher.


DON’T GUESS YOUR NEXT MOVE. PLAN IT!

The Deep Planning Network (PlaNet)

Image for post
Image for post
Image from: Jason LeungUnsplash

In the previous article Deep RL: a Model-Based approach (part 2), we saw how Deep Reinforcement Learning (DRL) works and how the model-based approach can improve sample efficiency. In this article, we present a specific algorithm model-based algorithm called PlaNet.


DON’T GUESS YOUR NEXT MOVE. PLAN IT!

Model-based Deep Reinforcement Learning explained

Image for post
Image for post
Image from: Jason LeungUnsplash

In the previous article Deep RL: a Model-Based approach (part 1), we saw how Deep Reinforcement Learning (DRL) could be very effective and very inefficient. Now we examine how it works and why a model-based approach can drastically improve the sample efficiency.

Reinforcement learning

In reinforcement learning, an agent acts in an unknown environment to reach an unknown goal.

Time is discretized into time steps, and for each of them, the agent receives information about the environment and takes action. Then it receives a feedback signal called reward. This reward is positive when the action brings the…


DON’T GUESS YOUR NEXT MOVE. PLAN IT!

Deep Reinforcement Learning doesn’t really work… Yet

Image for post
Image for post
Image from: Jason LeungUnsplash

Using Deep Reinforcement Learning (RL), we can train an agent to solve a task without explicitly programming it. This approach is so general that, in principle, we can apply it to any sequential decision-making. For example, in 2015, a research team developed a DRL algorithm called DQN to play Atari games. They use the same method across 57 different games, each with particular goals to achieve, with peculiar enemies, and different agent moves. Their agent learns to solve many games. In some cases, it reaches even better human-level performance.


COMPARISON OF SHAPLEY VALUE-BASED EXPLANATION METHODS

Checking the reliability of SHAP methods

Image for post
Image for post
Image from: Lucas SantosUnsplash

Our previous article, A Game of Prediction (Part 1), presented four methods, based on the SHAP framework, to explain the neural network result. In this article, we compare and evaluate them using a sanity check.


COMPARISON OF SHAPLEY VALUE-BASED EXPLANATION METHODS

Coalition Games for explaining DNN

Image for post
Image for post
Image from: Jonathan PeterssonUnsplash

Deep neural networks received great attention because it can be proven that under certain assumptions, they are universal approximators. However, how the network reaches a certain prediction is not well understood. Therefore, deep neural networks have been referred to as black-box algorithms.

To understand how deep neural networks produce a certain prediction and which input features were the most responsible for predicting, algorithms named Explanation methods have been introduced. The majority of these methods are based on heuristics and backpropagation.

A novel class of such explanation methods is based on the Shapley values for coalitional…


Sanity Checks on XAI with Captum, INNvestigate, and TorchRay

Assessing the scope and quality of results provided by eXplainable AI

Image for post
Image for post
Photo by Caroline on Unsplash

Sanity Checks

In the previous article, we’ve seen a brief introduction of the why’s and the how’s of Explainable Artificial Intelligence. Here we’re are going to see the results of some Sanity Checks experiments.

The experiments conducted here aim to answer the following question: who assures us that the explanation provided by the method actually tells us reliably about what the network has learned to take that decision?

Specifically, we want to assess the sensitivity of explanation methods to model parameters: if one method really highlights the most important regions of…


Unboxing the Black Box — eXplainable Artificial Intelligence (XAI)

How can you trust a model if you cannot understand how it reaches its conclusions?

Image for post
Image for post
Photo by Irene Giunta on Unsplash

Unboxing the Black Box

In the last decade, Artificial Intelligence (AI) systems' performance is reaching, and sometimes surpassing, the human level on many tasks.

AI-based technologies are increasingly being used in our daily life; think about movie recommendations of Netflix, friend suggestions on Facebook, neural machine translation of Google, or speech recognition of Amazon Alexa. Not only, but AI-enabled computer applications are also being increasingly used in three important fields of our lives, health, and finance.

Deep Neural Networks (DNNs) demonstrate great success in learning complex patterns that enable…


aka Training Self-Driving with Virtual Worlds

Image for post
Image for post

This series's previous two articles presented some challenges in training self-driving systems and the first methods to overcome: Pixel-level Adaptation. Today we’ll see a second approach: Features-level Adaptation.

Feature-level Adaptation

This method is based on adding a loss based on the segmentation model’s ability to fool a discriminator trained with the labeled data. If the model can do that even with never seen before images, it learned to produce very realistic results.

Moreover, this kind of training can also help to reduce the domain shift. Instead of working in pixel space, we operate with the respective…


aka Training Self-Driving with Virtual Worlds

Image for post
Image for post

In the previous article, I presented some challenges in training self-driving systems and two methods to overcome them: Pixel-level Adaptation and Features-level Adaptation, both based on Generative Adversarial Network. This article will explain how they work and show our experimental results.

Pixel-level Adaptation

The first method is based on a particular architecture of GAN called Cycle-Gan. This model can capture the principal style features from a set of images and then apply them to another image collection from another domain. For example, you can provide the model with two groups of images. The first one contains…

Enrico Busto

Founding Partner and CTO @ Addfor S.p.A. We develop Artificial Intelligence Solutions.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store