Developers
July 29, 2020

DeepMind Develops AI Systems to Decode How the Brain Works

Dopamine and temporal difference learning: Neuroscience and AI. Dopamine is a major component of the human reward system. DeepMoind’s scientists train AI models with reward and punishment algorithms to advance AI.
Source: Unsplash

Today we will talk about the relationship between neuroscience and artificial intelligence. Human-based learning and motivation is driven by two forces: internal and external rewards.

Most of our day to day behaviors are predicting or anticipating a result or in other terms, a rewarding outcome. If there's no reward in certain actions, then we are not interested in the same way. We do things for a reason, to expect a certain result.

There's a study that is based on how organisms learn from experience. Based on this study, the researchers can anticipate rewards. There's a famous experiment done by Ivan Pavlov. In the experiment, dogs are trained to expect food after a buzzer sounds. As soon as the buzzer sounded, the dogs began salivating. This shows how they learned to predict the reward. The same can be seen where before taking your dog for a walk it gets excited.

Based on this study, computer scientists have developed algorithms in artificial systems. The algorithms enable AI systems to learn strategies without being instructed, guided automatically by reward predictions.

A new study in computer science, named published in nature is explaining previous unexplained reward learning processes in the brain. It opens up new paths of research into the brain's dopamine system, including implications in motivation and learning disorders.

Have you ever heard about Reinforcement learning? It's one of the oldest and powerful concepts that link neuroscience to AI. It is originated from the 1980s where computer science researchers tried to develop algorithms that learned how to perform complex behaviors on their own. The algorithms used rewards and punishments as a teaching signal.  

To solve a problem in these scenarios it is always necessary to understand how current actions result in future rewards. We, humans, know that depending on certain actions there will be certain results. For example, if you study you will have better scores on tests.  

The temporal difference learning algorithm (TD) is one of the most important algorithms in reward prediction. TD uses a mathematical trick that replaces complex reasoning about future events with simple learning procedures that lead to the same results.

How does this happen? Well, instead of calculating the total future reward, TD predicts the combination of immediate reward and the own reward prediction at the next moment in time.

At the next moment, the new prediction is compared to what it was expected to be. If they differ, then the algorithm calculates how they differ, applying the temporal difference, adjusting the old prediction toward the new prediction. The algorithm always tries to bring these numbers closer, matching the expectations to reality.  

Between the 80s and 90s, neuroscientists struggled to understand the behavior of dopamine neurons. Where do dopamine neurons reside? In the midbrain. Despite they are located in the midbrain, they send projections to many brain areas. These projections are considered important messages.

The firing of the dopamine neurons have a strong relationship to reward. The response also depends on sensory information and changed as the animals became experienced in a certain task.

Scientists in the mid-90s have found out that responses in dopamine neurons represented reward prediction errors. The firing of the neurons meant when the animal had more reward, based on this, it was trained to expect.

The researchers later proposed that the brain uses the TD learning algorithm. Where a reward prediction error is calculated, broadcasting to the brain as a dopamine signal and used to improve learning.

This is not mere theory, as the reward prediction error theory of dopamine has been tested successfully in multiple experiments. It is currently the most successful quantitative theories in neuroscience.

Computer scientists continued to improve the algorithms for learning from rewards and punishments. Since 2013, the focus has been put into deep reinforcement learning. This means using deep neural networks to apply powerful representations in reinforcement learning.

Distributional reinforcement learning is the main algorithm that has made reinforcement learning work better with neural networks. The amount of reward that will result from a particular action is not a known quantity, there is randomness.

In conclusion, scientists have found out that Dopamine neurons have a strong connection to the reward system. The more dopamine is released, the more reward the person receives. Dopamine is a building structure of the reward system. The Ivan Pavlov study shows how organisms learn from experience. Dogs are trained to expect food after a buzzer sounds, and the second time, when the buzzer sounds, the dogs begin salivating. This shows how they learn to predict the reward. People do things expecting a reward, scientifically speaking, people seek dopamine rushes. One of the oldest and most powerful concepts that link neuroscience to AI is reinforcement learning. Computer science researchers tried to develop algorithms that learned how to perform actions on their own. The algorithms used rewards and punishments as a teaching signal. The reward system teaches the model how to seek reward by using the punishment signal.

TagsDeepMindAINeuroscience
Lucas Bonder
Technical Writer
Lucas is an Entrepreneur, Web Developer, and Article Writer about Technology.

Related Articles

Back
DevelopersJuly 29, 2020
DeepMind Develops AI Systems to Decode How the Brain Works
Dopamine and temporal difference learning: Neuroscience and AI. Dopamine is a major component of the human reward system. DeepMoind’s scientists train AI models with reward and punishment algorithms to advance AI.

Today we will talk about the relationship between neuroscience and artificial intelligence. Human-based learning and motivation is driven by two forces: internal and external rewards.

Most of our day to day behaviors are predicting or anticipating a result or in other terms, a rewarding outcome. If there's no reward in certain actions, then we are not interested in the same way. We do things for a reason, to expect a certain result.

There's a study that is based on how organisms learn from experience. Based on this study, the researchers can anticipate rewards. There's a famous experiment done by Ivan Pavlov. In the experiment, dogs are trained to expect food after a buzzer sounds. As soon as the buzzer sounded, the dogs began salivating. This shows how they learned to predict the reward. The same can be seen where before taking your dog for a walk it gets excited.

Based on this study, computer scientists have developed algorithms in artificial systems. The algorithms enable AI systems to learn strategies without being instructed, guided automatically by reward predictions.

A new study in computer science, named published in nature is explaining previous unexplained reward learning processes in the brain. It opens up new paths of research into the brain's dopamine system, including implications in motivation and learning disorders.

Have you ever heard about Reinforcement learning? It's one of the oldest and powerful concepts that link neuroscience to AI. It is originated from the 1980s where computer science researchers tried to develop algorithms that learned how to perform complex behaviors on their own. The algorithms used rewards and punishments as a teaching signal.  

To solve a problem in these scenarios it is always necessary to understand how current actions result in future rewards. We, humans, know that depending on certain actions there will be certain results. For example, if you study you will have better scores on tests.  

The temporal difference learning algorithm (TD) is one of the most important algorithms in reward prediction. TD uses a mathematical trick that replaces complex reasoning about future events with simple learning procedures that lead to the same results.

How does this happen? Well, instead of calculating the total future reward, TD predicts the combination of immediate reward and the own reward prediction at the next moment in time.

At the next moment, the new prediction is compared to what it was expected to be. If they differ, then the algorithm calculates how they differ, applying the temporal difference, adjusting the old prediction toward the new prediction. The algorithm always tries to bring these numbers closer, matching the expectations to reality.  

Between the 80s and 90s, neuroscientists struggled to understand the behavior of dopamine neurons. Where do dopamine neurons reside? In the midbrain. Despite they are located in the midbrain, they send projections to many brain areas. These projections are considered important messages.

The firing of the dopamine neurons have a strong relationship to reward. The response also depends on sensory information and changed as the animals became experienced in a certain task.

Scientists in the mid-90s have found out that responses in dopamine neurons represented reward prediction errors. The firing of the neurons meant when the animal had more reward, based on this, it was trained to expect.

The researchers later proposed that the brain uses the TD learning algorithm. Where a reward prediction error is calculated, broadcasting to the brain as a dopamine signal and used to improve learning.

This is not mere theory, as the reward prediction error theory of dopamine has been tested successfully in multiple experiments. It is currently the most successful quantitative theories in neuroscience.

Computer scientists continued to improve the algorithms for learning from rewards and punishments. Since 2013, the focus has been put into deep reinforcement learning. This means using deep neural networks to apply powerful representations in reinforcement learning.

Distributional reinforcement learning is the main algorithm that has made reinforcement learning work better with neural networks. The amount of reward that will result from a particular action is not a known quantity, there is randomness.

In conclusion, scientists have found out that Dopamine neurons have a strong connection to the reward system. The more dopamine is released, the more reward the person receives. Dopamine is a building structure of the reward system. The Ivan Pavlov study shows how organisms learn from experience. Dogs are trained to expect food after a buzzer sounds, and the second time, when the buzzer sounds, the dogs begin salivating. This shows how they learn to predict the reward. People do things expecting a reward, scientifically speaking, people seek dopamine rushes. One of the oldest and most powerful concepts that link neuroscience to AI is reinforcement learning. Computer science researchers tried to develop algorithms that learned how to perform actions on their own. The algorithms used rewards and punishments as a teaching signal. The reward system teaches the model how to seek reward by using the punishment signal.

DeepMind
AI
Neuroscience
About the author
Lucas Bonder -Technical Writer
Lucas is an Entrepreneur, Web Developer, and Article Writer about Technology.

Related Articles