Developers
August 11, 2020

How to Overcome Specification Problems for Artificial Agents

Specifying the task correctly leads to minimize the chances of specification gaming.

Today we will talk about a very interesting topic. We will talk about Specification gaming. What is Specification gaming? You might question. It's a behavior that satisfies the literal specification of an objective without achieving the intended outcome.

The problem appears when a learning agent finds and executes a shortcut to get a reward without completing the task that would've to give the reward. Like a kid copying on a test and getting the points without studying.

Following next we will see the possible causes of Specification gaming and share examples of when this happens. We will also see how to overcome this specific problem. 

First example

A designer asks you to stack Legos so that a red block ends up on top of a blue block. You get rewarded when the height of the red block is not touching the block. But instead of performing the task of picking up the red block and placing it on top of the blue one, you flip the red block and get the reward. As we can see, the behavior achieves the objective but not following the guideline. The designer gave a prompt for a reason, but the person that executed the task achieved the objective by its own rules.  

When one is developing a reinforcement learning algorithm, one has to build agents that learn to achieve the objective. One example is Atari games when they are trained for RL algorithms. The goal is to evaluate if the algorithms can solve the tasks. Specification gaming, in this case, we can see is a positive thing, as the agent finds new ways to achieve the proposed objective.  

Thought from a human perspective, we can consider algorithms naïve, as they do exactly what one tells them to do. Naïve but certainly useful, without algorithms, there would be no computer science as we know it today.

Designing task specifications that reflect the exact intent of the human-designed is not an easy task. A good RL algorithm can find an exact solution that differs from the intended solution.

What does this mean? That specifying the intent is the most important part of achieving the desired outcome. It's essential that the researchers correctly specify tasks so that agents can find new solutions.

Task Specification

Task specification includes not only the reward design but includes also the choice of how to train the environment for the rewards. How well the task specification is done, determines the capacity of the agent to produce an intended outcome.

If the specification is done right, then the agent produces a desirable solution. On the other hand, if the specification is done wrong, It can produce undesirable behavior. There is a very fine line between doing it right or wrong, one has to always be observant of the details when giving specifying the task. 

What causes specification gaming? Well, there are multiple causes, but the most common one is poor reward shaping. We have to then start by defining reward shaping. Reward shaping makes it easier to learn objectives by giving the agent rewards on the way to solving a task. It rewards the process instead of only rewarding the outcome.

The main idea is to specify a reward that accurately matches and produces a desired outcome. Going back to the lego stacking task, defining that bottom face of the red block has to be off the floor is a big opportunity for Specification gaming to happen. As you leave the door open for multiple scenarios.

When specifying the task, one has to always leave the “door closed”. This means narrowing down any possibility of specification gaming, therefore making it less possible to happen.

To make more accurate specifications, one can learn the reward function from human feedback. It makes it easier to evaluate an outcome that has been achieved rather than one that has to be imagined. It can also bear problems as also expecting the same outcome and sometimes specification gaming happening in other ways. Errors are 99% of the time made by humans. Meaning that specification gaming doesn't happen because there is a mistake in the specification, but a mistake in how specification Is done by the person doing it.  

In conclusion, specification gaming happens when a behavior satisfies the literal specification of an objective without achieving the intended outcome. We have seen that the most important part of specification gaming is specifying the task correctly. Specifying the task correctly is not an easy task, and the idea is to minimize the chance of specification gaming happening. The idea is to leave the "door closed" meaning that the fewer chances of happening, the better is specified. It is widely used in RL algorithms. We, as humans, have the chance of making the most out of our intelligence, we have seen how algorithms despite useful are naïve as they can only do what they are told. Machines without free will.

TagsSpecification GamingTask SpecificationReinforcement Learning
Lucas Bonder
Technical Writer
Lucas is an Entrepreneur, Web Developer, and Article Writer about Technology.

Related Articles

Back
DevelopersAugust 11, 2020
How to Overcome Specification Problems for Artificial Agents
Specifying the task correctly leads to minimize the chances of specification gaming.

Today we will talk about a very interesting topic. We will talk about Specification gaming. What is Specification gaming? You might question. It's a behavior that satisfies the literal specification of an objective without achieving the intended outcome.

The problem appears when a learning agent finds and executes a shortcut to get a reward without completing the task that would've to give the reward. Like a kid copying on a test and getting the points without studying.

Following next we will see the possible causes of Specification gaming and share examples of when this happens. We will also see how to overcome this specific problem. 

First example

A designer asks you to stack Legos so that a red block ends up on top of a blue block. You get rewarded when the height of the red block is not touching the block. But instead of performing the task of picking up the red block and placing it on top of the blue one, you flip the red block and get the reward. As we can see, the behavior achieves the objective but not following the guideline. The designer gave a prompt for a reason, but the person that executed the task achieved the objective by its own rules.  

When one is developing a reinforcement learning algorithm, one has to build agents that learn to achieve the objective. One example is Atari games when they are trained for RL algorithms. The goal is to evaluate if the algorithms can solve the tasks. Specification gaming, in this case, we can see is a positive thing, as the agent finds new ways to achieve the proposed objective.  

Thought from a human perspective, we can consider algorithms naïve, as they do exactly what one tells them to do. Naïve but certainly useful, without algorithms, there would be no computer science as we know it today.

Designing task specifications that reflect the exact intent of the human-designed is not an easy task. A good RL algorithm can find an exact solution that differs from the intended solution.

What does this mean? That specifying the intent is the most important part of achieving the desired outcome. It's essential that the researchers correctly specify tasks so that agents can find new solutions.

Task Specification

Task specification includes not only the reward design but includes also the choice of how to train the environment for the rewards. How well the task specification is done, determines the capacity of the agent to produce an intended outcome.

If the specification is done right, then the agent produces a desirable solution. On the other hand, if the specification is done wrong, It can produce undesirable behavior. There is a very fine line between doing it right or wrong, one has to always be observant of the details when giving specifying the task. 

What causes specification gaming? Well, there are multiple causes, but the most common one is poor reward shaping. We have to then start by defining reward shaping. Reward shaping makes it easier to learn objectives by giving the agent rewards on the way to solving a task. It rewards the process instead of only rewarding the outcome.

The main idea is to specify a reward that accurately matches and produces a desired outcome. Going back to the lego stacking task, defining that bottom face of the red block has to be off the floor is a big opportunity for Specification gaming to happen. As you leave the door open for multiple scenarios.

When specifying the task, one has to always leave the “door closed”. This means narrowing down any possibility of specification gaming, therefore making it less possible to happen.

To make more accurate specifications, one can learn the reward function from human feedback. It makes it easier to evaluate an outcome that has been achieved rather than one that has to be imagined. It can also bear problems as also expecting the same outcome and sometimes specification gaming happening in other ways. Errors are 99% of the time made by humans. Meaning that specification gaming doesn't happen because there is a mistake in the specification, but a mistake in how specification Is done by the person doing it.  

In conclusion, specification gaming happens when a behavior satisfies the literal specification of an objective without achieving the intended outcome. We have seen that the most important part of specification gaming is specifying the task correctly. Specifying the task correctly is not an easy task, and the idea is to minimize the chance of specification gaming happening. The idea is to leave the "door closed" meaning that the fewer chances of happening, the better is specified. It is widely used in RL algorithms. We, as humans, have the chance of making the most out of our intelligence, we have seen how algorithms despite useful are naïve as they can only do what they are told. Machines without free will.

Specification Gaming
Task Specification
Reinforcement Learning
About the author
Lucas Bonder -Technical Writer
Lucas is an Entrepreneur, Web Developer, and Article Writer about Technology.

Related Articles