Operant conditioning- its Reinforcement & Schedules in easy words

0
155
Operant conditioning

In this lesson, you will learn about another form of associative learning called operant conditioning. The experiments of B.F. Skinner will be discussed and the concepts of reinforcement and punishment will be explored. Different procedures for using reinforcement as well as the timing of reinforcement will also be explained.

What is Operant Conditioning?

Operant conditioning is a form of associative learning, like classical conditioning, except that in operant conditioning there is an association
with consequences.

Operant conditioning always involves behaviour that is either reinforced or punished. One of the first people to study this kind of learning was Edward Thorndike. He put forth the Law of Effect which states that:

The Law of Effect in operant conditioning

Thorndike conducted a series of experiments using a cat in a puzzle box. The hungry cat was locked in a cage next to its food. The cat had to get out of the cage in order to get to the food. He found that, in time, the cat learned the new behaviour by connecting the stimulus with the response.

Thorndike puzzle box

He put forth the Law of Effect which states that:

  • If the consequences of the behaviour are pleasant, the stimulus-response connection will be strengthened and the likelihood of the behaviour will increase.
  • If the consequences of the behaviour are unpleasant, the stimulus-response connection will weaken and the likelihood of the behaviour will decrease.

Thorndike’s work set the stage for the work of B.F. Skinner. In his studies, he designed a chamber, called a Skinner box. The box had a bar or button that an animal pressed to release food or water.

skinner box

Two important concepts used by Skinner in operant conditioning are reinforcement and punishment.

  • Reinforcement is any consequence that increases the likelihood of a specific behaviour.
  • Punishment is any consequence that decreases the likelihood of a specific behaviour.

Reinforcement in Operant Conditioning

There are many types of reinforcement and there are also many schedules of reinforcement. This refers to whether the reinforcement should happen continuously or partially. Let’s begin with the types of reinforcement. Reinforcement can be:

  • positive or negative
  • immediate or delayed
  • primary or secondary

Remember that reinforcement strengthens a behaviour making it more likely to occur again.

Positive reinforcement occurs when a specific behaviour is followed by a desirable event or state (something that is desired). Two examples of positive reinforcement are:

  • praising a dog that comes when it is called
  • paying a person for building a fence in your yard

Negative reinforcement occurs when a specific behaviour ends in an undesirable event or state (something that is undesired). Two examples of negative reinforcement are:

  • ending a headache by taking an aspirin
  • putting up an umbrella to get out of the rain

Immediate reinforcement occurs right after the behaviour. Two examples are

  • praising the dog right when it comes
  • paying the fence builder as soon as the job is completed

Delayed reinforcement occurs sometime after the behaviour. Two examples are:

  • praising the dog for coming five minutes after he comes over
  • paying the fence builder one week after the job is completed

Primary reinforcement involves something that is biologically reinforcing, such as food, warmth, or water. Two examples are:

  • giving the dog a biscuit when it comes
  • giving the fence builder a cold glass of water when he’s thirsty

Secondary reinforcement, also called conditioned reinforcement, involves something that you have learned to value, such as money. Two examples are:

  • patting the dog when he comes
  • paying the fence builder

As you can see, there are many types of reinforcement. Which is more effective?

  • Both positive and negative reinforcement are equally effective.
  • We are more likely to respond to immediate reinforcement rather than delayed reinforcement; however, we can’t always be rewarded or reinforced right away. As humans, we must learn to delay being rewarded.

Such examples include receiving a passing grade at the end of the course and getting a paycheque at the end of a two-week period.

  • Research has shown that the ability to delay gratification has an advantage. Children who prefer a big reward tomorrow over a smaller reward today become higher-achieving adolescents.
Also read  What is Abnormal Psychology? its theories, approaches and goals

If reinforcement increases the likelihood of a behaviour occurring in the future, then what decreases the likelihood of a behaviour occurring in the future? The answer is punishment.

Punishment

Punishment weakens a behaviour making it less likely to occur again in the future. There are two different types of punishment.

  • The behaviour is followed by an undesirable event.

For example, if a young child touches a hot stove, the burn on the hand is the punishment. This punishment makes the behaviour less likely to occur in the future.

  • The behaviour is followed by the ending of a desirable state or event. This type of punishment is also called omission training. In school, we call this time out.

For example, if I get repeated speeding tickets, I lose the privilege of driving. Likewise, a young child who is hitting his classmates is removed from the classroom to a time-out room.

Research has shown that reinforcement is effective but how effective is punishment?

Problems with Punishment

Many learning experts oppose the use of punishment in controlling behaviour. Some of the reasons are as follows:

  • Sometimes, punishment doesn’t end the undesired behaviour; it only suppresses it. The punished person avoids doing the behaviour if they will get caught.
  • punishment can teach aggressive behaviour because we tend to repeat behaviours that we observe.
  • Sometimes, punishment can lead to fear, anxiety, and lower self-esteem because frequently punished children or animals learn to avoid it.

Reinforcement Procedures in Operant Conditioning

In operant conditioning, like in classical conditioning, there are procedures that can be applied to reinforce a specific behaviour, making the behaviour more likely to occur in the future. Three of these procedures are shaping (or acquisition), discrimination, and extinction.

Operant conditioning

Shaping

Shaping is a technique or procedure that is used to establish new behaviour. It involves gradually reinforcing closer and closer approximations of the desired behaviour. In everyday life, we are always shaping the behaviour of others or having our behaviour shaped.

an example is when you learned to drive a car. You were reinforced for turning the car on, then driving forward a few metres, then putting on the turn signal, then taking the first right-hand turn, then taking the first left-hand turn, and so on. Your driving behaviour was being shaped.
Shaping is useful in training specific behaviour. But, how do we learn to behave differently when presented with similar stimuli and how do we get rid of behaviours that we have learned? The answer lies in the procedures of discrimination and extinction.

Discrimination

Discrimination is the ability to tell the difference between two similar stimuli. If you can’t tell the difference, it is called generalization.
For example, you know the difference between class bells and fire alarm bells, human cookies and dog biscuits, and your friends Jessica and Sarah.

Extinction

Extinction is the loss of a specific behaviour when no consequence follows it. This is sometimes a good thing because it prevents us from repeating the same unsuccessful behaviours.
A few examples are making flirtatious comments to someone who is not interested in us and making an egg salad sandwich for someone who doesn’t eat eggs.
Let’s look at two other examples of extinction that show how it can be used to change a child’s bad behaviour.

  • A child is lying in bed while his parents are in the living room talking to guests. The child begins to make loud noises to get their attention. The child’s behaviour becomes extinct because the parents ignore the child and continue to talk to their guests.
  • A child is having dinner with her parents and at the end of the meal she yells loudly for dessert. The parents continue talking and ignore their daughter’s demands. After a short period of time, the child is served dessert.
Operant conditioning

Schedules of Reinforcement

For a specific behaviour to continue, the behaviour needs to be reinforced continuously or partially. These are called schedules of reinforcement.
Skinner and others have identified different types of reinforcement schedules.

  • In continuous reinforcement, you reward every correct response.
  • In partial reinforcement, you reward only some correct responses.
Also read  What is Learning? its History and different theories of learning

Partial reinforcement schedules are further broken down into ratio schedules and interval schedules.

  • Ratio schedules focus on the number of responses before reinforcement occurs.
  • Interval schedules focus on the time between reinforcements.

Ratio and interval schedules can be either fixed or variable. If we put together all the information on schedules, we end up with four different partial reinforcement schedules. Let’s look at each one separately.

what is Operant conditioning

Fixed interval schedule (FI): Fixed interval schedules is an operant conditioning schedule where rewards are given after a fixed amount of time has passed. The reward is delivered on the first, second, third, fourth, and fifth intervals after the starting point.

For example, if there is a 15-minute interval, then the first response after 15 minutes would be reinforced.

Note: Fixed interval schedule is the least effective and the easiest to break.

Fixed ratio schedule (FR): The fixed ratio schedule is one of the most well-known schedules of reinforcement. It is a continuous reinforcement schedule where the reinforcer is given after a certain number of responses has been emitted. This means that every time the animal completes that number of responses without error, it will receive the reinforcer. The more responses emitted without error, the more likely they are to receive a reinforcer.

For example, if an animal completes five responses without error, they will receive a piece of cheese. Different schedules of reinforcement are based on the number of responses that have to be completed with no errors in order to receive the reinforcer.

FR schedules are usually used for training specific tasks or for rewarding.

Variable interval schedule (VI): In a variable interval schedule, the time between reinforcements is varied. This type of schedule is different from the fixed interval schedule (FI) in that there is no specific time at which a reinforcement will be given.

For example, it starts with one reinforcement and then on the days that follow it increases to 2, 3 and so on. This type of schedule is different from the fixed ratio schedule (FR) in that there is no specific schedule or ratio for reinforcements.

Variable ratio schedule (VR): The variable-ratio schedule is a reinforcement schedule that keeps the number of reinforcements constant and varies the time interval between each one.

For example, a ratio of 1:4 means that for four responses, there will be one reinforcement. A ratio of 4:1 means that for every four responses, there will be one reward. This reinforcement schedule has been found to have some interesting properties.

Note: Variable ratio schedule are the most productive.

The most important practical based question in psychology

Question: A person receives a paycheque every other week.

Answer: Fixed interval

Question: Pop quizzes are administered to students.

Answer: Variable interval

Question: People play slot machines at gambling casinos.

Answer: Variable ratio

Question: You call the mechanic to find out if your car is fixed yet.

Answer: Variable interval

Question: A factory worker is paid every time he finishes three pairs of pants.

Answer: Fixed ratio

Question: When fly fishing, you must cast and reel back several times before catching a fish.

Answer: Variable ratio

Question: During a class, you look at your watch until the end of the class.

Answer: Fixed interval

Question: A salesperson is paid based on commission.

Answer: Fixed ratio

Question: You call a friend and get a busy signal.

Answer: Variable interval

Limitations

The application of operant conditioning in the classroom has been proven to be effective, but there are some limitations that need to be addressed.

There are many applications of operant conditioning that are not used in the classroom. Operant conditioning can be applied to so many different settings and areas that it is nearly impossible to list them all.

It can only work with students who have a high IQ and have a good attention span.

It can’t work on students who don’t speak English.

The rewards or punishments have to be delivered immediately after the desired behavior. This can limit the teacher’s ability to reward or punish a student outside of class time.

operant conditioning can only work with behaviors that are under voluntary control. If a student is not able to control their behavior, then this technique will not work.