# 9.4 Prisoner's Dilemma

## TRUST NO ONE

• The Prisoner's Dilemma is a classic example of a non-zero-sum game.
• The equilibrium in Prisoner's Dilemma is not the optimum solution.

The RAND Corporation, located in Santa Monica, California, is the original "think-tank." It was founded after World War II to be a center of national security and global policy ideas and analysis. Whereas today it advises many nations on a variety of issues, its initial focus was national defense. Game theory was one of the early pursuits of RAND thinkers, and in 1950 two RAND scientists, Merrill Flood and Melvin Dresher, framed what would become one of the most fascinating games of all time, the Prisoner's Dilemma.

The basic game is set up like this: imagine that you and your friend are caught robbing a bank. Upon being apprehended you are immediately separated so that you do not have time to communicate with each other. Each of you is taken to a separate cell for interrogation. If you and your buddy cooperate (C) with each other-that is, say nothing to the cops-each of you will get only a year in jail, known as the "reward" payoff, R.

If you both rat on each other, or "defect" (D), to use the game theorist's terminology, you will both get three years in prison, known as the "punishment" payoff, P.

If one of you cooperates and the other defects, the cooperator will get five years, known as the "sucker's" payoff, S, and the defector will get off with no jail time, known as the "temptation to defect" payoff, T.

This matrix concisely expresses the game as we have described it, where T = 0, R = 1, P = 3, and S = 5. Note that T>R>P>S. (It might be useful here to interpret the "is greater than" sign to mean "is better than," because the values actually represent negatives-years spent in jail.)

First let's consider why this is not a zero-sum game. Looking at each cell, we can tell that none of the payoffs for you and your buddy sum to zero. In fact, all of them result in some net jail time for one or both of you, although some outcomes are more favorable than others. For instance, if both of you cooperate, the total time served by the two of you will be two years, which is as close to win-win as this situation can get (after all, you did just rob a bank). If both of you defect, then the total jail time for the two of you will be six years, a lose-lose scenario that is a good deal worse than the best-case scenario. The other two scenarios result in a total of five years of jail time served between the two of you. So, if you could only agree with your buddy that both of you will keep quiet, as a team you'll be better off. The dilemma comes from the fact that neither you nor your buddy has any incentive to do this.

You have no idea whether or not your buddy is going to cooperate. Even if you have discussed a situation like this with him beforehand, you cannot be sure that he won't betray you. As a rational being, you are going to make the decision that minimizes your potential downside, or your personal maximum penalty. If you choose to cooperate with your buddy, the maximum penalty you could receive is five years, and your best-case scenario is a one-year prison sentence. However, if you choose to defect, your maximum penalty would be three years, and there is a chance that you could get off with no jail time. Your rational buddy is faced with the same set of options and the same reasoning. As a rational being, you will choose to defect and so will your buddy, and these actions result in the lose-lose scenario.

What is so interesting in the Prisoner's Dilemma is that it is an example in which the equilibrium solution is not the same as the optimal solution. The equilibrium solution, remember, is the state in which neither player has anything to gain by switching strategies as long as the other player also doesn't switch. The optimal solution is the scenario in which the greatest good, or utility, is realized. In the Prisoner's Dilemma the greatest good, on the whole, comes about when both players cooperate. This scenario is unstable, however, because both players have an incentive to switch strategy. On the other hand, if both players defect, neither has anything to gain by changing strategy if the other doesn't, so the defect-defect solution is stable. Game theorists would say that the defect strategy is strictly dominant over the cooperate strategy as long as T>R>P>S.

When versions of the Prisoner's Dilemma are posed to actual people, the results do not always match the mathematical predictions. Real people do not always act rationally, and even if they did, it is very rare that a game is ever played just once in real life. As an example, let's say that you decide to cooperate, but your buddy decides to defect. After you serve your sentence, your buddy offers to rob another bank with you to help you get back on your feet (with friends like this, who needs enemies?!). You agree, and both of you get caught again. This situation is not exactly like the first time you got caught, however, because now each of you has a reputation, a track record. Your buddy might realize that you have already cooperated once and that if you cooperate again, and he chooses to cooperate this time also, then both of you will be better off. On the other hand, you might have revenge on your mind and decide that because your buddy burned you the last time, you will retaliate this time. These kinds of considerations make the Iterated Prisoner's Dilemma more complicated than the one-shot version.

## DÉJÁ VU

• The Iterated Prisoner's Dilemma admits a wide variety of equilibrium outcomes, depending on the mix of strategies adopted by the players.
• In computer tournaments, strategies that are neither always generous nor always punitive tend to fare the best.

If the Prisoner's Dilemma is to be played over and over again, it is best that the number of times that it is to be played is not pre-determined; otherwise, everyone should just defect, as in the one-round version. The reasoning goes like this: you should always defect in your last game because there is no chance for retaliation. Knowing that your buddy will also think of this strategy, you should always defect in your second-to-last game as well. This thinking naturally extends all the way back to the first move, so everyone should just always defect. If, however, players play without knowledge of when the game will end, strategies other than "always defect" become viable, even dominant. One such alternative strategy is the random strategy, in which a player randomly cooperates or defects, with no consideration given to what has happened in previous rounds. Another strategy is retaliation: always do to your opponent what she did to you the last time.

There are many strategies, some of which are clearly better than others, and others of which are rather obscure in their efficacy. To put all of these strategies to the test, Robert Axelrod of the University of Michigan organized a tournament in the mid-1980s in which different Iterated Prisoner's Dilemma strategies competed against each other over the course of many rounds. The winning strategy was to be the one with the lowest accumulated jail time in the end.

One might suspect that "always-defecting" would still be the best strategy in such a tournament. If two players played five rounds of the always-defecting strategy, their individual scores at the end of five rounds would be 15 years. (In our score-keeping, lower scores are better).

Results of "Pure Defect" vs. "Pure Defect"

P1 STRATEGY DDDDD
P2 STRATEGY DDDDD

P1 PAYOFF = PPPPP = 3 + 3 + 3 + 3 + 3 = 15 years
P2 PAYOFF = PPPPP = 3 + 3 + 3 + 3 + 3 = 15 years

On the other hand, if two players who were "always-cooperating" played each other for five rounds, each player's accumulated score would be five years.

Results of "Pure Cooperate" vs. "Pure Cooperate"

P1 STRATEGY CCCCC
P2 STRATEGY CCCCC

P1 PAYOFF = RRRRR = 1 + 1 + 1 + 1 + 1 = 5 years
P2 PAYOFF = RRRRR = 1 + 1 + 1 + 1 + 1 = 5 years

So, there is clearly something to be gained by not defecting all the time if you can get into a repeated mutual cooperation situation, such as that shown above. The question is: how can you get into such a situation, especially when a "Pure Defect" strategy will dominate a "Pure Cooperate" strategy?

Results of "Pure Defect" vs. "Pure Cooperate"

P1 STRATEGY DDDDD
P2 STRATEGY CCCCC

P1 PAYOFF = TTTTT = 0 + 0 + 0 + 0 + 0 = 0 years
P2 PAYOFF = SSSSS = 5 + 5 + 5 + 5 + 5 = 25 years

Analysis of the strategies that fared best in Axelrod's tournament indeed provided answers to this question. Some were very complicated, based on analyzing specific sequences of prior moves to prescribe the next sequence of moves. Others were very simple, such as the Tit-For-Tat (TFT) strategy. As its name implies, TFT relies simply on doing to your opponent what he last did to you. So, if your opponent cooperates on the first turn, then you should cooperate on the second turn. This can lead to the nice "always-cooperating" cycle if two TFT players start off cooperating, while protecting the player from getting too many sucker's payoffs. However, TFT can also lead to the "always-defecting" situation, if two TFT players start off by defecting.

The best strategies were variants of Tit-For-Tat with Forgiveness (TFTWF). This strategy is basically the same as regular TFT except that some small percentage of the time, you forgive your opponent's prior defection and do not mimic it. This provides a mechanism for breaking the "always-defecting" trap.

Results of "Tit-For-Tat-with-Forgiveness" vs. "Tit-For-Tat":

P1 (TFTWF) DDCCC
P2 (TFT) DDDCC

P1 PAYOFF = PPSRR = 3 + 3 + 5 + 1 + 1 = 13 years
P2 PAYOFF = PPTRR = 3 + 3 + 0 + 1 + 1 = 8 years

Note that in this particular match, TFTWF loses to TFT. Remember, however, that the tournament consists of many matches against many different strategies. TFT will invariably get caught in "always defecting" cycles, whereas TFTWF will be able to escape these, providing an advantage over TFT in the long run.

Most of the successful strategies in Axelrod's tournament were based on some amount of altruistic behavior. It was a stunning mathematical indication that aggression and vindictiveness do not always prevail. It seems that a truly selfish strategy, in the sense that it is designed to maximize one's own benefit, must include some element of forgiveness. In fact, Axelrod found that successful strategies had four common traits, which he described anthropomorphically in this way:

• First, the strategy should be "nice." This means that it will not defect unless its opponent defects first.
• Second, the strategy should retaliate against defectors to avoid being exploited by "always defectors."
• Third, the strategy should be forgiving. After retaliating against a defection, it should begin to cooperate again as soon as its opponent cooperates.
• Fourth, the strategy should not try to score more than its opponent-it should be non-envious. This stems from the fact that the strength of cooperation lies in the reality that both parties benefit equally from it.

These traits are fascinating if not heartwarming, showing us that cooperation and altruism really do have a place in a world as starkly defined as that of the Iterated Prisoner's Dilemma. This suggests that studying games can help us to understand some of the behavioral aspects of our natural world, such as why certain types of animals live in cooperative societies and others live as solitary aggressors. This world of conflicting living strategies is characterized by the game called the "Hawks and Doves," and it is to this game that we will next turn our attention.