The Evolution of Cooperation*. *. Robert Axelrod. Professor of Political Science and Public Policy, University of Michigan, Ann. Arbor. Dr. Axelrod is a member of . The Evolution of Cooperation THE EVOLUTION OF COOPERATION Robert AxelrodBasic Books, Inc., PublishersNew York A. Accessed: 11/02/ Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at.
|Language:||English, Spanish, Portuguese|
|Genre:||Business & Career|
|Distribution:||Free* [*Registration Required]|
Five mechanisms for the evolution of cooperation: Kin selection. Direct reciprocity . Indirect reciprocity. Spatial selection. Group selection. The evolution of cooperation can refer to: the study of how cooperation can emerge and persist .. Axelrod, Robert; Hamilton, William D. (27 March ), " The Evolution of Cooperation" (PDF), Science, – File:Axelrod Robert The Evolution of osakeya.info osakeya.info (file size: MB, MIME type.
Copyright by the American Association for the Advancement of Science. The evolution of cooperation. Iiibliography: p. Games of strategy Mathematics 3. Conflict management. Consensus Social sciences 6. Social interaction. A89 '. Contents 5. How to Choose Effectively 7. The Social Structure of Cooperation 9. Should a friend keep providing favors to another friend who never reciprocates?
Should a business provide prompt service to another business that is about to be bankrupt? The interests of the players are not in total conflict. Both players can do 14 The Problem of Cooperation well by getting the reward, R, for mutual cooperation or both can do poorly by getting the punishment, P, for mutual defection. Using the assumption that the other player will always make the move you fear most will lead you to expect that the other will never cooperate, which in turn will lead you to defect, causing unending punishment.
So unlike chess, in the Prisoner's Dilemma it is not safe to assume that the other player is out to get you. In fact, in the Prisoner's Dilemma, the strategy that works best depends directly on what strategy the other player is using and, in particular, on whether this strategy leaves room for the development of mutual cooperation.
This principle is based on the weight of the next move relative to the current move being sufficiently large to make the future important. In other words, the discount parameter, w, must be large enough to make the future loom large in the calculation of total payoffs. After all, if you are unlikely to meet the other person again, or if you care little about future payoffs, then you might as well defect now and not worry about the consequences for the future.
This leads to the first formal proposition. It is the sad news that if the future is important, there is no one best strategy. Proposition 1. If the discount parameter, w, is sufficiently high, there is no best strategy independent of the strategy used by the other player. The proof itself is not hard. Suppose that the other player is using ALL D, the strategy of always defecting.
If the other player will never cooperate, the best you can do is always to defect yourself. Now suppose, on the other hand, that the other player is using a strategy of "permanent retaliation. In that case, your best strategy is never to defect, provided that the temptation to defect on the first move will eventually be more than compensated for by the long-term disadvantage of getting nothing but the punishment, P, rather than the reward, R, on future moves.
This will be true whenever the discount parameter, w, is sufficiently great. Therefore, if w is sufficiently large, there is no one best strategy. In the case of a legislature such as the U. Senate, this proposition says that if there is a large enough chance that a member of the legislature will interact again with another member, there is no one best strategy to use independently of the strategy being used by the other person. It would be best to cooperate with someone who will reciprocate that cooperation in the future, but not with someone whose future behavior will not be very much affected by this interaction see, for example, Hinckley The very possibility of achieving stable mutual cooperation depends upon there being a good chance of a continuing interaction, as measured by the magnitude of w.
As it happens, in the case of Congress, the chance of two members having a continuing interaction has increased dramatically as the biennial turnover rates have fallen from about 40 percent in the first forty years of the republic to about 20 percent or less in recent years Young , pp.
However, saying that a continuing chance of interaction is necessary for the development of cooperation is not the same as saying that it is sufficient. The demonstration that there is not a single best strategy leaves open the question 16 The Problem of Cooperation of what patterns of behavior can be expected to emerge when there actually is a sufficiently high probability of continuing interaction between two individuals.
Before going on to study the behavior that can be expected to emerge, it is a good idea to take a closer look at which features of reality the Prisoner's Dilemma framework is, and is not, able to encompass. Fortunately, the very simplicity of the framework makes it possible to avoid many restrictive assumptions that would otherwise limit the analysis: The payoffs of the players need not be comparable at all.
For example, a journalist might get rewarded with another inside story, while the cooperating bureaucrat might be rewarded with a chance to have a policy argument presented in a favorable light. The payoffs certainly do not have to be symmetric.
It is a convenience to think of the interaction as exactly equivalent from the perspective of the two players, but this is not necessary. One does not have to assume, for example, that the reward for mutual cooperation, or any of the other three payoff parameters, have the same magnitude for both players. As mentioned earlier, one does not even have to assume that they are measured in comparable units.
The only thing that has to be assumed is that, for each player, the four payoffs are ordered as required for the definition of the Prisoner's Dilemma. The payoffs of a player do not have to be measured on an absolute scale. They need only be measured relative to each other.
Cooperation need not be considered desirable from the point of view of the rest of the world. There are times when one wants to retard, rather than foster, cooperation between players. Collusive business practices are good for 17 Introduction the businesses involved but not so good for the rest of society. In fact, most forms of corruption are welcome instances of cooperation for the participants but are unwelcome to everyone else. So, on occasion, the theory will be used in reverse to show how to prevent, rather than to promote, cooperation.
There is no need to assume that the players are rational. They need not be trying to maximize their rewards. Their strategies may simply reflect standard operating procedures, rules of thumb, instincts, habits, or imitation Simon ; Cyert and March The actions that players take are not necessarily even conscious choices.
A person who sometimes returns a favor, and sometimes does not, may not think about what strategy is being used. There is no need to assume deliberate choice at all. Nations certainly take actions which can be interpreted as choices in a Prisoner's Dilemma—as in the raising or lowering of tariffs. It is not necessary to assume that such actions are rational or are the outcome of a unified actor pursuing a single goal.
On the contrary, they might well be the result of an incredibly complex bureaucratic politics involving complicated information processing and shifting political coalitions Allison Likewise, at the other extreme, an organism does not need a brain to play a game.
Bacteria, for example, are highly responsive to selected aspects of their chemical environment. They can therefore respond differentially to what other organisms are doing, and these conditional strategies of behavior can be inherited. Moreover, the behavior of a bacterium can affect the fitness of other organisms around 18 The Problem of Cooperation it, just as the behavior of other organisms can affect the fitness of a bacterium.
But biological applications are best saved for chapter 5. For now the main interest will be in people and organizations. Therefore, it is good to know that for the sake of generality, it is not necessary to assume very much about how deliberate and insightful people are.
Nor is it necessary to assume, as the sociobiologists do, that important aspects of human behavior are guided by one's genes. The approach here is strategic rather than genetic. Of course, the abstract formulation of the problem of cooperation as a Prisoner's Dilemma puts aside many vital features that make any actual interaction unique. Examples of what is left out by this formal abstraction include the possibility of verbal communication, the direct influence of third parties, the problems of implementing a choice, and the uncertainty about what the other player actually did on the preceding move.
In chapter 8 some of these complicating factors are added to the basic model. It is clear that the list of potentially relevant factors that have been left out could be extended almost indefinitely. Certainly, no intelligent person should make an important choice without trying to take such complicating factors into account. The value of an analysis without them is that it can help to clarify some of the subtle features of the interaction—features which might otherwise be lost in the maze of complexities of the highly particular circumstances in which choice must actually be made.
It is the very complexity of reality which makes the analysis of an abstract interaction so helpful as an aid to understanding. The next chapter explores the emergence of cooperation through a study of what is a good strategy to employ if 19 Introduction confronted with an iterated Prisoner's Dilemma.
This exploration has been done in a novel way, with a computer tournament. Professional game theorists were invited to submit their favorite strategy, and each of these decision rules was paired off with each of the others to see which would do best overall. Amazingly enough, the winner was the simplest of all strategies submitted. A second round of the tournament was conducted in which many more entries were submitted by amateurs and professionals alike, all of whom were aware of the results of the first round.
The analysis of the data from these tournaments reveals four properties which tend to make a decision rule successful: These results from the tournaments demonstrate that under suitable conditions, cooperation can indeed emerge in a world of egoists without central authority.
To see just how widely these results apply, a theoretical approach is taken in chapter 3. A series of propositions are proved that not only demonstrate the requirements for the emergence of cooperation but also provide the chronological story of the evolution of cooperation.
Here is the argument in a nutshell. The evolution of cooperation requires that individuals have a sufficiently large chance to meet again so that they have a stake in their future interaction. If this is true, cooperation can evolve in three stages. The beginning of the story is that cooperation can get started even in a world of unconditional defection.
The development cannot take place if it is tried only by scattered individuals who have virtually no chance to interact with each other. However, cooperation can evolve from small clusters of individuals who base their cooperation on reciprocity and have even a small proportion of their interactions with each other. The middle of the story is that a strategy based on reciprocity can thrive in a world where many different kinds of strategies are being tried.
The end of the story is that cooperation, once established on the basis of reciprocity, can protect itself from invasion by less cooperative strategies. Thus, the gear wheels of social evolution have a ratchet.
Chapters 4 and 5 take concrete settings to demonstrate just how widely these results apply. Chapter 4 is devoted to the fascinating case of the "live and let live" system which emerged during the trench warfare of World War I.
In the midst of this bitter conflict, the front-line soldiers often refrained from shooting to kill—provided their restraint was reciprocated by the soldiers on the other side. What made this mutual restraint possible was the static nature of trench warfare, where the same small units faced each other for extended periods of time. The soldiers of these opposing small units actually violated orders from their own high commands in order to achieve tacit cooperation with each other. A detailed look at this case shows that when the conditions are present for the emergence of cooperation, cooperation can get started and prove stable in situations which otherwise appear extraordinarily unpromising.
In particular, the "live and let live" system demonstrates that friendship is hardly necessary for the development of coop21 Introduction eration.
Under suitable conditions, cooperation based upon reciprocity can develop even between antagonists. Chapter 5, written with evolutionary biologist William D. Hamilton, demonstrates that cooperation can emerge even without foresight. This is done by showing that Cooperation Theory can account for the patterns of behavior found in a wide range of biological systems, from bacteria to birds.
Cooperation in biological systems can occur even when the participants are not related, and even when they are-unable to appreciate the consequences of their own behavior. What makes this possible are the evolutionary mechanisms of genetics and survival of the fittest. An individual able to achieve a beneficial response from another is more likely to have offspring that survive and that continue the pattern of behavior which elicited beneficial responses from others.
Thus, under suitable conditions, cooperation based upon reciprocity proves stable in the biological world. Potential applications are spelled out for specific aspects of territoriality, mating, and disease. The conclusion is that Darwin's emphasis on individual advantage can, in fact, account for the presence of cooperation between individuals of the same or even different species. As long as the proper conditions are present, cooperation can get started, thrive, and prove stable.
While foresight is not necessary for the evolution of cooperation, it can certainly be helpful. Therefore chapters 6 and 7 are devoted to offering advice to participants and reformers, respectively. Chapter 6 spells out the implications of Cooperation Theory for anyone who is in a Prisoner's Dilemma. From the participant's point of view, the object is to do as well as possible, regardless of how well the other player does.
Based upon the tournament results and the formal propositions, four simple suggestions are 22 The Problem of Cooperation offered for individual choice: Understanding the perspective of a participant can also serve as the foundation for seeing what can be done to make it easier for cooperation to develop among egoists.
Thus, chapter 7 takes the Olympian perspective of a reformer who wants to alter the very terms of the interactions so as to promote the emergence of cooperation. A wide variety of methods are considered, such as making the interactions between the players more durable and frequent, teaching the participants to care about each other, and teaching them to understand the value of reciprocity.
This reformer's perspective provides insights into a wide variety of topics, from the strength of bureaucracy to the difficulties of Gypsies, and from the morality of TIT FOR TAT to the art of writing treaties. Chapter 8 extends the implications of Cooperation Theory into new domains.
It shows how different kinds of social structure affect the way cooperation can develop. For example, people often relate to each other in ways that are influenced by observable features, such as sex, age, skin color, and style of dress.
These cues can lead to social structures based on stereotyping and status hierarchies. As another example of social structure, the role of reputation is considered. The struggle to establish and maintain one's reputation can be a major feature of intense conflicts.
For example, the American government's escalation of the war in Vietnam in was mainly due to its desire to deter other challenges to its interests by maintaining its reputation on the world stage. This chapter also considers a government's concern for maintaining its reputation with its own citizens. To be effective, a government cannot enforce 23 Introduction any standards it chooses but must elicit compliance from a majority of the governed.
To do this requires setting the rules so that most of the governed find it profitable to obey most of the time. The implications of this approach are fundamental to the operation of authority, and are illustrated by the regulation of industrial pollution and the supervision of divorce settlements. By the final chapter, the discussion has developed from the study of the emergence of cooperation among egoists without central authority to an analysis of what happens when people actually do care about each other and what happens when there is central authority.
But the basic approach is always the same: This approach allows more than the understanding of the perspective of a single player. It also provides an appreciation of what it takes to promote the stability of mutual cooperation in a given setting. The most promising finding is that if the facts of Cooperation Theory are known by participants with foresight, the evolution of cooperation can be speeded up.
However, the proposition of the previous chapter demonstrates that there is no one best strategy to use.
What is best depends in part on what the other player is likely to be doing. Further, what the other is likely to be doing may well depend on what the player expects you to do.
To get out of this tangle, help can be sought by combing the research already done concerning the Prisoner's Dilemma for useful advice. Fortunately, a great deal of research has been done in this area. Psychologists using experimental subjects have found 27 The Emergence of Cooperation that, in the iterated Prisoner's Dilemma, the amount of cooperation attained—and the specific pattern for attaining it—depend on a wide variety of factors relating to the context of the game, the attributes of the individual players, and the relationship between the players.
Since behavior in the game reflects so many important factors about people, it has become a standard way to explore questions in social psychology, from the effects of westernization in Central Africa Bethlehem to the existence or nonexistence of aggression in career-oriented women Baefsky and Berger , and to the differential consequences of abstract versus concrete thinking styles Nydegger In the last fifteen years, there have been hundreds of articles on the Prisoner's Dilemma cited in Psychological Abstracts.
The iterated Prisoner's Dilemma has become the E. Just as important as its use as an experimental test bed is the use of the Prisoner's Dilemma as the conceptual foundation for models of important social processes. Richardson's model of the arms race is based on an interaction which is essentially a Prisoner's Dilemma, played once a year with the budgets of the competing nations Richardson ; Zinnes , pp.
Oligopolistic competition can also be modeled as a Prisoner's Dilemma Samuelson , pp. The ubiquitous problems of collective action to produce a collective good are analyzable as Prisoner's Dilemmas with many players G.
Hardin Even vote trading has been modeled as a Prisoner's Dilemma Riker and Brams In fact, many of the best-developed models of important political, social, and economic processes have the Prisoner's Dilemma as their foundation. There is yet a third literature about the Prisoner's Dilem28 Computer Tournaments ma. This literature goes beyond the empirical questions of the laboratory or the real world, and instead uses the abstract game to analyze the features of some fundamental strategic issues, such as the meaning of rationality Luce and Raiffa , choices which affect other people Schelling , and cooperation without enforcement Taylor Unfortunately, none of these three literatures on the Prisoner's Dilemma reveals very much about how to play the game well.
The experimental literature is not much help, because virtually all of it is based on analyzing the choices made by players who are seeing the formal game for the first time. Their appreciation of the strategic subtleties is bound to be restricted. Although the experimental subjects may have plenty of experience with everyday occurrences of the Prisoner's Dilemma, their ability to call on this experience in a formal setting may be limited.
The choices of experienced economic and political elites in natural settings are studied in some of the applied literature of Prisoner's Dilemma, but the evidence is of limited help because of the relatively slow pace of most high-level interactions and the difficulty of controlling for changing circumstances. All together, no more than a few dozen choices have been identified and analyzed this way. Finally, the abstract literature of strategic interaction usually studies variants of the iterated Prisoner's Dilemma designed to eliminate the dilemma itself by introducing changes in the game, such as allowing interdependent choices Howard ; Rapoport , or putting a tax on defection Tideman and Tullock ; Clarke To learn more about how to choose effectively in an iterated Prisoner's Dilemma, a new approach is needed.
Such an approach would have to draw on people who have 29 The Emergence of Cooperation a rich understanding of the strategic possibilities inherent in a non-zero-sum setting, a situation in which the interests of the participants partially coincide and partially conflict.
Two important facts about non-zero-sum settings would have to be taken into account. First, the proposition of the previous chapter demonstrates that what is effective depends not only upon the characteristics of a particular strategy, but also upon the nature of the other strategies with which it must interact. The second point follows directly from the first. An effective strategy must be able at any point to take into account the history of the interaction as it has developed so far.
A computer tournament for the study of effective choice in the iterated Prisoner's Dilemma meets these needs. In a computer tournament, each entrant writes a program that embodies a rule to select the cooperative or noncooperative choice on each move. The program has available to it the history of the game so far, and may use this history in making a choice.
If the participants are recruited primarily from those who are familiar with the Prisoner's Dilemma, the entrants can be assured that their decision rule will be facing rules of other informed entrants. Such recruitment would also guarantee that the state of the art is represented in the tournament.
Wanting to find out what would happen, I invited professional game theorists to send in entries to just such a computer tournament. It was structured as a round robin, meaning that each entry was paired with each other entry. As announced in the rules of the tournament, each entry was also paired with its own twin and with RANDOM, a program that randomly cooperates and defects with equal probability. Each game consisted of exactly two hundred moves.
It awarded both players 3 points for mutual cooperation, and 1 point for mutual defection. If one player defected while the other player cooperated, the defecting player received 5 points and the cooperating player received 0 points. No entry was disqualified for exceeding the allotted time. In fact, the entire round robin tournament was run five times to get a more stable estimate of the scores for each pair of players.
In all, there were , moves, making for , separate choices. The fourteen submitted entries came from five disciplines: Appendix A lists the names and affiliations of the people who submitted these entries, and it gives the rank and score of their entries. One remarkable aspect of the tournament was that it allowed people from different disciplines to interact with each other in a common format and language. Most of the entrants were recruited from those who had published articles on game theory in general or the Prisoner's Dilemma in particular.
This was the simplest of all submitted programs and it turned out to be the best! This decision rule is probably the most widely known and most discussed rule for playing the Prisoner's Dilemma.
It is easily understood and easily programmed. It is known to elicit a good degree of cooperation when played with humans Oskamp ; W. Wilson As an entry in a computer tournament, it has the 31 The Emergence of Cooperation desirable properties that it is not very exploitable and that it does well with its own twin. It has the disadvantage that it is too generous with the RANDOM rule, which was known by the participants to be entered in the tournament. All of these facts were known to most of the people designing programs for the Computer Prisoner's Dilemma Tournament, because they were sent copies of a description of the preliminary tournament.
This result contrasts with computer chess tournaments, where complexity is obviously needed. But his modification, like the others, just lowered the performance of the decision rule. Analysis of the results showed that neither the discipline of the author, the brevity of the program—nor its length— accounts for a rule's relative success. What does? Before answering this question, a remark on the interpretation of numerical scores is in order.
In a game of moves, a useful benchmark for very good performance is 32 Computer Tournaments points, which is equivalent to the score attained by a player when both sides always cooperate with each other. A useful benchmark for very poor performance is points, which is equivalent to the score attained by a player when both sides never cooperate with each other.
Most scores range between and points, although scores from 0 to points are possible. Surprisingly, there is a single property which distinguishes the relatively high-scoring entries from the relatively low-scoring entries.
This is the property of being nice, which is to say never being the first to defect. For the sake of analyzing this tournament, the definition of a nice rule will be relaxed to include rules which will not be the first to defect before the last few moves, say before move Each of the eight top-ranking entries or rules is nice.
None of the other entries is. There is even a substantial gap in the score between the nice entries and the others. The nice entries received tournament averages between and , while the best of the entries that were not nice received only points. Thus, not being the first to defect, at least until virtually the end of the game, was a property which, all by itself, separated the more successful rules from the less successful rules in this Computer Prisoner's Dilemma Tournament.
Each of the nice rules got about points with each of the other seven nice rules and with its own twin. This is because when two nice rules play, they are sure to cooperate with each other until virtually the end of the game. Actually the minor variations in end-game tactics did not account for much variation in the scores. Since the nice rules all got within a few points of 33 The Emergence of Cooperation with each other, the thing that distinguished the relative rankings among the nice rules was their scores with the rules which are not nice.
This much is obvious. What is not obvious is that the relative ranking of the eight top rules was largely determined by just two of the other seven rules. These two rules are kingmakers because they do not do very well for themselves, but they largely determine the rankings among the top contenders.
The most important kingmaker was based on an "outcome maximization" principle originally developed as a possible interpretation of what human subjects do in the Prisoner's Dilemma laboratory experiments Downing It is well worth studying as an example of a decision rule which is based upon a quite sophisticated idea.
Instead it is based on a deliberate attempt to understand the other player and then to make the choice that will yield the best long-term score based upon this understanding.
For each move, it updates its estimate of these two conditional probabilities and then selects the choice which will maximize its own long-term payoff under the assumption that it has correctly modeled the other player.
It assumes that they are both. This is a fairly sophisticated decision rule, but its implementation does have one flaw. The nice rules did well in the tournament largely because they did so well with each other, and because there were enough of them to raise substantially each other's average score. As long as the other player did not defect, each of the nice rules was certain to continue cooperating until virtually the end of the game.
But what happened if there was a defection? Different rules responded quite dif35 The Emergence of Cooperation ferently, and their response was important in determining their overall success. A key concept in this regard is the forgiveness of a decision rule. Forgiveness of a rule can be informally described as its propensity to cooperate in the moves after the other player has defected.
After one punishment, it lets bygones be bygones. One of the main reasons why the rules that are not nice did not do well in the tournament is that most of the rules in the tournament were not very forgiving.
A concrete illustration will help. Consider the case of JOSS, a sneaky rule that tries to get away with an occasional defection. But instead of always cooperating after the other player cooperates, 10 percent of the time it defects after the other player cooperates. Thus it tries to sneak in an occasional exploitation of the other player. At first both players cooperated, but on the sixth move JOSS selected one of its probabilistic defections. On the twenty-fifth move, JOSS selected another of its probabilistic defections.
This echo had JOSS defecting on the odd numbered moves. Together these two echoes resulted in both players defecting on every move after move This string of mutual defections meant that for the rest of the game they both got only one point per turn. A major lesson of this tournament is the importance of minimizing echo effects in an environment of mutual power. When a single defection can set off a long string of recriminations and counterrecriminations, both sides suffer. A sophisticated analysis of choice must go at least three levels deep to take account of these echo effects.
The first level of analysis is the direct effect of a choice. This is easy, since a defection always earns more than a cooperation. The second level considers the indirect effects, taking into account that the other side may or may not punish a defection. This much of the analysis was certainly appreciated by many of the entrants. But the third level goes deeper and takes into account the fact that in responding to the defections of the other side, one may be repeating or even amplifying one's own previous exploitative choice.
Thus a single defection may be successful when analyzed for its direct effects, and perhaps even when its secondary effects are taken into account. But the real costs may be in the tertiary effects when one's own isolated defections turn into unending mutual recriminations. Without their realizing it, many of these rules actually wound up punishing themselves.
With the other player serving as a mechanism to delay the self-punishment by a few moves, this aspect of self-punishment was not picked up by many of the decision rules. The existence of these rules should serve as a warning against the facile belief that an eye for an eye is necessarily the best strategy. There are at least three rules that would have won the tournament if submitted. The sample program sent to prospective contestants to show them how to make a submission would in fact have won the tournament if anyone had simply clipped it and mailed it in!
But no one did. The sample program defects only if the other player defected on the previous two moves. The implication of this finding is striking, since it suggests that even expert strategists do not give sufficient weight to the importance of forgiveness. Another rule which would have won the tournament was also available to most of the contestants.
This was the rule which won the preliminary tournament, a report of which was used in recruiting the contestants. It is interesting that artificial intelligence techniques could have inspired a rule which was in fact better than any of the rules designed by game theorists specifically for the Prisoner's Dilemma. If DOWNING had started with initial assumptions that the other players would be responsive rather than unresponsive, it too would 39 The Emergence of Cooperation have won and won by a large margin.
A kingmaker could have been king. It turned out that optimism about their responsiveness would not only have been more accurate but would also have led to more successful performance. It would have resulted in first place rather than tenth place. In the first place, many of them defected early in the game without provocation, a characteristic which was very costly in the long run.
In the second place, the optimal amount of forgiveness was considerably greater than displayed by any of the entries except possibly DOWNING. And in the third place, the entry that was most different from the others, DOWNING, floundered on its own misplaced pessimism regarding the initial responsiveness of the others.
The analysis of the tournament results indicate that there is a lot to be learned about coping in an environment of mutual power. Even expert strategists from political science, sociology, economics, psychology, and mathematics made the systematic errors of being too competitive for their own good, not being forgiving enough, and being too pessimistic about the responsiveness of the other side. The effectiveness of a particular strategy depends not only on its own characteristics, but also on the nature of the other strategies with which it must interact.
For this reason, the results of a single tournament are not definitive. Therefore, a second round of the tournament was conducted. The results of the second round provide substantially 40 Computer Tournaments better grounds for insight into the nature of effective choice in the Prisoner's Dilemma. The reason is that the entrants to the second round were all given the detailed analysis of the first round, including a discussion of the supplemental rules that would have done very well in the environment of the first round.
Thus they were aware not only of the outcome of the first round, but also of the concepts used to analyze success, and the strategic pitfalls that were discovered. Moreover, they each knew that the others knew these things.
Therefore, the second round presumably began at a much higher level of sophistication than the first round, and its results could be expected to be that much more valuable as a guide to effective choice in the Prisoner's Dilemma. The second round was also a dramatic improvement over the first round in sheer size of the tournament. The response was far greater than anticipated. There was a total of sixty-two entries from six countries. The contestants were largely recruited through announcements in journals for users of small computers.
The game theorists who participated in the first round of the tournament were also invited to try again. The contestants ranged from a ten-year-old computer hobbyist to professors of computer science, physics, economics, psychology, mathematics, sociology, political science, and evolutionary biology.
The second round provided a chance both to test the validity of the themes developed in the analysis of the first round and to develop new concepts to explain successes and failures. The entrants also drew their own lessons from the experience of the first round.
But different people drew 41 The Emergence of Cooperation different lessons.
What is particularly illuminating in the second round is the way the entries based on different lessons actually interact. It was the simplest submission in the second round, and it won the second round. This decision rule was known to all of the entrants to the second round because they all had the report of the earlier round, showing that TIT FOR TAT was the most successful rule so far.
They had read the arguments about how it was known to elicit a good degree of cooperation when played with humans, how it is not very exploitable, how it did well in the preliminary tournament, and how it won the first round. The report on the first round also explained some of the reasons for its success, pointing in particular to its property of never being the first to defect "niceness" and its propensity to cooperate after the other player defected "forgiveness" with the exception of a single punishment.
This was Anatol Rapoport, who submitted it the first time. The second round of the tournament was conducted in the same manner as the first round, except that minor endgame effects were eliminated. As announced in the rules, the length of the games was determined probabilistically with a 0.
Since no one knew 42 Computer Tournaments exactly when the last move would come, end-game effects were successfully avoided in the second round. Once again, none of the personal attributes of the contestants correlated significantly with the performance of the rules. The professors did not do significantly better than the others, nor did the Americans. The names of the contestants are shown in the order of their success in appendix A along with some information about them and their programs. But on the other hand, neither did long programs with their greater complexity do any better than short programs.
The determination of what does account for success in the second round is not easy because there were ways the 63 rules including RANDOM were paired in the round robin tournament. This very large tournament score matrix is given in Appendix A along with information about the entrants and their programs.
In all, there were over a million moves in the second round. As in the first round, it paid to be nice. Being the first to defect was usually quite costly.
More than half of the entries were nice, so obviously most of the contestants got the message from the first round that it did not pay to be the first to defect. In the second round, there was once again a substantial correlation between whether a rule was nice and how well 43 The Emergence of Cooperation it did.
Of the top fifteen rules, all but one were nice and that one ranked eighth. Of the bottom fifteen rules, all but one were not nice. The overall correlation between whether a rule was nice and its tournament score was a substantial.
A property that distinguishes well among the nice rules themselves is how promptly and how reliably they responded to a challenge by the other player. A rule can be called retaliatory if it immediately defects after an "uncalled for" defection from the other.
Exactly what is meant by "uncalled for" is not precisely determined. The point, however, is that unless a strategy is incited to an immediate response by a challenge from the other player, the other player may simply take more and more frequent advantage of such an easygoing strategy.
There were a number of rules in the second round of the tournament that deliberately used controlled numbers of defections to see what they could get away with. To a large extent, what determined the actual rankings of the nice rules was how well they were able to cope with these challengers. It is designed to look for softies, but is prepared to back off if the other player shows it won't be exploited. The rule is unusual in that it defects on the very first move in order to test the other's response.
If the other player ever defects, it apologizes by cooperating and playing tit-for-tat for the rest of the game. Otherwise, it cooperates on the second and third moves but defects every other move after that. TESTER did a good job of exploiting several supplementary rules that would have 44 Computer Tournaments done quite well in the environment of the first round of the tournament.
It did, however, provide low scores for some of the more easygoing rules. As another example of how TESTER causes problems for some rules which had done well in the first round, consider the three variants of Leslie Downing's outcome maximization principle. These came from Stanley F. Quayle and Leslie Downing himself. A slightly modified version came from a youthful competitor, eleven-year-old Steve Newman. However, all three were exploited by TESTER since they all calculated that the best thing to do with a program that cooperated just over half the time after one's own cooperation was to keep on cooperating.
It first seeks to establish a mutually rewarding relationship with the other player, and only then does it cautiously try to see if it will be allowed to get away with something. The rule normally cooperates but is ready to defect if the other player defects too often. Thus the rule tends to cooperate for the first dozen or two dozen moves if the other player is cooperating. Only then does it throw in an unprovoked defection. By waiting until a pattern of mutual cooperation has been developed, it hopes to lull the other side into being forgiving of occasional defections.
If the other player continues to cooperate, then defections become more frequent. It tries to avoid pressing its luck too far. So while it pays to be nice, it also pays to be retaliatory. It is nice, forgiving, and retaliatory. It is never the first to defect; it forgives an isolated defection after a single response; but it is always incited by a defection no matter how good the interaction has been so far.
The lessons of the first round of the tournament affected the environment of the second round, since the contestants were familiar with the results. The report on the first round of the Computer Prisoner's Dilemma Tournament Axelrod a concluded that it paid to be not only nice but also forgiving. Of the sixtytwo entries, thirty-nine were nice, and nearly all of them were at least somewhat forgiving.
But it came in only twenty-fourth. But in the second round, it was in the bottom half of the tournament. What seems to have happened is an interesting interaction between people who drew one lesson and people who drew another from the first round.
Lesson One was: But the people who drew Lesson Two did not themselves do very well either. The reason is that in trying to exploit other rules, they often eventually got punished enough to make the whole game less rewarding for both players than pure mutual cooperation would have been.
None of the other entries that tried to apply the exploitative conclusion of Lesson Two ranked near the top either. While the use of Lesson Two tended to invalidate Lesson One, no entrants were able to benefit more than they were hurt in the tournament by their attempt to exploit the easygoing rules. Would the results of the second round have been much different if the distribution of entries had been substantially different? That is to say, is it robust?
A good way to examine this question is to construct a series of hypothetical tournaments, each with a very different distribution of the types of rules participating. The method of constructing these drastically modified tournaments is explained in appendix A. Another way to examine the robustness of the results is to construct a whole sequence of hypothetical future rounds of the tournament.
Some of the rules were so unsuccessful that they would be unlikely to be tried again in future tournaments, while others were successful enough that their continued presence in later tournaments would be likely. For this reason, it would be helpful to analyze what would happen over a series of tournaments if the more successful rules became a larger part of the environment for each rule, and the less successful rules were met less often.
This analysis would be a strong test of a rule's performance, because continued success would require a rule to do well with other successful rules. Imagine that there are many animals of a single species which interact with each other quite often.
Suppose the interactions take the form of a Prisoner's Dilemma. When two animals meet, they can cooperate with each other, not cooperate with each other, or one animal could exploit the other. Suppose further that each animal can recognize individuals it has already interacted with and can remember salient aspects of their interaction, such as whether the other has usually cooperated.
A round of the tournament can then be regarded as a simulation of a single generation of such animals, with each decision rule being employed by large numbers of individuals. One convenient implication of this interpretation is that a given animal can interact with another animal using its own decision rule, just as it can run into an animal using some other rule.
The value of this analogy is that it allows a simulation of future generations of a tournament. The idea is that the more successful entries are more likely to be submitted in the next round, and the less successful entries are less likely to be submitted again.
To make this precise, we can say that the number of copies or offspring of a given entry will be proportional to that entry's tournament score. We simply have to interpret the average payoff received by an individual as proportional to the individual's expected number of offspring.
For example, if one rule gets twice as high a tournament score in the initial round as another rule, then it will be twice as well-represented in the next round. One possibility is that a player will try different strategies over time, and then stick with what seems to work best.
Another possibility is that a person using a rule sees that other strategies are more successful and therefore switches to one of those strategies. Still another possibility is that a person occupying a key role, such as a member of Congress or the manager of a business, would be removed from that role if the strategy being followed was not very successful.
Thus, learning, imitation, and selection can all operate in human affairs to produce a process which makes relatively unsuccessful strategies less likely to appear later. The simulation of this process for the Prisoner's Dilemma tournament is actually quite straightforward. The tournament matrix gives the score each strategy gets with each of the other strategies. Starting with the proportions of each type in a given generation, it is only necessary to calculate the proportions which will exist in the next generation.
The results provide an interesting story. The first thing that happens is that the lowest-ranking eleven entries fall to half their initial size by the fifth generation while the middle-ranking entries tend to hold their own and the topranking entries slowly grow in size. By the fiftieth generation, the rules that ranked in the bottom third of the tournament have virtually disappeared, while most of those in the middle third have started to shrink, and those in the top third are continuing to grow see figure 2.
This process simulates survival of the fittest. At first, a rule that is successful with all sorts of rules will proliferate, but later as the unsuccessful rules disappear, success requires good performance with other successful rules. This simulation provides an ecological perspective because there are no new rules of behavior introduced. It differs from an evolutionary perspective, which would allow mutations to introduce new strategies into the environment. In the ecological perspective there is a changing distribution of given types of rules.
The less successful rules become less common and the more successful rules proliferate. The statistical distribution of types of individuals changes in each generation, and this changes the environ51 The Emergence of Cooperation ment with which each of the individual types has to interact. At first, poor programs and good programs are represented in equal proportions.
But as time passes, the poorer ones begin to drop out and the good ones thrive. Success breeds more success, provided that the success derives from interactions with other successful rules.
If, on the other hand, a decision rule's success derives from its ability to exploit other rules, then as these exploited rules die out, the exploiter's base of support becomes eroded and the exploiter suffers a similar fate.
By the two hundredth generation or so, things began to take a noticeable turn. The ecological analysis shows that doing well with rules that do not score well themselves is eventually a selfdefeating process. Not being nice may look promising at first, but in the long run it can destroy the very environment it needs for its own success. By the one-thousandth generation it was the most successful rule and still growing at a faster rate than any other rule.
It also achieved the highest score in five of the six hypothetical tournaments which were constructed by magnifying the effects of different types of rules from the second round. And in the sixth hypothetical tournament it came in second. Added to its victory in the first round of the tournament, and its fairly good performance in laboratory experiments with human subjects, TIT FOR TAT is clearly a very successful strategy.
Proposition 1 says that there is no absolutely best rule independent of the environment. Part of its success might be that other rules anticipate its presence and are designed to do well with it. While such exploitation is occasionally fruitful, over a wide range of environments the problems with trying to exploit others are manifold.
In the first place, if a rule defects to see what it can get away with, it risks retaliation from the rules that are provocable. In the second place, once mutual recriminations set in, it can be difficult to extract oneself. Being able to exploit the exploitable without paying too high a cost with the others is a task which was not successfully accomplished by any of the entries in round two of the tournament.
Its niceness prevents it from getting into unnecessary trouble. Its retaliation discourages the other side from persisting whenever defection is tried. Its forgiveness helps restore mutual cooperation. And its clarity makes it intelligible to the other player, thereby eliciting long-term cooperation. Moreover, the ecological analysis which simulated future rounds of the tournament suggested that TIT FOR TAT would continue to thrive, and that eventually it might be used by virtually everyone.
What would happen then? Suppose that everyone came to be using the same strategy. Would there be any reason for someone to use a different strategy, or would the popular strategy remain the choice of all? A very useful approach to this question has been developed by an evolutionary biologist, John Maynard Smith 55 The Emergence of Cooperation and This approach imagines the existence of a whole population of individuals employing a certain strategy, and a single mutant individual employing a different strategy.
The mutant strategy is said to invade the population if the mutant can get a higher payoff than the typical member of the population gets. Put in other terms, the whole population can be imagined to be using a single strategy, while a single individual enters the population with a new strategy. The newcomer will then be interacting only with individuals using the native strategy.
Moreover, a native will almost certainly be interacting with another native since the single newcomer is a negligible part of the population. Therefore a new strategy is said to invade a native strategy if the newcomer gets a higher score with a native than a native gets with another native.
Since the natives are virtually the entire population, the concept of invasion is equivalent to the single mutant individual being able to do better than the population average. This leads directly to the key concept of the evolutionary approach.
A strategy is collectively stable if no strategy can invade it. All mutations are possible; and if any could invade a given population, this mutation presumably would have the chance to do so. For this reason, only a collectively stable strategy is expected to be able to maintain itself in the long-run equilibrium as the strategy used by all. Biological applications will be discussed in chapter 5, but for now the point is that collectively stable strategies are important because they are the only ones that an entire population can maintain in the long run in the face of any possible mutant.
The motivation of applying collective stability to the 56 The Chronology of Cooperation analysis of people's behavior is to discover which kinds of strategies can be maintained by a group in the face of any possible alternative strategy. If a successful alternative strategy exists, it may be found by the "mutant" individual through conscious deliberation, or through trial and error, or through just plain luck. If everyone is using a given strategy and some other strategy can do better in the environment of the current population, then someone is sure to find this better strategy sooner or later.
Thus only a strategy that cannot be invaded can maintain itself as the strategy used by all. A warning is in order about this definition of a collectively stable strategy. It assumes that the individuals who are trying out novel strategies do not interact too much with one another. A difficulty with this concept of collective stability when applied to the iterated Prisoner's Dilemma is that it can be very hard actually to determine which strategies have it and which do not.
Others have dealt with this difficulty by restricting the analysis to situations where the strategies are particularly simple, or by considering only some arbitrarily limited set of strategies. The characterization is given in Appendix B.