The Evolution of Reciprocity: Social Types or ... - Jean-Baptiste André

Intuitively, reciprocity should evolve if past behavior conveys infor- mation about future behavior. ..... the individual's health, corpulence, existing social bounds, social background, wealth ..... and related screening processes. Harvard University ...
3MB taille 2 téléchargements 126 vues
vol. 175, no. 2

the american naturalist

!

february 2010

The Evolution of Reciprocity: Social Types or Social Incentives? Jean-Baptiste Andre´* Ecologie et Evolution, Unite´ Mixte de Recherche 7625, Centre National de la Recherche Scientifique, Universite´ Pierre et Marie Curie, Baˆtiment A, 7e`me e´tage, Case 237, 7 Quai Saint Bernard, F-75252 Paris Cedex 05, France Submitted May 26, 2009; Accepted October 14, 2009; Electronically published December 16, 2009 Online enhancement: appendix.

abstract: The vast majority of human beings regularly engage in reciprocal cooperation with nonrelated conspecifics, and yet the current evolutionary understanding of these behaviors is insufficient. Intuitively, reciprocity should evolve if past behavior conveys information about future behavior. But it is not straightforward to understand why this should be an outcome of evolution. Most evolutionary models assume that individuals’ past behavior informs others about their stable social type (defector, cooperator, reciprocator, etc.), which makes it sensible to reciprocate. In this article, after describing the central source of difficulty in the evolutionary understanding of reciprocity, I put forward an alternative explanation based on a work by O. Leimar. It consists of taking into account the fact that the payoffs to individuals in social interactions can change through time. This offers a solution because individuals’ past behavior then signals their payoffs, which also makes it sensible to reciprocate. Even though the overwhelming majority of evolutionary models implicitly endorse the social types mechanism, I argue that the social incentives mechanism may underlie reciprocity in humans. Keywords: cooperation, signaling, human behavior, payoffs variability.

Introduction The vast majority of human beings regularly express cooperative behaviors toward nonrelated conspecifics. Reciprocity and other mechanisms of social feedback are involved in these behaviors. This article will deal with the simplest possible social feedback: the repeated exchange of cooperative behaviors between two individuals, called direct reciprocity. The importance of direct reciprocity is still unclear in nonhuman animals (see Hammerstein 2003; Stevens and Hauser 2004; but see also Mitani 2005; Krams et al. 2008; Melis et al. 2008), but it is beyond doubt in humans, where it plays a role in many aspects of social life, from durable intimate relationships to brief economic interactions. Yet despite its manifest ubiquity in human behavior, the current evolutionary understanding of direct reciprocity is insufficient. * E-mail: [email protected]. Am. Nat. 2010. Vol. 175, pp. 000–000. ! 2009 by The University of Chicago. 0003-0147/2010/17502-51299$15.00. All rights reserved. DOI: 10.1086/649597

Instead of explaining how direct reciprocity may have been shaped by evolution, numerous evolutionary analyses in fact show that it is unlikely to be a stable outcome (Boyd and Lorberbaum 1987; Farrell and Ware 1989; Lorberbaum 1994; see also Bendor and Swistak 1995). This results from a fundamental lack of information about a partner’s future behavior. To illustrate, consider the following argument. Suppose that in a given round of a pairwise interaction, one of the two players defects. Why should the other player reply with defection? After all, her partner could well have defected only once and go back to reciprocation afterward, in which case cooperation would still be the best option. Conversely, suppose that one of the two players cooperates in a given round. Why should the other cooperate in the following? Cooperation pays only if one’s partner reciprocates after it and not if she has cooperated in the past. There is no obvious reason for an individual’s past behavior to convey any specific information that helps predict his or her future behavior. In this article, I will first illustrate the evolutionary difficulty with reciprocity in a more formal manner. I will then present the assumptions that evolutionary models classically have to make, at least implicitly, in order to solve this difficulty. Most importantly, I will show that the majority of these models restrict a priori the space of possible strategies, in such a way that past behaviors always convey some reliable information about future behaviors, by assuming that individuals are characterized by a stable social type (cooperator, defector, reciprocator, etc.). I will then propose an alternative solution based on a model by Leimar (1997b). I will show that as soon as some payoff variability is introduced, reciprocity becomes a stable endpoint of evolution under assumptions that are less constraining than the social types approach. In the “Discussion,” I will then argue in favor of this solution, which leads to a novel understanding of reciprocity, and I will discuss some of its consequences. The Puzzle of Reciprocity and Its Traditional Solutions Here, the evolutionary difficulty with reciprocity is presented, together with the usual assumptions that are made

000

The American Naturalist

to solve it. This will illustrate why an alternative solution is necessary. Consider an alternate indefinitely repeated prisoner’s dilemma (IRPD) between two players. Each player makes a move on every other round, and at each move she can choose to either cooperate, with cost c to herself and benefit b 1 c for her partner, or defect, with neither cost nor benefit. Following the standard terminology (see, e.g., Aumann and Maschler 1995, p. 73), a history at time t for the interaction between players i and j is a sequence of past actions by both players from the beginning of the interaction until t. A strategy consists of a collection of rules stipulating the action(s) one carries out for each possible history. For instance, unconditional cooperation (AllC) stipulates cooperation for every history, tit for tat (TFT) stipulates defection for histories that include a defection by one’s partner in the last round and cooperation for all other histories, and so on. In general, strategies need not be consistent in the sense that individuals can reciprocate at time t, defect later, cooperate afterward, and so on. Unreached Histories and Twin Strategies Consider a population fixed with a strategy called the resident. Define a deviant as a strategy differing from the resident for at least one history (i.e., there exists at least one sequence of past actions after which the deviant strategy does not stipulate the same action that the resident does). In order for the resident to be a stable evolutionary endpoint, a first necessary condition is that the resident is a best reply to itself (i.e., it is a Nash equilibrium). However, this condition is not sufficient to guarantee that the resident is a stable evolutionary endpoint. In particular, it does not rule out the possibility that the resident population can be invaded by combinations of rare deviants (Boyd and Lorberbaum 1987; Lorberbaum 1994). This issue is summed up in the following (see also “Susceptibility to Invasion by Deviants” in the appendix in the online edition of the American Naturalist). Assume that there exist certain histories, called deviant histories, that cannot be reached by two resident partners (i.e., they are attainable only if at least one partner deviates). The actions undertaken for these histories, called silent actions, are neutral in a purely resident population because they are never implemented (Selten 1975, 1983; Boyd and Lorberbaum 1987; Lorberbaum 1994). Strategies that differ from the resident only at silent actions are called twins of the resident (Lorberbaum 1994). The existence of twins has two important consequences. (1) Because the resident and its twins are strictly equivalent when played against one another, the fitness difference between them depends exclusively on the nature of other deviants in front of which silent actions are expressed. (2)

For any resident, one can construct at least one pair of strategies (a twin and another deviant) such that, owing to the mere presence of the deviant (even in small frequency), the average payoff of the twin is larger than the average payoff of the resident (Boyd and Lorberbaum 1987; Farrell and Ware 1989; Lorberbaum 1994). This led Lorberbaum (1994) to conclude that there is no evolutionarily stable strategy (ESS) in the IRPD (where the term ESS does not correspond to the original definition of the concept). There is, however, a stronger case that needs to be made. The above result states that a resident strategy can always be destabilized by the introduction of a twin and a deviant, but this potentially includes cases in which the deviant needs to self-sacrifice in order to favor the twin, and such deviants do not really destabilize the resident. True, they may cause the rise of the twin, but once the twin is fixed and becomes the new resident, the vast majority of behaviors in the population are unchanged because the twin is essentially identical to the resident, and the deviant remains rare. In this article, a resident will therefore be said to be susceptible to invasion by deviants only in a stronger sense: when there exist paired twin and deviant strategies that are mutually favorable deviations (see “Susceptibility to Invasion by Deviants” in the appendix). In such a case, the twin rises in frequency owing to the presence of the rare deviant; when the twin has reached a sufficiently high frequency, the deviant becomes favored, and as a result, behaviors have changed in the population. Well-known Nash equilibrium strategies in the IRPD are all susceptible to invasion by deviants in this strong sense. As an example, consider a population fixed with TFT. In such a population, only deviants ever defect, reaction to defection is thus silent, and the fitness differential between TFT and AllC (a twin of TFT) depends on the average nature of defecting deviants. If most defections are due to hardheaded defectors, then TFT has a larger fitness than AllC. However, if most defections are due to temporary defectors (e.g., suspicious tit for tat [STFT], defecting once and then playing TFT; see Boyd and Lorberbaum 1987), then AllC is better. Overall, TFT is hence susceptible to invasion (in a strong sense) by a combination of STFT and AllC. The same reasoning holds (in the simultaneous version of the game) for a population fixed with generous TFT (Nowak and Sigmund 1992), tat for tit (Binmore and Samuelson 1992), Pavlov (Fudenberg and Maskin 1990; Nowak and Sigmund 1993), and contrite TFT (Sugden 1986; Boerlijst et al. 1997). All are susceptible to invasion by a combination of STFT and AllC (except tat for tit, which is susceptible to invasion by a combination of STFT and suspicious AllC).

Evolution of Reciprocity 000 Conclusion: The Problem of Deviants’ Unpredictability When one’s partner’s behavior is incompatible with her being a resident (i.e., when a deviant history has been reached), she is certainly a deviant (e.g., STFT). The problem that occurs then is that her future behavior is unpredictable in the absence of hypotheses about the deviants’ nature (e.g., assuming that most defections are due to AllD; see fig. 1). Of course, one’s partner’s behavior is always unpredictable in one way or another; that is, she can always move in an unpredictable way. However, when her behavior is compatible with her being a resident, she is most likely to actually be a resident, because deviants are rare. Natural selection thus operates on the resident’s action at such histories so as to maximize its payoff when playing itself. The true problem of partner’s unpredictability occurs when a partner is certainly (or very likely to be) a deviant, because then natural selection on the resident’s action depends on specific hypotheses on deviants’ nature. In sum, the unpredictability of deviants is the essence of the evolutionary difficulty with reciprocity. To my knowledge, it has never been formally demonstrated that this difficulty cannot be solved in the absence of supplementary hypotheses. What is clear is that in the evolutionary models in which reciprocity is a stable evolutionary outcome, supplementary hypotheses are always made, at least implicitly. These hypotheses are of two types. The first and most common consists of assumptions on the nature of deviants. The second consists in building models in such a way that the resident strategy generates, on its own, every possible history, so that being paired with a deviant is never a likely possibility. I will now present the implementation of both approaches in the traditional evolutionary models on reciprocity. The alternative solution, which I favor, will be presented separately. Conferring Some Informative Value to Deviants’ Behaviors: Reciprocity and the Variability of Social Types The most common way to deal with the unpredictability of deviants consists in making some assumptions about their nature. Most typically, this entails restrictions on attainable strategy space, such that the past behavior of genuine deviants allows one to predict their future behavior (fig. 1). This then determines the direction of selection for every action, including those silent actions that are expressed exclusively when confronted with deviants. The restriction of strategy space can take various forms of varying complexity. The simplest and most frequent consists in considering only strategies that can remember plays by their fellow player for one round in the past. Owing to their memory limitation, these strategies always

react in the same manner after a given event in the past round (e.g., Nowak and Sigmund 1992, 1993; McNamara et al. 1999; Roberts and Sherratt 1998; Wahl and Nowak 1999; Taylor and Day 2004; Andre´ and Day 2007). All individuals, whether they are residents or deviants, are thus categorized according to simple social types: defectors, reciprocators, cooperators, and so on. In such a case, reciprocity is a stable evolutionary endpoint, because past behavior always reveals the social type to which one belongs, even when one deviates from the resident strategy. There are other more complex ways of constructing the space of possible strategies that allow reciprocity to be a stable evolutionary endpoint. In principle, even though this has never been systematically explored, it should be the case with any restriction to strategies with finite memory size (see Hauert and Schuster 1997). As long as the expected duration of interactions is sufficiently larger than the memory window of an individual, one can eventually gather information in the beginning of an interaction to predict a partner’s behavior. However, if memory size is large, the necessary duration of interactions soon becomes unrealistic. Even strategy spaces in which individuals react to a potentially infinite number of rounds can still be constructed in such a way that deviants’ behavior conveys some information. For instance, in the adaptor strategies considered by Hauert and Stenull (2002), individuals actualize an internal state after each round, and this state influences their behavior. However, the precise way in which one actualizes one’s internal state is a priori assumed to be constant across all social interactions, and past behavior thus conveys reliable information on this very process. An alternative approach, in the same vein, would consist in introducing an explicit cost for strategic complexity (e.g., a cost of memory or a cost of behavioral conditionality) or in introducing mutation biases toward certain strategies. To my knowledge, such an approach has never been undertaken (but see Binmore and Samuelson 1992 for a very weak cost of complexity). Overall, it is rather straightforward to understand that once the informative value of behavior is taken as a given (because strategy space is limited to somehow consistent strategies), then reciprocating to a partner’s behavior makes sense. This can explain the discrepancy between models that have no ESS (e.g., Lorberbaum 1994) and models claiming that a particular strategy wins the evolutionary contest in the IRPD (e.g., Nowak and Sigmund 1992, 1993). The discrepancy comes from the a priori assumptions on the informative value of behaviors that are made in the latter, but not in the former, family of models.

Figure 1: Nature of information conveyed by one’s partner’s past behavior in traditional approaches to reciprocity. From Bayesian principles, when one’s partner has never deviated from the resident strategy, she is most likely to be actually a resident, and her future behavior is predictable (first pathway). When one’s partner has deviated from the resident strategy but most deviations are assumed to be plain mistakes, she is still most likely to be a (trembling) resident, and her future behavior is also predictable, especially if residents react to their own deviations (second pathway). When one’s partner has deviated from the resident strategy and most deviations are not assumed to be plain mistakes, she is then a genuine deviant, and her future behavior depends on specific hypotheses on the average nature of deviants (third and fourth pathways).

Evolution of Reciprocity 000 Neglecting Genuine Deviants: Behavioral Deviations as Mistakes

Rather than making some assumptions on the nature of deviants, an alternative solution is to build models so that the likelihood of confronting a genuine deviant is negligible at all histories. However, whereas in static games this entails only the assumption that deviants are rare, an additional hypothesis that I call the trembling hand assumption must be made in dynamic games. It consists of assuming that, because of the very frequent occurrence of perceptual or implementation errors (i.e., trembling hand), even when one’s partner behaves in a deviant way, it is always more likely for her to be a trembling resident than a genuine deviant (fig. 1). Under such an assumption, one has reliable information about one’s partner in all circumstances: she is a resident. But it is unclear, at first, how this makes it sensible to react to a partner’s behavior. If behavioral variations are the products of mistakes, one should not use behavior as a source of information at all. A partner’s behavior is relevant information in this case, provided she responds to her own behavior, because it is sensible to react to her mistakes if she reacts to them as well (fig. 1). Strategies that are stable evolutionary endpoints under this assumption are hence strategies such as grim or Pavlov, which respond to both their partner’s and their own behavior. The trembling hand assumption plays a relatively minor role in evolutionary analyses (except for Leimar 1997a, 1997b; Lorberbaum et al. 2002). It is most notably at work outside of evolutionary biology per se in the application of a concept from game theory, the concept of trembling hand perfection (Selten 1965, 1973, 1975), leading to the notion of limit ESS (Selten 1983, 1988). A limit ESS is a stable evolutionary endpoint if, at every history, one’s partner is more likely to be a trembling resident than a genuine deviant. Yet one must realize the strength of the assumption behind this approach. The trembling hand approach does not merely require that individuals make mistakes. It requires that any deviation is more likely to be the product of mistakes than a symptom of genuine deviance. This is a very costly and unrealistic assumption. When one’s partner has deviated twice, thrice, or more from the resident strategy, she is very likely to be a genuine deviant, differing from the resident for inherent reasons. Her future behavior thus has no reason to be similar to the behavior of a resident. Because of the strength of this assumption, the trembling hand approach will not be discussed further.

The Alternative Solution: Reciprocity and the Variability of Social Incentives The alternative solution is inspired by the evolutionary theory of costly signaling. This theory is aimed at explaining why phenotypes can come to bear an informative value for observers (Zahavi 1975). In signaling theory, the reliability of a given signal is sustained by an underlying variability of the net payoffs that individuals gain by sending such a signal (Spence 1974; Grafen 1990). Otherwise, evolution leads every individual to either send or not send the signal, and the signal ceases to be informative. For instance, the tail length of male barn swallows signals their quality because the net benefit of carrying a long tail depends on male quality. Strong males receive a net benefit from their long tail, but for weaker males, the cost of carrying the tail is larger than the sexual gain. Therefore, tail length evolves in such a way that it reliably signals male quality. Applying this reasoning to the evolution of reciprocity leads to the conclusion that we need to consider some variability in the payoffs received by individuals in their social interactions. I will now describe a way in which this can be achieved, called a social incentives approach to reciprocity. I will present a model analogous to Leimar’s (1997b) but developing a different analysis and reaching somewhat different conclusions.

A Social Life Twice as Complex as Usual Consider a population of individuals playing an alternate IRPD. The interaction is illustrated in figure 2. Individuals can be, at every moment, in one of two situations vis-a`vis each of their partners (fig. 3). When an individual is in a so-called cooperation-prone situation vis-a`-vis one of his partners, he receives a net benefit from reciprocal cooperation with this partner (b1 ! c 1 1 0, where bi is the benefit of receiving help and ci the cost of helping when in situation i). When an individual is in a non-cooperation-prone situation, he pays a cost of reciprocal cooperation with this partner (b2 ! c 2 ≤ 0). Importantly, (1) each individual can be engaged at any time in several IRPDs with different partners, and his situation is specific to each partner; (2) one’s situation is always known by oneself, but it is not observable by one’s partner (i.e., it is private information); and (3) each individual’s situation vis-a`-vis each partner can change with constant probabilities in both directions. In all cases, recall that defection is the strictly dominant strategy of every stage game, irrespective of individuals’ situations.

000

The American Naturalist

Figure 2: Schema of the alternate indefinitely repeated prisoner’s dilemma. Fk and Pk represent, respectively, focal and partner’s moves (cooperate or defect) in round k. Because the game is alternating, one of the two players has the specific role to initiate the interaction, and this role is attributed randomly (i.e., by “nature’s move”). Each partner can leave the interaction with a constant probability d and/or move from situation i to j with constant probabilities jij between any two of his moves. These events are supposed to occur just before one makes a move (dotted lines). The end of the interaction is definitive and immediately noticeable by one’s partner, whereas situational changes are not.

Situation-Dependent Reciprocity Consider the strategy called situation-dependent reciprocity (SDR). For most histories, it simply consists of repeating a partner’s last move when one is in a cooperationprone situation and defecting unconditionally otherwise (Leimar [1997b] calls it state-dependent reciprocity). SDR leads to cooperation via reciprocal behaviors. This strategy is described more formally in “Situation-Dependent Reciprocity” in the appendix and illustrated in figure 4 with an example. In a different model, Leimar (1997b) shows that SDR is a limit ESS in Selten’s (1983) terminology. Here, I show that SDR is immune to invasion by any combination of rare deviants. The two following sections aim at explaining why this result holds (see also fig. 5 and more details in “Situation-Dependent Reciprocity Is a Stable Evolutionary Endpoint” in the appendix).

SDR Is a Best Reply to Itself Under reasonable conditions on parameter values, SDR is a best reply to itself (for details, see “SDR Is a Nash Equilibrium” in the appendix). This can be understood by evaluating three things. First, when one’s partner follows SDR, then cooperation/defection signals her situation. As a result, as long as situations have sufficient inertia, it is sensible to react to her behavior. Second, in a cooperation-prone situation, the optimal behavior is to cooperate because it provides one with the benefit of reciprocal cooperation (b1 ! c 1 1 0). Third, in a non-cooperation-prone situation, the optimal behavior is to defect because b2 ! c 2 ≤ 0; that is, the future benefit of cooperation is lower than its immediate cost. In brief, social behavior remains a reliable signal of one’s situation because there is no gain in faking this signal.

SDR Is Resistant to All Deviant-Deviant Combinations Besides SDR being a Nash equilibrium, a population fixed with SDR is also robust to challenges by any combination of rare deviants (in the sense defined in “Susceptibility to Invasion by Deviants” in the appendix). This comes from two different mechanisms. Behavioral Variability Is Part of the Equilibrium Path. First, and most importantly, SDR generates, on its own, a large amount of behavioral variability in such a way that few histories remain unreached by SDR residents (such as with the trembling hand assumption). For instance, if one’s partner starts by defecting, then cooperates, then defects again, as long as deviants are sufficiently rare in the population, the most likely scenario is that she is following SDR but has changed situation. A best reply is thus to follow SDR also, because SDR is a best reply to itself. Overall, the natural generation of behavioral variability (as a reflection of the variability of incentives) is central to the SDR strategy. It makes it possible for individuals to

Figure 3: Two situations determining the payoffs received in the interaction. Transition can occur from situation i to situation j at a rate jij.

Evolution of Reciprocity 000

Figure 4: Typical play between two players following situation-dependent reciprocity. Players’ situations are indicated in italics (1, 2), and their moves are in bold. Actual situational changes are indicated by thick dotted lines. Both players are in situation 1 in the first round, and partner has the role of initiating the interaction. Reciprocal cooperation goes on until focal defects because his situation has changed (round 4). Partner is still in situation 1 but nevertheless defects in return. Defection goes on until focal reinstates cooperation after his situation has changed back to 1 (round 8).

use partner’s behaviors as signals and not as mere symptoms of deviance. Key to this result is therefore the assumption that individuals’ situations can change during interactions. If individuals’ situations were definitely fixed in the beginning of each interaction (such as in classic economic approaches of repeated games with incomplete information; Harsanyi 1967–1968; Aumann and Maschler 1995; see also Boyd 1992), then numerous histories would remain unreached, and the selective pressure stemming from true deviants would remain key at these histories. It is because situations can potentially change—even though they have some inertia—that each behavioral change can be interpreted as a signal. At Every Unreached History, the Only Mutually Beneficial Behavior Is Defection. There is actually one family of histories that cannot be reached by two SDR partners. When one player has spontaneously defected, he is overtly in a non-cooperation-prone situation, and any cooperation on the part of his partner is a strict deviance from SDR. This poses a problem because the direction of selection on one’s reaction to such a deviance depends exclusively on the average nature of the deviants that produce it (for details, see “The Puzzle of Reciprocity and Its Traditional Solutions”). Destabilization of SDR is prevented under two sufficient conditions. First, the product of the costs of cooperation should be larger than the product of benefits (c 1c 2 ≥ b1b2; see “SDR Is Resistant to All Combinations of Rare Deviants” in the appendix). This guarantees that when a player is in a non-cooperation-prone situation, the only issue that is mutually beneficial, in the short run, for his partner and for him is mutual defection. Second, the probability that a situational change occurs before the end of an interaction should be lower than a certain threshold (condition [A14] in “SDR Is Resistant to All Combinations

of Rare Deviants” in the appendix). This guarantees that when a player is in a non-cooperation-prone situation, the only issue that is mutually beneficial, even in the long run, for his partner and for him is mutual defection. If both conditions hold, then there may be some pairs of strategies (twin, deviant) that lead to the replacement of SDR by a twin, but they do not destabilize SDR’s behavior because they do not constitute mutually profitable deviations (see “Susceptibility to Invasion by Deviants” and “SDR Is Resistant to All Combinations of Rare Deviants” in the appendix). Leimar (1997b) does not address the problem of deviant histories in the same manner. His analysis still requires the trembling hand assumption, which makes it always more likely to be confronting a trembling resident than a genuine deviant, and thus avoids constraints on parameter values. However, as a result, Leimar’s (1997b) approach suffers from the general weakness of the trembling hand assumption. If an individual cooperates twice, thrice, or more, although SDR stipulates that she should defect, Leimar (1997b) assumes that she is a trembling SDR, whereas she is very likely to be a genuine deviant. For that reason, Leimar (1997b) does not firmly show that the existence of payoff variability allows reciprocity to be a stable evolutionary endpoint. First, the trembling hand assumption is very costly, and second, this assumption would be sufficient on its own for reciprocity to be a stable evolutionary endpoint anyhow (e.g., in grim or Pavlov strategies). The present analysis shows that reciprocity can be a stable evolutionary endpoint in a very demanding sense when payoffs are variable and in the absence of assumptions on the nature of deviances, but this is true only under certain constraints on parameter values. Evolutionary Dynamics The results presented so far concern the stability of SDR once it is fixed. They do not tell us anything about the

Figure 5: Best reply to one’s partner’s behavior when situation-dependent reciprocity (SDR) is the resident. When one’s partner behaves in line with SDR, from Bayesian principles she is most likely to be actually an SDR resident, and her past behavior reveals her situation, which makes it sensible to reciprocate (first and second pathways). One’s partner behaves in a way that is incompatible with SDR when, and only when, she cooperates after one has spontaneously defected. In this case, the only behavior that is mutually beneficial for both players is to follow SDR and defect. More precisely, two things are possible. (1) In front of the deviant, the best reply is to stick to SDR and defect, because one does not benefit sufficiently from reciprocal cooperation (third pathway). (2) This deviant overreciprocates in such an advantageous way that the best reply is to deviate from SDR and cooperate, but it is then the deviant that obtains a negative payoff (i.e., she would have been better off defecting); such a deviant is therefore unable to increase in frequency (fourth pathway). In either case, in the global population, the majority of behaviors thus remain in line with SDR.

Evolution of Reciprocity 000 evolutionary dynamics leading (or not leading) to its fixation. There are two points worth mentioning on this issue. First, evolutionary dynamics can potentially lead to the fixation of SDR. A population of full defectors (AllD) can indeed be invaded by a combination of SDR and another strategy following SDR only if her partner cooperates first (called SDR#). The dynamics of the three strategies (AllD, SDR#, and SDR) easily lead to the complete fixation of SDR. Second, it is important to note that SDR is not the only equilibrium strategy in this game and that the evolutionary dynamics could well reach other strategies. Determining the likelihood of each possible outcome is beyond the scope of this article. Discussion Understanding reciprocity has been a challenge for evolutionary biologist since Williams (1966) and Trivers (1971) introduced this idea in the field. Many models have been developed, based on the IRPD as a paradigm. Surprisingly, few of these models explicitly raise the fundamental question of the rationale for reciprocity: why does a partner’s past social behavior convey reliable information about her future behavior? Answering this question should be the very first step of an evolutionary account of reciprocity. Behaviors Inform about Preferences Proximally, reciprocity is based on the fact that another’s social behavior informs one about his preferences (in a psychological sense). When Peter helps John, John infers that Peter had the intention to help him, and if Peter preferred helping John in t, he is likely to do so in the future. Yet this raises more questions than it answers. It is not always true that current preferences provide information about future preferences. If Peter wears a blue Tshirt on Monday, John does not infer that he will consistently do so every day. Worse, even social preferences change. What then constitutes reliable information from social behaviors? Moreover, on an evolutionary perspective, taking for granted that certain behaviors do constitute reliable sources of information, the ultimate question still remains unanswered: how can evolution have led to an equilibrium state in which certain behaviors reveal the future behaviors of their performers? What Do Preferences Inform About, and Why? Even though I consider here only direct reciprocity, let us take a more general perspective for a moment. In a flexible species, the preferences and resulting behaviors of an individual are plastic responses, calculated according to a strategy, as a function of a set of variables, called the in-

dividual’s situation, and determining the payoffs the individual obtains from each decision. This variable set includes local features of the situation at hand—for example, being in a prison versus being free—but also features that are more durable and specific to the individual, such as the individual’s health, corpulence, existing social bounds, social background, wealth, and skills (note that the term “situation” in this acceptation has a broader meaning than in the social psychology literature; e.g., Doris 2002; Zimbardo 2008). The behavior of an individual may thus convey two types of information. First, it may convey information on specific features of the individual’s strategy. Concerning reciprocity, this is the option chosen in the vast majority of evolutionary models, called social types models. Second, it may convey information on specific features of the payoffs experienced by the individual (also called the individual’s incentives). Concerning reciprocity, this is the option chosen in the alternative evolutionary approach, called the social incentives theory of reciprocity. The two types of information that behavior can convey in a repeated interaction may sometimes be difficult to disentangle in proximate terms, but their evolutionary implications differ considerably. From an evolutionary perspective, the question is not only to decide whether social behavior informs about X or Y but also to understand how this can be an outcome of evolution. Therefore, deciding between the social types and the social incentives theory of reciprocity amounts to answering the following question: at the evolutionary equilibrium of a repeated interaction, can one’s social behavior convey information about some specific features of one’s strategy or incentives? Social Typology as a Rationale for Reciprocity: The Social Types Theory The idea that the past behavior of a partner informs us about her future behavior via the disclosure of specific features of her strategy, or social type, and that this constitutes the basis of reciprocity is the one contemplated by the vast majority of evolutionary models (e.g., Nowak and Sigmund 1992, 1993; McNamara et al. 1999; Roberts and Sherratt 1998; Wahl and Nowak 1999; Hauert and Stenull. 2002; Taylor and Day 2004; Andre´ and Day 2007). It also fits well with the commonsense understanding of reciprocity as a way to reward intrinsically good individuals (for a recent example, see, e.g., Nowak 2008). Yet this view of reciprocity relies on a very costly hypothesis that often remains largely unnoticed. In proximate terms, it is sufficient for simple social types to merely exist in order for current behaviors to convey enough information to sustain reciprocity. For instance, if a population is made of both consistently reciprocating and consistently defecting individuals, then individuals’ past be-

000

The American Naturalist

havior does reveal their future behavior. In ultimate terms, however, a much stronger assumption is required: there must be extrinsic constraints favoring simple social types over more flexible strategies. Otherwise, past behaviors do not remain indexes of future behaviors at evolutionary equilibrium (see “The Puzzle of Reciprocity and Its Traditional Solutions”). Implicitly, the idea behind the social types theory of reciprocity is hence that a cost of flexibility forces strategies to be somehow consistent. One’s past actions inform about one’s future actions because it would be impossible, difficult, or costly to change in the meantime. This very idea, more than the precise assumptions of any particular model stemming from this approach, is worth discussing. The Cost of Flexibility Is Unlikely to Explain Behavioral Reciprocity. Regarding social behavior, human beings, for instance, are able to entertain, at the same time, numerous social interactions with different partners, to produce most often appropriate behaviors with each, and even to change behavior with a given partner. Therefore, there is no good reason for behavioral changes to be extremely costly or difficult. Said differently, flexibility is exactly what brains have evolved to produce: their role is to adapt phenotypes to current and varying incentives. Behavioral flexibility is, therefore, unlikely to be very costly for a species with a complex brain. Whereas the cost of cognitive flexibility can be taken into account in refined evolutionary models, it is very implausible for such a cost to be the single cause behind the evolution of a ubiquitous social phenomenon such as reciprocity. Reciprocity based on the existence of a physiological linkage between past and future actions, as assumed in social types models, may actually be evolutionarily stable when the linkage is mechanistically simple and hence strongly constrained (see Crowley and Sargent 1996). This could be the case in certain nonbehavioral interactions (e.g., the plant-rhizobium interaction). Because cognitive systems are made precisely for flexibility, it is unlikely to be the case in interactions involving behavioral decisions. Here, it is illustrative to compare real behaviors with cheap talk (i.e., a promise to reciprocate). Evolutionarily and rationally speaking, it is well accepted that cheap talk is unreliable because it can easily be disconnected from behavior (promises are not binding; Schelling 1960). This raises a serious question for the social types theory of reciprocity: if promises are not reliable sources of information, why should real behaviors be? After all, both behaviors and promises involve mental representations that can be physiologically costly to disconnect from future behaviors. If the social types theory of reciprocity is valid, and if it is really the cost of cognitive flexibility that gives an informative value to past behaviors, then there should

be no clear-cut difference between cheap talk and real behavior in terms of their respective trustworthiness. But the truth is that there is a clear-cut difference, and it comes from the fact that real behaviors are costly. Because they are costly, real behaviors reliably inform about future behaviors, but they do so by conveying information on one’s payoffs, not by conveying information on one’s social type. This corresponds to the social incentives theory of reciprocity, not to the social types theory. The Variability of Payoffs as a Rationale for Reciprocity: The Social Incentives Theory Every individual receives specific benefits and pays specific costs in each social interaction (see Boyd 1992; Leimar 1997a, 1997b). The reciprocal exchange of help may convey a net benefit for certain individuals in certain social interactions and a net cost for other individuals or for the same individuals in other social interactions. Developing a model (modified from Leimar 1997b) in which individuals can be in either of two situations (one in which they do benefit from reciprocal cooperation and one in which they do not) and in which situations can change in the very course of social interactions, this article shows that reciprocity can be a stable endpoint of evolution (see also Leimar 1997b). The idea behind this result is very simple. When individuals vary in their payoffs, behaviors evolve in such a way that they signal individuals’ current payoffs. It is then the inertia of payoffs, rather than the constrained inertia of behavior, that sustains reciprocity. One’s partner’s cooperation signals that she benefits from the reciprocal interaction and that she is likely to cooperate in the future. This makes it rational to cooperate in response. In contrast, one’s partner’s defection signals that she does not benefit from the reciprocal interaction, and this makes it rational to defect in response. But why is cooperation/defection a reliable signal? After all, cooperation always bears a cost, and one’s partner could well defect even when she does benefit from reciprocal cooperation. If cooperation/defection reliably signals one’s underlying payoff (rather than one’s foolishness), it is in fact because reciprocity is present. Owing to the existence of reciprocity, only individuals who really do not benefit from reciprocal cooperation are likely to defect (for others, the likelihood of a retaliation is a deterrent). In turn, reciprocity is present because cooperation reliably signals one’s underlying payoff. The evolutionary mechanism stabilizing reciprocity is thus circular, and such circularity is actually general to every instance of costly signaling. In barn swallows, one cannot separate (1) the growth of males’ tails as a function of their body condition and (2) females’ response to tail length as a signal of body

Evolution of Reciprocity 000 condition. Both are interconnected because males would never benefit from producing a long tail if females did not happen to use them as signals. The above mechanism is likely to take place in real life. First, payoff variability in social interactions exists. Individuals differ in skills, social background, wealth, physical attractiveness, and so on, and there is no reason why they should all benefit identically from every social interaction. Second, because the amount of time and resources available for social life is necessarily limited, and because some partners are more valuable than others (see Noe¨ and Hammerstein 1995), there are certainly some social interactions that bear a net cost because they waste resources that could be used more efficiently in other interactions. Third, social payoffs do have some inertia. The situation one faces in a particular social interaction is an objective property of the world, not the product of a decision. Skills, wealth, social backgrounds, and so on cannot be changed at the individual’s whim, and hence they consistently affect his payoffs. Fourth, even though they have some inertia, social incentives do change. Individuals can change jobs, meet new friends, and so on, and this leads to changes in the payoff landscape they face. In sum, whereas the emergence of reciprocity as the outcome of social types involves assumptions on the physiology and/or genetics of behavior that are questionable, all the elements are likely to be present for reciprocity to emerge as the outcome of the variability of social incentives. When individuals have private information about their own payoffs, their social behavior evolves in such a way that it discloses this information, and this favors reciprocity (see also Leimar 1997b). Reciprocity Is Based on Costly Signaling Throughout the article, I have implicitly considered reciprocity to be the outcome of some form of costly signaling. This calls for a justification. First, behaviors in the repeated prisoner’s dilemma are signals and responses under Maynard Smith and Harper’s (1995) definition (and under Scott-Phillips’s [2008] more stringent definition) because, in a reciprocal interaction, two conditions are met. (1) One’s behavior has evolved owing to its effects on one’s partner’s future behavior. (2) One’s reaction to a partner’s behavior has evolved to be influenced by this behavior (because a partner’s behavior conveys some information on her future behavior). This is true both in the social types and in the social incentives theories of reciprocity. Note that in either case, reaction to a partner is both a response and a signal destined to influence her future behavior; reciprocity is hence a succession of communication events. At odds with the above arguments, in an article aiming

at defining biological communication, Scott-Phillips (2008) argued that reciprocity is not the outcome of signaling because cooperative behaviors have not evolved to be influenced by a partner’s past behaviors but rather only to influence her future behaviors. This point of view is in line with the traditional opinion on reciprocity. A partner’s behavior conveys no information; it just so happens that she will reciprocate in the future, and therefore one is better off cooperating to trigger more cooperation. However, this has an immediate consequence: if a partner happens to defect rather than cooperate, one should not retaliate because there has been no information conveyed. Instead, one should still cooperate in order to influence her next behavior (Boyd and Lorberbaum 1987; see “The Puzzle of Reciprocity and Its Traditional Solutions”). Contra this traditional opinion (and contra Scott-Phillips 2008), under both the social types and the social incentives mechanisms, reciprocity evolves because others’ behavior conveys some information, and reciprocity then constitutes a genuine response to their behavior. A secondary question is whether the mechanism that renders signals reliable in reciprocity is really akin to costly signaling. This question has two different answers, depending on whether one considers social types or social incentives models. In the social types approach, current behaviors reliably reveal future behaviors because they constitute physiologically hard to fake signals (i.e., because current and future behaviors are difficult to disconnect for physiological reasons). The evolutionary mechanism at the basis of reciprocity in this approach is hence costly signaling in a large sense (strictly speaking, the expression “hard to fake signaling” would be more appropriate). In the social incentives mechanism, current behaviors reliably reveal future behaviors because faking the signal brings more cost than benefit (when one does not benefit from reciprocal cooperation, it is not worth cooperating to make one’s partner believe that one does). The evolutionary mechanism at the basis of reciprocity in this approach is hence costly signaling in a strict sense. Some Consequences of the Social Incentives Theory of Reciprocity On empirical grounds, the social types theory implies that reciprocity is based on the fact that past behaviors reveal individuals’ stable personality traits. There are important observations on human behaviors that are at odds with this result. First, social psychology has shown at great length that personality traits explain but a moderate fraction of behavioral variance (e.g., Milgram 1963; Doris 2002; Zimbardo 2008). Second, the observed variance in personality (at least in humans) does not correspond to the range of simplistic types posited in social types models

000

The American Naturalist

(defectors, reciprocators, etc.). Third, firm reciprocity can occur largely before two partners come to understand each other’s personality, including among partners who do not even attempt to understand it (e.g., in professional life). Fourth, partners frequently change behavior in the very course of interactions. In particular, many instances of reciprocity consist of responding to novel behaviors of already intimate partners, even though these behaviors are unlikely to reveal something new about their personality. The social incentives theory of reciprocity accounts much better for these observations. First, it explains why individuals can use their partner’s past behaviors as sources of information, even when this behavior is in large part determined by situational factors. Second, it does not depend on the existence of a range of simplistic personality types. Third, and most importantly, it can account both for the stability and for the potential versatility of reciprocal interactions. Social incentives lie on a continuum of generality and stability, from very temporary social interests that may rapidly change, to very durable preferences. Even stable personality traits can be understood as adaptive responses to lifelong incentives (see, e.g., Dall et al. 2004; Sih et al. 2004; Wolf et al. 2007). In sum, the social incentives theory of reciprocity implies neither that social behaviors should always be stable nor that they should always be versatile; rather, it predicts that they should be based on a continuum of social cues of varying degrees of generality concerning each partner. Social Evaluations. Direct reciprocity is but one example of social feedbacks. We let our behaviors with potential partners depend on the knowledge of many different behaviors of theirs, including behaviors with third parties. Understanding social evaluations in general—that is, the way we infer social properties of our partners from the knowledge of their behaviors—is a key challenge for the evolutionary study of social behavior. Considering the variability of social behavior as a reflection of the variability of social payoffs, rather than of social types, offers, I believe, a promising pathway in this endeavor. The principle is very simple: if an individual has performed behavior X in the past, one can infer that she is also likely to perform Y in the future, because some underlying incentives are common to both decisions. This principle could be at work in both direct and indirect reciprocity. Consider the example of cross-domain evaluations, the fact that one’s behavior in a given domain may help observers to evaluate one’s trustworthiness in other domains (e.g., marital life impacts professional life; see, e.g., Henrich and Henrich 2007, pp. 131–132). Social types theory explains cross-domain evaluations by the existence of constraints (i.e., a physiological linkage between behaviors in two domains), and this cannot help us to understand them,

at least not with evolutionary game theory. In the social incentives theory, cross-domain evaluations occur because some social incentives (but not all) are common between the two domains, and this is potentially understandable. For instance, the natural relationship we often tend to make between one’s trustworthiness in pairwise interactions and one’s groupwise behaviors (e.g., contribution to a public good) can be understood in this framework if certain social incentives are common to pairwise and groupwise decisions (e.g., one’s benefit from having a good reputation in one’s local group). Conclusion In this article, I have shown that the current evolutionary understanding of reciprocity is based on the often implicit assumption that others’ past behaviors disclose some information about their stable social type. I have then proposed an alternative approach in which others’ social behaviors reveal the social incentives they face, which also makes it rational to reciprocate. Even though deciding positively between these two alternatives is, in large part, an empirical issue, in the “Discussion,” I have put forward several arguments that, I believe, support the social incentives approach. Finally, whereas social types models have generated abundant theoretical results, I have argued that surprisingly few of them match the subtlety of actual social behaviors. By contrast, based on the reasoning that social behaviors reveal the payoffs faced by individuals, the social incentives theory of reciprocity has the potential to help shed light on the subtlety of the evaluations that form the basis of social life. Acknowledgments Above all, I thank Y. Viossat for his invaluable help in game theory. I also thank T. Day for his kind help with English; N. Baumard and H. Mercier for their comments on a previous version of this work; D. Sperber and the group Naturalisme et Sciences Humaines for many stimulating discussions on related topics; and three anonymous reviewers for their very constructive remarks. I thank the Centre National de la Recherche Scientifique for funding. Literature Cited Andre´, J. B., and T. Day. 2007. Perfect reciprocity is the only evolutionarily stable strategy in the continuous iterated prisoner’s dilemma. Journal of Theoretical Biology 247:11–22. Aumann, R. J., and M. B. Maschler. 1995. Repeated games with incomplete information. MIT Press, Cambridge, MA. Bendor, J., and P. Swistak. 1995. Types of evolutionary stability and the problem of cooperation. Proceedings of the National Academy of Sciences of the USA 92:3596–3600. Binmore, K., and L. Samuelson. 1992. Evolutionary stability in re-

Evolution of Reciprocity 000 peated games played by finite automata. Journal of Economic Theory 57:278–305. Boerlijst, M. C., M. A. Nowak, and K. Sigmund. 1997. The logic of contrition. Journal of Theoretical Biology 185:281–293. Boyd, R. 1992. The evolution of reciprocity when conditions vary. Pages 473–489 in A. H. Harcourt and F. B. M. de Waal, eds. Coalitions and alliances in humans and other animals. Oxford University Press, Oxford. Boyd, R., and J. P. Lorberbaum. 1987. No pure strategy is evolutionarily stable in the repeated prisoner’s dilemma game. Nature 327:58–59. Crowley, P. H., and R. C. Sargent. 1996. Whence tit-for-tat? Evolutionary Ecology 10:499–516. Dall, S. R. X., A. I. Houston, and J. M. McNamara. 2004. The behavioural ecology of personality: consistent individual differences from an adaptive perspective. Ecology Letters 7:734–739. Doris, J. M. 2002. Lack of character: personality and moral behavior. Cambridge University Press, Cambridge. Farrell, J., and R. Ware. 1989. Evolutionary stability in the repeated prisoner’s dilemma. Theoretical Population Biology 36:161–166. Fudenberg, D., and E. Maskin. 1990. Evolution and cooperation in noisy repeated games. American Economic Review 80:274–279. Fudenberg, D., and J. Tirole. 1991. Game theory. MIT Press, Cambridge, MA. Grafen, A. 1990. Biological signals as handicaps. Journal of Theoretical Biology 144:517–546. Hammerstein, P. 2003. Why is reciprocity so rare in social animals? a protestant appeal. Pages 55–82 in P. Hammerstein, ed. Genetic and cultural evolution of cooperation. MIT Press, Cambridge, MA. Harsanyi, J. 1967–1968. Games with incomplete information played by “Bayesian” players. Parts I–III. Management Science 14:159– 182, 320–334, 486–502. Hauert, C., and H. G. Schuster. 1997. Effects of increasing the number of players and memory size in the iterated prisoner’s dilemma: a numerical approach. Proceedings of the Royal Society B: Biological Sciences 264:513–519. Hauert, C., and O. Stenull. 2002. Simple adaptive strategy wins the prisoner’s dilemma. Journal of Theoretical Biology 218:261–272. Henrich, J., and N. Henrich. 2007. Why humans cooperate: evolution and cognition. Oxford University Press, Oxford. Kandori, M., and I. Obara. 2006. Efficiency in repeated games revisited: the role of private strategies. Economatrica 74:499–519. Krams, I., T. Krama, K. Igaune, and R. Mand. 2008. Experimental evidence of reciprocal altruism in the pied flycatcher. Behavioral Ecology and Sociobiology 62:599–605. Leimar, O. 1997a. Reciprocity and communication of partner quality. Proceedings of the Royal Society B: Biological Sciences 264:1209– 1215. ———. 1997b. Repeated games: a state space approach. Journal of Theoretical Biology 184:471–498. Lorberbaum, J. 1994. No strategy is evolutionarily stable in the repeated prisoner’s dilemma. Journal of Theoretical Biology 168: 117–130. Lorberbaum, J. P., D. E. Bohning, A. Shastri, and L. E. Sine. 2002. Are there really no evolutionarily stable strategies in the iterated prisoner’s dilemma? Journal of Theoretical Biology 214:155–169. Maynard Smith, J. 1982. Evolution and the theory of games. Cambridge University Press, Cambridge. Maynard Smith, J., and D. G. C. Harper. 1995. Animal signals: models and terminology. Journal of Theoretical Biology 177:305–311.

Maynard Smith, J., and G. R. Price. 1973. Logic of animal conflict. Nature 246:15–18. McNamara, J., C. Gasson, and A. Houston. 1999. Incorporating rules for responding into evolutionary games. Nature 401:368–371. Melis, A. P., B. Hare, and M. Tomasello. 2008. Do chimpanzees reciprocate received favours? Animal Behaviour 76:951–962. Milgram, S. 1963. Behavioral study of obedience. Journal of Abnormal Psychology 67:371–378. Mitani, J. 2005. Reciprocal exchange in chimpanzees and other primates. Pages 101–113 in P. M. Kappeler and C. P. van Schaik, eds. Cooperation in primates and humans: mechanisms and evolution. Springer, Heidelberg. Noe¨, R., and P. Hammerstein. 1995. Biological markets. Trends in Ecology & Evolution 10:336–339. Nowak, M. A. 2008. Generosity: a winner’s advice. Nature 456:579. Nowak, M. A., and K. Sigmund. 1992. Tit for tat in heterogeneous populations. Nature 355:250–253. ———. 1993. A strategy of win stay, lose shift that outperforms titfor-tat in the prisoner’s dilemma game. Nature 364:56–58. Roberts, G., and T. N. Sherratt. 1998. Development of cooperative relationships through increasing investment. Nature 394:175–179. Schelling, T. C. 1960. The strategy of conflict. Harvard University Press, Cambridge, MA. Scott-Phillips, T. C. 2008. Defining biological communication. Journal of Evolutionary Biology 21:387–395. Selten, R. 1965. Spieltheoretische Behandlung eines Otigopolmodells mit Nachfragetrfigheit. Zeitschrift fu¨r Gesamte Staatswissenschaft 121:301–324. ———. 1973. A simple model of imperfect competition, where 4 are few and 6 are many. International Journal of Game Theory 2: 141–201. ———. 1975. Reexamination of the perfectness concept for equilibrium points in extensive games. International Journal of Game Theory 4:25–55. ———. 1983. Evolutionary stability in extensive two-person games. Mathematical Social Sciences 5:269–363. ———. 1988. Evolutionary stability in extensive two-person games: correction and further development. Mathematical Social Sciences 16:223–266. Sih, A., A. Bell, and J. C. Johnson. 2004. Behavioral syndromes: an ecological and evolutionary overview. Trends in Ecology & Evolution 19:372–378. Spence, A. M. 1974. Market signaling: informational transfer in hiring and related screening processes. Harvard University Press, Cambridge, MA. Stevens, J. R., and M. D. Hauser. 2004. Why be nice? psychological constraints on the evolution of cooperation. Trends in Cognitive Sciences 8:60–65. Sugden, R. 1986. The economics of rights, cooperation and welfare. Blackwell, Oxford. Taylor, P. D., and T. Day. 2004. Stability in negotiation games and the emergence of cooperation. Proceedings of the Royal Society B: Biological Sciences 271:669–674. Trivers, R. 1971. The evolution of reciprocal altruism. Quarterly Review of Biology 46:35–57. Wahl, L. M., and M. A. Nowak. 1999. The continuous prisoner’s dilemma. I. Linear reactive strategies. Journal of Theoretical Biology 200:307–321. Whitlock, M. C., B. H. Davis, and S. Yeaman. 2007. The costs and

000

The American Naturalist

benefits of resource sharing: reciprocity requires resource heterogeneity. Journal of Evolutionary Biology 20:1772–1782. Williams, G. C. 1966. Adaptation and natural selection. Princeton University Press, Princeton, NJ. Wolf, M., G. S. van Doorn, O. Leimar, and F. J. Weissing. 2007. Lifehistory trade-offs favour the evolution of animal personalities. Nature 447:581–584.

Zahavi, A. 1975. Mate selection: a selection for a handicap. Journal of Theoretical Biology 53:205–214. Zimbardo, P. G. 2008. The Lucifer effect: understanding how good people turn evil. Random House, New York. Associate Editor: Stuart West Editor: Ruth G. Shaw

“The Horned Corydalus. One of the largest and most formidable looking, though perfectly harmless, insects we have is the Corydalus cornutus. Its large size, its broad net-veined wings, and slow stupid flight, and aquatic habits, besides many other characteristics, place it very low in the scale of insect life.” From “Natural History Miscellany: Zoo¨logy” (American Naturalist, 1867, 1:434–439).

! 2009 by The University of Chicago. All rights reserved. DOI: 10.1086/649597

Appendix from J.-B. Andre´, “The Evolution of Reciprocity: Social Types or Social Incentives?” (Am. Nat., vol. 175, no. 2, p. 000) Supplemental Information Susceptibility to Invasion by Deviants Consider an alternate indefinitely repeated prisoner’s dilemma (IRPD) between two partners. P(S1 , S 2 ) is the expected payoff of an individual playing strategy S1 when confronted with an individual playing S2. Consider a resident strategy, called resident, and a twin of this strategy called twin. Nash Equilibrium Resident is a Nash equilibrium if and only if GS ( resident, P(S, resident) ≤ P(resident, resident).

(A1)

If equation (A1) is fulfilled with a strict inequality for every S, resident is a strict Nash equilibrium and a strict evolutionarily stable strategy (ESS) by virtue of Maynard Smith’s first condition (Maynard Smith and Price 1973). Resident is also a stable evolutionary endpoint: as long as deviants are sufficiently rare, any deviant obtains a strictly lower payoff than resident. However, if resident has twins (i.e., strategies deviating only at unreached histories), then equation (A1) is fulfilled with mere equality for the twin(s); that is, P(twin, resident) p P(resident, resident). Maynard Smith’s first condition is therefore neutral. Also, Maynard Smith’s second condition is not helpful. It consists in comparing the respective payoffs of resident and twin in front of twin, and these payoffs are also equal (i.e., P(resident, twin) p P(twin, twin)). Therefore, as long as resident has twin(s), it is at most a neutrally stable strategy (Maynard Smith 1982), because Maynard Smith’s first and second conditions are both neutral. Susceptibility to Combinations of Deviants Besides being at most a neutrally stable strategy and, more significantly, when resident has twins, it is at risk of being destabilized because the fitness difference between twin and resident depends on the nature of other deviants. Consider then a second deviant strategy, called deviant, that may deviate from resident at any history. Resident is said to be susceptible to invasion by the couple (deviant, twin) if and only if P(twin, deviant) 1 P(resident, deviant),

(A2)

P(deviant, twin) 1 P(resident, twin).

(A3)

Conditions (A2) and (A3) stipulate that the two deviant strategies are mutually beneficial deviations (see “Unreached Histories and Twin Strategies” in the main text).

The Indefinitely Repeated Prisoner’s Dilemma with Two Situations Consider a pairwise alternate IRPD with perfect information (i.e., the past sequence of moves is fully recalled by both players). Each partner can leave the interaction with a constant probability d before each of his moves (dotted lines in fig. 2). Each individual can be, at every moment, in one of two situations (see fig. 3), and its situation can change with constant probabilities j12 and j21 from 1 to 2 and backward (dotted lines in fig. 2); j12 and j21 do not depend on the history of the interaction (in contrast with, e.g., Whitlock et al. 2007). 1

App. from J.-B. Andre´, “Evolution of Reciprocity”

In situation 1, individuals pay a cost c1 for cooperating and receive a benefit b1 1 c 1 from their partner’s cooperation. In situation 2, individuals pay a cost c2 for cooperating and receive a benefit b 2 ≤ c 2 from their partner’s cooperation.

Situation-Dependent Reciprocity Is a Stable Evolutionary Endpoint Situation-Dependent Reciprocity The strategy called situation-dependent reciprocity (SDR) is defined as follows (see also fig. 4): When in situation 2, always defect. When in situation 1, defect if you were the last of the pair who cooperated in the past,

(A4)

cooperate otherwise. In the following, it will be shown that SDR is a Nash equilibrium and that it is not susceptible to combinations of rare deviants (in the sense defined in “Susceptibility to Invasion by Deviants”). There are other strategies that also fulfill these two conditions, but they are not considered in this article. SDR Is a Nash Equilibrium Define an information set at time t for player i in its interaction with player j as a set of states of the world that player i cannot distinguish. In this model, an information set at t for player i is one among all the possible series of past moves by both players (called a history of the interaction) plus player i’s own situation in t. Consider a focal player, called focal, interacting with a given partner, called partner, following the SDR strategy. We aim at showing that there is no better strategy for focal than SDR itself. This involves showing that focal has no incentive to deviate once or several times from SDR. This problem is simplified as follows. Proposition: SDR is a Nash equilibrium if, for any round t at which focal has to make a move and any information set h t focal can be facing if partner has always followed SDR (i.e., h t may include some past deviations by focal but not by partner), there is no single deviation that can be profitable to focal. This proposition is proven by its contraposition: if SDR is not a Nash equilibrium, there exists h t such that focal can gain by deviating once in t (see, e.g., Fudenberg and Tirole 1991, p. 110). The proof is as follows: if SDR is not a Nash equilibrium, there exists an alternative strategy sˆ deviating at one or several information sets and reaching a larger payoff than SDR. As a result, there also exists a time t " 1 t such that a strategy sˆ", playing like sˆ until t " and like SDR afterward, reaches a larger payoff than SDR. This strategy, sˆ", has a last profitable deviation, which is therefore a single profitable deviation. As a result, in order to show that SDR is a Nash equilibrium, I will consider every possible information set h t (just before focal makes his move) at which partner has always followed SDR, assume that partner will always follow SDR in the future, and check that focal cannot benefit from deviating once in h t and following SDR afterward. Before proceeding with this analysis, let me define two useful variables. Consider focal just before he makes his move in an interaction with partner, and define focal’s expected future payoffs, if both of them follow the SDR strategy in the future, in two cases. (1) Pi C ( p) is focal’s expected future payoff when in situation i, partner is the last of the pair who has cooperated, and partner was in situation 1 with probability p in the past round. For instance, P1 C (1) is focal’s expected payoff when partner has just cooperated in the past round. (2) Pi,2C is focal’s expected future payoff when in situation i and focal is the last of the pair who has cooperated. Partner Performed the Last Cooperation Here, I consider information sets at which partner is not overtly in situation 2. This includes both the cases in which the last cooperative behavior was performed by partner (either in the preceding round or several rounds ago) and the case in which focal has to initiate the interaction. Call p the probability that partner is in situation 1 in round t ! 1, just before focal makes his move. If partner’s last cooperative behavior just occurred in round t ! 1, then p p 1. If the interaction is beginning, the probability that partner is in situation 1 is the average probability for a random new partner to be in situation 1 (p p p10 p j21 /(j12 " j21 )). As long as j12 " j21 ≤ 1 (a condition that will be required later on for other reasons), all the other cases are intermediate: when partner’s last cooperative behavior occurred several rounds ago, the 2

App. from J.-B. Andre´, “Evolution of Reciprocity”

probability that partner is in situation 1 lies between p10 and 1. For mathematical convenience, I will introduce p !, the probability that partner is in situation 1 in round t ! 1, when she responds to focal’s behavior. This last probability is p ! p p(1 " j12 ) ! (1 " p)j21. Recall that the analysis aims at computing focal’s expected future payoffs if he defects or cooperates in t (assuming that both focal and partner will follow SDR afterward) and comparing them. By considering all the events that can occur in rounds t ! 1 and t ! 2, the payoff focal obtains if he cooperates in t can be expressed as K iC ( p) p "c i ! p !(1 " d)[bi ! (1 " jij )(1 " d)Pi C (1) ! jij (1 " d)Pj C (1)] ! (1 " p ! )(1 " d)[(1 " jij )(1 " d)Pi,2C ! jij (1 " d)Pj,2C ]. The payoff he obtains if he defects can be expressed as QiC ( p) p p !(1 " d)[(1 " jij )(1 " d)Pi C (1) ! jij (1 " d)Pj C (1)] ! (1 " p ! )(1 " d)[(1 " jij )(1 " d)Pi C (0) ! jij (1 " d)Pj C (0)]. The difference between the two payoffs is DCi ( p) p K iC ( p) " QiC ( p), which yields DCi ( p) p "c i ! p !(1 " d)bi ! (1 " p ! )(1 " d) 2[(1 " jij )D i ! jij D j ],

(A5)

with D i p Pi,2C " Pi C (0). Equation (A5) has three terms. If focal cooperates rather than defects in round t, he (1) pays a cost ci and (2) receives a benefit bi if his partner is still present and, in situation 1, at the moment of responding (probability p !(1 " d)). (3) The third term has to do with the benefit of information (see also Kandori and Obara 2006). After focal cooperates in t, partner’s behavior in t ! 1 reveals her situation with certainty (situation 1 if she cooperates, situation 2 if she defects), which is not the case if focal defects. This generates an extra benefit, gained when partner is in situation 2 and the interaction lasts at least until t ! 2 (probability (1 " p ! )(1 " d) 2), that is equal to (1 " jij )D i ! jij D j, where D i represents the difference between focal’s payoff when partner is in situation 2 and focal in situation i, depending on whether focal has the information about partner’s situation. These two differences are both positive and can be computed by extending the analysis to rounds t ! 3 and t ! 4: D1 p c1, D2 p

j21 (1 " d)[b 2 ! (1 " j21 )(1 " d)c 1 ] . 1 " (1 " j21 ) 2 (1 " d) 2

(A6)

Focal is in situation 1. In this case, SDR stipulates that focal should cooperate. From equations (A5) and (A6), the sufficient and necessary condition to guarantee that cooperation is a best reply is "c 1 ! p !(1 " d)b1 ! (1 " p ! )(1 " d) 2[(1 " j12 )c 1 ! j12 D 2 ] ≥ 0.

(A7)

Acknowledging the fact that D 2 ≥ 0, it is possible to derive the sufficient condition b1 1 " (1 " p ! )(1 " j12 )(1 " d) 2 ≥ . c1 p !(1 " d)

(A8)

At worst, partner is unknown and its likelihood to be in situation 1 is p ! p p10 p j21 /(j12 ! j21 ). Equation (A8) thus leads to the sufficient condition b1 j12 ! j21 " j12 (1 " j12 )(1 " d) 2 ≥ . c1 j21 (1 " d)

(A9)

Inequality is a sufficient (but not necessary) condition for cooperation to be a best reply at every information 3

App. from J.-B. Andre´, “Evolution of Reciprocity”

set where focal is in situation 1 and partner is not overtly in situation 2. It stipulates that cooperative interactions must be sufficiently stable (j12 and d not too high) and that the relative proportion of individuals in situation 1 (j21 /j12) must not be too low. The sufficient and necessary condition for SDR to be a best reply in this case is given by equation (A7). Focal is in situation 2. In this case, SDR is a best reply iff DC2 ( p) ≤ 0. Equations (A5) and (A6) yield DC2 ( p) ≤

b2 ! c 2 1 ! p" (1 ! j21 " p " ) " (c 1 ! c 2 ). 2 ! j21 2 ! j21

(A10)

The second term on the right-hand side of equation (A10) ([(1 ! p " )/(2 ! j21 )](c 1 ! c 2 )) is strictly positive iff c 2 ! c 1. This raises a problem because it might then fail to guarantee DC2 ( p) ≤ 0. Again, cooperating provides one with information about one’s partner at a cost ci. If this cost is strictly lower in situation 2 than in situation 1 (c 2 ! c 1), then it might be beneficial to test one’s partner when in situation 2 in order to obtain the information at a cheaper price. In other words, the benefit of information could outweigh the net cost of cooperation. A sufficient condition for this not to occur is that the upper bound of DC2 ( p) given in equation (A10) is negative. This yields the condition c 1 ! c 2 1 ! j21 " p " ≤ . c 2 ! b2 1 ! p"

(A11)

At worst, partner is unknown and p " p p10 p j21 /(j12 " j21 ). Equation (A11) thus leads to the sufficient condition c1 ! c 2 j21 ≤ (1 ! j21 ) " (2 ! j21 ). c 2 ! b2 j12

(A12)

Focal Performed the Last Cooperation Here, I consider information sets at which partner is overtly in situation 2. Both players were cooperating, or partner had the role to initiate the interaction, and partner started to defect and never reinstated cooperation afterward. In this case, if focal defects in round t, he obtains the same payoff than if he cooperates, except that he does not pay the cost ci. Defection is thus the best option. SDR Is Resistant to All Combinations of Rare Deviants “SDR Is a Nash Equilibrium” has demonstrated that under reasonable constraints on parameter values (summed up in “Summary of the Constraints on Parameter Values”), when confronting a partner following the SDR strategy, there is never any incentive to deviate from the SDR strategy itself; SDR is a Nash equilibrium. This is not sufficient, however, to guarantee that SDR is a stable evolutionary endpoint. One must also check that SDR is not susceptible to combinations of rare deviants (in the sense defined in “Susceptibility to Invasion by Deviants”). Unreached Information Sets SDR generates, on its own, a large fraction of the possible behavioral variability. As long as the two transition rates (jij) are strictly positive (and significantly larger than mutation rates), a behavioral change from the part of one’s partner is more likely to be the symptom of a situational change than of genuine deviance. There is, however, an important family of information sets that cannot be reached when one’s partner is following SDR. After a focal player spontaneously defects, SDR stipulates that his partner should defect in response and wait for the focal player to possibly reinstate cooperation later on (see, e.g., fig. 4). If the partner cooperates instead, then she is necessarily a deviant. Call such deviants insistent individuals. Reaction to insistent individuals is neutral in front of SDR, because it is never implemented. This raises a difficulty because it is exclusively the average nature of insistent deviants that drives selection on this reaction. Confronting this difficulty, Leimar (1997b) considers the traditional trembling hand assumption. In such a perspective, from Bayesian principles, insistent individuals are most likely to be strict SDR individuals making a mistake, and the best reaction to them is to follow SDR. This explains why, in the article by Leimar (1997b), SDR is a limit ESS in Selten’s (1983) acceptation. Avoiding the trembling hand assumption, as I wish to do, 4

App. from J.-B. Andre´, “Evolution of Reciprocity”

raises a critical difficulty. Consider the following possibility. A deviant follows SDR except that, when she is in a cooperation-prone situation and her partner spontaneously defects, she still insists on cooperating once (or twice or more) and would agree to cooperate forever, provided her partner cooperates every other round (or every k rounds). Insistent cooperation is a “secret handshake” that the deviant uses to propose a mutually beneficial, asymmetrical deal with her partner. The best reaction to the handshake might well be to accept the deal and thus deviate from SDR. This may, in turn, fully destabilize SDR because it would then become advantageous for one to lie and pretend to be in a non-cooperation-prone situation, even when one is not. In the absence of the trembling hand assumption, SDR may still be evolutionarily robust, but it requires additional constraints. Consider a deviant individual, called deviant, that signals himself by cooperating when SDR stipulates that he should defect. Consider also a twin of SDR, called twin, that responds to deviant’s first deviation by deviating in the continuation game. In the following, constraints on parameter values are found such that there does not exist any combination (twin, deviant) for which conditions (A2) and (A3) are both met. When deviant deviates and cooperates when he should have defected, he pays an immediate cost. For condition (A3) to be met, this cost must be compensated by a deviation in response (a cooperation) from the part of twin. For condition (A2) to be met, twin must benefit from her own deviation. This can occur in two distinct ways. Case 1: both deviations can be mutually beneficial even in the absence of any situational change, that is, even if deviant remains in situation 1 and twin in situation 2 until the end of the interaction. Case 2: the two deviations may be mutually beneficial only if at least one situational change occurs before the end of the interaction (deviant moves to situation 2 and/or twin to situation 1). In the following, I will find conditions on parameter values such that none of these two possibilities can occur. Let me first consider case 1. We must find conditions on parameter values such that cooperation between deviant in situation 1 and twin in situation 2 cannot be mutually beneficial. The initial cooperation by deviant costs him c1 and gives twin a benefit b2. The compensation by twin consists in an expected number k of cooperation events (k can be lower than 1). The initial cooperation and its payment are mutually profitable deviations if and only if c 1 ! kb1 and b 2 1 kc 2. These conditions are violated for every k if and only if c 1 c 2 ≥ b1 b 2 .

(A13)

Condition (A13) implies that cooperation in situation 2 requires an extremely large back payment to become advantageous, and this payment would cancel out the benefit of cooperation for one’s partner. Let me now consider case 2. Under this scenario, cooperation between deviant in situation 1 and twin in situation 2 cannot be mutually profitable. At least one of the two players has a negative payoff at the end of this episode (i.e., a payoff strictly lower than what he would have obtained if he had not deviated from SDR). But this initial cost of deviation is subsequently compensated by a payment occurring after a situational change from the part of one player (a payment that may hence never occur). This second possibility is impossible too, provided situational changes are sufficiently slow. First, because any interaction’s expected duration is finite, and because one’s payoff at each round is finite, the maximal possible expected payment is itself finite. Second, a payment occurring after a situational change has a small probability to take place if the rates of situational changes are both low (because the interaction might end beforehand). As a result, for any strictly positive initial cost, there exist threshold rates of situational changes, below which the expected future payment is lower than this initial cost. Let me now find a conservative estimate of this threshold. First, the expected payment one can obtain from an interaction is lower than Bmax p max (b1 , b 2 )/d (where 1/d is the expected duration of the interaction). Second, at the end of the episode of interaction between deviant in situation 1 and twin in situation 2, the cost that the worst-off player has paid is necessarily larger than Cmin p min (c 1 c 2 /b1 ! b 2 , c 1 ! b1 b 2 /c 2 ) (Cmin is a highly conservative minimum). Third, the probability that the interaction ends before the first situational change (or at the same round) is k p d/(j12 " j21 " d ! j12 j21 ! dj12 ! dj21 " dj12 j21 ). Therefore, a sufficient condition for the expected payment to be lower than the initial cost is Cmin ≥ Bmax # (1 ! k). Assuming that the three probabilities are small, such that their products can be neglected, yields the more illustrative condition 5

(A14)

App. from J.-B. Andre´, “Evolution of Reciprocity”

j12 ! j21 min (c 1 c 2 /b1 " b 2 , c 1 " b1 b 2 /c 2 ) ≤ # d. j12 ! j21 ! d max (b1 , b 2 )

(A15)

Note that condition (A15) implies condition (A13) (i.e., c 1 c 2 " b1 b 2 must be positive). If condition (A15) is fulfilled, then there is no couple (twin, deviant) such that both are profitable deviations in front of each other. If twin is strictly better than SDR in front of deviant (condition [A2] is met), then SDR is necessarily better than deviant in front of twin (condition [A3] is violated). Also, vice versa, if deviant is strictly better than SDR in front of twin (condition [A3] is met), then SDR is necessarily better than twin in front of deviant (condition [A2] is violated). This has the following consequence: there might exist a twin better than SDR in front of a given deviant, and this twin can rise to fixation (if condition [A2] is met). But if twin gets fixed and becomes the new resident, deviant is not favored and remains rare (because condition [A3] is then necessarily violated). Therefore, twin does not express its difference relative to SDR. As a result, the vast majority of actions expressed at equilibrium remain exactly identical to the set of actions expressed by SDR in front of itself. Summary of the Constraints on Parameter Values This section summarizes the sufficient conditions for SDR to be a stable evolutionary endpoint. (1) Cooperation must be beneficial in situation 1 but not in situation 2; that is, b1 1 c 1 and b 2 ≤ c 2. (2) Cooperative interactions must be frequent and stable enough for cooperation to be paid back. This leads to the sufficient condition (A9) (the sufficient and necessary condition is condition [A7]). (3) The benefit of information must not outweigh the cost of cooperation in situation 2. This leads to the sufficient condition (A12). (4) Mutually beneficial cooperation must be impossible when a player is in situation 2. This implies the sufficient and necessary condition (A13). (5) Payments after a future situational change must be unattractive. This implies the sufficient condition (A14) and (under the assumption that d, j12, and j21 are all small) the more illustrative sufficient condition (A15). Overall, these five conditions are shown to be sufficient, but not necessary, for SDR to be a stable evolutionary endpoint. Even though they are realistic, taken together, they are indeed restrictive. Let me consider them one by one. Condition 1 does not raise any particular issue. In any given biological system, it is always possible to categorize social interactions depending on whether they are beneficial or costly. Condition 2 is only moderately constraining. It shows that reciprocal cooperation evolves only if interactions are sufficiently stable and if there are not too few individuals that benefit from it. This is sensible and should be easily met in at least some biological systems. Condition 5 has two consequences. First, it restricts the applicability of the model to cases where interactions are short, relative to the rate at which individuals change situation. Otherwise, “metareciprocity” could evolve, in which payments occur after players’ situations have changed. Second, together with conditions 3 and 4, it restricts the range of authorized values for bi and ci. This probably constitutes the strongest limitation, because it prevents the model to be applicable per se to social behavior. In real life, individuals are not in either one of two opposite situations but rather lie on a continuum. It is certain that at some point, an individual would arise with bi and ci that violate condition 3, 4, or 5. This constitutes an undeniable limitation of the model’s applicability. I believe that the solution to this last problem lies in further development of the model in the direction of more continuous payoff variability, such that individuals’ behavior can inform others about subtle, quantitative aspects of their social incentives rather than in an all-or-nothing manner. The idea put forward in this article is that social behaviors reveal something about individuals’ incentives. Yet the simplicity of the present model still makes it possible for certain behaviors to reveal nothing but deviance. In a more comprehensive model, any behavioral variation should be interpretable as a signal of some underlying incentives.

6