Iterated Prisoners Dilemma with limited attention

How attention scarcity effects the outcomes of a game? We present our findings on a version of the Iterated Prisoners Dilemma (IPD) game in which players can accept or refuse to play with their partner. We study the memory size effect on determining the right partner to interact with. We investigate the conditions under which the cooperators are more likely to be advantageous than the defectors. This work demonstrates that, in order to beat defection, players do not need a full memorization of each action of all opponents. There exists a critical attention capacity threshold to beat defectors. This threshold depends not only on the ratio of the defectors in the population but also on the attention allocation strategy of the players.


Introduction
Games and economic models are more interrelated than one can imagine [1]. This is also the case for social interactions. A simplistic virtual setting for simulating a trust in an e-commerce setting, would be the Iterated Prisoners Dilemma game which is, by its nature, very related to the evolution of a trust [2,3]. Each transaction in an e-commerce setting can be viewed as a round in an iterated prisoner's dilemma game. Adherence to electronic contracts or providing services with good quality can be considered as cooperation while the temptation to act deceptively for immediate gain can be considered as deception.
Economy is the study of how to allocate scarce resources. According to Davenport, the scarcest resource of today is nothing but attention [4]. Attention scarcity is first stated by Herbert Simon. He says that, "What information consumes is rather obvious: it consumes the attention of its recipients" [5]. The new digital age has come with its vast amount of immediately available information that exceeds our information processing power. Thus, attention scarcity is a natural consequence of huge amount of information. Attention is very critical to any kind of interaction, especially in the era of digital technologies. Conventional Economy has been transforming itself to the Attention Economy [4,[6][7][8]. Games should do the same. Little work has been done on games with limited attention. How does attention scarcity effect a game? We will discuss attention games in a specific context of Iterated Prisoners Dilemma.

Iterated Prisoners Dilemma game
Prisoners Dilemma game is one of the commonly studied social experiments [2,3,[9][10][11][12]. Two players should simultaneously select one of the two actions: cooperation or defection, and play accordingly with each other. Dependent on their choices, they receive different payoffs as seen in figure 1.
Payoff matrix can be described by the following simple rules. In the case of mutual cooperation, both players receive the reward payoff, R. If one cooperates, while the other defects, cooperator gets the sucker's payoff, S while the defector gets temptation payoff, T . In the case of mutual defection, both get the punishment payoff P . Payoff matrix should satisfy the inequality S < P < R < T and the additional constraint T + S < 2R for repeated interactions. Rationality leads to defection, because R < T and S < P makes defection better than cooperation. But, at the same time, P < R implies that mutual cooperation is superior to mutual defection. So, rationality fails and this situation is referred to as a dilemma. It is well known that the defection is the individually reasonable behavior that leads to a situation in which everyone is worse off [2]. On the other hand, cooperation results in the maximization of the joint outcomes [11].
If two players play prisoners dilemma more than once and they remember previous actions of their opponent and change their strategy accordingly, the game is called Iterated Prisoners Dilemma (IPD) [12]. Despite its level of abstraction, a large variety of situations starting from daily life (i.e., stop or go on when the red light is on?) to socio-economic relations (i.e., fulfill or renege on trade obligations?) may be represented as an IPD game. It is shown that repeated encounters between the same individuals foster cooperation. This is often referred to as the shadow of the future. If individuals are likely to interact again in the future, this allows for the return of an altruistic act [2, 10].

Attention in games
In general, a player is not capable of knowing all the players in an interacting environment and usually acts based on a limited information. One reason could be the huge number of players, or another could be that the players may have a very limited memory size to be informed of all the others [13,14]. For example, in real life, a market has a few market leaders and many small brands whose number, in general, is simply too large for consumer to remember all of them. Therefore, a consumer can only have access to a limited number of service providers. The essence of any game is to interact with other players and get a chance to improve the payoff one gets. To interact with others, one should first capture their attention in a positive manner. When we give our attention to something, we always take it away from something else. We can think of having attention as owning a kind of property. This property is located in the memory of a player.

IPD game under limited attention
In many studies related to IPD game, it is assumed that there exists enough memory to remember all the previously encountered players and their actions. Memory is an important aspect, because knowing the identity and history of an opponent allows one to respond in an appropriate manner. We use the term limited attention to indicate the existence of an upper bound on how many distinct encounters are remembered by a player. We ask the following reasonable question, as in reference [14], what if the memory size is limited? The same question can be reformulated as follows: what if attention capacity is limited? In this study, we introduce attention capacity as an important parameter to investigate the dynamics of the mentioned game.

The model
Researcher Tesfatsion introduced the notions of choice and refusal into IPD games [3]. In order to choose or refuse an opponent, players should be able to remember the identity of each player and their past behaviors. It is known that the choice helps players to find cooperation while refusal lets them escape from defection [3]. In our very simplistic model, we consider that there exist two type of players: cooperators, who always cooperate, and defectors, who always defect. We combine these pure strategies with a simple choice-and-refusal rule: If a player knows that the opponent is a defector, then he or she refuses to play. Otherwise he or she plays.

33001-2
Each round of the IPD game consumes a limited attention of its players. We assume that every player has the same attention capacity M. When a player encounters an opponent, he stores the necessary information related to the opponent's action in his memory. After playing with M different opponent, the attention capacity fills up. As the player encounters more opponents, he will have the problem of attention scarcity. He has to forget the previously encountered ones. To use ones memory efficiently, one needs to decide whom to forget? In this respect, in section 4.1 we will discuss 5 different attention allocation strategies. Like the rest of the literature, we focus on the conditions under which "cooperative move" becomes more favorable. However, our research considers that the game takes place in a world with a limited attention.
The personality of a player (cooperator or defector) is randomly set. Remember that once the personality is set, it never changes. In each iteration, two individuals are randomly chosen to play the game. In this respect, there is no spatial pattern. One considers that the underlying interaction graph is a complete graph.
Let C and D denote the sets of cooperator and defector players, respectively. Let N denote the set of all players, that is, N = C ∪ D. The number of defectors is denoted by |D|. Thus, the remaining |C | = N − |D| players are the cooperators, where N = |N |. We define our model parameters attention capacity ratio and defector ratio as µ = M/N and δ = |D|/N , respectively. Hence, we have 0 µ 1 and 0 δ 1.
We use the de facto payoff values of T = 5, R = 3, P = 1, and S = 0 throughout this study.

Evaluation metrics
Social welfare can be measured by the average payoff of players. The payoffs of all the encounters are added up to have the final outcome of each player. To make a comparison between the defectors and the cooperators, we take the average outcome of each. Let c i and d i be the numbers of games, where the player i plays with cooperators and defectors, respectively. We use the payoff matrix given in figure 1 to calculate the total payoff of the player i as follows: We evaluated our results by a comparison between the average performances of the cooperators and the average performances of the defectors. Our performance metrics are as follows: Although further investigations call for simulations, some analytical investigation of average performances is possible.

Cooperator's average performance
Cooperator's average performance ofP C can be analytically found. For a cooperator, to play with a defector means no gain, since sucker's payoff is equal to zero, that is, S = 0.P C can only increase if two cooperators play a round with each other. When two cooperators are selected to play with each other, each cooperator gets R = 3 points. The probability of matching two cooperators is equal to (1− δ) 2 . Among T = τN 2 /2 rounds, only (1 − δ) 2 T of them is expected to pass between two cooperators. As a result, |C | = (1 − δ)N cooperators share (R + R)(1 − δ) 2 τN 2 /2 payoffs. In other words, Without any further investigation, we can conclude that increasing τ, N and R is favorable forP C while increasing δ is not. Note that neither attention capacity M nor any attention allocation strategy has effect in this setting. If the population is composed of only cooperators, that is |C | = N and δ = 0,P C will be RτN .

Defector's average performance
Due to the choice and refusal rule, if an opponent is known to be a defector, no player plays with him. Therefore, in order to obtain the defector's average performance ofP D , we need the probability of a defector j ∈ D to be unknown by player i ∈ N . This probability cannot be analytically found except for the special cases of players without memory and players with unlimited memory.

Players without memory
When players have no memory, i.e., attention capacity is zero, they are totally forgetful and remember nothing. Note that this case actually corresponds to a player playing prisoners dilemma without realizing that they are playing repeatedly. As a result, players continue to play with defectors in spite of the choice and refusal rule. The probability of matching a defector with a cooperator is equal to 2δ(1 − δ) while matching the two defectors is equal to δ 2 . Therefore, for a special case of µ = 0, we havē for T = 5 and P = 1. We observe that increasing the number of defectors is not favorable even for defectors. Nevertheless, it is easy to verify that for µ = 0,P D is always greater thanP C which can be stated as defection is a favorable action against the players with no memory.

Players with unlimited memory
For a special case of M N , the players are no longer forgetful and they are able to remember each opponent's last action. Due to the choice and refusal system, any defector can play at most |C | rounds with cooperators and |D| − 1 rounds with defectors. Therefore, for a sufficiently large τ, we havē We can conclude that as we increase the number of defectors in this setting, the average payoff of the defectors again decreases.

Simulations
The dynamics of a system is further investigated by simulation while the attention capacity ratio µ and the defectors ratio δ vary. The model is simulated for every possible attention capacity values of M (from 0 to N ) and for every possible number of defectors (from 0 to N ). We study a population of N = 100.
The number of iterations, T , is another critical issue. It is set to T = τ × N 2 /2 since there are N 2 pairs, where τ, being the third model parameter, is the number of plays for a pair of players. Note that, when τ = 1, no two players are expected to meet again during the simulation. This situation corresponds to a non-iterated version of the game. In order to see the effect of time, τ is set to 2 and 5. The results were averaged over 20 independent realizations for every combination of parameter values.

Attention allocation strategies
Some people are positive and remember only good memories. On the contrary, some remember bad events and live to get their revenge. Motivated by these, we make a comparison of 5 simple attention allocation strategies based on forget mechanisms: (i) Players that prefer to forget only cooperators, denoted by FOC. (ii) Players that prefer to forget only defectors, denoted by FOD. (iii) When players have no preference, they can select someone, uniformly at random, to forget. We call this strategy as FAR. (iv) Players may also prefer to use coin flips to decide which type, namely, cooperators or defectors, of a player to forget. Once the type is decided, someone among this type is randomly selected and forgotten. Let FEQ denote this "equal probability" to types approach. (iv) If the knowledge of which type has the majority is available, this extra information can be used in devising a strategy. One possible effective strategy could 33001-4 be to assume that the opponent is of the type of majority, hence, pay attention to the minorities only. That is, one prefers to forget the majority which we call FMJ strategy.
We investigate the average performances of cooperators and defectors when they use the same strategy.

Observations
In this section, for a more general view, we present our observations based on our simulation data.
With our essential parameters of µ, δ, and τ along with the different attention allocation strategies, we can determine the conditions under which cooperation is more favorable than defection.
Simulation results for various values of attention capacity ratio µ and defector ratio δ are given in

Average performance of defectors
The second row of figure 2 can be interpreted as follows: (i) Greater attention capacity, i.e., an increase in µ, helps players to remember the defectors. As a result, defectors experience social isolation and their average payoff severely diminishes. (ii) An increase in the number of defectors, i.e., an increase in δ, leads a competition among them. Thus, defectors' average payoff again diminishes. (iii) Note that all five plots are in agreement with our discussion in section 3.2.1 and section 3.2.2 for special cases of µ = 0 and µ = 1.

Attention boundaries
We refer to theP C −P D = 0 contour lines, seen in the third row of figure 2, as the attention boundaries. An attention boundary determines a favorable action. If a pair of (µ, δ) remains inside the attention boundary, it meansP C −P D > 0 and cooperation is a favorable action, otherwise defection is a favorable action. Attention boundaries for five different attention allocation strategies seen in figure 2, are visually superposed in figure 3(b) for the sake of comparison.
For a given defector ratio, we observe that there is a critical threshold for attention capacity, below which defection is advantageous, and above which cooperation becomes a favorable action. With lesser attention capacity, defectors can be easily overlooked. Greater attention capacity along with the choice-and-refusal rule do not let defectors improve their payoffs. Due to the degrading of defector's performance, the average payoff of cooperators manages to exceed that of defectors when players have a greater attention capacity.

Attention allocation strategies
We consider a strategy better if it has a larger area, where cooperators are doing better than defectors, in the µ, δ plain. That is, a better strategy has more (µ, δ) pairs below its attention boundary. From this perspective, the best strategy is FOC, and the worst one is FOD. All the remaining strategies are located in between these two strategies.
The forget majority, FMJ, is a mixed strategy. When 0.5 < δ, defectors are the majority and FMJ acts as if they forget only the defectors. When δ < 0.5, cooperators are the majority. Thus FMJ switches to forget only cooperator. Therefore, its plot is similar to that of FOD for 0 < δ < 0.5 and that of FOC for 0.5 < δ < 1. FMJ strategy can be put differently as allocation of the minority. One can think that this strategy is better than the rest, since scarcity, in general, triggers the perception of greater importance.

33001-6
Nevertheless, figure 3(b) is against this intuition. The optimal strategy is to forget only the cooperators. By doing so, players manage to allocate their memories only for defectors. In other words, they keep their enemies closer. Thus, they become more prudent to the defectors. On the other hand, forgetting defectors seems to be the most wasteful and carefree attention consuming habit. We observe that the necessary information for refusing the defectors is dismissed while applying the FOD strategy.
The critical value of δ = 0.5 determines which strategy is superior, except for the two extreme strategies of FOD and FOC. FEQ does better than FAR when 0.5 < δ and FAR does better than FEQ when δ < 0.5.
Even if FAR strategy seems identical to FEQ strategy, there exists a slight difference between them. Notice that, forgetting at random depends on the content of the memory, while forgetting with equal probability does not. Higher defector's ratio, that is 0.5 < δ, causes one to encounter more defectors. In that case, memories of the players would be plentiful with defective experiences. Thus, forgetting at random would be more biased towards FOD. Similarly, forgetting at random would be more biased towards FOC when δ < 0.5.

Effect of time
Literature on IPD game suggests that as the number of iterations increases, the cooperative behavior also increases among the players [2,10]. This is also verified by our simulations. The shadow of the future can be quantified by the parameter of τ. A short shadow of the future (lesser τ), hinders the detection of the defectors. When the future of the shadow is longer, lesser attention capacity would be sufficient for cooperators to beat the defectors. As τ increases, defector's performance gets worse in comparison with cooperators. Attention boundaries obtained by setting τ = 2 and τ = 5 are given in figure 3(a) and figure 3(b), respectively. The area inside the attention boundaries is much larger in figure 3(b) than in figure 3(a). This finding suggests that the shadow of the future fosters cooperation.

Conclusions
We observe that as the proportion of the defectors increases, the average payoff for any player decreases. On the other hand, an increase in the attention capacity has different outcomes for cooperators and defectors. As attention capacity increases, the change in the cooperators overall performance is almost negligible, but the defectors' performance significantly diminishes. The rule of choice-and-refusal plays an important role in this situation. Nevertheless, it is worth pointing out that even the choice-andrefusal alone cannot fulfill the desired goal without passing some threshold value of attention capacity. As the attention capacity increases, or the shadow of the future gets longer, the detection of the defectors gets feasible, consequently the defectors face a social isolation due to the rule of choice-and-refusal. As a result, the cooperators' performance exceeds the defectors' performance. Thus, cooperation becomes a favorable action. This work demonstrates that in order to beat a defection, players do not need a full memorization of each action of all opponents. This finding is really important especially in the world of a limited attention. We also investigate five different attention allocation strategies and we find out that the best strategy is "forgetting only the cooperators". By applying this strategy, one becomes more prudent to the deceptive actions. In conclusion, attention should be selective, and it should be directed towards the defectors and towards their defective moves.
In the present work, players are pure cooperators or pure defectors. They never change their character. Various forgetting strategies are investigated but both cooperators and defectors use the same strategy in a game. The situation of cooperators using one strategy and defectors another is left for the future work. It would be also interesting to study the effect of a biased payoff matrix. As a future work, we plan to investigate other means for fostering cooperation, even in the conditions of attention scarcity. To achieve this goal, we can make use of other experiences by taking recommendations to determine with whom to play. But from whom to take advice is very critical and must be well studied to clarify which collaboration strategy is better. We will also extend our work to the mixed strategies for interaction, such as "mostly defect" and "mostly cooperate".