# The prisoner’s dilemma

The most frequently cited game of the game theory is the so-called prisoner’s dilemma. Its name is originated from the following scenario: you and your gangmate are busted. The police have only very weak evidence, so both of you are facing “only” one year in prison. But the inspector wants to show up more success, so he offers you a deal: if you squeal on your gangmate – which will result him a three-year sentence – the prosecutor will “let you free” or more precisely one year of your sentence will be taken off. This sounds good, until you realize that your gangmate is being offered with the same deal – which would yield also for you two extra years in prison. If both of you betray, then both of you receive three years sentence reduced with the one year for betraying. It is altogether two years in prison for both of you.

The payoff matrix of prisoner’s dilemma:

BA | B cooperates | B defects |

A cooperates | -1 -1 | 0 -3 |

A defects | -3 0 | -2 -2 |

(In this game theoretical context cooperation refers to the cooperation with the other participant and not with the police 🙂

For the sake of pure mathematical modelling the following circumstances used to be added to the discussion of the dilemma:

- The game is played one time;
- The participants will never meet again.

So, there is no chance for retorsion.

With these circumstances the dilemma has two important traits:

- From the individual’s point of view defection yields always better – minus one year in prison – than cooperation. In case of defection you receive zero year instead of one year sentence if the other participant cooperates, and you receive 2 years instead of three years sentence if the other participant defects. It means that defecting is the so-called “dominant strategy” in this game.

If both parties defect, then no one of them has a better choice from their selfish point of view while the other participant does not change his decision. This situation is called Nash equilibrium after the mathematician John Nash. The Nash equilibrium of the prisoner’s dilemma is when both participants are defecting. - From the community’s point of view cooperation yields always better – minus one year in prison – than defection. In case of cooperation the community receives two years instead of three years sentence if the other participant cooperates, and the community receives 3 years instead of four years sentence if the other participant defects. It means that from the community’s point of view cooperating is the dominant strategy in this game.

From the community’s point of view, it is not only an equilibrium but an optimum.

**We can summarize the two above mentioned traits of the dilemma in one sentence: the interest of the community and the interest of the individual are in conflict at both participants and in case of any decision of the other party. This makes the “prisoner’s dilemma” dilemma.**

There is a generalized form of the payoff matrix of the dilemma in which the values are represented by letters: If both players cooperate, they both receive the reward R for cooperating. If both players defect, they both receive the punishment payoff P. If one of them defects while the other cooperates, then the defector receives the temptation payoff T, while the cooperator receives the “sucker’s” payoff, S.

From the individual’s point of view:

BA | B cooperates | B defects |

A cooperates | R R | T S |

A defects | S T | P P |

From the community’s point of view:

Community | B cooperates | B defects |

A cooperates | R + R | S + T |

A defects | S + T | P + P |

The two important traits can be summarized in four inequalities:

- T > R and P > S means that the individual’s self-interest is always the defection.
- 2R > T+S and T+S > 2P means that the community’s interest can be served always by cooperation.

From these relations one can derive the generally mentioned condition of the dilemma: T > R > P > S. But from this generally used condition do not ensue the original inequalities. For example, if T > 2R–S then the relations between T, R, P and S does not change, but the payoff for the community is greater if one and only one of the participants does not cooperate. The temptation (T) is too high, and also morally reasonable: “you can do for me this small favour as my payoff is growing with more than your loss”. Similar situation occurs when S < 2P–T. S is still the least yield in the row, but the “sucker’s” reasoning is morally acceptable: “No one should expect from me such a big suck. If I defect, your result will be a little worse, but mine will be much better, so you can afford it for me.” The community’s payoff is higher in case of dual defection than if one of the participants cooperates.

# The prisoner’s certainty

At first sight it seems that from the individual’s point of view the only rational decision is to defect. But at second sight people realize that the other participant is also a human being, not a cow, a mouse or a fly. Even more in the original story of the prisoner’s dilemma the other participant is not simply a human being but a guy from the same gang, who have been socialized in the same district on the same streets and with the same gangmates. So, there is no reason to believe that the other participant will think differently than me. If the other thinks the same as I do, then the lower left and the upper right field of the payoff matrix is not applicable. In that way it is clear, that the rational decision is to cooperate.

BA | B cooperates | B defects |

A cooperates | -1 -1 | NA |

A defects | NA | -2 -2 |

It was Douglas Hofstadter, who first wrote about this kind of solution of the dilemma calling this type of rational thinking “superrational”. Although the only requirement for this glory is to realize that the other party is also enabled to think.

On the other hand, one could argue like this: “We humans are so different! The assumption that two of us will think on the same way is not realistic.” Yes, we can agree with this pragmatical approach, but now we were talking about mathematical models, and in the model the participants are similar. In the reality can be different, but there are other circumstances like direct and indirect reciprocity, retorsion and reputation which are disclosed from the model but exist in the reality. If we play math, then we prepare models. Not surprisingly in experiments humans display systemic bias towards cooperation. Probably its’ motive is not only the “superrational” thinking, but also that we can hardly empathize that there will be no other turns, there is no chance to retorsion, there is no such thing as reputation. We can’t play the game “clear”, without our social background, without our stereotypes.

**So, the good news is that we, humans, do think, and socialized for cooperation.**

### Questions:

In Axelrod’s iterated prisoner’s dilemma (IPD) tournament the winner strategy was Tit for Tat, which is well known from the Old Testament: eye for an eye, tooth for tooth. And from the individual’s point of view it is a very good strategy. It is also clear, that cooperation is a very good strategy from the community’s point of view, because it always increases the payoff on the community’s level. But till now we did not consider learning! Is it enough for teaching the defectors to set the good example of cooperation?

So, the question is: On long term – considering the effect of teaching – what is the best strategy from the community’s point of view? How can we model teaching and learning? Which is the best teaching strategy?