-
Notifications
You must be signed in to change notification settings - Fork 0
Strategies
In the tit-for-tat strategy, the agent begins by cooperating, and then plays whatever strategy the other agent played. For example, let P1 be an arbitrary agent and let P2 be the tit-for-tat agent. Then a sequence of choices might be something like:
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
P1 | D | C | C | D | D |
TfT | C | D | C | C | D |
Notice how P2 started off cooperating, and then just copies the previous choice by P1.
The tit-for-two-tats strategy is similar, except that P2 only defects if the other agent defects twice in a row, but cooperates immediately after the other agent cooperates. So, if I let the sequence (A1,A2) represent the past two choices from the other agent, then Tf2T plays the following strategy:
Their Sequence | My Response |
---|---|
(C,C) | C |
(C,D) | C |
(D,C) | C |
(D,D) | D |
Thus, the following example represents a proper response for a Tf2T agent:
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | |
---|---|---|---|---|---|---|---|---|
P1 | D | C | C | D | D | C | C | C |
Tf2T | C | C | C | C | C | D | C | C |
The Pavlov strategy begins by cooperating. On subsequent turns, whenever the two agents didn't agree on the previous last play then the Pavlov agent defects; otherwise, the Pavlov agent cooperates. So, if I let the pair <A,B> represent the choices that the two players made on the last round, then Pavlov plays the following strategy:
<Your last choice,My last choice> | My Response |
---|---|
<C,C> | C |
<C,D> | D |
<D,C> | D |
<D,D> | C |
This says that if we cooperated last time then I'll cooperate this time. If we both defected, I'll take a chance that you might want to cooperate this time. If you defected and I cooperated, I'll defect next time. And if you cooperated and I defected, I'll try to get away with defecting again this time.
An example sequence of how Pavlov would play against a made-up agent is shown below:
0 | 1 | 2 | 3 | 4 | 5 | |
---|---|---|---|---|---|---|
P1 | D | D | C | D | D | D |
Pav | C | D | C | C | D | C |
The WinStay/LoseShift (WSLS) strategy begins by cooperating, but then changes its behavior (C goes to D or D goes to C) whenever the agent doesn't win. Winning occurs whenever the agent gets either its most preferred or next most preferred result. For example, let P1 be an arbitrary agent and let P2 be the WSLS agent. Then a sequence of choices might be something like:
0 | 1 | 2 | 3 | 4 | 5 | |
---|---|---|---|---|---|---|
P1 | D | D | C | C | D | D |
WSLS | C | D | C | C | C | D |
Whenever WSLS wins (CC or CD --- where P1's choice is listed first and P2's choice is listed second), the WSLS agent replays its choice (C or D, respectively); i.e., it stays with the choice. Whenever WSLS loses (DC or DD), the WSLS agent changes its choice (D or C, respectively); i.e., it shifts its choice.