Diplomacy - EP Chapter 2 - Issue 298 From: nick@sunburn.uwaterloo.ca (Nick Fitzpatrick) Date: Mon, 26 Oct 1992 14:27:10 +0000 Issue #298 of Chapter Two of the Electronic Protocol By Nicholas Fitzpatrick (nick@sunburn.uwaterloo.ca) October 26, 1992 --------------------------------------------------------------- Electronic Protocol Games played on the Diplomacy Adjudicator --------------------------------------------------------------- **** TABLE OF CONTENTS **** PART ONE - Opinions, Letters, and Editorials: Challenges in Playing Multiplayer Games This is a special, issue of EP Chapter 2, that features an article that Danny Loeb has written in respect to programming the Diplomat, and the Bordeaux Diplomat. I have reformated it from the LATEX text which Danny sent me (many weeks ago), and have presented it below, there are a couple of places where I have probably messed up the translation. . . . Danny wrote the article for publication in a journal, however I do not know which one. If you want further information of the DPP, or the Bordeaux Diplomat, or a LATEX copy of this article, please contact Danny Loeb. ***** PART ONE ***** ***CHALLENGES IN PLAYING MULTIPLAYER GAMES *** by Daniel Loeb (loeb@geocub.greco-prog.fr) LABRI Universite de Bordeaux I 33205 Talence, France Dedicated to my daughter Gabrielle Jeanette Loeb Whereas a great deal of work has been done on two-player games, multiplayer games are only beginning to become understood both on a theoretical and on a practical level. In so far as one studies games in an attempt to model the real world, this is a terrible loss, for large number of participants are involved in most real world conflicts. The game of Diplomacy was invented in 1959 by Allan Calhammer to simulate one such conflict: the First World War. This game was chosen as a test bed for our multi-player game playing theories. As in any multi-player game, positions are extremely difficult to analyze. The mathematical machinery developed to analyze two-player games crumbles, and must be rebuilt from scratch. In the final analysis, the most important factor in evaluating a position of a multi-player game is not located on the game board, but rather in the minds of the players. For this reason, in any multi-player game negotiation become necessary in order to determine, and perhaps modify these attitudes. Diplomacy is an extremely rich example of a multi-player game. The number of players allows a large number of free-wheeling alliances without degenerating into total chaos. Many heuristics exists by which to measure a player's progress to victory, and there are a large number of points which need to be negotiated during the formation and continued operation of an alliance. Furthermore, just as real life, Diplomacy is not a sequential game. Moves must be submitted simultaneously for all players for all of their units. Thus, normal search techniques either no longer apply or get bogged down in the tremendous size of the game tree, for it is impossible to complete even a one-ply search. Finally, Diplomacy is an extremely suitable testbed for different approaches to multi-player games, since it is sufficiently popular to encourage unrelated groups of researchers to work on the game. According to legend, the first play-by-mail diplomacy game (1963A) included a diplomat (diplomacy automata) by Dave McDaniel which was eliminated in 1903 (Berch, 1980). In 1982, the Avalon Hill Game Corporations marketed ``Computer Diplomacy.'' The program is designed primarily as a game master, but has very limited playing capabilities as well. At the 1989 Conference on Computer Games, Kraus, Ephrati, and Lehmann introduced a diplomat they developed over three years at Hebrew University (Kraus, 1989A, 1989B). Their program is very good, and negotiates like a human. It constitutes the first serious attempt at a computer diplomacy program. During the 1991 Conference on Computer Games (Hall, 1992), Michael Hall and the author elaborated several ideas by which a computer could play the game Diplomacy. Whereas the Hebrew University diplomat is based primarily on the negotiation, our diplomat is based primarily on the strategy of the game. (See Hall, 1992, for details. The present article outlines the Diplomacy Programming Project's progress since last year and connects our ideas with a new mathematical theory developed for multiplayer games. (See also Loeb, this volume.) In order to make the article as accessible as possible, we have preferred to omit all references to the detailed rules of the game. (Calhammer, 1956) In fact, most of the ideas discussed below can be be applied to any multi-person conflict. I would like to express my appreciation to Constantin Staykoff for his dedication and hard work in implementing the strategic module of the Bordeaux Diplomat, and Ken Lowe for the use of his Diplomacy Adjudicator. I would further like to thank all the members of the Diplomacy Programming Project for their encouragement and help. PLAYER POSITION EVALUATION Traditional Techniques In theory, the minimax algorithm can be used to play any two-player, deterministic, sequential-move game perfectly. ----------------------------------------------------- Minimax Algorithm A tree is drawn representing the game with the current position as root. The children of each node represent the various positions obtained after making all of the possible moves. The type of a node is given by the player whose turn it is to move. Assign position values to all leaves. Increasing values indicate better results for player A. For example, in a game with win, loss, and draw, we can assign win=1, loss= -1 and draw =0. For each internal node, calculate the value of the position assuming perfect play on both sides. For an A-node, the value is the maximum of the children's values. For a B-node, the value is the minimum of the children's values. A should choose a move from the root node with maximum value. ------------------------------------------------------ However, due to time constraints, it is seldom possible to analyze the entire game tree. Therefore, game programs often use various search techniques (such as alpha-beta (Knuth, 1975) and approximative evaluation functions which analyze the position using different heuristics, and thus need not search to the end of the game. However, the above analysis guarantees that there exists a perfect strategy to which any approximation thereof can be compared. However, what happens when a third player is added? For simplicity, assume that one player will win, and the other two players lose. However, similar remarks hold in the absence of such assumptions. A game tree can still be formed as above with values assigned to the leaves. However, problems are encountered when minimaxing the values back up the tree. Let n be an A-node in the tree whose children have all been evaluated. v(n) must now be calculated. Obviously, if n has a child whose value is equal to an A-win, then this child is a good strategy for A, and the value of $n$ will be a win for A. Similarly, if all of the children of n are wins for say B, then n is itself a win for B. However, what happens if some of the children of n are wins for B and some are wins for C? It is then impossible to evaluate n without some psychological information regarding A's attitude to such situations. His choice of move is indifferent, yet greatly affects the strategy of the other players. Queer Games Jim Propp calls such positions ``queer'' (Propp, Unpublished). For example, in the game of Nim played with 3-players, Players take turns removing stones from piles. Any number of stones (at least one) can be removed from a pile. The player to make the last move wins. 0+0 is a win for the previous player. Thus, 1+0, 0+1, 2+0 and 0+2 are wins for the next player. Hence, 1+1 is a win for the following player. Therefore, the position 1+2 is queer as it yields a choice for the next player betweens wins for the previous player and the following player. Are queer positions very common? If not, then we can hope to frequently apply the traditional techniques mentioned above. Unfortunately, given any impartial game (An impartial game is one in which to answer the question ``What moves are possible?'') you need not pose the question ``Whose turn is it.'' Thus, Chess is not an impartial game, since only White can move White's pieces G, the game 2+2+G is queer (Propp, Unpublished). Thus, impartial games large enough to contain a 2+2 component are queer. A simple probabilistic argument shows that virtually all games are queer. Let a,b,c,q represent the probabilities that a position is a win for any given player or that it is queer respectively. Without loss of generality, we consider games in which all internal node have two children. By symmetry, a=b=c. Thus, the probability to be queer q=1-3a is equal to the probability that neither choices is a win for the current player, yet both choices are not wins for the same opponent (1-a)^2- 2a^2. The only solution that represents a valid probability is a=0. Thus, q=1. Sums of Games A traditional technique in analyzing games has been to break down the game into a sum of simpler games. The game G+H is the game in which players may play in either G or H but not both. The game is over when play is no longer possible in either summand. The last player to have played is the winner. Typically, the most powerful results have been found for impartial games. The two possible results of a two-player game are N win for the next player, and P win for the previous player. Adding Two-Player Impartial Games + N P ----------------- N N or P N P N P The above table gives the sums of all two player impartial games except N+N. In fact, the sum of two N's depends on the particular games, yet a theorem of Grundy allows us to without loss of generality consider only sums of games of Nim with one pile. In these trivial games, the P positions are equivalent to empty piles, and N positions are equivalent to non-empty piles. The sum of two N positions is N itself if and only if the sizes of their associated Nim piles are distinct (Berlekamp, 1982). For three player impartial games, the possible sums are quite complex. The sums are calculated on the premise that play rotates NFPNFPNFP . . . Adding Three Player Impartial Games + P N F Q ---------------------------------- P PQ NQ FQ Q N NQ NFQ PNQ NQ F FQ PNQ NQ NFQ Q Q NQ NFQ F Only in one particular case (P+Q) is the sum predetermined. To determine the value of the sum, there are two further problems. First, there is no generalized theorem of Grundy. Grundy's proof supposes that all ``reversible'' moves can be ignored. However, in three-player impartial games even if P's move is reversed by N, he may have gained Zugzwang, and placed F in a difficult position. Second, as can be seen below, even the game of Nim is no longer trivial once there are three players. Classifying Queer Games In the above classification, queer games are all considered to have the same value. Indeed, it has been believed (Neumann, 1928) that all three-player games are strictly-determined or symmetric non-strictly determined. However, are queer games always symmetric? For example, suppose N could choose between two queer positions: one where F would name N or P as winner, and one where N would name F or P as winner. Perhaps, N would prefer the first choice even though both are queer games, since he would then retain some chance of winning. The two positions above must then be given different values: f_1 and n_1 respectively. Obviously, p_1 can be defined in a similar manner. Now, denote forced wins by f_0, p_0, and n_0 respectively. N has the following preferences n_0 > f_1 , p_1 > f_0 , p_0. He is indifferent regarding f_0 and p_0. A position leading to such a choice can not be simplified by his opponents, and must be labeled simply n_1. Is N necessarily indifferent between f_1 and p_1? Unfortunately, we can not answer this question until we clarify how a player makes a choice between equally interesting possibilities. One assumption is that he makes a random move. In that case, f_1 may be preferred to p_1 or visa versa. However, consider a game with additional moves equivalent to certain other moves. For example, in Chess allowing the Rooks to be moved by either hand and all other pieces to be moved only by the left hand. Is it then more likely the player will move his rook? Although plausible, this is not a very pleasing model since it does not allow us to prune redundant branches from our game tree. Instead, I propose the model by which the players establish equivalence classes of positions. Two positions are equivalent if the equivalence classes of their non-dominated options are in one-to-one correspondence via equivalence. Moves are made by randomly choosing a non-dominated equivalence class, and then selecting a move within this class. In this way, f_1 and p_1 are equivalence classes of equal interest to N. He makes his choice randomly between the two. This choice between F and P to select the eventual winner can be represented as n_2. Similarly, n_3 is the choice by N of who will chose the person who will chose the person who selects the winner. In fact, it can be shown that any position in a 3-player finite sequential perfect-knowlege game is equivalent to exactly one of the equivalence classes n_i,p_i,f_i where i is an integer. n_i+1 for example is defined to be a choice for N between p_i and f_i. Thus, we have the counter-intuitive result that no game is symmetric under the hypotheses above. Calculations with these refined classes become even more complicated. For example, whereas Nim with two piles is completely trivial. The next player wins unless the two piles are of equal size. with two player, there is no apparent rhyme or reason in the 3-player 2-pile Nim table. More players Until now, we have only discussed games with three players. When a fourth player is added, the refined classification (n_i,f_i,p_i) breaks down completely. However, the crude classification scheme N,F,P,Q can be extended. Instead of four types of games, now twelve are needed. [1-4] Forced win for a certain player. [5-8] An alliance of three players is needed to stop a certain player from winning. [9-12] One player plays no significant role whatsoever and the 3 remaining players participate in a ``Queer'' game. (In other word, any of the 3 players can be stopped by the other 2.) Under this view, game types are in one-to-one correspondence with voting schemes. (Isbbell, No Date) Number of Voting Schemes n 0 1 2 3 4 5 6 7 ---------------------------------------------- v(n) 0 1 2 4 12 81 2646 1422564 A voting scheme consists of a set of voters V, and a set of winning coalitions W subject to the condition that if V\supseteq B\supseteq A\in W then B in W, and if A=V-B then exactly one of A and B can be a member of W. In other words, a voting scheme is an order preserving map f from the subsets of V to -1,1 such that f(A)=-f(V-A). Given a game G and a partition pi-P of its players, we define the quotient game G/pi to be a new game with one player for each block (or alliance) of pi. The player B in pi in the game G/pi must make a move whenever it is the turn of some p in B. Thus, in G/pi there might not be simple rotation of play even if in G the players take turns playing in simple rotation. The moves available to B in the game G/pi are the same as those available to p in the game G. B wins the game G/pi if any p in B wins. A is said to be a winning (resp. losing) alliance of players in the game G if the quotient game G/A,P-A is a win (resp. loss) for the alliance A. The set of winning games is a voting scheme, since adding players to an alliance can only strengthen it, and given a pair of opposing coalitions exactly one can win while the other must lose. If the game includes results other the simple wins and losses, then the notion of a voting scheme must be generalized. A generalized voting scheme is a order preserving map from the subsets of V into the set of results {-n,-n+2, . . .,n-2,n} such that f(A)=-f(V-A). The play of a game can be viewed as the selection of a winning coalition. (Shapely, No Date) The difference between the coarse classification and the fine classification of game is that when a winning alliance is formed in fact the alliance itself does not win, but rather only one member of the alliance. Thus, in the end, most alliances are unstable. It is this unstability which the fine types f_i,n_i,p_i attempt to measure in 3-player games. In conclusion, just as two-dimensional projections are often insufficient to recreate a three-dimensional object, these coarse classifications based on two-player projections of a multi-player game do not contain the subtle nuances which give the game its interest. However, it is difficult to develop a single complete and rigorous theory as was the case for the theory of two-player games. NEGOTIATION The Theoretical Basis Three-player games are much more complicated than two-player games. Whereas, in a two-player game, there is only one adversarial relationship, in a three-player game, any of the three players can be opposed by an alliance of the other two players. The exact strategy derived depends on many ``psychological'' assumptions on the behavior of the other player's when confronted with choices of equal interest. For example, when a loss seems certain, many players will choose to share the defeat with the player most responsible for their loss. While maximizing your position against such a psychology, one must be careful not to give the impression that you are minimizing the position of any other player. Whereas, when a victory seems possible, players chose their winning coalition according to their estimated chances of having the victory accorded to them. Thus, all critical members of an alliance must be assured of the likelihood of victory for them. In any two player game, players can only be pleasantly surprised when their opponent makes a bad move. In that way, the player will often find a previously unavailable winning strategy. However, this is not the case in three-player games. Suppose for example that, N chooses f_0 (a forced win for F) when he could have chosen f_1 (by which F would have decided whether P or N would win). This is a bad move for N since he no longer has any chance to win. Nevertheless, P also loses from this error. Thus, in any three player game, negotiation is necessary for several reasons. First, as we saw above, each player will attempt to shift the blame for the other player's misfortunes on the third player. Here we discuss matters of loyalty and treason, and try to place an appropriate ``spin'' on the moves that we have made. Secondly, negotiation can be used to avoid the sort of error that N made in the example above. If P had convinced N of the danger that he was in, then perhaps N would have made the move that was in their mutual interest. Obviously, these two types of negotiation are difficult to separate. By giving a player useful suggestions, one can hope that his trust for you will be raised. On the other hand, you can suggest a move to a gullible player which is more in your interest than his. If you have promoted his trust in you up to now, then perhaps he will accept your advice (against his better judgement). On the other hand, if the other player no longer has any trust in you, then he will be less receptive to any advice you might give. In an extreme case, he might ignore his own normal preferences, and simply hand the victory to the third player. In a game with a very complicated search tree, communication does not merely serve for negotiation itself, but rather these suggestions allow a pair of players temporarily working together to exchange information, and thus examine the search tree in parallel. Diplomacy In order to better study the relationship between negotiation and multi-player games, we chose the game Diplomacy. Obviously, many other multi-player games exist. However, certain prohibit negotiation among players. For example, in the game of Poker, perhaps some information could usefully be exchanged among the player. However, this is normally forbidden. Some games such as 3-player Nim are so simple that there is little to discuss. Moreover, in any impartial game, there is a great symmetry which overshadows the negotiation. In a position n_1, what can the player F say to N different from what P might say in order to convince N to attribute the victory to him. On the other hand, although there is no inherent symmetry in the game Diplomacy, statistics show (Walker, 1979) that each country has an approximately equal chance to win the game. In fact, according to Alan Calhammer creator of the game, a perfect game of Diplomacy is one in which no country is eliminated, but rather everyone allies against the current leader until he is weakened and a new leader appears. All of this coalition building and rebuilding, generally involves much detailed negotiation. Furthermore, an effective alliance will require coordinated movements. Through these negotiations the participants in any alliance will determine which of the units will make attacks, and which will support those attacks. Moreover, the strategy of the game diplomacy is very rich. Thus, in proposing alliances with Turkey, Russia and Austria-Hungary will each be able to offer a number of advantages and disadvantages to Turkey. Depending on the temperment of the Turkish player, he can accept one of the offers, hold off while waiting further developments, or he could even pretend to accept both offers while intending to soon betray the trusts of one or both of the other players. Diplomacy Programming Project When testing game programs, we have learned much more from machine versus machine competitions than from machine versus human competitions. The humans are very unpredictable. Thus, a victory one day might not mean that you have improved your program, but rather that the human is tired, or has gotten overconfident, etc. On the other hand, computer tests are repeatable. One variable can be changed at a time, while hundreds of test games are played automatically. Using techniques such as this we are able to automatically fine-tune the position evaluation heuristics of a computer game program. Some programs use automated learning techniques; in this case, it is vital that the program be able to play thousands of games automatically. For a game like Diplomacy, it is even more important to emphasize automated testing. Since there are seven players, an ``average'' program should win about one seventh of the time. Thus, as compared with a two-player game, approximately three or four times as many test games should be run in order to statistically prove a similar percentage increase in the number of wins. Furthermore, the heuristic position evaluation used by the Bordeaux Diplomat contains a large number of parameters whose values should eventually be optimized through repeated tests. When testing a Diplomacy program with human players, we have seen that each player has many preconceived notions about the computer. These preconceptions leads to unwarranted prejudices for or against the computer which in turn influences the alliance formation process much more than any purely game-related factor. Thus, the real capacity of the program is clouded by a large number of extraneous issues. To avoid this test games against humans have been organized using Gunboat rules. By these rules, the identity of each player is kept secret, and no communication is allowed other than through the movements themselves. However, by eliminating communication, we have eliminated the negotiation which was our main object of study. Indeed, the ideal solution is to test our program against computer opponents. The first opponents one might think of are copies of our program, or alternate versions of our program. However, this is an extremely inadequate solution. Obviously, programs could recognize copies of themselves by sending special messages at the beginning of the game. The program would then ally with copies of itself, eliminate any remaining players, and finally declare a draw. Here, we assume that the programmer is trying to recognize his own program. However, similar effects can occur without such intentions on the part of the programmer. For example, it is reasonable, all things being equal, to follow advice from an intelligent player, yet how should a program judge the intelligence of others? The most reasonable criterion is that an intelligent player would suggest moves that you consider very strong. However, if the other program is a copy of yourself, then it will evaluate moves the same way you do, and then will probably suggest to you the very moves you would have thought of making yourself. Under such conditions, your program would inevitably ally with itself. Once again the subject at hand is being clouded by a host of irrelevant details. To avoid these problems, any diplomacy program should be tested against a set of programs written by different groups of programmers using completely different techniques. But how can seven diplomats programmed by different people carry out any sort of meaningful dialog? How should messages be transmitted between programs? And how can these messages be understood? In many computer game competitions, moves are transmitted by hand. The computers sit side-by-side while operators enter moves. This guarantees compatibility, but in a game of diplomacy, where for each turn 34 movements must be entered, and may be preceded by arbitrarily many diplomatic messages, the game would inevitably slow down to a standstill. Indeed, the test at the 1992 Computer Olympiad (Loeb, this volume) was terminated after 12 movement phases due to lack of time whereas it is important to run a large number of complete game in order to accurately measure the potential of a diplomacy program. To run the necessary number of games in a reasonable amount of time, the computers communicate directly. The Diplomat Interface has been developed by the Diplomacy Programming Project to facilitate such communication. Although written in LCS, (LCS is a variant of SML developed by Bernard Berthomeiu and the Outil et Logiciel pour le Communication group at the Laboratoire d'Automatique et de l'Analyse de Systeme in Toulouse.) LCS differs from SML in that it allows the management of parallel processing. it is capable of launching programs written in any language within the UNIX operating system. The standard output of each player-diplomat is rerouted via the use of an LCS instream to the LCS agent responsible for the communication with that player-diplomat. This agent then can output in the corresponding LCS outstream where it is available on a FIFO basis as standard input for the diplomat. To simulate the play of humans against machines, the Diplomacy Interface is capable of opening X or suntools windows for specified countries. Input and output for these windows are handled in a manner similar to that described above. Actually, it is impossible for a diplomat to determine who is playing the other countries: a human or a machine. Thus, no prejudices against humans can be built into the diplomats, and humans must perform a sort of Turing test if they wish to distinguish the machines from other humans. Each player communicates only with the Diplomat Interface. The Diplomat Interface acts as a game master. That is to say, players can submit their moves with the ``SUB'' command. When all the moves are in and all players are ready to go, or when the deadline arrive, results are computed and distributed with the ``ORD'' message. The current position is announced via the ``NOW'' message, the missing orders for next turn are requested with the ``MIS'' message. It is the Diplomat Interface that declares the game over either because a player has reached the victory condition of 18 supply centers, or because the players have agreed on a draw with the ``DRW''command. However, the Diplomat Interface is also a real interface; that is to say, it can pass messages from one player to another. It was decided that this communication should not be made in a natural language. The study of natural languages is an entirely separate and difficult field of research while we wish to concentrate on the problems directly related to negotiation and game playing. Thus, a simplified language called the DPP Protocol was devised by which the players would be required to communicate. (Loeb 1992A, 1992B) Tests using actual negotiation transcripts show that this simple language allows the expression of nearly all of the concepts employed usual when negotiating in Diplomacy. It is impossible to say that it is sunny outside, but it is easy to threaten a player with war if he takes one of your cities. (France: ``SND ENG (IFF DMZ(PAR BRE MAR) THN PCE ELS NOT(PCE))''.) This language was developed independently of the diplomat we are developing. Thus, one can hope that other programmers will find the language as convenient as we are. In order to make the language as rich as possible, a several dozen of predicates have been defined. In addition to their grammar, definitions of their suggested meaning and examples of their use are given. However, there are no guarantees that other programs will interpret the messages accordingly. One can always write a psychopathic program which attacks only those who propose peace. However, one could expect that like in the real world, psychopaths would have a terribly short life expectancy in Diplomacy. In consequence, most reasonable diplomats will start all relationships with the tentative assumption that its partner will react ``normally'' to any signals made by the diplomat. Certain programs might not understand all possible constructions in the protocol. However, for the most part, they should understand the messages we indicate in the dictionary as the most basic. This includes the message ``TRY''. Its purpose is that upon receipt of an unknown message, we respond with ``TRY'' and a list of constructions understood by our program. In this way, some sort of communication can be held between programs of widely differing levels of sophistication. THE BORDEAUX DIPLOMAT Division of Control The Diplomat proposed by Hall (Hall, 1992) is based on a strategic core. This lower brain of the diplomat does all the calculations and thus forms the basis for the decisions taken by the upper brain, or negotiator part of the diplomat. The negotiator does not know the details of the map, nor the rules of the game. On the other hand, it must be able to communicate with the other players, distinguish between friends and enemies, form alliances, and so on. To make decisions, the negotiators poses concrete questions to the strategic core, and following the answers to these questions, it choses the appropriate moves to submit, or messages to send out. The strategic core does not know what country the diplomat is playing. It is simply a disinterested, objective observes who calculates the best moves countries given a certain number of constraints along with the value of the resulting position of all countries. This strategic core can be tested separately from the rest of the program (as was done at the 1992 Computer Olympiad). However, its failure to communicate and to grasp the notions of balance of power, greatly limit its potential. On the other hand, the programs division between the strategic core and the negotiator is completely modular. Thus, it would be possible to redesign the Bordeaux diplomat simply be replacing one of the two modules with a new program. Given a collection of several negotiators and several strategic cores based on different methods, we can easily construct an enormous number of full diplomats. The strategic core is given the current position including the supply center distribution, and a set of ``concerned'' countries which has been pre-divided into alliances by the negotiator. Each alliance can optionally have a list of ``mortal enemies'' designated. Next, the length and depth of the search is specified. Certain countries forbidden to enter certain ``demilitarized zones,'' or forbidden or conversely required to make certain moves. Finally, a ``seed'' can be specified for the search. In return, the strategic core must return for each concerned country the move that maximizes the total interest of their alliance (while minimizing the interest of their enemies) subject to the constraints given along with the values themselves. Obviously the strategic core is used to select the moves to be used subject to the constraints of all of the treaties which we have agreed to, and decided to follow. However, the core has many other applications: - In response to a proposed new alliance, two problems can be submitted: One with the current alliances, and one with the new alliances. A good alliance is one in which both players gain. - Given a rumor or a threat, we introduce the moves in question as a ``seed''. If after several iterations, the move is still being considered, then the move is plausible. -Suggestions can be evaluated similarly. - To detect new alliances, we send the actual moves used last turn to the strategic core along with various conjectured alliances. The actual alliances are those for which the actual moves used are the most stable. - In case of a backstab, we calculate the actual gain of the player with the moves used, and the gain he would have made with the moves proposed by the alliance. The difference gives an upper bound on his reliability. During turns in which he does not stab, a similar calculation gives a lower bound on his reliability. A player is expected to keep his word until his potential gains exceeds his reliability. Further discussion of the diplomatic aspects of our program are postponed to a later article as continued work progresses on the negotiation module of our program by Per Westling (Linkoping, Sweden). The Strategic Core Many difficulties were encountered in writing the strategic core of our program. It was first outlined by Hall (Hall, 1992), but several difficulties were noted in the search routine. In the end, a new algorithm was implemented by Constantin Staykoff. As we have seen above, perfect-knowledge position evaluation is much more difficult to define in a multi-player environment---especially in a game such as Diplomacy with as many as seven players. Already in two-player games such as Chess, the strategy is sufficiently rich to prevent programs from playing perfectly. Hence, a search is made through the most ``important'' parts of the game tree, and value of the leave positions are then ``calculated.'' The selection of tree paths and the evaluation of the terminal positions is left are governed by various heuristics. The game tree of diplomacy is much more complex. Every unit must move simultanously. Thus, the game is not a sequential game, but rather a 7-dimensional matricial game. For example, the number of possible non-equivalent openings is 4 430 690 040 914 420 (Green, No Date) as compared to 20 in chess. In chess, the average number of moves mounts to about three dozen during the mid-game. However, in Diplomacy, the number of available moves increases by an additional factor of several billion during the movement phases as the number of units and possible supports rises. An efficient simplex algorithm has been invented by Lemke and Howson (Lemke, 1964) to find Nash equilibria in two-dimension matricial games. However, even if this method could be generalized to multi-player games, there would not be sufficient time to input the entire game matrix to the algorithm. Hence, whereas it is possible to search several turns ahead in chess, it is impossible to conduct a complete one-ply search during a movement phase of the game diplomacy. Instead, we must carefully converge on the Nash equilibria in the enormous matrix without looking at all the possible moves. It is senseless to consider looking at two movement phases at the same time. All predictional capacity must then be built into our position evaluation heuristics. Fortunately, during the building and retreat phases, the number of possible moves is never quite so astronomical. It is thus possible to do a complete one-ply analysis of these phases of the game. Optionally, the search of a build or retreat phase could be less complete, but anticipate the following phase via a recursive call to the strategic module of the program. Search Techniques with Simultaneous Movement A complete description of the heuristics of our position evaluator can be found in (Hall, 1992), so we will concentrate on the search techniques itself. What moves should be considered? In what combinations? And in what order? The search technique proposed in (Hall, 1992) EBBFS (Even Better Best-First Search) involved an iterative search for moves for all of the concerned countries in parallel. Moves were considered one at a time by each alliance following a set agenda and adding orders for one unit a time (Best-First). The results of a given move was calculated in relation to the ``current best'' moves for each of the other alliances. The position is kept as current best move or reject according to the output the position evaluator gives to the resulting position. Suppose a move M_A was rejected for the alliance A because the alliance B had considered the move M_B to be its best move at the time A evaluated M_A, but later B rejects M_B for a better move M'_B. According to the above algorithm, the alliance A would not be given a chance to reconsider the interest of the move M_A in light of this development. Given the drawbacks of EBBFS, we propose a new search technique called RES (Refined Evolutionary Search). The name is derived from the fact that this search strategy is similar to that adopted by nature to play the game ``Biology.'' In the game biology, each player must chose DNA sequences families. These sequences then define a certain number of living organisms. If the descendants of these living organisms are numerous after several billion years in a certain prechosen environment, then the player has won. To play the game of biology, a player must in theory test every DNA sequences family in competition against every possible set of rivals. However, such a research is impossible. Thus, nature has adopted the evolutionary search techniques. By this method, animals are mutated one at a time by randomly changing a certain number of genomes. The successful mutations replace the original species, and the unsuccessful mutations are rejected. Similarly, in diplomacy, a large alliance can not possibly consider all of its movements. Thus, the research is started by considering a certain ``seed'' move. This move being considered is then mutated by selected a certain number of units, and randomly selecting new moves for these units. Mutations are made for one alliance after another until the maximum search length has been reached. The value of any given mutation with respect to the current moves of all other alliances is compared to the value of the original move with respect to the current moves of all other alliances. In other words, when an alliance adopts a new strategy, this may changes the value of the strategies already being considered by the other alliances. A certain number of consistency rules cut the loss of time due to the consideration of uninteresting mutations. In all cases where allied forced are ordered to attack each other, the moves are changed into mutual support. Furthermore, as a time saving measure, support is only considered among units belonging to the same alliance. If the search passes through Nash equilibrium, then no alliance will have any interest in changing its moves. Thus, no Nash equilibria will be lost. Nevertheless, the program only considers pure strategies whereas pure Nash equilibria do not exist for all games. Suppose Italy with his fleet in the Tyrrhenian Sea is trying to counter Turkey's incursion into the Ionian Sea. If Italy defends Naples, then Turkey should attack Tunis. However, if Italy defends Tunis, then Turkey should attack Naples. Assuming that Italy and Turkey are the only countries concerned by this situation, the RES search will eventually enter a cyclic pattern of length four as described above. Eventually, we would like our program to be able to detect such cycles. The strategies involved in these cycles could then be extracted and subjected to a detailed matricial game analysis. A mixed Nash equilibrium would then be found by which Italy moves to Naples and Tunis with certain probabilities. At that point, a random choice would be made, and the move evaluated according the expected return. Currently our program does not do such detailed analysis but merely halts at a random point (the search limit) and returns the strategies currently being considered and their actual return. The research is designated as ``refined,'' since that is how the mutation rate is set. At the beginning of the research, all units are reassigned new orders. There is no reason to believe that the old orders are good, so we attempt to examine as many completely different possibilities as we can. Later on in the research, we are more confident in the moves being considered, and therefore restrict ourselves to smaller and smaller modifications. As the search limit approaches, we only consider changing the order of one unit at a time, or permute the supports used. -------------------------------------- Refined Evolutionary Search Calculate best moves and their values for each given alliance with respect to a certain position (units and supply centers) in the game Diplomacy and the following constraints: demilitarized zones, required moves, forbidden moves, search depth, search width. - (Initialize.) Set best moves BestM, and best raw moves RBestM to the default moves. Calculate the position P resulting from BestM. For each alliance k, let VBestM_k be the value of P for alliance k. For each unit u, create a list of legal moves LM(u) given the map, demilitarized zones, forbidden moves, and required moves. Abort if LM(u) is empty. Set i=1. - (Choose Alliance.) Let p be i mod the number of alliances. - (Choose Units.) Let $n$ be the smallest integer greater or equal to the number of units in alliance p times (width - i)/width. Select a set SU of n units belonging to alliance p. - (Make Moves.) Set RM to RBestM_p. For all units u in SU, set RM(u) to a random move or hold from LM(u). Set M to RM. - (Make Supports.) Replace in M all orders of the form A\MovesTo C and B\MovesTo C with appropriate random supports. Replace in M all orders of the form A\MovesTo B and B Holds with appropriate supports. - (Evaluate Moves.) Calculate the position P resulting from M and BestM_j for j\neq p. Let NewV_j be the value of P for the alliance j.^* - (Selection Moves.) If NewV_p>VBestM_p, then set BestM_p to M, set RBestM_p to RM, and for all alliances j, set VBestM_j to NewV_j. - (Test Exit.) Increment i. If i<width, then continue with step 2. Otherwise, output BestM_j and VBestM_j for each alliance, and exit. -------------------------------------- (*) - The value of a position for an alliance is the sum of the values for each member of the alliance minus the value for any indicated mortal enemy. In biology, Mammals have retained their deep-diving instinct to some measure despite having left the oceans millions of years ago. Similarly, in Diplomacy, a player may attempt to defend a space which comes under attack early in the research. In response, the attacking alliance may add enough support that defense become hopeless. However, the defender will persist in his futile support unless there is positive gain by making some other move. When there is a positive gain available from another move, a cyclic situation as described above often results. Such behavior in biology is useful; for example, when people fall into icy waters. Similarly, such a behavior is useful in Diplomacy in order to make life as difficult as possible for the attacking player. CONCLUSIONS The theory of multi-player games and the game Diplomacy in particular is too deep and rich to expect a complete analysis. However, in view of the applications of multi-player games, we must persist in our efforts to model such games. Mathematically, we seem to lack a good language in which to describe such problems. Underlying psychological hypotheses are necessary which opens the game up to the fuzzy world of negotiation. Our view of negotiation is that it must be based on a solid strategic core. Otherwise, the diplomat can not decide what to ask for, nor be able to evaluate any sort of offer. Early versions of this strategic core show good survival skills, but lack the savvy to pull off a win. Nevertheless, in blind tests, players are unable to distinguish the strategic core from human players. In combination with an automatic negotiator which will temper the aggressivity of its strategic alter-ego, we should be able to compete at an equal level with humans in the game of Diplomacy. Simultaneously, we hope that existing diplomats will be made available in conformity with a universal language of communication among Diplomat which we have developed, and that many new programs are introduced. Thus, we will be able to begin the repeated testing necessary to compare various theories, and to optimize our strategies. By attempting to solve these difficult problems resulting from the interactions of intelligent agents, we hope to better understand what sort of capabilities such an agent should have, when it cannot assume that everyone in the world is its friend. Creating programs that can play complex multi-player games is a step towards creating programs that can deal with the complexities of the real world and the craftiness of humans. BIBLIOGRAPHY Mark Berch, The Lexicon of Diplomacy, Diplomacy Digest, 34 (1980). E. R. Berlekamp, J. H. Conway and R. K. Guy, ``Winning Ways for your Mathematical Plays,'' Academic Press, 1982. M. Hall and D. Loeb, Thoughts on Programming a Diplomat, Heuristic Programming in Artificial Intelligence 3: the third computer olympiad, eds: J. van den Herik and V. Allis, Ellis Horwood, 123--145 (1992). A. Calhammer, Rules to Diplomacy, The Avalon Hill Game Company, 1956, 1976. (``Diplomacy'' is a trademark of the Avalon Hill Game Company.) T. Green, private communication. J. Isbell, Combinatorial Mathematics and its Applications, ed. D. J. A. Welsh, 181. D. E. Knuth and R. W. Moore, An Analysis of alpha-beta Pruning, Artificial Intelligence, 6 (4) 293--325 (1975). S. Kraus, D. Lehmann, E. Ephrati, An Automated Diplomacy Player, ``Heuristic Programming in Artificial Intelligence,'' Eds. D. N. L. Levy and D. F. Beal, Ellis Horwood (1989), 136--153. S. Kraus, E. Ephrati and D. Lehmann, Evaluation of Suggestions during Automated Negotiations, Proc of the 11th Cognitive Science Conference, 1989. Lemke and Howson, Equilibrium points of bimatrix games, Journal of Social and Industrial Applications of Mathematics, 1964, 413--423. D. Loeb and C. Staykoff, Diplomacy in London in this volume. D. Loeb, Diplomacy Programming Project: Message Protocol Update, distributed via e-mail, 1992. D. Loeb, Diplomacy Programming Project: Conversations, distributed via e-mail, 1992. J. G. Propp, Three-Person Impartial Games, Unpublished manuscript. L. S. Shapley, A value for n-person games, ``Contributions to the theory of games,'' Vol. 2 (eds. H. Kuhn and A. W. Tucker), 307--317, Princeton University Press, Princeton Press Constantin Staykoff, ``Automate-Calculateur de Mouvements pour le jeu Diplomatie,'' DEA thesis 1992, Universite de Bordeaux I, France. (In French) John von Neumann, Zur Theorie der Gesellschaftsspiele, Mathematische Annalen 100 (1928) 295--320 in German. Translation in English by Sonya Bargmann, On the Theory of Games of Strategy in ``Contributions to the Theory of Games,'' Volume IV, eds. A. W. Tucker and R. D. Luce, Annals of Mathematics Studies 40, Princeton University Press 1959, 13--42. Rod Walker, ``Player's Guide to Diplomacy,'' The Avalon Hill Game Company, Baltimore, 1979. Up