Diplomacy - EP Chapter 2 - Issue 298

From: nick@sunburn.uwaterloo.ca (Nick Fitzpatrick)

Date: Mon, 26 Oct 1992 14:27:10 +0000

Issue #298 of Chapter Two of the Electronic Protocol
By Nicholas Fitzpatrick (nick@sunburn.uwaterloo.ca)
October 26, 1992
---------------------------------------------------------------
Electronic Protocol Games played on the Diplomacy Adjudicator
---------------------------------------------------------------

**** TABLE OF CONTENTS ****
PART ONE - Opinions, Letters, and Editorials:
     Challenges in Playing Multiplayer Games

This is a special, issue of EP Chapter 2, that features an article that
Danny Loeb has written in respect to programming the Diplomat, and the
Bordeaux Diplomat.  I have reformated it from the LATEX text which Danny
sent me (many weeks ago), and have presented it below, there are a couple
of places where I have probably messed up the translation. . . .

Danny wrote the article for publication in a journal, however I do not
know which one.  If you want further information of the DPP, or the Bordeaux
Diplomat, or a LATEX copy of this article, please contact Danny Loeb.

***** PART ONE *****

***CHALLENGES IN PLAYING MULTIPLAYER GAMES ***

by Daniel Loeb (loeb@geocub.greco-prog.fr)
LABRI
Universite de Bordeaux I
33205 Talence, France

Dedicated to my daughter Gabrielle Jeanette Loeb

Whereas a great deal of work has been done on two-player games,
multiplayer games are only beginning to become understood both on a
theoretical and on a practical level.  In so far as one studies games
in an attempt to model the real world, this is a terrible loss, for
large number of participants are involved in most real world conflicts.

The game of Diplomacy was invented in 1959 by Allan Calhammer to
simulate one such conflict: the First World War.  This game was chosen
as a test bed for our multi-player game playing theories.

As in any multi-player game, positions are extremely difficult to
analyze.  The mathematical machinery developed to analyze two-player
games crumbles, and must be rebuilt from scratch.  In the final
analysis, the most important factor in evaluating a position of a
multi-player game is not located on the game board, but rather in the
minds of the players. For this reason, in any multi-player game
negotiation become necessary in order to determine, and perhaps modify
these attitudes.

Diplomacy is an extremely rich example of a multi-player game.  The
number of players allows a large number of free-wheeling alliances
without degenerating into total chaos. Many heuristics exists by which
to measure a player's progress to victory, and there are a large number
of points which need to be negotiated during the formation and
continued operation of an alliance.

Furthermore, just as real life, Diplomacy is not a sequential game.
Moves must be submitted simultaneously for all players for all of their
units. Thus, normal search techniques either no longer apply or get
bogged down in the tremendous size of the game tree, for it is
impossible to complete even a one-ply search.

Finally, Diplomacy is an extremely suitable testbed for different
approaches to multi-player games, since it is sufficiently popular to
encourage unrelated groups of researchers to work on the game.

According to legend, the first play-by-mail diplomacy game (1963A)
included a diplomat (diplomacy automata) by Dave McDaniel which was
eliminated in 1903 (Berch, 1980).  In 1982, the Avalon Hill Game
Corporations marketed ``Computer Diplomacy.'' The program is designed
primarily as a game master, but has very limited playing capabilities
as well.

At the 1989 Conference on Computer Games, Kraus, Ephrati, and Lehmann
introduced a diplomat they developed over three years at Hebrew
University (Kraus, 1989A, 1989B).  Their program is very good, and
negotiates like a human. It constitutes the first serious attempt at a
computer diplomacy program.

During the 1991 Conference on Computer Games (Hall, 1992), Michael Hall
and the author elaborated several ideas by which a computer could play
the game Diplomacy.  Whereas the Hebrew University diplomat is based
primarily on the negotiation, our diplomat is based primarily on the
strategy of the game.  (See Hall, 1992, for details.  The present
article outlines the Diplomacy Programming Project's progress since
last year and connects our ideas with a new mathematical theory
developed for multiplayer games. (See also Loeb, this volume.)

In order to make the article as accessible as possible, we have
preferred to omit all references to the detailed rules of the game.
(Calhammer, 1956)  In fact, most of the ideas discussed below can be
be applied to any multi-person conflict.

I would like to express my appreciation to Constantin Staykoff for his
dedication and hard work in implementing the strategic module of the
Bordeaux Diplomat, and Ken Lowe for the use of his Diplomacy
Adjudicator.  I would further like to thank all the members of the
Diplomacy Programming Project for their encouragement and help.

PLAYER POSITION EVALUATION
Traditional Techniques

In theory, the minimax algorithm can be used to play any two-player,
deterministic, sequential-move game perfectly.

-----------------------------------------------------
Minimax Algorithm

A tree is drawn representing the game with the current position as
root.  The children of each node represent the various positions
obtained after making all of the possible moves.  The type of a node is
given by the player whose turn it is to move.

Assign position values to all leaves.  Increasing values indicate
better results for player A. For example, in a game with win, loss, and
draw, we can assign win=1, loss= -1 and draw =0.  For each internal
node, calculate the value of the position assuming perfect play on both
sides.  For an A-node, the value is the maximum of the children's
values.  For a B-node, the value is the minimum of the children's
values.   A should choose a move from the root node with maximum value.
------------------------------------------------------

However, due to time constraints, it is seldom possible to analyze the
entire game tree.  Therefore, game programs often use various search
techniques (such as alpha-beta (Knuth, 1975) and approximative
evaluation functions which analyze the position using different
heuristics, and thus need not search to the end of the game.

However, the above analysis guarantees that there exists a perfect
strategy to which any approximation thereof can be compared.
However, what happens when a third player is added?

For simplicity, assume that one player will win, and the other two
players lose. However, similar remarks hold in the absence of such
assumptions.

A game tree can still be formed as above with values assigned to the
leaves.  However, problems are encountered when minimaxing the values
back up the tree.

Let n be an A-node in the tree whose children have all been evaluated.
v(n) must now be calculated.
Obviously, if n has a child whose value is equal to an A-win, then this
child is a good strategy for A, and the value of $n$ will be a win for
A.  Similarly, if all of the children of n are wins for say B, then n
is itself a win for B.  However, what happens if some of the children
of n are wins for B and some are wins for C?  It is then impossible to
evaluate n without some psychological information regarding A's
attitude to such situations.  His choice of move is indifferent, yet
greatly affects the strategy of the other players.

Queer Games

Jim Propp calls such positions ``queer'' (Propp, Unpublished).   For
example, in the game of Nim played with 3-players, Players take turns
removing stones from piles.  Any number of stones (at least one) can be
removed from a pile.  The player to make the last move wins. 0+0 is a
win for the previous player.  Thus, 1+0, 0+1, 2+0 and 0+2 are wins for
the next player. Hence, 1+1 is a win for the following player.
Therefore, the position 1+2 is queer as it yields a choice for the next
player betweens wins for the previous player and the following player.

Are queer positions very common? If not, then we can hope to frequently
apply the traditional techniques mentioned above.

Unfortunately, given any impartial game (An impartial game is one in
which to answer the question ``What moves are possible?'') you need not
pose the question ``Whose turn is it.''  Thus, Chess is not an
impartial game, since only White can move White's pieces G, the game
2+2+G is queer (Propp, Unpublished).  Thus, impartial games large
enough to contain a 2+2 component are queer.

A simple probabilistic argument shows that virtually all games are
queer.  Let a,b,c,q represent the probabilities that a position is a
win for any given player or that it is queer respectively.  Without
loss of generality, we consider games in which all internal node have
two children.  By symmetry, a=b=c.  Thus, the probability to be queer
q=1-3a is equal to the probability that neither choices is a win for
the current player, yet both choices are not wins for the same opponent
(1-a)^2- 2a^2.  The only solution that represents a valid probability
is a=0.  Thus, q=1.

Sums of Games

A traditional technique in analyzing games has been to break down the
game into a sum of simpler games.  The game G+H is the game in which
players may play in either G or H but not both.  The game is over when
play is no longer possible in either summand.  The last player to have
played is the winner.

Typically, the most powerful results have been found for impartial
games.  The two possible results of a two-player game are N win for the
next player, and P win for the previous player.

Adding Two-Player Impartial Games

+      N        P
-----------------
N      N or P   N
P      N        P

The above table  gives the sums of all two player impartial games
except N+N.  In fact, the sum of two N's depends on the particular
games, yet a theorem of Grundy allows us to without loss of generality
consider only sums of games of Nim with one pile.  In these trivial
games, the P positions are equivalent to empty piles, and N positions
are equivalent to non-empty piles.  The sum of two N positions is N
itself if and only if the sizes of their associated Nim piles are
distinct (Berlekamp, 1982).

For three player impartial games, the possible sums are quite
complex.  The sums are calculated on the premise that play rotates
NFPNFPNFP . . .

Adding Three Player Impartial Games

+      P      N      F      Q
----------------------------------
P      PQ     NQ     FQ     Q
N      NQ     NFQ    PNQ    NQ
F      FQ     PNQ    NQ     NFQ
Q      Q      NQ     NFQ    F

Only in one particular case (P+Q) is the sum predetermined.  To
determine the value of the sum, there are two further problems.  First,
there is no generalized theorem of Grundy.  Grundy's proof supposes
that all ``reversible'' moves can be ignored. However, in three-player
impartial games even if P's move is reversed by N, he may have gained
Zugzwang, and placed F in a difficult position.  Second, as can be seen
below, even the game of Nim is no longer trivial once there are three
players.

Classifying Queer Games

In the above classification, queer games are all considered to have the
same value.  Indeed, it has been believed (Neumann, 1928) that all
three-player games are strictly-determined or symmetric non-strictly
determined. However, are queer games always symmetric?  For example,
suppose N could choose between two queer positions: one where F would
name N or P as winner, and one where N would name F or P as winner.
Perhaps, N would prefer the first choice even though both are queer
games, since  he would then retain some chance of winning.
The two positions above must then be given different values: f_1
and n_1 respectively.  Obviously, p_1 can be defined in a similar
manner.  Now, denote forced wins by f_0, p_0, and n_0 respectively.  N
has the following preferences n_0 > f_1 , p_1 > f_0 , p_0.  He is
indifferent regarding f_0 and p_0.  A position leading to such a choice
can not be simplified by his opponents, and must be labeled simply n_1.

Is N necessarily indifferent between f_1 and p_1?  Unfortunately, we
can not answer this question until we clarify how a player makes a
choice between equally interesting possibilities.  One assumption is
that he makes a random move.  In that case, f_1 may be preferred to p_1
or visa versa.  However, consider a game with additional moves
equivalent to certain other moves.  For example, in Chess allowing the
Rooks to be moved by either hand and all other pieces to be moved only
by the left hand.  Is it then more likely the player will move his
rook?

Although plausible, this is not a very pleasing model since it does not
allow us to prune redundant branches from our game tree.  Instead, I
propose the model by which the players establish equivalence classes of
positions.  Two positions are equivalent if the equivalence classes of
their non-dominated options are in one-to-one correspondence via
equivalence.

Moves are made by randomly choosing a non-dominated equivalence class,
and then selecting a move within this class.

In this way, f_1 and p_1 are equivalence classes of equal interest to
N.  He makes his choice randomly between the two.  This choice between
F and P to select the eventual winner can be represented as n_2.

Similarly, n_3 is the choice by N of who will chose the person who will
chose the person who selects the winner.  In fact, it can be shown that
any position in a 3-player finite sequential perfect-knowlege game is
equivalent to exactly one of the equivalence classes n_i,p_i,f_i where
i is an integer.  n_i+1 for example is defined to be a choice for N
between p_i and f_i.  Thus, we have the counter-intuitive result that
no game is symmetric under the hypotheses above.

Calculations with these refined classes become even more complicated.
For example, whereas Nim with two piles is completely trivial.  The
next player wins unless the two piles are of equal size. with two
player, there is no apparent rhyme or reason in the 3-player 2-pile Nim
table.

More players

Until now, we have only discussed games with three players.  When a
fourth player is added, the refined classification (n_i,f_i,p_i) breaks
down completely.  However, the crude classification scheme N,F,P,Q can
be extended.  Instead of four types of games, now twelve are needed.

[1-4] Forced win for a certain player.

[5-8] An alliance of three players is needed to stop a certain player
from winning.

[9-12] One player plays no significant role whatsoever and the 3
remaining players participate in a ``Queer'' game. (In other word, any
of the 3 players can be stopped by the other 2.)

Under this view, game types are in one-to-one correspondence with
voting schemes. (Isbbell, No Date)

Number of Voting Schemes
n     0   1   2   3    4    5      6   7
----------------------------------------------
v(n)  0   1   2   4   12   81   2646   1422564

A voting scheme consists of a set of voters V, and a set of winning
coalitions W subject to the condition that if V\supseteq B\supseteq
A\in  W then B in W, and if A=V-B then exactly one of A and B can be a
member of W.  In other words, a voting scheme is an order preserving
map f from the subsets of V to -1,1 such that f(A)=-f(V-A).

Given a game G and a partition pi-P of its players, we define the
quotient game G/pi  to be a new game with one player for each block (or
alliance) of pi.  The player B in pi in the game G/pi must make a move
whenever it is the turn of some p in B.  Thus, in G/pi there might not
be simple rotation of play even if in G the players take turns playing
in simple rotation.  The moves available to B in the game G/pi are the
same as those available to p in the game G.  B wins the game G/pi if
any p in B wins.

A is said to be a winning (resp. losing) alliance of players in the
game G if the quotient game G/A,P-A is a win (resp. loss) for the
alliance A.  The set of winning games is a voting scheme, since adding
players to an alliance can only strengthen it, and given a pair of
opposing coalitions exactly one can win while the other must lose.  If
the game includes results other the simple wins and losses, then the
notion of a voting scheme must be generalized.  A generalized voting
scheme is a order preserving map from the subsets of V into the set of
results {-n,-n+2, . . .,n-2,n} such that f(A)=-f(V-A).

The play of a game can be viewed as the selection of a winning
coalition. (Shapely, No Date)  The difference between the coarse
classification and the fine classification of game is that when a
winning alliance is formed in fact the alliance itself does not win,
but rather only one member of the alliance. Thus, in the end, most
alliances are unstable.  It is this unstability which the fine types
f_i,n_i,p_i attempt to measure in 3-player games.

In conclusion, just as two-dimensional projections are often
insufficient to recreate a three-dimensional object, these coarse
classifications based on two-player projections of a multi-player game
do not contain the subtle nuances which give the game its interest.
However, it is difficult to develop a single complete and rigorous
theory as was the case for the theory of two-player games.

NEGOTIATION
The Theoretical Basis

Three-player games are much more complicated than two-player games.
Whereas, in a two-player game, there is only one adversarial
relationship, in a three-player game, any of the three players can be
opposed by an alliance of the other two players.

The exact strategy derived depends on many ``psychological''
assumptions on the behavior of the other player's when confronted with
choices of equal interest.  For example, when a loss seems certain,
many players will choose to share the defeat with the player most
responsible for their loss.  While maximizing your position against
such a psychology, one must be careful not to give the impression that
you are minimizing the position of any other player.

Whereas, when a victory seems possible, players chose their winning
coalition according to their estimated chances of having the victory
accorded to them.  Thus, all critical members of an alliance must be
assured of the likelihood of victory for them.

In any two player game, players can only be pleasantly surprised when
their opponent makes a bad move.  In that way, the player will often
find a previously unavailable winning strategy.  However, this is not
the case in three-player games.  Suppose for example that, N chooses
f_0 (a forced win for F) when he could have chosen f_1 (by which F
would have decided whether P or N would win).  This is a bad move for
N since he no longer has any chance to win.  Nevertheless, P also loses
from this error.

Thus, in any three player game, negotiation is necessary for several
reasons.  First, as we saw above, each player will attempt to shift the
blame for the other player's misfortunes on the third player.  Here we
discuss matters of loyalty and treason, and try to place an appropriate
``spin'' on the moves that we have made.

Secondly, negotiation can be used to avoid the sort of error that N
made in the example above.  If P had convinced N of the danger that he
was in, then perhaps N would have made the move that was in their
mutual interest.

Obviously, these two types of negotiation are difficult to separate.
By giving a player useful suggestions, one can hope that his trust for
you will be raised.  On the other hand, you can suggest a move to a
gullible player which is more in your interest than his.  If you have
promoted his trust in you up to now, then perhaps he will accept your
advice (against his better judgement).

On the other hand, if the other player no longer has any trust in you,
then he will be less receptive to any advice you might give.  In an
extreme case, he might ignore his own normal preferences, and simply
hand the victory to the third player.

In a game with a very complicated search tree, communication does not
merely serve for negotiation itself, but rather these suggestions allow
a pair of players temporarily working together to exchange information,
and thus examine the search tree in parallel.

Diplomacy

In order to better study the relationship between negotiation and
multi-player games, we chose the game Diplomacy.

Obviously, many other multi-player games exist.  However, certain
prohibit negotiation among players.  For example, in the game of Poker,
perhaps some information could usefully be exchanged among the player.
However, this is normally forbidden.

Some games such as 3-player Nim are so simple that there is little to
discuss.  Moreover, in any impartial game, there is a great symmetry
which overshadows the negotiation.  In a position n_1, what can the
player F say to N different from what P might say in order to convince
N to attribute the victory to him.

On the other hand, although there is no inherent symmetry in the game
Diplomacy, statistics show (Walker, 1979) that each country has an
approximately equal chance to win the game.  In fact, according to Alan
Calhammer creator of the game, a perfect game of Diplomacy is one in
which no country is eliminated, but rather everyone allies against the
current leader until he is weakened and a new leader appears.  All of
this coalition building and rebuilding, generally involves much
detailed negotiation.  Furthermore, an effective alliance will require
coordinated movements.  Through these negotiations the participants in
any alliance will determine which of the units will make attacks, and
which will support those attacks.

Moreover, the strategy of the game diplomacy is very rich.  Thus, in
proposing alliances with Turkey, Russia and Austria-Hungary will each
be able to offer a number of advantages and disadvantages to Turkey.
Depending on the temperment of the Turkish player, he can accept one of
the offers, hold off while waiting further developments, or he could
even pretend to accept both offers while intending to soon betray the
trusts of one or both of the other players.

Diplomacy Programming Project

When testing game programs, we have learned much more from machine
versus machine competitions than from machine versus human
competitions.  The humans are very unpredictable.  Thus, a victory one
day might not mean that you have improved your program, but rather that
the human is tired, or has gotten overconfident, etc.

On the other hand, computer tests are repeatable.  One variable can be
changed at a time, while hundreds of test games are played
automatically.  Using techniques such as this we are able to
automatically fine-tune the position evaluation heuristics of a
computer game program.  Some programs use automated learning
techniques; in this case, it is vital that the program be able to play
thousands of games automatically.

For a game like Diplomacy, it is even more important to emphasize
automated testing.  Since there are seven players, an ``average''
program should win about one seventh of the time.  Thus, as compared
with a two-player game, approximately three or four times as many test
games should be run in order to statistically prove a similar
percentage increase in the number of wins.  Furthermore, the heuristic
position evaluation used by the Bordeaux Diplomat contains a large
number of parameters whose values should eventually be optimized
through repeated tests.

When testing a Diplomacy program with human players, we have seen that
each player has many preconceived notions about the computer.  These
preconceptions leads to unwarranted prejudices for or against the
computer which in turn influences the alliance formation process much
more than any purely game-related factor.  Thus, the real capacity of
the program is clouded by a large number of extraneous issues.

To avoid this test games against humans have been organized using
Gunboat rules.  By these rules, the identity of each player is kept
secret, and no communication is allowed other than through the
movements themselves.  However, by eliminating communication, we have
eliminated the negotiation which was our main object of study.  Indeed,
the ideal solution is to test our program against computer opponents.

The first opponents one might think of are copies of our program, or
alternate versions of our program.  However, this is an extremely
inadequate solution.  Obviously, programs could recognize copies of
themselves by sending special messages at the beginning of the game.
The program would then ally with copies of itself, eliminate any
remaining players, and finally declare a draw.

Here, we assume that the programmer is trying to recognize his own
program.  However, similar effects can occur without such intentions on
the part of the programmer. For example, it is reasonable, all things
being equal, to follow advice from an intelligent player, yet how
should a program judge the intelligence of others?  The most reasonable
criterion is that an intelligent player would suggest moves that you
consider very strong.  However, if the other program is a copy of
yourself, then it will evaluate moves the same way you do, and then
will probably suggest to you the very moves you would have thought of
making yourself.  Under such conditions, your program would inevitably
ally with itself.

Once again the subject at hand is being clouded by a host of irrelevant
details.  To avoid these problems, any diplomacy program should be
tested against a set of programs written by different groups of
programmers using completely different techniques.

But how can seven diplomats programmed by different people carry out
any sort of meaningful dialog?  How should messages be transmitted
between programs?  And how can these messages be understood?

In many computer game competitions, moves are transmitted by hand.  The
computers sit side-by-side while operators enter moves.  This
guarantees compatibility, but in a game of diplomacy, where for each
turn 34 movements must be entered, and may be preceded by arbitrarily
many diplomatic messages, the game would inevitably slow down to a
standstill.  Indeed, the test at the 1992 Computer Olympiad (Loeb, this
volume) was terminated after 12 movement phases due to lack of time
whereas it is important to run a large number of complete game in order
to accurately measure the potential of a diplomacy program.

To run the necessary number of games in a reasonable amount of time,
the computers communicate directly.  The Diplomat Interface has been
developed by the Diplomacy Programming Project to facilitate such
communication.   Although written in LCS, (LCS is a variant of SML
developed by Bernard Berthomeiu and the Outil et Logiciel pour le
Communication group at the Laboratoire d'Automatique et de l'Analyse de
Systeme in Toulouse.)  LCS differs from SML in that it allows the
management of parallel processing. it is capable of launching programs
written in any language within the UNIX operating system.  The standard
output of each player-diplomat is rerouted via the use of an LCS
instream to the LCS agent responsible for the communication with that
player-diplomat.  This agent then can output in the corresponding LCS
outstream where it is available on a FIFO basis as standard input for
the diplomat.

To simulate the play of humans against machines, the Diplomacy
Interface is capable of opening X or suntools windows for specified
countries.  Input and output for these windows are handled in a manner
similar to that described above.  Actually, it is impossible for a
diplomat to determine who is playing the other countries: a human or a
machine.  Thus, no prejudices against humans can be built into the
diplomats, and humans must perform a sort of Turing test if they wish
to distinguish the machines from other humans.

Each player communicates only with the Diplomat Interface.  The
Diplomat Interface acts as a game master.  That is to say, players can
submit their moves with the ``SUB'' command.  When all the moves are
in and all players are ready to go, or when the deadline arrive,
results are computed and distributed with the ``ORD'' message.  The
current position is announced via the ``NOW'' message, the missing
orders for next turn are requested with the ``MIS'' message.  It is the
Diplomat Interface that declares the game over either because a player
has reached the victory condition of 18 supply centers, or because the
players have agreed on a draw with the ``DRW''command.

However, the Diplomat Interface is also a real interface; that is to
say, it can pass messages from one player to another.

It was decided that this communication should not be made in a natural
language.  The study of natural languages is an entirely separate and
difficult field of research while we wish to concentrate on the
problems directly related to negotiation and game playing.  Thus, a
simplified language called the DPP Protocol was devised by which the
players would be required to communicate.  (Loeb 1992A, 1992B)

Tests using actual negotiation transcripts show that this simple
language allows the expression of nearly all of the concepts employed
usual when negotiating in Diplomacy.  It is impossible to say that it
is sunny outside, but it is easy to threaten a player with war if he
takes one of your cities. (France: ``SND ENG (IFF DMZ(PAR BRE MAR) THN
PCE ELS NOT(PCE))''.)

This language was developed independently of the diplomat we are
developing.  Thus, one can hope that other programmers will find the
language as convenient as we are.

In order to make the language as rich as possible, a several dozen of
predicates have been defined.  In addition to their grammar,
definitions of their suggested meaning and examples of their use are
given.  However, there are no guarantees that other programs will
interpret the messages accordingly.  One can always write a
psychopathic program which attacks only those who propose peace.
However, one could expect that like in the real world, psychopaths
would have a terribly short life expectancy in Diplomacy.  In
consequence, most reasonable diplomats will start all relationships
with the tentative assumption that its partner will react ``normally''
to any signals made by the diplomat.

Certain programs might not understand all possible constructions in the
protocol.  However, for the most part, they should understand the
messages we indicate in the dictionary as the most basic.  This
includes the message ``TRY''.  Its purpose is that upon receipt of an
unknown message, we respond with ``TRY'' and a list of constructions
understood by our program.  In this way, some sort of communication can
be held between programs of widely differing levels of sophistication.

THE BORDEAUX DIPLOMAT

Division of Control

The Diplomat proposed by Hall (Hall, 1992) is based on a strategic
core.  This lower brain of the diplomat does all the calculations and
thus forms the basis for the decisions taken by the upper brain, or
negotiator part of the diplomat.

The negotiator does not know the details of the map, nor the rules of
the game.  On the other hand, it must be able to communicate with the
other players, distinguish between friends and enemies, form alliances,
and so on.  To make decisions, the negotiators poses concrete questions
to the strategic core, and following the answers to these questions, it
choses the appropriate moves to submit, or messages to send out.

The strategic core does not know what country the diplomat is playing.
It is simply a disinterested, objective observes who calculates the
best moves countries given a certain number of constraints along with
the value of the resulting position of all countries.

This strategic core can be tested separately from the rest of the
program (as was done at the 1992 Computer Olympiad).  However, its
failure to communicate and to grasp the notions of balance of power,
greatly limit its potential.

On the other hand, the programs division between the strategic core and
the negotiator is completely modular.  Thus, it would be possible to
redesign the Bordeaux diplomat simply be replacing one of the two
modules with a new program.  Given a collection of several negotiators
and several strategic cores based on different methods, we can easily
construct an enormous number of full diplomats.

The strategic core is given the current position including the supply
center distribution, and a set of ``concerned'' countries which has
been pre-divided into alliances by the negotiator.  Each alliance can
optionally have a list of ``mortal enemies'' designated.  Next, the
length and depth of the search is specified.  Certain countries
forbidden to enter certain ``demilitarized zones,'' or  forbidden or
conversely required to make certain moves.  Finally, a ``seed'' can be
specified for the search.

In return, the strategic core must return for each concerned country
the move that maximizes the total interest of their alliance (while
minimizing the interest of their enemies) subject to the constraints
given along with the values themselves.

Obviously the strategic core is used to select the moves to be used
subject to the constraints of all of the treaties which we have agreed
to, and decided to follow.

However, the core has many other applications:
- In response to a proposed new alliance, two problems can be
submitted: One with the current alliances, and one with the new
alliances. A good alliance is one in which both players gain.

- Given a rumor or a threat, we introduce the moves in question as a
``seed''. If after several iterations, the move is still being
considered, then the move is plausible.

-Suggestions can be evaluated similarly.

- To detect new alliances, we send the actual moves used last turn to
the strategic core along with various conjectured alliances.  The
actual alliances are those for which the actual moves used are the
most stable.

- In case of a backstab, we calculate the actual gain of the player
with the moves used, and the gain he would have made with the moves
proposed by the alliance.  The difference gives an upper bound on his
reliability.

During turns in which he does not stab, a similar calculation gives a
lower bound on his reliability.

A player is expected to keep his word until his potential gains exceeds
his reliability.

Further discussion of the diplomatic aspects of our program are
postponed to a later article as continued work progresses on the
negotiation module of our program by Per Westling (Linkoping, Sweden).

The Strategic Core
Many difficulties were encountered in writing the strategic core of our
program.  It was first outlined by Hall (Hall, 1992), but several
difficulties were noted in the search routine. In the end, a new
algorithm was implemented by Constantin Staykoff.

As we have seen above, perfect-knowledge position evaluation is much
more difficult to define in a multi-player environment---especially in
a game such as Diplomacy with as many as seven players.

Already in two-player games such as Chess, the strategy is sufficiently
rich to prevent programs from playing perfectly.  Hence, a search is
made through the most ``important'' parts of the game tree, and value
of the leave positions are then ``calculated.''  The selection of tree
paths and the evaluation of the terminal positions is left are governed
by various heuristics.

The game tree of diplomacy is much more complex.  Every unit must move
simultanously.  Thus, the game is not a sequential game, but rather a
7-dimensional matricial game.  For example, the number of possible
non-equivalent openings is 4 430 690 040 914 420 (Green, No Date) as
compared to 20 in chess.  In chess, the average number of moves mounts
to about three dozen during the mid-game.  However, in Diplomacy, the
number of available moves increases by an additional factor of several
billion during the movement phases as the number of units and possible
supports rises.

An efficient simplex algorithm has been invented by Lemke and Howson
(Lemke, 1964) to find Nash equilibria in two-dimension matricial games.
However, even if this method could be generalized to multi-player
games, there would not be sufficient time to input the entire game
matrix to the algorithm.  Hence, whereas it is possible to search
several turns ahead in chess, it is impossible to conduct a complete
one-ply search during a movement phase of the game diplomacy.  Instead,
we must carefully converge on the Nash equilibria in the enormous
matrix without looking at all the possible moves.

It is senseless to consider looking at two movement phases at the same
time.  All predictional capacity must then be built into our position
evaluation heuristics.

Fortunately, during the building and retreat phases, the number of
possible moves is never quite so astronomical.  It is thus possible to
do a complete one-ply analysis of these phases of the game.

Optionally, the search of a build or retreat phase could be less
complete, but anticipate the following phase via a recursive call to
the strategic module of the program.

Search Techniques with Simultaneous Movement

A complete description of the heuristics of our position evaluator can
be found in (Hall, 1992), so we will concentrate on the search
techniques itself.  What moves should be considered?  In what
combinations?  And in what order?

The search technique proposed in (Hall, 1992) EBBFS (Even Better
Best-First Search) involved an iterative search for moves for all of
the concerned countries in parallel.  Moves were considered one at a
time by each alliance following a set agenda and adding orders for one
unit a time (Best-First).  The results of a given move was calculated
in relation to the ``current best'' moves for each of the other
alliances.  The position is kept as current best move or reject
according to the output the position evaluator gives to the resulting
position.

Suppose a move M_A was rejected for the alliance A because the alliance
B had considered the move M_B to be its best move at the time A
evaluated M_A, but later B rejects M_B for a better move M'_B.
According to the above algorithm, the alliance A would not be given a
chance to reconsider the interest of the move M_A in light of this
development.

Given the drawbacks of EBBFS, we propose a new search technique called
RES (Refined Evolutionary Search).  The name is derived from the fact
that this search strategy is similar to that adopted by nature to play
the game ``Biology.''

In the game biology, each player must chose DNA sequences families.
These sequences then define a certain number of living organisms.  If
the descendants of these living organisms are numerous after several
billion years in a certain prechosen environment, then the player has
won.

To play the game of biology, a player must in theory test every DNA
sequences family in competition against every possible set of rivals.
However, such a research is impossible.  Thus, nature has adopted the
evolutionary search techniques.  By this method, animals are mutated
one at a time by randomly changing a certain number of genomes.  The
successful mutations replace the original species, and the unsuccessful
mutations are rejected.

Similarly, in diplomacy, a large alliance can not possibly consider all
of its movements.  Thus, the research is started by considering a
certain ``seed'' move.  This move being considered is then mutated by
selected a certain number of units, and randomly selecting new moves
for these units. Mutations are made for one alliance after another
until the maximum search length has been reached.

The value of any given mutation with respect to the current moves of
all other alliances is compared to the value of the original move with
respect to the current moves of all other alliances.  In other words,
when an alliance adopts a new strategy, this may changes the value of
the strategies already being considered by the other alliances.

A certain number of consistency rules cut the loss of time due to the
consideration of uninteresting mutations.  In all cases where allied
forced are ordered to attack each other, the moves are changed into
mutual support.  Furthermore, as a time saving measure, support is only
considered among units belonging to the same alliance.

If the search passes through Nash equilibrium, then no alliance will
have any interest in changing its moves.  Thus, no Nash equilibria will
be lost.

Nevertheless, the program only considers pure strategies whereas pure
Nash equilibria do not exist for all games.  Suppose Italy with his
fleet in the Tyrrhenian Sea is trying to counter Turkey's incursion
into the Ionian Sea.  If Italy defends Naples, then Turkey should
attack Tunis. However, if Italy defends Tunis, then Turkey should
attack Naples.

Assuming that Italy and Turkey are the only countries concerned by this
situation, the RES search will eventually enter a cyclic pattern of
length four as described above.  Eventually, we would like our program
to be able to detect such cycles.  The strategies involved in these
cycles could then be extracted and subjected to a detailed matricial
game analysis.  A mixed Nash equilibrium would then be found by which
Italy moves to Naples and Tunis with certain probabilities.  At that
point, a random choice would be made, and the move evaluated according
the expected return.

Currently our program does not do such detailed analysis but merely
halts at a random point (the search limit) and returns the strategies
currently being considered and their actual return.

The research is designated as ``refined,'' since that is how the
mutation rate is set.  At the beginning of the research, all units are
reassigned new orders.  There is no reason to believe that the old
orders are good, so we attempt to examine as many completely different
possibilities as we can.  Later on in the research, we are more
confident in the moves being considered, and therefore restrict
ourselves to smaller and smaller modifications.  As the search limit
approaches, we only consider changing the order of one unit at a time,
or permute the supports used.

--------------------------------------
Refined Evolutionary Search

Calculate best moves and their values for each given alliance with
respect to a certain position (units and supply centers) in the game
Diplomacy and the following constraints: demilitarized zones, required
moves, forbidden moves, search depth, search width.

- (Initialize.)
Set best moves BestM, and best raw moves RBestM to the default moves.
Calculate the position P resulting from BestM.
For each alliance k, let VBestM_k be the value of P for alliance k.
For each unit u, create a list of legal moves LM(u) given the map,
demilitarized zones, forbidden moves, and required moves.
Abort if LM(u) is empty.
Set i=1.

- (Choose Alliance.)
Let p be i mod the number of alliances.

- (Choose Units.)
Let $n$ be the smallest integer greater or equal to the number of units
in alliance p times (width - i)/width.
Select a set SU of n units belonging to alliance p.
 - (Make Moves.)
Set RM to RBestM_p.
For all units u in SU, set RM(u) to a random move or hold from LM(u).
Set M to RM.

-  (Make Supports.)
Replace in M all orders of the form A\MovesTo C and B\MovesTo C with
appropriate random supports.
Replace in M all orders of the form A\MovesTo B and B Holds with
appropriate supports.

- (Evaluate Moves.)
Calculate the position P resulting from M and BestM_j for j\neq p.
Let NewV_j be the value of P for the alliance j.^*

- (Selection Moves.)
If NewV_p>VBestM_p, then set BestM_p to M, set RBestM_p to RM, and for
all alliances j, set VBestM_j to NewV_j.

- (Test Exit.)
Increment i.
If i<width, then continue with step 2.
Otherwise, output BestM_j and VBestM_j for each alliance, and exit.

--------------------------------------
(*) - The value of a position for an alliance is the sum of the values
for each member of the alliance minus the value for any indicated
mortal enemy.

In biology, Mammals have retained their deep-diving instinct to some
measure despite having left the oceans millions of years ago.
Similarly, in Diplomacy, a player may attempt to defend a space which
comes under attack early in the research.  In response, the attacking
alliance may add enough support that defense become hopeless.  However,
the defender will persist in his futile support unless there is
positive gain by making some other move.  When there is a positive gain
available from another move, a cyclic situation as described above
often results.

Such behavior in biology is useful; for example, when people fall into
icy waters.  Similarly, such a behavior is useful in Diplomacy in order
to make life as difficult as possible for the attacking player.

CONCLUSIONS

The theory of multi-player games and the game Diplomacy in particular
is too deep and rich to expect a complete analysis.  However, in view
of the applications of multi-player games, we must persist in our
efforts to model such games.

Mathematically, we seem to lack a good language in which to describe
such problems.  Underlying psychological hypotheses are necessary which
opens the game up to the fuzzy world of negotiation.

Our view of negotiation is that it must be based on a solid strategic
core.  Otherwise, the diplomat can not decide what to ask for, nor be
able to evaluate any sort of offer.  Early versions of this strategic
core show good survival skills, but lack the savvy to pull off a win.
Nevertheless, in blind tests, players are unable to distinguish the
strategic core from human players.

In combination with an automatic negotiator which will temper the
aggressivity of its strategic alter-ego, we should be able to compete
at an equal level with humans in the game of Diplomacy.
Simultaneously, we hope that existing diplomats will be made available
in conformity with a universal language of communication among Diplomat
which we have developed, and that many new programs are introduced.

Thus, we will be able to begin the repeated testing necessary to
compare various theories, and to optimize our strategies.

By attempting to solve these difficult problems resulting from the
interactions of intelligent agents, we hope to better understand what
sort of capabilities such an agent should have, when it cannot assume
that everyone in the world is its friend.  Creating programs that can
play complex multi-player games is a step towards creating programs that can deal
with the complexities of the real world and the craftiness of humans.

BIBLIOGRAPHY

Mark Berch, The Lexicon of Diplomacy, Diplomacy Digest, 34 (1980).

E. R. Berlekamp, J. H. Conway and R. K. Guy, ``Winning Ways for
your Mathematical Plays,'' Academic Press, 1982.

M. Hall and D. Loeb, Thoughts on Programming a Diplomat, Heuristic
Programming in Artificial Intelligence 3: the third computer olympiad,
eds: J. van den Herik and V. Allis, Ellis Horwood, 123--145 (1992).

A. Calhammer, Rules to Diplomacy, The Avalon Hill Game Company,
1956, 1976. (``Diplomacy'' is a trademark of the Avalon Hill Game
Company.)

T. Green, private communication.

J. Isbell, Combinatorial Mathematics and its Applications, ed. D.
J. A. Welsh, 181.

D. E. Knuth and R. W. Moore, An Analysis of alpha-beta Pruning,
Artificial Intelligence, 6 (4) 293--325 (1975).

S. Kraus, D. Lehmann, E. Ephrati, An Automated Diplomacy Player,
``Heuristic Programming in Artificial Intelligence,'' Eds. D. N. L.
Levy and  D. F. Beal, Ellis Horwood (1989), 136--153.

S. Kraus, E.  Ephrati and D. Lehmann, Evaluation of Suggestions
during Automated Negotiations, Proc of the 11th Cognitive Science
Conference, 1989.

Lemke and Howson, Equilibrium points of bimatrix games, Journal of
Social and Industrial Applications of Mathematics, 1964, 413--423.

D. Loeb and C. Staykoff, Diplomacy in London in this volume.

D. Loeb, Diplomacy Programming Project: Message Protocol Update,
distributed via e-mail, 1992.

D. Loeb, Diplomacy Programming Project: Conversations, distributed
via e-mail, 1992.

J. G. Propp, Three-Person Impartial Games, Unpublished manuscript.

L. S. Shapley, A value for n-person games, ``Contributions to the
theory of games,'' Vol. 2 (eds. H. Kuhn and A. W. Tucker),  307--317,
Princeton University Press, Princeton Press

Constantin Staykoff, ``Automate-Calculateur de Mouvements pour le
jeu Diplomatie,'' DEA thesis 1992, Universite de Bordeaux I, France.
(In French)

John von Neumann, Zur Theorie der Gesellschaftsspiele,
Mathematische Annalen 100 (1928) 295--320 in German.  Translation in
English by Sonya Bargmann, On the Theory of Games of Strategy in
``Contributions to the Theory of Games,'' Volume IV, eds. A. W. Tucker
and R. D. Luce, Annals of Mathematics Studies 40, Princeton University
Press 1959, 13--42.

Rod Walker, ``Player's Guide to Diplomacy,'' The Avalon Hill Game
Company, Baltimore, 1979.