RLGO README 1. Contents of this file 1. Contents 2. Overview 3. Building RLGO 4. Running RLGO 5. Settings files 6. Running scripts 7. Additional information 2. Overview RLGO is a Go playing program based on reinforcement learning and simulation-based search. It is built on top of the Fuego Computer Go library. 3. Building RLGO RLGO requires the following packages in order to build: a) Boost C++ libraries (version 1.33.1 or higher) must be installed. b) Fuego 0.4 must be downloaded and built. In addition, RLGO requires the following packages to run experiment scripts: c) GoGui must be installed. d) GnuGo must be installed. e) BayesElo must be installed. f) GnuPlot must be installed. See the INSTALL file for further installation issues. 4. Running RLGO RLGO uses the Go Text Protocol (GTP) (http://www.lysator.liu.se/~gunnar/gtp/gtp2-spec-draft2/gtp2-spec.html). As with any GTP program, RLGO can be run from the command line, or from an interface such as GoGui. RLGO supports the following command line options: -settings filename Use specified settings file (see section 5) -Foo Bar Override all settings with the name "Foo" found in the settings file with the value "Bar" -Fee.Foo Bar Override the setting with the name "Foo" in the object "Fee" with the value "Bar" The value "Bar" must be a single alphanumeric string. The tilde character '~' will be replaced by a space during the override (to allow for multi-word strings). If no settings file is specified then RLGO will use tourney.set by default. Some particularly important settings to know about are: OutputPath: where output files (log files, saved games, etc.) should be placed BoardSize: size of board to use SelfPlay: play a series of test-games using self-play, instead of waiting for GTP commands MaxGames: how many games to simulate before each real move By default RLGO is launched in GTP mode. After initialisation, it will wait to receive GTP commands on the standard input. To launch RLGO in self-test mode, launch RLGO with the option -SelfPlay SelfPlay. RLGO must be initialised with the correct board size, using the BoardSize setting. It is not sufficient to set the boardsize using a GTP command. An example of running RLGO is given below: rlgomain/rlgo -settings tdlearn -Alpha 0.2 -Trainer BackwardTrainer -LearningRule LambdaReturn -Lambda 0.9 # Run RLGO directly with tdlearn.set, but overriding the setting Alpha to have a value of 0.2, Lambda to have a value of 0.9, using the lambda return learning rule and training in backward sweeps after each game. 5. Settings files The setup of RLGO is controlled by a settings file. The settings file not only specifies the values of parameters, but also which objects are created by RLGO. The purpose of the settings file is to make it easy to run scripts that vary the setup of RLGO, for example to run RLGO with three different learning rules, or to run RLGO with four different step-sizes (see section 6). Each object is listed in the settings file with the following format: Object = <CLASSNAME> { ID = <OBJECTNAME> <SETTING1> = <VALUE1> <SETTING2> = <VALUE2> ... } Settings may refer to other objects by their identifier <OBJECTNAME>. For example, RLGO supports different learning rules (for example the tdlearn.set file includes TD0 and LambdaReturn algorithms). When these rules are included in the settings file, objects of the specified classes are created and initialised on start-up. However, this does not imply that these objects are used! In this example, the trainer specifies which learning rule to use, for example by specifying LearningRule = TD0. Settings are case-sensitive and order-sensitive. There must be whitespace on either side of the equals sign " = ". Comments may be included in settings files by prefixing with a hash #. The tilde character is converted into a space internally (this avoids problems in scripts with multi-word settings). Some settings expect vectors of objects, in the following format (whitespace must be included before, between and after each object name): <SETTING> = <NUM_OBJECTS> [ <OBJECTNAME1> <OBJECTNAME2> ... ] There are two special types of object. The RlInclude object includes all settings from another settings file. The RlOverride object overrides settings in any subsequent objects. This allows for different variants of a main settings file to be maintained, by including the general settings and overriding specific settings. The meaning of each setting is documented in the corresponding class declaration. The order in which settings are expected is not currently documented (@TODO), but is easily identified from the LoadSettings method of the corresponding class. Settings may be overridden in the following ways (highest priority listed first): a) Command line (see section 4). b) Environment variable. Any environment variables with the prefix "Rl" are assumed to be setting overrides. The prefix is stripped to give the setting. c) RlOverride object (see above) Standard settings files for RLGO include: localshape.set Local shape features with a long-term memory (no learning, no search) tdlearn.set Temporal-difference learning with a long-term memory tdsearch.set Temporal-difference search with a short-term memory dyna2.set Dyna-2 search with long and short-term memories tourney.set Settings for RLGO with tournament settings (time control, alpha-beta search, pondering) 6. Running scripts Experiments with RLGO are executed by running the shell scripts found in the scripts subdirectory. Each script provides its own command line options, which can be viewed by invoking the script with no arguments. Some examples of running RLGO using these scripts are provided below: scripts/experiment.sh tdlearn localshape output/tdlearn-shapes MaxSize "1 2 3" "Compare local shapes using TD learning" 100000 20000 1 # Train RLGO using tdlearn.set settings, and test RLGO using localshape.set settings. Compares three different versions of RLGO, using 1x1, 2x2 and 3x3 local shapes. Plays 100000 training games and 20000 testing games, and includes one GnuGo opponent in the testing tournament. Puts the output results and plots into output/tdlearn-shapes. scripts/multi-elo.sh tdsearch output/local-shapes MaxGames "100 1000 10000" MaxSize "1 2 3" 9 1000 "Compare local shapes using TD search" # Run a tournament between RLGO using tdsearch.set settings. Compares nine different versions of RLGO, using 1x1, 2x2 and 3x3 local shapes, and using 100, 1000 or 10000 simulations per move. Plays 1000 matches on 9x9 boards. Puts the output results and plots into output/local-shapes. 7. Additional Information Further information about the concepts used in RLGO can be found in CONCEPTS Pseudocode and experimental results can be found in David Silver's Ph.D. thesis: http://webdocs.cs.ualberta.ca/~silver/David_Silver/Publications_files/thesis.pdf