Note: Windows uses backslashes not forward slashes in path names.Ĭhange the first line (if necessary) of "src\reversi_zero\agent\player.py" to This instruction is written by Thanks! Required: 64-bit windows Procedure verified for Windows 8.1.
Install the 64-bit version of Python 3.5 (the 32-bit version is not sufficient). Anaconda with Python 3.5 (Recommended) instructions.Note: For some strange reason, both Python 3.5 and Anaconda get installed in a hidden folder. To access them, you first have to go to the Control Panel, select Folder Options, and on the View tab, click on the circle next to "Show hidden files, folders, or drives" in the Advanced settings section. Anaconda gets installed in C:\ProgramData\Anaconda3.
The direct download option installs Python in (I believe) C:\Users\\AppData\Local\Program\Python. You could install the entire 2015 version (not the 2017 version that Microsoft tries to force on you) of Visual Studio but this is a large download and install, most of which you don't need. Double-click on the downloaded file to run the installer. The python source code for this project uses numerous f-strings, a feature new to Python 3.6. Since we need Python 3.5 (required by the windows version of tensorflow), use your editor's search feature to find every occurrence of an f-string and rewrite it using string.format(). Install the libraries From either the Anaconda prompt or from a command window in the top level folder where you put this distribution, enter the following. download_model.sh 5 Configuration 'AlphaGo Zero' method and 'AlphaZero' method #CODE CHICKEN CORE 2.3.4 INSTALL# I think the main difference between 'AlphaGo Zero' and 'AlphaZero' is whether using eval or not. It is able to change these methods by configuration. Pla圜onfig#use_newest_next_generation_model = False.Execute Evaluator to select the best model.Pla圜onfig#use_newest_next_generation_model = True.PlayWithHumanConfig#use_newest_next_generation_model = True.Not use Evaluator (the newest model is selected as self-play's model).It seems that policy(π) data saved by self-play are distribution in proportion to pow(N, 1/tau).Īfter the middle of the game, the tau becomes 0, so the distribution is one-hot. If you find a good parameter set, please share in the github issues! PlayDataConfig other important hyper-parameters (I think) PlayDataConfig#save_policy_of_tau_1 = True means that the saved policy's tau is always 1. nb_game_in_file,max_file_num: The max game number of training data is nb_game_in_file * max_file_num.multi_process_num: Number of process to generate self-play data.