|
|
Distributed Learning. Distributed learning allows remote processes to provide external resources to the training of a given population. This learning environment consists of an experiment server and a number of remote experiment agents. Once the experiment server is up, remote experiment agents (RMA) can register with the server to offer its services. Intially, when an experiment server starts, it loads all the config and creates a population in exactly the same way as normal. However, the server provides no experiment evaluation itself, so if no RMAs register nothing happens. Once an RMA registers, the server provides it with the relevant fitness function and the data it will need. The transfer of this information happens only once and happens during registration. Once the RMA has successfully registered, it requests a poplulation memeber from the server. Once this chromosome is received the RMA then evaluates the individual the stored fitness function and data. The result is then returned to the server, and the next chromosome is requested by the RMA. Once the server has received results for all the population, the server then provides the necessary NEAT evolutionary steps. The resulting population is then distributed to any registered agents as before. Both the server and RMA are fault tolerant. This means if several things. First, if an agent disappears for whatever reason the server strikes it from the regsistered agents lists and will no longer distribute chromosomes to it. Second, if an RMA takes to long to return an experiment result, that result is ignored and the experiment is distributed to another RMA. The tardy RMA still remains registered and will receive futher chromosomes for evaluation. Lastly, if the sever fails all registered RMAs will attempt to reconnect every 10 seconds so that when the server restarts they will re-register and the whole experiment will start again. Ideally, the server should pick up from the last complete evolutionary cycle, but currently this is on the to-do list. Starting the Experiment Server. From the neat4j directory e.g. 'dir you unzipped to'/neat4j_1_0 run the following command. java .\neat4j.jar;lib;lib\log4j.jar org.neat4j.core.distribute.ExperimentServer config/xor/xor_neat.ga. This starts the experiment server ready too run the XOR experiment. The server listens in port 1969 which is not currently configurable. Starting the RMA. To start an RMA that connects to the server above, from the neat4j directory e.g. 'dir you unzipped to'/neat4j_1_0 run the following command. java .\neat4j.jar;lib;lib\log4j.jar org.neat4j.core.distribute.RemoteExperimentAgentClient server_address [number of agents]. Note the second optional parameter specifies the number of agents that will be started (in seperate threads) - if this parameter is missing, the default is one. This functionality, it must be said, has not been as thouroughly tested as the rest of NEAT4J, so if you do find any issues please let me know. Issues I do know are that it does not stop on either of the terminating conditions. |