Evolving Neural Networks with HyperNEAT and Online Training
MetadataShow full metadata
Artificial neural network research of the past decade has seen significant growth with the advent of genetic algorithms such as NSGA and NEAT to develop neural networks through evolution. Another more recent advance in this technology is the HyperNEAT algorithm, an extension to the highly successful NEAT algorithm, which is capable of capturing the symmetry of a domain. HyperNEAT has been very successful for evolving agent controllers, and as such it seems a good platform for exploring hybrid techniques.
Our research focuses on augmenting HyperNEAT technology for use in agent controllers through strategic application of online learning. Several methods are proposed and explored.
All methodologies are tested using a team gathering task. A simulated environment is setup with gathering robots that must locate resources and work together to carry the resources back to a central base location. The robots are controlled by the networks produced by the HyperNEAT algorithm (referred to as "substrates").
In the first set of experiments, several types of online learning are combined with HyperNEAT. In all cases, evolution proceeds as normal until the evaluation phase; at this point the HyperNEAT substrate is trained in an online fashion using a given training technique. The learning methods explored are: supervised backpropagation, reinforcement backpropagation, Hebbian learning, and temporal difference learning. These are compared against the baseline HyperNEAT algorithm with no online learning.
Next, the methodology of applying online learning is extended in an attempt to find optimal learning rate parameters for each of the learning techniques; this shall be referred to as parameter selection. The HyperNEAT algorithm uses Compositional Pattern Producing Networks (CPPNs) to generate the connection weight values for its substrates. The CPPN is augmented to also generate learning parameters for each of the other training algorithms. The initial set of experiments is repeated using the learning parameter selection approach. One additional training technique is added, the ABC variant of Hebbian learning, which uses additional parameters to control neural plasticity.
These two sets of experiments are repeated with an additional enhancement, to treat learning ability as a fitness measure. Each substrate is evaluated multiple times, with the agent environment reset between evaluations. The performance of each is recorded, and then the factor of improvement between evaluations (due to the online learning) is measured, and subsequently incorporated into the fitness score for the chromosome that produced the CPPN and substrate. Thus, individuals that demonstrate responsiveness to online learning will be favored, and will be more likely to produce offspring for future generations.
A different set of experiments is also performed examining a few other approaches. These approaches focus on combining HyperNEAT with a couple of variants of heuristically supervised backpropagation for online learning.
The main variant that is tested involves performing geometric translations (in this case, rotations) to training samples during backpropagation, in attempt to take advantage of the substrate's symmetry. This is compared with baseline HyperNEAT; with basic backpropagation; and with repeated backpropagation, where each training sample is issued multiple times. The latter approach is introduced in order to account for the possibility that the performance of rotational backpropagation is enhanced purely due to the number of training iterations performed per sample.
Based on the results from this set of experiments, and the different strengths of the original HyperNEAT algorithm versus the addition of online learning, a another set of experiments is performed using a technique we call bootstrapping that uses online learning during the early stages of evolution, but switches it off when a certain average level of fitness is achieved. The results of these experiments suggest that some initial online training may produce more optimal results than with constant online training, or with none.
One final approach we explore is to attempt to reinforce useful behaviors performed by the agents during evaluations. This approach, referred to as HyperNEAT with training banks, identifies when the agent arrives in a state that should be rewarded, and collects the inputs and outputs that resulted in that state in a repository (the training bank). Then, between HyperNEAT evaluations, the inputs and outputs from the training states are used as training samples, and the network is trained using backpropagation, repeated backpropagation, or rotational backpropagation. The results from these experiments show that networks evolved with HyperNEAT using rotational backpropagation applied via training banks exhibit a a higher degree of generalizability than HyperNEAT alone.