Improving Neuroph Performance
After recently published benchmark of three leading Java Neural Network frameworks Neuroph, Encog and JOONE we realized that we need to optimize Neuroph in order to provide better performance (faster training) and to support new technologies (multi core, GPUs and cluster computing). There is a discussion about all this on our forums:
What We Did
So we did some basic optimizations at Java level and managed to improve Neuroph's performance significantly.
We did the following:
- Converted all Vectors to ArrayLists and arrays
Removed boxing for all double variables
- Added optimized implementation for WeightedSum input function since it was one of the bottlenecks we discovered using Netbeans profiler
And just doing this improved Neuroph's performance nearly two times. Not bad but we need more.
The next thing we tried is to make a matrix based implementation of neural network (see package org.neuroph.contrib.matrixmlp). In this approach each layer consists of few arrays which contains data from all neurons and weights in array/matrix form. Bruce Wooton was the first who suggested this kind of implementation based on the ujmp (Universal Java Matrix Package). We also created the implementation based on plain Java arrays and it turned out to be pretty fast. The good thing is that we were able to reuse existing class hierarchies and architecture, and just to provide different implementation at layer and learning rule level.
We used the same benchmark as here. Benchmark tests how fast Multi Layer Perceptron with Momentum Backpropagation can push some random data forward and backward. The same kind of network and learning rule was used with Encog and JOONE. This benchmark assumes that all networks are using the same training algorithm (so it will require the same number of iterations to train network), and wants to determine which network is fastest at learning/processing data.
|Training set size
|Number of iterations
||AMD Phenom II x4 965 3.4 Ghz
The benchmark results are shown in the table and picture bellow.
From the benchmark results above we see that:
1. Neuroph 2.5 is faster than JOONE, both matrix and object based implementations. Object based implementation is about 30% faster, while the matrix based is about 5-6 times faster.
2. The Encog is still the fastest, but Neuroph matrix implementation is getting close to it. The Encog is about 2-3 times faster compared to the matrix based Neuroph implementation, and about 9 times faster compared to the object based implementation.
Conclusion: Neuroph 2.5 brings significant performance improvement over version 2.4, but it is still slower then Encog 2.4. Also, important note is that this benchmark did not used multi core support for Encog, which makes it even faster (see this for more details). The Neuroph still does not support multi cores.
In short, we're faster then JOONE, and next we must beat the Encog :)
Neuroph Design and Performance Analysis
While the Neuroph philosophy to be intuitive, easy to use, and follow strong object model which corresponds to domain model has been successful so far, it is obvious that price is paid in the terms of performance. So, now we need a way to keep the current architecture and, at the same time, to provide performance improvement.
We are going to achieve this by adding a new layer into the current architecture, which should provide high performance calculation and, at the same time, allow to keep the current API to the end users. The current object model will be transformed into corresponding high performance implementation under the hood. That way it should provide friendly and intuitive API to end users and high performance. We are still discussing this in order to find the best solution, and current matrix based implementation for MultiLayerPerceptron and MomentumBackpropagation (MatrixMultiLayerPerceptron and MatrixMomentumBackpropagation) is example of this.
| Neural Network Model Layer
(Rich OO API to create and manipulate neural networks)
|High performance (matrix based?) calculation layer
High performance architecture for Neuroph neural networks
Why is Encog Fast?
Encog architecture is already optimized for speed. It is layer based, which means that the basic building components are Layers vs. Neuroph, where basic building components are Neurons (also Connections and Weights). Encog mostly works with simple arrays, and it has even more - something called FlatNetwork, which is a Multi Layer Perceptron converted into few one-dimensional arrays. This makes all operations very fast compared to the basic layered Encog architecture and even faster compared to the Neuroph object model. Also, there is an interesting solution with CalculateGradient, which is used to calculate gradients during learning and which supports multi-threading. Check out the source of these classes to get idea what is going on inside: BasicNetwork, BasicLayer, WeightedSynapse, BasicTraining, Propagation, Backpropagation, CalculateGradient, FlatNetwork, TrainFlatNetworkBackPropagation.
Whats Next To Do
1. We will continue to optimize existing code, both matrix and object based implementations, until we reach the best possible performance, while preserving the current architecture.
2. We need to add multi core, GPU and clustering support.
3. We have to develop new improved algorithms such as ResilientBackpropagation, QuickPropagation, batch mode Backpropagation, Delta bar delta, etc.
4. We need to improve existing benchmarking code and do some more benchmarking for specific learning rules and data sets. These benchmarks will help us make the real performance comparison since existing benchmark just measures data flow speed. Also, we have some preliminary results which show that recently published benchmark results and compariso (http://www.codeproject.com/KB/recipes/benchmark-neuroph-encog.aspx and http://www.codeproject.com/KB/recipes/xor-encog-neuroph-joone.aspx) might be misleading, but we need to investigate this in more detail.
Download full NetBeans projects for development version of Neuroph 2.5alpha and benchmarking code bellow. If you want play with version 2.5a and experiment with benchmark, make sure you put reference to appropriate project/jar from benchmark project.
Neuroph 2.5 alpha