An example of a multivariate data type classification problem using Neuroph framework
Introduction
Neural networks are applicable in virtually every situation in which a relationship between the predictor variables (independents, inputs) and predicted variables (dependents, outputs) exists, even when that relationship is very complex and not easy to articulate in the usual terms of "correlations" or "differences between groups."
The type of problem amenable to solution by a neural network is defined by the way they work and the way they are trained. Neural networks work by feeding in some input variables, and producing some output variables. They can therefore be used where you have some known information, and would like to infer some unknown information.
In classification, the objective is to determine to which of a number of discrete classes a given input case belongs. In classification problems, the purpose of the Neural network is to assign each case to one of a number of classes and to estimate the probability of membership of the case in each class. The objective is to determine to which of a number of discrete classes a given input case belongs.
Introduction to the problem
The objective is to train the neural network to predict which group of six classes of freshly excised tissue the Breast Tissue belongs, when it is given other attributes as input. First thing that is needed in order to do that is to have a data set. The data set that is used in this experiment can be found at
http://archive.ics.uci.edu/ml/datasets.html under the category classification. The name of the data set is Breast Tissue Database. Dataset contains the information about electrical impedance measurements in samples of freshly excised tissue from the breast.
Several constraints were placed on the selection of these instances from a larger database. This database includes 106 instances. Each instance belongs to one class. Six classes of freshly excised tissue were studied using electrical impedance measurements:
- Carcinoma
- Fibro-adenoma
- Mastopathy
- Glandular
- Connective
- Adipose
The characteristics (input attributes) that are used in the prediction process are:
- Impedivity (ohm) at zero frequency (I0)
- Phase angle at 500 KHz (PA500)
- High-frequency slope of phase angle (HFS)
- Impedance distance between spectral ends (DA)
- Area under spectrum (AREA)
- Area normalized by DA (A/DA)
- Maximum of the spectrum (MAX IP)
- Distance between I0 and real part of the maximum frequency point (DR)
- Length of the spectral curve (P
The data set can be downloaded
here. It cannot be inserted in Neuroph in its original form. If we want to use this data set for classification, we need to normalize the data first. The type of neural network that will be used in this experiment is multi layer perception with backpropagation.
Procedure of training a neural network
In order to train a neural network, there are six steps to be made:
- Normalize the data
- Create a Neuroph project
- Creating a Training Set
- Create a neural network
- Train the network
- Test the network to make sure that it is trained properly
Step 1. Data Normalization
In order to train neural network the data set have to be normalized. Normalization implies that all values from the data set should take values in the range from 0 to 1.
For that purpose it would be used standard Min Max normalization formula:
where B is the standardized value, A the given value, and D and C determine the range in which we want our value to be. In this case, D = 0 and C = 1.
After normalization of original data, values should be saved as .txt file.
Step 2. Creating a new Neuroph project
We create a new project in Neuroph Studio by clicking File > New Project, then we choose Neuroph project and click 'Next' button.
In a new window we define project name and location. After that we click 'Finish' and a new project is created and will appear in projects window, on the left side of Neuroph Studio.
Step 3. Creating a Training Set
To create training set, in main menu we choose Training > New Training Set to open training set wizard.
Once a network has been structured for a particular application, that network is ready to be trained. To start this process the initial weights are chosen randomly. Then the training begins.
There are two approaches to training - supervised and unsupervised. Supervised training type involves a mechanism of providing the network with the desired output either by manually grading the network's performance or by providing the desired outputs with the inputs. Unsupervised training is where the network has to make sense of the inputs without outside help.
In supervised training, both the inputs and the outputs are provided. The network then processes the inputs and compares its resulting outputs against the desired outputs. Errors are then propagated back through the system, causing the system to adjust the weights which control the network. This process occurs over and over as the weights are continually tweaked.
After clicking 'Next' we need to insert data into training set table. All data could be inserted manually, but we have a large number of data instances and it would be a lot more easier to load all data directly from some file. We click on 'Choose File' and select file in which we saved our normalized data set. Values in that file are separated by tab.
Then, we click 'Load' and all data will be loaded into table, which has 15 columns. First 9 of the colums represents inputs, and last 6 of them represents outputs from our data set.
However, some networks never learn. This could be because the input data does not contain the specific information from which the desired output is derived. Networks also don't converge if there is not enough data to enable complete learning.
If a network simply can't solve the problem, the designer then has to review the input and outputs, the number of layers, the number of elements per layer, the connections between the layers, the summation, transfer, and training functions, and even the initial weights themselves.
Training attempt 1
Step 4.1 Creating a neural network
Now we need to create neural network. In this experiment we will analyze several architecture. Each neural network which we create will be type of Multi Layer Perceptron and each will differ from one another according to parameters of Multi Layer Perceptron.
We create a new neural network by clicking right click on project and then New > Neural Network. Then we define neural network name and type. We will choose 'Multi Layer Perceptron' type.
The multi layer perceptron (MLP) is a hierarchical structure of several perceptrons, and overcomes the shortcomings of these single-layer networks. The multi layer perceptron is an artificial neural network that learns nonlinear function mappings. It is capable of learning a rich variety of nonlinear decision surfaces.
The network has a simple interpretation as a form of input-output model, with the weights and thresholds (biases) the free parameters of the model. Such networks can model functions of almost arbitrary complexity, with the number of layers, and the number of units in each layer, determining the function complexity. Important issues in Multi Layer Perceptrons design include specification of the number of hidden layers and the number of units in these layers.
The number of input and output units is defined by the problem. The number of hidden units to use is far from clear. As good a starting point as any is to use one hidden layer, with the number of units equal to half the sum of the number of input and output units.
One of the most important characteristics of a perceptron network is the number of neurons in the hidden layers. If an inadequate number of neurons are used, the network will be unable to model complex data, and the resulting fit will be poor. If too many neurons are used, the training time may become excessively long, and, worse, the network may over fit the data. When overfitting occurs, the network will begin to model random noise in the data. The result is that the model fits the training data extremely well, but it generalizes poorly to new, unseen data. Validation must be used to test for this.
As learning rule choose Backpropagation With Momentum. Backpropagation is the best known training algorithm for neural network. In backpropagation, the gradient vector of the error surface is calculated. This vector points along the line of steepest descent from the current point, so we know that if we move along it a "short" distance, we will decrease the error. A sequence of such moves will eventually find a minimum of some sort. The difficult part is to decide how large the steps should be.
Large steps may converge more quickly, but may also overstep the solution or go off in the wrong direction. Very small steps may go in the correct direction, but they also require a large number of iterations. In practice, the step size is proportional to the slope and to a special constant: the learning rate. The correct setting for the learning rate is application-dependent, and is typically chosen by experiment. It may also be time-varying, getting smaller as the algorithm progresses.
The algorithm is also usually modified by inclusion of a momentum term: this encourages movement in a fixed direction, so that if several steps are taken in the same direction, the algorithm "picks up speed", which gives it the ability to escape local minimum, and also to move rapidly over flat spots and plateaus. Momentum is a standard technique that is used to speed up convergence and maintain generalization performance.
In the next window we will set multi layer perceptron's parameters. The number of input and output neurons are the same as in the training set. And now we have to choose number of hidden layers, and number of neurons in each layer.
Next, we click 'Finish' and the first neural network is created. In the picture below we can see the graph view of this neural network.
Step 5.1 Train the neural network
The goal of the training process is to find the set of weight values that will cause the output from the neural network to match the actual target values as closely as possible.
Once the number of layers and number of units in each layer has been selected, the network's weights and thresholds must be set so as to minimize the prediction error made by the network. This is the role of the training algorithms. The historical cases that you have gathered are used to automatically adjust the weights and thresholds in order to minimize this error. This process is equivalent to fitting the model represented by the network to the training data available. The error of a particular configuration of the network can be determined by running all the training cases through the network, comparing the actual output generated with the desired or target outputs.
In Set Learning parameters dialog use default learning parameters, and just click the Train button. The training process starts.
Total Net Error is higher than set value. After 10522 iterations Neural Network failed to learn problem with error less than 0,01. We can test this network but error will be greater than expected.
Step 6.1 Test the neural network
After the network is trained, we click 'Test', in order to see the total error, and all the individual errors. The results show that total mean square error is 0.11488525194274858, which is to much. With this information we can conclude that this Neural Network is not good enough.
Training attempt 2
Step 5.2 Train the neural network
So let we try something else. We will update the weight of learning rate and increase it by 25%. In network window click Randomize button and then click Train button. That means that we will set value of 0.2 in learning rate label replace with a new value 0.3 and click 'Train' button.
Error is much higher than in previous case.
Step 6.2 Test the neural network
Now we want to see testing results.
In this attempt the total mean square error is even higher then it was in previous case. We should try some other architecture in order to get better results.
Training attempt 3
Step 4.3 Creating a neural network
In this attempt we will try to get some better results by increasing the size of hidden neurons. It is known that number of hidden neurons is crucial for network training success, and now we will try with 5 hidden neurons.
First we have to create a new neural network. All the parameters are the same as they were in the first training attempt, we will just change the number of hidden neurons.
In the picture below we can see the graph view of this neural network.
Step 5.3 Train the neural network
In our first training of this second neural network architecture we will try with the following learning parameters.
Total Net Error is still higher than set value. After 18307 iterations Neural Network failed to learn problem with error less than 0,01. We can test this network but error will be greater than expected.
Step 6.3 Test the neural network
In this case the total mean square error is still higher than it was in the last training attempt and the overall result is still not good. Also, we can see that there are lot of the individual errors that are very high so we will have to try some other network architecture that will give as a better testing results.
Training attempt 4
Step 4.4 Creating a neural network
This solution will try to give us a better results than the previous one, with using 8 hidden neurons. All other parameters will be the same as in the previous solutions.
the picture below we can see the graph view of this neural network.
Step 5.4 Train the neural network
Neural network that we've created can now be tested. We can try to change the parameters and get a better result. This time, we will train the network the same way, but we will lower the momentum to 0.4.
Step 6.4 Test the neural network
After testing the neural network, we see that the total mean square error is 0.060183932603344995 which is better than it was in the previous attempts, but it should be much lower. And also there are still a lot of the individual errors that are near 1 which is pretty bad.
Training attempt 5
Step 5.5 Train the neural network
In this attempt we will use the same network architecture as it was in the previous training attempt. We will try to get better results by changing some learning parameters. For learning rate now we will set 0.4, and momentum will be 0.4, the max error will remain the same (0.01).
Following useful conclusion can be drawn from this training. We can see that the architecture of eight hidden neurons is not appropriate for this training set, because for continuing the training of the neural network we do not get the desired approximation of max error. Error is still much higher than desired level.
Step 6.5 Test the neural network
In this case the total mean square error is higher than it was in the last training attempt. We can conclude that we shouldn’t increase the learning rate.
Training attempt 6
Step 4.6 Creating a neural network
Create a new neural network with 10 hidden neurons and select the same options as in the previous architectures. We will use the same training set as above.
Step 5.6 Train the neural network
In this attempt learning rate will be 0.2 and momentum will be 0.4, and maximum error limitation will remain 0.01. We click on the 'Train' button.
Step 6.6 Test the neural network
The results show that total mean square error is approximate 0.0392.
In the table below for the previous sessions we will present the results of all trainings.
In the tables below for the previous sessions we will present the results of all trainings.
Training attempt |
Hidden Neurons |
Learning Rate |
Momemtum |
Max Error |
Number of iterations |
Total Net Errors |
1 |
2 |
0.2 |
0.7 |
0.01 |
10522 |
0.3425 |
2 |
2 |
0.3 |
0.7 |
0.01 |
18141 |
0.3394 |
3 |
5 |
0.2 |
0.7 |
0.01 |
18307 |
0.3389 |
4 |
8 |
0.2 |
0.4 |
0.01 |
16388 |
0.1244 |
5 |
8 |
0.4 |
0.4 |
0.01 |
11017 |
0.1319 |
6 |
10 |
0.2 |
0.4 |
0.01 |
13809 |
0.0568 |
After several tries with different architecture and parameters we got results that are given in table below. There is interesting pattern in data. If we look number of hidden neurons and total net error we can see that higher number of neurons leads us to lesser total net error.
Training attempt |
Hidden Neurons |
Learning Rate |
Momemtum |
Max Error |
Number of iterations |
Total Net Errors |
7 |
10 |
0.3 |
0.4 |
0.01 |
12384 |
0.0582 |
8 |
12 |
0.2 |
0.4 |
0.01 |
14853 |
0.0497 |
9 |
12 |
0.3 |
0.4 |
0.01 |
24494 |
0.0548 |
10 |
12 |
0.2 |
0.3 |
0.01 |
13582 |
0.0521 |
11 |
14 |
0.2 |
0.4 |
0.01 |
7386 |
0.0421 |
12 |
14 |
0.3 |
0.4 |
0.01 |
9347 |
0.0487 |
13 |
16 |
0.2 |
0.4 |
0.01 |
4068 |
0.0381 |
14 |
16 |
0.3 |
0.4 |
0.01 |
25676 |
0.0447 |
15 |
18 |
0.2 |
0.4 |
0.01 |
31943 |
0.0263 |
16 |
18 |
0.25 |
0.4 |
0.01 |
10547 |
0.0451 |
Training attempt 17
Step 4.17 Create a Neural Network
This neural network will contain 20 neurons in hidden layer and same options as previous networks.
Step 5.8 Train the neural network
First we will try with recommended values for learning rate and momentum. That is 0.25 for learning rate and 0.4 for momentum.
During the testing we successfully trained the neural network. The total net error slowly descends but with high oscillation and finally stops when reaches a level lower than a given (0.01) in 653 iteration.
Step 6.8 Test the neural network
We can see testing results for this type of neural network architecture. So far, in this case we've got the best result. The total mean square error was approximate 0.0145.
Training attempt 18
Step 5.10 Train the neural network
First we will try with recommended values for learning rate and momentum. That is 0.2 for learning rate and 0.4 for momentum.
During the testing we successfully trained the neural network. The total net error slowly descends finally stops when reaches a level lower than a given (0.01) in 66 iteration.
Step 6.8 Test the neural network
Total Mean Square Error measures the average of the squares of the "errors". The error is the amount by which the value implied by the estimator differs from the quantity to be estimated. An mean square error of zero, meaning that the estimator predicts observations of the parameter with perfect accuracy, is the ideal, but is practically never possible. The unbiased model with the smallest mean square error is generally interpreted as best explaining the variability in the observations. The test showed that total mean square is approximate 0.00586. The goal of experimental design is to construct experiments in such a way that when the observations are analyzed, the mean square error is close to zero relative to the magnitude of at least one of the estimated treatment effects.
We also need to examine all the individual errors to make sure that testing was completely successful. We have a large data set so individual testing can require a lot of time. But at the first sight it is obvious that in this case the individual errors are also much smaller than in previous attempts. There are very few extreme cases.
The following tables will show the value of input, output and errors in 5 randomly selected observations.
Values of inputs
Observation |
|
|
| | Input | value | | | |
1 |
0.8436 |
0.49 |
0.816 |
0.8 |
0.961 |
0.8257 |
0.878 |
0.7669 |
0.844 |
16 |
0.8936 |
0 |
0.494 |
0.783 |
0.943 |
0.7579 |
0.8544 |
0.751 |
0.873 |
39 |
0.8769 |
0.81 |
0.574 |
0.9 |
0.993 |
0.943 |
0.9313 |
0.8712 |
0.889 |
89 |
0.4079 |
0.91 |
0.679 |
0.903 |
0.93 |
0.3808 |
0.7376 |
1 |
0.247 |
97 |
0.4264 |
0.9 |
0.795 |
0.756 |
0.967 |
0.8792 |
0.8289 |
0.725 |
0.467 |
Values of outputs
Observation |
|
|
Output | value | | |
1 |
1 |
0 |
0 |
0 |
0.0006 |
0 |
16 |
1 |
0 |
0 |
0 |
0 |
0 |
39 |
0 |
0 |
0.9999 |
0 |
0 |
0 |
89 |
0 |
0 |
0 |
0 |
0.0002 |
0.9986 |
97 |
0 |
0 |
0 |
0 |
0 |
1 |
Individual errors
Observation |
|
|
Error | value | | |
1 |
-0 |
0 |
0 |
0 |
0.0006 |
0 |
16 |
-0 |
0 |
0 |
0 |
0 |
0 |
39 |
0 |
0 |
-0.0001 |
0 |
0 |
0 |
89 |
0 |
0 |
0 |
0 |
0.0002 |
-0.0014 |
97 |
0 |
0 |
0 |
0 |
0 |
-0 |
As we can see in the table, the network guessed right in all five cases, so we can conclude that this type of neural network architecture is very good.
Advanced training techniques
Another technique for training a neural network involves validation and generalization. This type of training is usually used with huge data sets. The idea is to use only a part of the data set when training a network, and then test the network with inputs from the other, unused part of the data set. That way we can determine whether the neural network has the power of generalization.
The traditional training-validation-testing approach fails to give assurance that a neural network will meet the rigorous standards required for high reliability environments and safety-critical systems. For high assurance systems applying a simple testing set may not be adequate. These systems, which require a high degree of system reliability, will need more than the usual limited set of testing data. Automated testing, in combination with test generation algorithms, may help to alleviate this problem and aid in reliability assessment.
Process verification ensures that project planning is adequate, processes are compliant with the governing contract and processes are being executed. It makes sure that standards, procedures and environments for the project are adequate and that the project is staffed with trained personnel.
Training attempt 19
Step 3.11 Creating a Training Set
The idea of this attempt is to use only a part of the data set when training a network, and then test the network with inputs from the other, unused part of the data set. That way we can determine whether the neural network has the power of generalization. We will choose random 70% of instances of training set for training and remaining 30% for testing
Step 5.11 Train the neural network
Unlike previous training, now there is no need to create new neural network. Advanced Training Techniques consist in the fact that we examine the performance of existing architectures, using a new training and test set of data.
The parameters that we now need to set will be the same as the ones in previous training attempt: the maximum error will be 0.01, the learning rate 0.2, and the momentum 0.4. We will not limit the maximum number of iterations, and we will check 'Display error graph', as we want the see how the error changes throughout the iteration sequence.
During the testing we again successfully trained the neural network. The total net error slowly descends up until the iteration 3850, when it finally stops when reaches a level lower than a given (0.01).
Step 6.19 Test the network
After successful training the neural network, we can test the same to discover whether the results will be as good as the previous testing.
Unlike previous practice, where we have to train and test neural networks using the same training set, now we will use the second training set that contains 70% of the initial training set instances. So go to network window, select training set and press button Test.
Total Mean Square Error is 0.0071, which is lower than expected.
Unlike previous practice, where we have to train and test neural networks using the same training set, now we will use the second training set that contains only 30% of the initial training set instances. So go to network window, select training set and press button Test.
Total Mean Square Error is 0.1144 which is 10% higher than desired error. Percentage is not that big especially if we consider that this is sports prediction, but we should look at individual error to see are there any one result than is completly mistaken.
From this the conclusion is drawn that the neural network memorizes the training data well, but fails to generate correct output for some of the new test data. The problem may lie in the fact that we used 32 instances for the test vs. 74 instances that are used to train neural network. So how many data should be used for testing? Some authors believe that the 10% could be a practical choice. We will create four new training sets. More precisely we will make two training set to train and two training set to test the same architecture. Two training sets, which we use to train the network, will consist of 80% and 90% of the initial instances of our original training set, and the remaining two training sets, which we use to test the network, will consist of 20% and 10% of the initial instances of our original training set. And further we will restricted to a maximum error of 0.01, 0.2 for learning rate and 0.4 for momentum.
Advanced training results for the different samples:
Training attempt |
Training set |
Testing set |
Iterations |
Total Net Error (during training) |
Total Mean Square Error (during testing) |
19 |
70% |
30% |
3850 |
0.0096 |
0.1144 |
20 |
80% |
20% |
643 |
0.0098 |
0.0768 |
21 |
90% |
10% |
109 |
0.0095 |
0.0981 |
Conclusion
During this experiment, we have created several different architectures of neural networks. We wanted to find out what is the most important thing to do during the neural network training in order to get the best results. We normalize the original data set using a Max-min normalization or linear scaling method. Through six basic steps we explained in detail the creation, training and testing neural networks. If the network architecture using a small number of hidden neurons training will become excessively and the network may over fit no matter what are the values of training parameters.
Final results of our experiment are given in the table below. There are the results obtained using standard training techniques and the results obtained by using advanced training techniques. The best solution is indicated by a blue background.
Training results for the different samples:
Training attempt |
Number of hidden neurons |
Number of hidden layers |
Testing set |
Maximum error |
Learning rate |
Momemtum |
Total Mean Square Error |
Number of iterations |
Number of correct guesses |
Network trained |
1 |
2 |
1 |
full |
0.01 |
0.2 |
0.7 |
- |
10522 |
- |
no |
2 |
2 |
1 |
full |
0.01 |
0.3 |
0.7 |
- |
18141 |
- |
no |
3 |
5 |
1 |
full |
0.01 |
0.2 |
0.7 |
- |
18307 |
- |
no |
4 |
8 |
1 |
full |
0.01 |
0.4 |
0.4 |
- |
16388 |
- |
no |
5 |
8 |
1 |
full |
0.01 |
0.4 |
0.4 |
- |
11017 |
- |
no |
6 |
10 |
1 |
full |
0.01 |
0.2 |
0.4 |
- |
13809 |
- |
no |
7 |
10 |
1 |
full |
0.01 |
0.3 |
0.4 |
- |
12384 |
- |
no |
8 |
12 |
1 |
full |
0.01 |
0.2 |
0.4 |
- |
14853 |
- |
no |
9 |
12 |
1 |
full |
0.01 |
0.3 |
0.4 |
- |
24494 |
- |
no |
10 |
12 |
1 |
full |
0.01 |
0.2 |
0.3 |
- |
13582 |
- |
no |
11 |
14 |
1 |
full |
0.01 |
0.2 |
0.4 |
- |
7386 |
- |
no |
12 |
14 |
1 |
full |
0.01 |
0.3 |
0.4 |
- |
9347 |
- |
no |
13 |
16 |
1 |
full |
0.01 |
0.2 |
0.4 |
- |
4068 |
- |
no |
14 |
16 |
1 |
full |
0.01 |
0.3 |
0.4 |
- |
25676 |
- |
no |
15 |
18 |
1 |
full |
0.01 |
0.2 |
0.4 |
- |
31943 |
- |
no |
16 |
18 |
1 |
full |
0.01 |
0.25 |
0.4 |
- |
10547 |
- |
no |
17 |
20 |
1 |
full |
0.01 |
0.25 |
0.4 |
- |
653 |
- |
no |
18 |
20 |
1 |
full |
0.01 |
0.2 |
0.4 |
0.0058 |
66 |
5/5 |
yes |
19 |
20 |
1 |
only 70% of instances used |
0.01 |
0.2 |
0.4 |
0.1144 |
3850 |
17/32 |
yes |
20 |
20 |
1 |
only 80% of instances used |
0.01 |
0.2 |
0.4 |
0.0768 |
643 |
19/22 |
yes |
21 |
20 |
1 |
only 90% of instances used |
0.01 |
0.2 |
0.4 |
0.0981 |
109 |
11/11 |
yes |
The computing world has a lot to gain from neural networks. Their ability to learn by example makes them very flexible and powerful. Furthermore there is no need to devise an algorithm in order to perform a specific task. There is no need to understand the internal mechanisms of that task. Perhaps the most exciting aspect of neural networks is the possibility that some day 'conscious' networks might be produced.
Download:
Data set used in this tutorial
Training sets
Neuroph project
See also:
Multi Layer Perceptron Tutorial