Training attempt 1.1

Let's train the same network with some other different values for learning rate and momentum. We won't be changing the value of the max error - it remains 0.02. If we enter 0.3 as value for learning rate and 0.5 for momentum, this is what happens:

We see that in this case total net error is much bigger than the set value for max error which means that the training is not complete and after 13800 iterations the network cannot be tested. Training attempt 1.2
Let's increase the value of learning rate to 0.4, while the value for momentum remains the same. The result is as follows:

We now see that with the increased value of learning rate the total net error is even bigger with the increased number of iterations. After this, we conclude that increasing the value of the learning rate leads to oscilations of the objective error function and the network reaches a state where no useful training takes place. Training attempt 4

4.4 Step Creating a neural network

Following these rules, we now decide for a neural network that contains 3 hidden neurons in one hidden layer. Again, we type in the standard number of inputs and outputs, check 'Use Bias Neurons', choose a Sigmoid Transfer function, and select 'Backpropagation with Momentum' as the Learning rule.

In this case, we chose three hidden neurons.

Graphical representation of neural network
5.4 Step Train the network

The neural network, that will be used as our second solution to the problem, has been created. Like the previous neural network, we will train this one with the training set we created before, with the entire sample. We select 'NewTrainingSet1', click 'Train' and a new window appears, asking us to fill in the parameters. This time, since there are more neurons in the hidden layer, we can select the maximum error to be 0.01. We do not limit the maximum number of iterations. As for the learning parameters, the learning rate will be 0.2, and momentum 0.7. After we click 'Train', the iteration process starts. The total net error, grows very fast and stops in 23 iteration with error 0.002637448922.

6.4 Step Testing the network

Total Mean Square Error measures the average of the squares of the "errors". The error is the amount by which the value implied by the estimator differs from the quantity to be estimated. An mean square error of zero, meaning that the estimator predicts observations of the parameter with perfect accuracy, is the ideal, but is practically never possible. The unbiased model with the smallest mean square error is generally interpreted as best explaining the variability in the observations. The test showed that total mean square is 0.16207228477883195. The goal of experimental design is to construct experiments in such a way that when the observations are analyzed, the mean square error is close to zero relative to the magnitude of at least one of the estimated treatment effects.

After examining all the errors in test results, we find that a lot of error values are about 0.5 and 0.4. All the other errors are very low, and a large majority is 0.0259.

The only thing left is to put the random inputs stated above into the neural network. The result of the test are shown in the table. The network guessed right in all five cases.

The final part of testing this network is testing it with several input values. To do that, we will select 5 random input values from our data set.

The output neural network produced for this input is the last two columns.

observation

age of patient

patient's year of operation

number of positive axillary nodes detected

patient survived 5 years or longer

patient died within 5 year

patient survived - obtained outputs

patient died - obtained outputs

1.

0

0.636364

0

0

1

0.10344

0.89639

2.

0.264151

0.818182

0.307692

0

1

0.39274

0.60676

3.

0.358491

0.454545

0.057692

0

1

0.10799

0.89188

4.

0.509434

0.272727

0

0

1

0.11041

0.88954

5.

0.811321

0.909091

0

0

1

0.13654

0.86339

The network guessed correct in all five instances. After this test, we can conclude that this solution does not need to be rejected. We see that the output values approximate those found in the date set, only for the 80th sample values are a little different than they should.
Training attempt 4.1
Now, using the same neural network with 3 hidden neurons, let's run a few more trainings keeping the value of the learning rate 0.3 and just changing the momentum.

The total net error is still too big. Training attempt 4.2
We go back to the NewNeuralNetwork window, click the Randomize button, then click the Train button again and enter 0.5 as a new value for momentum. The result is this:

Training attempt 4.3
Let's decrease momentum once again to see what effect it has on the training. Let new momentum be 0.2:

We see that at the constant learning rate of 0.3 and with graduall decreasing momentum (0.7, 0.5, 0.2) total net error does not change much (it is always around 0.17...) and the network learns slowly due to large number of iterations. Training attempt 4.4
If we set the momentum at the intial value of 0.7 and with the new value for learning rate of 0.5 the following happens:

At iteration 165 the total net error is smaller than the set max error (0.01), the training is complete and we can test the network.
6.4.4 Step Testing the network
After the network is trained, we click 'Test', in order to see the total error, and all the individual errors.

The test showed that total mean square is 0.18931258619161326. Looking at the individual error, we can observe that most of them are at low level - around 0.0289. However, there are still some cases where those errors are considerably larger (around 0.3622) which means we should try some other neural network.
Values of inputs, outputs and individual errors, in 5 randomly selected observations, are in table below:

Inputs

Outputs

Individual errors

observation

age of patient

patient's year of operation

number of positive axillary nodes detected

patient survived 5 years or longer

patient died within 5 year

patient survived

the patient

1.

0.6981

0.2727

0

0.0295

0.9763

0.0295

-0.0237

2.

0.7358

0.1818

0

0.3622

0.6323

0.3622

-0.3677

3.

0.7547

0.8182

0

0.0289

0.9769

0.0289

-0.0231

4.

0.8113

0.9091

0

0.0289

0.9769

0.0289

-0.0231

5.

0.8679

0.8182

0

0.0363

0.9703

0.0363

-0.0297

The network guessed correct in all five instances. After this test, we can conclude that this solution does not need to be rejected. Training attempt 6

4.6 Creating new neural network

Next Neural Network will have same number of input and output neurons but different number of neurons in hidden layer. We will use 4 hidden layer neurons. Network in named NewNeuralNetwork3.

And the neuronal network looks like this:
5.6 Step Training the network

We will train the network the same way, with learning rate value 0.5 and momentum 0.7, and max error 0.01.

The error function minimum is first moving horizontally most of the path and suddenly begins to oscillate, and stopping at iteration 117.
6.6 Step Testing the network

We clicked 'Test' after 'Training', to see whether more neurons contribute to a better training. As we see, not to reduce the error. Almost the same as in the first training network. error is approximately the same as in first practice.

Training attempt 6.1
We'll do few more trainings to see how changing the momentum effects the trainiings results. First, at the learniing rate of 0.5 and momentum of 0.6 we have this:

Training attempt 6.2
We decrease momentum to 0.4 now:

Our conclusion is that the decreased momentum can lead to growing number of iterations which means that network learns slower at higher momentum values. Training attempt 10

5.10 Step Training the network

In our project HabermanSurvival, we create a new network NewNeuralNetwork4.

This neural network will contain 6 neurons in hidden layer, as we see in picture below, and same options as previous networks, or 3 inputs and 2 outputs.

The learning parameters set the maximum error of 0.01, learning rate of 0.2 and momentum of 0.7. Than we click "Train" and wait.

The total net error slowly descends but with high oscilation and stops when reaches a level lower than the given (0.01) in 69 iteration.

6.10 Step Testing the Neural Network

Total Mean Square Error measures the average of the squares of the "errors". The error is the amount by which the value implied by the estimator differs from the quantity to be estimated. An mean square error of zero, meaning that the estimator predicts observations of the parameter with perfect accuracy, is the ideal, but is practically never possible. The model with the smallest mean square error is generally interpreted as best explaining the variability in the observations. The test showed that total mean square is 0.16044272735952059. The goal of experiment is to construct experiments in such a way that when the observations are analyzed, the mean square error is close to zero relative to the magnitude of at least one of the estimated treatment effects.

Now we need to examine all the individual errors for every single instance and check if there are any extreme values. When you have a large data set, individual testing requires a lot of time. Instead of testing 106 observations we will random choose 5 observations which will be subjected to individual testing.

In introduction we mentioned that result can belong to one of two groups. So if patient survived 5 years or longer output would be 0,1 and if patient died within 5 year output would be 1, 0. After completion of testing would be ideal if the value of output after the test were the same as the output values before testing. As with other statistical methods, and classification using neural networks include errors that arise during the approximation.

Values of inputs, outputs and individual errors, in 5 randomly selected observations, are in table below:

Inputs

Outputs

Individual errors

observation

age of patient

patient's year of operation

number of positive axillary nodes detected

patient survived 5 years or longer

patient died within 5 year

patient survived

the patient

1.

0.6981

0.7273

0

0.0058

0.9942

0.0058

-0.0058

2.

0.7547

0

0

0.4528

0.5472

0.4528

-0.4528

3.

0.8113

0.3636

0

0.4528

0.5472

0.4528

-0.4528

4.

0.717

0.8182

0

0.0339

0.9661

0.0339

-0.0339

5.

0.8679

0.8182

0

0.0018

0.9982

0.0018

-0.0018

The network guessed all of them right. We can conclude that this network has a good ability of generalization, and, the training of this network has been validated.
Training attempt 10.1
Now we will try some variations. Let's set learning rate to 0.4, while momentum remains the same: 0.7. These are the results:

As we can see, the training is not complete because the error is too big. The number of iterations is also huge. Training attempt 10.2
We can try to increase learning rate to, let's say, 0.6, which could lead to faster learning. This is the result:

As we can see, the number of iterations dropped.
In the following table we will show the results of all the trainings done using this neural network. First, we change values for learning rate from 0.0 to 1.0 while keeping the momentum fixed (0.7). We choose to keep this value for momentum fixed because previosly it has given the best result. Then we will keep the value for learning rate fixed - but the one that has given the best result, and that is 0.2, and then change values for momentum.

Training attempt

Number of hidden neurons

Number of hidden layers

Training set

Maximum error

Learning rate

Momentum

Total net error

Number of iterations

Total mean square error

5 random inputs test - number of correct guesses

Network trained

10

6

1

Full

0.01

0.2

0.7

0.00865

69

0.29702

5/5

yes

10.1

6

1

Full

0.01

0.4

0.7

0.17375

61143

-

-

no

10.2

6

1

Full

0.01

0.6

0.7

0.17693

15696

-

-

no

10.3

6

1

Full

0.01

0.0

0.7

0.09439

671428

-

-

no

10.4

6

1

Full

0.01

0.1

0.7

0.15933

191993

-

-

no

10.5

6

1

Full

0.01

0.3

0.7

0.17693

16596

-

-

no

10.6

6

1

Full

0.01

0.5

0.7

0.18518

4125

-

-

no

10.7

6

1

Full

0.01

0.7

0.7

0.19316

6568

-

-

no

10.8

6

1

Full

0.01

0.8

0.7

0.19686

6960

-

-

no

10.9

6

1

Full

0.01

0.9

0.7

0.19431

188454

-

-

no

10.10

6

1

Full

0.01

1.0

0.7

0.19693

48296

-

-

no

10.11

6

1

Full

0.01

0.2

0.0

0.06984

320496

-

-

no

10.12

6

1

Full

0.01

0.2

0.1

0.06144

163163

-

-

no

10.13

6

1

Full

0.01

0.2

0.2

0.00711

79

0.16942

5/5

yes

10.14

6

1

Full

0.01

0.2

0.3

0.13897

121350

-

-

no

10.15

6

1

Full

0.01

0.2

0.4

0.16768

147553

-

-

no

10.16

6

1

Full

0.01

0.2

0.5

0.15256

32627

-

-

no

10.17

6

1

Full

0.01

0.2

0.6

0.16898

28604

-

-

no

10.18

6

1

Full

0.01

0.2

0.8

0.17167

16112

-

-

no

10.19

6

1

Full

0.01

0.2

0.9

0.00846

57

0.22436

5/5

yes

10.20

6

1

Full

0.01

0.2

1.0

0.23856

755

-

-

no

We see from the table that training attempts 10.13 and 10.19 were successfull (training attempt 10 was presented before). We tested them both on 5 random inputs each, and network guessed right all of them in both cases. Here are the graphic presentations:

Training attempt 10.13:

Five different inputs for training attempt 10.13:

observation

age of patient

patient's year of operation

number of positive axillary nodes detected

patient survived 5 years or longer

patient died within 5 year

patient survived - obtained outputs

patient died - obtained outputs

1.

0.47169

0

0.01923

0

1

0.11496

0.89144

2.

0.49057

0.72727

0.03846

0

1

0.11496

0.89145

3.

0.50943

0.54545

0.17308

0

1

0.40814

0.58609

4.

0.71698

0.90909

0

0

1

0.11496

0.89144

5.

0.84901

0.36364

0.19231

0

1

0.40814

0.58609

Graphics for training attempt 10.19:

Five random inputs for training attempt 10.19:

observation

age of patient

patient's year of operation

number of positive axillary nodes detected

patient survived 5 years or longer

patient died within 5 year

patient survived - obtained outputs

patient died - obtained outputs

1.

0.37736

0.54545

0

0

1

0.00445

0.99552

2.

0.41509

1

0

0

1

0.00445

0.99552

3.

0.50943

0.54545

0.17308

0

1

0.13442

0.89227

4.

0.67924

0

0

0

1

0.00445

0.99552

5.

0.88679

0.63636

0.05769

0

1

0.13442

0.89227

Training attempt 11
4.11 Step Creating a neural network

We now decide for a neural network that contains 8 hidden neurons in one hidden layer. Again, we type in the standard number of inputs and outputs, check 'Use Bias Neurons', choose a Sigmoid Transfer function, and select 'Backpropagation with Momentum' as the Learning rule.

In this case, we chose eight hidden neurons.

Graphical representation of neural network
5.11 Step Train the network

The neural network has been created. We will train this one with the training set we created before, actually with the 80% of the sample. We select 'HabermanSurvival80', click 'Train' and a new window appears, asking us to fill in the parameters. This time, since there are more neurons in the hidden layer, we can select the maximum error to be 0.01. We do not limit the maximum number of iterations. As for the learning parameters, the learning rate will be 0.2, and momentum 0.3. After we click 'Train', the iteration process starts. The total net error, grows very fast and stops in 47696 iteration with error 0.2845048425.

Training attempt 11.1
Since learning rate determines how fast a neural network learns, the smaller its value is the more time it will take the network to learn. So now we can increase learning rate to 0.6 and keep momentum as it is, that is 0.3:

As we have previously stated, higher learning rate speeds up the process of learning, which can be seen here - number of iterations dropped from 47696 to only 316. Still, total net error is very high, which means the training is not complete. Training attempt 11.2
If now we keep the same value for learning rate of 0.6 with increased momentum to 0.5, we see that the network is completely trained after 20 iterations and total net error is smaller than max error. We can test it.

6.11 Step Testing the network

Values of inputs, outputs and individual errors, in 5 randomly selected observations, are in table below:

observation

age of patient

patient's year of operation

number of positive axillary nodes detected

patient survived 5 years or longer

patient died within 5 year

patient survived - obtained outputs

patient died - obtained outputs

1.

0.37736

0.54545

0

0

1

0.23923

0.76102

2.

0.41509

1

0

0

1

0.37159

0.62857

3.

0.50943

0.54545

0.17308

0

1

0.08768

0.91255

4.

0.67924

0

0

0

1

0.99148

0.00855

5.

0.88679

0.63636

0.05769

0

1

0.08757

0.91265

We see from the table that network guessed 4 out of 5 instances.

Below is a table that summarizes this experiment. The two best solutions for the problem are in bold and have a yellow background.

Training attempt

Number of hidden neurons

Number of hidden layers

Training set

Maximum error

Learning rate

Momentum

Total mean square error

Number of iterations

5 random inputs test - number of correct guesses

Network trained

1

2

1

80% of full date set

0.02

0.3

0.4

0.29702

7

4/5

yes

1.2

2

1

80% of full date set

0.02

0.3

0.5

-

13800

-

no

1.3

2

1

80% of full date set

0.02

0.4

0.5

-

568412

-

no

2

2

1

full

0.01

0.2

0.7

-

51

-

no

3

2

1

full

0.01

0.3

0.7

-

2876

-

no

4

3

1

full

0.01

0.2

0.7

0.16207

23

5/5

yes

4.1

3

1

full

0.01

0.3

0.6

-

26138

-

no

4.2

3

1

full

0.01

0.3

0.5

-

37758

-

no

4.3

3

1

full

0.01

0.3

0.2

-

19988

-

no

4.4

3

1

full

0.01

0.5

0.7

0.18931

165

-

yes

5

3

1

full

0.01

0.3

0.7

-

7109

-

no

6

4

1

full

0.01

0.5

0.7

0.18417

117

-

yes

6.1

4

1

full

0.01

0.5

0.6

-

121094

-

no

6.2

4

1

full

0.01

0.5

0.4

-

623369

-

no

6.3

4

1

full

0.01

0.4

0.6

-

10456

-

no

7

4

1

full

0.01

0.2

0.7

-

42300

-

no

8

4

1

full

0.01

0.3

0.7

-

20009

-

no

9

4

1

full

0.01

0.7

0.7

-

9535

-

no

10

6

1

full

0.01

0.2

0.7

0.16044

69

5/5

yes

10.1

6

1

full

0.01

0.4

0.7

-

61143

-

no

10.2

6

1

full

0.01

0.6

0.7

-

25551

-

no

10.3

6

1

Full

0.01

0.0

0.7

-

671428

-

no

10.4

6

1

Full

0.01

0.1

0.7

-

191993

-

no

10.5

6

1

Full

0.01

0.3

0.7

-

16596

-

no

10.6

6

1

Full

0.01

0.5

0.7

-

4125

-

no

10.7

6

1

Full

0.01

0.7

0.7

-

6568

-

no

10.8

6

1

Full

0.01

0.8

0.7

-

6960

-

no

10.9

6

1

Full

0.01

0.9

0.7

-

188454

-

no

10.10

6

1

Full

0.01

1.0

0.7

-

48296

-

no

10.11

6

1

Full

0.01

0.2

0.0

-

320496

-

no

10.12

6

1

Full

0.01

0.2

0.1

-

163163

-

no

10.13

6

1

Full

0.01

0.2

0.2

0.16942

79

5/5

yes

10.14

6

1

Full

0.01

0.2

0.3

-

121350

-

no

10.15

6

1

Full

0.01

0.2

0.4

-

147553

-

no

10.16

6

1

Full

0.01

0.2

0.5

-

32627

-

no

10.17

6

1

Full

0.01

0.2

0.6

-

28604

-

no

10.18

6

1

Full

0.01

0.2

0.8

-

16112

-

no

10.19

6

1

Full

0.01

0.2

0.9

0.22436

57

5/5

yes

10.20

6

1

Full

0.01

0.2

1.0

-

755

-

no

11

8

1

80% of full date set

0.01

0.2

0.3

-

47696

-

no

11.1

8

1

80% of full date set

0.01

0.6

0.3

-

316

-

no

11.2

8

1

80% of full date set

0.01

0.6

0.5

0.18884

20

4/5

yes

12

8

1

90% of full date set

0.01

0.5

0.2

-

153

-

no

13

10

1

full

0.01

0.2

0.7

-

538

-

no

We tried to train a network with more than 10 neurons and the results are bad, the network will not train.

Advanced Training Techniques

We want to check the network performance, when the training is complete. A learning neural network is expected to extract rules from a finite set of examples. It is often the case that the neural network memorizes the training data well, but fails to generate correct output for some of the new test data. Therefore, it is desirable to come up with some form of regularization.

One form of regularization is to split the training set into a new training set and a validation set. After each step through the new training set, the neural network is evaluated on the validation set. The network with the best performance on the validation set is then used for actual testing. Your new training set consisted of the say it for example 80% - 90% of the original training set, and the remaining 10% - 20% would be classified in the validation set. Then you have to compute the validation error rate periodically during training and stop training when the validation error rate starts to go up. Validation error is not a good estimate of the generalization error, if your initial set consists of a relatively small number of instances. Our initial set, we named it man, consists only of 306 instances. In this case 10% or 20%, of the original training set, consisted of the 10 or 20 instances. This is the insufficient number of instances to perform validation. In this case instead validation we will use a generalization as a form of regularization.

One way to get appropriate estimate of the generalization error is to run the neural network on the test set of data that is not used at all during the training process. The generalization error is usually defined as the expected value of the square of the difference between the learned function and the exact target.

In the following examples we will check the generalization error, such as from the example to the example we will increase the number of instances in the training set, which we use for training, and we will decrease the number of instances in the sets that we used for testing.
Training attempt 14
3.14 Step Create a Training Set

We will choose random 90% of instances of training set for training and remaining 10% for testing. First group will be called HebermanSurvival90, and second HebermanSurvival10.
5.14 Step Train the network

Unlike previous training, now there is no need to create new neural network. Advanced Training Techniques consist in the fact that we examine the performance of existing architectures, using a new training and test set of data. Satisfactory results we found using architecture NewNeuralNetwork4. By the end of this article we will use not only this architecture, but also the parameters of the training that we used in this architecture previously which brought us desired results. But before you open an existing architecture, create new training sets. First training set name it HebermanSurvival90 and second one name it HebermanSurvival10.

Now open neural network NewNeuralNetwork4, select training set HebermanSurvival90 and in new network window press button 'Train'. The parameters that we now need to set will be the same as the ones in previous training attempt: the maximum error will be 0.01, the Learning rate 0.2, and the Momentum 0.7. We will not limit the maximum number of iterations, and we will check 'Display error graph', as we want the see how the error changes throughout the iteration sequence. Then press 'Train' button again and see what will happen.

The error function do not fluctuate much, moving in a straight line, horizontally, and in 18604 iteration stops, can not find its optimal solution.

We train the 70,80,90 percent of the date set and test 30,20,10 percent randomly selected as the date set. We obtain the following table of the results:

Training attempt

Number of hidden neurons

Number of hidden layers

Training set

Test set

Maximum error

Learning rate

Momentum

Iterations

Total Net Error(during training)

Total Mean Square Error (during testing)

Network trained

14

6

1

90%

10%

0.01

0.2

0.7

18604

0.054564

0.161705

no

15

6

1

80%

20%

0.01

0.2

0.7

2380

0.05456

0.16170

no

16

6

1

70%

30%

0.01

0.2

0.7

668

0.15566

0.28574

no

After training 17th attempt we concluded that there are some cases that makes big impact on Total Mean Squared Error as an example of error as 0.8, 0.5 and 0.4.

In 18th i 19th attempts we found most big errors and correctly classified with big error (out of 24). With big error we mean that network classified completly wrong (for example it is 1, 0 but it should be 0, 1) and that error makes huge impact on Total Mean Square Error.

Because all of these network failed to make error less than 0.01 we can say that this network failed to generalize this problem.

Conclusion

Different solutions which were tested in this experiment have shown that the choice of the number of hidden neurons is crucial to the effectiveness of a neural network. Also, the experiment showed that the success of a neural network is very sensitive to parameters chosen in the training process. The learning rate must not be too high, and the maximum error must not be too low. The results have shown that the total mean square error does not reflect directly the success of a network In the end, after including only 10% of instances in the training set, we learned that even that number can be sufficient to make a good training set and a reasonably trained neural network.

DOWNLOAD

Data set used in this tutorial

The prepared date set

Neuroph projects

The samples used for advanced techniques

See also:

Multi Layer Perceptron Tutorial

Prodecure of training a neural network

Preparation the data set Create a Neuroph project Create a training set Create a neural network Train the network Test the network to make sure that it is trained properly

1.Step Preparation the data set

2.Step Creating a new Neuroph project

3.Step Creating a training set

Training attempt 1

4.1 Step Creating a neural network

5.1 Step Training the neural network

6.1 Step Testing the neural network

Training attempt 1.1

Training attempt 1.2

Training attempt 4

4.4 Step Creating a neural network

5.4 Step Train the network

6.4 Step Testing the network

Training attempt 4.1

Training attempt 4.2

Training attempt 4.3

Training attempt 4.4

6.4.4 Step Testing the network

Training attempt 6

4.6 Creating new neural network

5.6 Step Training the network

6.6 Step Testing the network

Training attempt 6.1

Training attempt 6.2

Training attempt 10

5.10 Step Training the network

6.10 Step Testing the Neural Network

Training attempt 10.1

Training attempt 10.2

Training attempt 11

4.11 Step Creating a neural network

5.11 Step Train the network

Training attempt 11.1

Training attempt 11.2

6.11 Step Testing the network

Advanced Training Techniques

Training attempt 14

3.14 Step Create a Training Set

5.14 Step Train the network

Conclusion

Preparation the data set

Create a Neuroph project

Create a training set

Create a neural network

Train the network

Test the network to make sure that it is trained properly