CLASSIFICATION OF ANIMAL SPECIES USING NEURAL NETWORK - PART 2
An example of a multivariate data type classification problem using Neuroph
by Boris Ruzic, Faculty of Organizational Sciences, University of Belgrade
an experiment for Intelligent Systems course
Introduction
This work represents a continuation of the experiment CLASSIFICATION OF ANIMAL SPECIES USING NEURAL NETWORK.
Classification is one of the most frequently encountered decision making tasks of human activity. A classification problem occurs when an object needs to be assigned into a predefined group or class based on a number of observed attributes related to that object. Because of this the aim of cluster analysis is to classify the objects into clusters, especially in such a way that two objects of the same cluster are more similar than the objects of other clusters. Neural networks are machine learning technology suitable for ill-defined problems, such as recognition, prediction, classification, and control. Advantage of neural networks lies in the following aspects. First, they can adjust themselves to the data without any explicit specification of functional or distributional form for the underling model, because they are data driven self-adaptive methods. Second, neural networks are nonlinear models, which makes them flexible in modeling real world complex relationships. Finally, neural networks can approximate any function with arbitrary accuracy.
Introduction to the problem
The purpose of this experiment is to present the results from previous work in graphical form. Also to study the feasibility of classification animal species using neural networks. An animal class is made up of animal that are all alike in important ways. So we need to train a neural network to make it able to predict which species belong to a particular group. Once we have decided on a problem to solve using neural networks, we will need to gather data for training purposes. The training data set includes a number of cases, each containing values for a range of input and output variables. The data set that we use in this experiment can be found at http://archive.ics.uci.edu/ml/datasets.html under the category classification. In this category there are many sets of data but for the purposes of this experiment we will use the data set named Zoo. This set of data was published by Richard Forsyth (date donated: 1990-05-15).
This database includes 101 cases. Each case is the name of animal. It was found that each of these animals belonged to one of seven classes.
Each variable is type of Boolean, except variable animal name which is nominal variable and variable legs is a numeric variable (set of values: {0, 2, 4, 6, 8}).
Some training attempts will show the relations between certain variables and at the end of experiment will be presented relations between three different parameters - learning rate, momentum and number of iterations.
Training attempt 2
Table 1. Training results for the first architecture
Training attempt |
Hidden Neurons |
Learning Rate |
Momentum |
Max Error |
Number of iterations |
Total Net Errors |
1. |
2 |
0.2 |
0.7 |
0.01 |
19540 |
0.0201 |
2. |
2 |
0.3 |
0.7 |
0.01 |
19798 |
0.1977 |
3. |
2 |
0.5 |
0.4 |
0.01 |
25630 |
0.1289 |
4. |
2 |
0.7 |
0.7 |
0.01 |
20342 |
0.1995 |
5. |
2 |
0.9 |
0.8 |
0.01 |
20907 |
0.3007 |
Based on data from Table 1 can be seen that regardless of the parameters of training error do not falls below a specified level, even if we train the network through a different number of iterations.
This all may be due to the small number of hidden neurons. In the following solution we will increase the number of hidden neurons.
Here you can see relation between learning rate and number of iterations.
As you can see we can conclude that the smallest number of iterations is when the learning rate is also the smallest -> 0.2. But the bigest number of iterations is when the learning rate has mean value -> 0.5. On the second graph total net errors has tendency of growth as learning rate increase.
And below it can be seen relation between momentum and number of iterations.
So number of iterations has the biggest value when momentum has the smallest.
Recommendation: If you do not get the desired results, continue to gradually increase the training parameters. The neural network will definitely learn the new sample, and it would not forget all the samples it had learnt previously.
Conclusion
During this experiment, we created six different architectures, one basic training set and six training sets derived from the basic training set. We normalize the original data set using a linear scaling method. Through graphs we have shown relations between major parameters. We have concluded that one layer of hidden neurons is enough in this case. Also, the experiment showed that the success of a neural network is very sensitive to parameters chosen in the training process. If the network architecture using a small number of hidden neurons training will become excessively and the network may over fit no matter what are the values of training parameters. Through the various tests we have demonstrated the sensitivity of neural networks to high and low values of learning parameters. We have shown that the best solution to the problem of classification of animal species, in seven different groups, is architecture with one hidden layer and six hidden neurons. Finally, in the table below can been seen the overall results of this experiment. Best solution is indicated in green color.
Training attempt |
Number of hidden neurons |
Number of hidden layers |
Training set |
Maximum error |
Learning rate |
Momentum |
Total mean square error |
Number of iterations |
Number of correct guesses
| Network trained |
1 |
2 |
1 |
full |
0.01 |
0.2 |
0.7 |
- |
19540 |
- |
no |
2 |
2 |
1 |
full |
0.01 |
0.3 |
0.7 |
- |
19798 |
- |
no |
3 |
2 |
1 |
full |
0.01 |
0.5 |
0.4 |
- |
25630 |
- |
no |
4 |
2 |
1 |
full |
0.01 |
0.7 |
0.7 |
- |
20342 |
- |
no |
5 |
2 |
1 |
full |
0.01 |
0.9 |
0.8 |
- |
20907 |
- |
no |
6 |
4 |
1 |
full |
0.01 |
0.001 |
0.05 |
- |
2000 |
- |
no |
7 |
4 |
1 |
full |
0.01 |
0.9 |
0.9 |
- |
2000 |
- |
no |
8 |
4 |
1 |
full |
0.01 |
0.5 |
0.5 |
- |
2000 |
- |
no |
9 |
6 |
1 |
full |
0.01 |
0.6 |
0.4 |
0.00267 |
71 |
3/5 |
yes |
10 |
6 |
1 |
full |
0.01 |
0.7 |
0.4 |
0.002557 |
1 |
5/5 |
yes |
11 |
6 |
1 |
only 70% of instances used |
0.01 |
0.7 |
0.4 |
0.01526 |
53 |
16/31 |
yes |
12 |
6 |
1 |
only 85% of instances used |
0.01 |
0.7 |
0.4 |
0.02003 |
250 |
15/16 |
yes |
13 |
6 |
1 |
only 90% of instances used |
0.01 |
0.7 |
0.4 |
0.01005 |
119 |
11/11 |
yes |
14 |
10 |
1 |
full |
0.01 |
0.7 |
0.4 |
0.00256 |
62 |
5/5 |
yes |
15 |
10 |
1 |
only 70% of instances used |
0.01 |
0.7 |
0.4 |
0.00251 |
80 |
5/7 |
yes |
16 |
10 |
1 |
only 85% of instances used |
0.01 |
0.7 |
0.4 |
0.00203 |
2000 |
4/6 |
yes |
17 |
10 |
1 |
only 90% of instances used |
0.01 |
0.7 |
0.4 |
0.00191 |
2000 |
11/11 |
yes |
18 |
18 |
1 |
full |
0.01 |
0.7 |
0.4 |
0.00252 |
60 |
4/6 |
yes |
19 |
18 |
1 |
only 70% of instances used |
0.01 |
0.7 |
0.4 |
0.01364 |
37 |
11/12 |
yes |
20 |
18 |
1 |
only 85% of instances used |
0.01 |
0.7 |
0.4 |
0.00993 |
2000 |
12/15 |
yes |
21 |
18 |
1 |
only 90% of instances used |
0.01 |
0.7 |
0.4 |
0.00205 |
2000 |
11/11 |
yes |
22 |
30 |
1 |
full |
0.01 |
0.7 |
0.4 |
0.00252 |
2000 |
6/8 |
yes |
23 |
30 |
1 |
only 70% of instances used |
0.01 |
0.7 |
0.4 |
0.00269 |
2000 |
7/11 |
yes |
24 |
30 |
1 |
only 85% of instances used |
0.01 |
0.7 |
0.4 |
0.00896 |
2000 |
16/18 |
yes |
25 |
30 |
1 |
only 90% of instances used |
0.01 |
0.7 |
0.4 |
0.00401 |
2000 |
11/11 |
yes |
Dynamic Backpropagation
These are the results of a Dynamic Backpropagation algoritam used on the best example in our experiment.
Training Results:
For this training, we used Sigmoid transfer function.
Total Net Error graph look like this:
Practical Testing:
Impact of Learning rate on Number of iterations
Now here will be shown relation between Learning Rate and Number of iterations and relation between Momentum and of course Number of iterations.
This graph shows what order the values had been chosen for Learning rate and how it affected the Number of iterations. We can conclude that it takes less iterations to make the network well trained. Learning rate is a value ranging from zero to unity. Choosing a value very close to zero, requires a large number of training cycles. This makes the training process extremely slow. On the other hand, if the learning rate is very large, the weights diverge and the objective error function heavily oscillates and the network reaches a state where no useful training takes place.
Impact of Momentum on Number of iterations
On this graph we can see that Number of iterations has the highest value when the momentum is 0.4, and has the lowest value when the momentum is 0.9. When we found the right value for momentum, number of iterations as more and more reduced. The momentum parameter is used to prevent the system from converging to a local minimum or saddle point. A high momentum parameter can also help to increase the speed of convergence of the system. However, setting the momentum parameter too high can create a risk of overshooting the minimum, which can cause the system to become unstable. A momentum coefficient that is too low can not reliably avoid local minima, and can also slow down the training of the system.
Impact of Hidden neurons on Number of iterations
Below is a graph that shows relation beetween number of hidden neurons and iterations.
On this graph we can see that by increasing the number of hidden neurons the network can be successfully trained with a smaller number of iterations.
Impact of Total Net Error on Number of hidden neurons
On the next graph it is shown relation between number of hidden neurons and total net error.
Result of this graph shows impact of number of hidden neurons on total net error. So the more we increase the number of hidden neurons, total net error will decrease even more.
DOWNLOAD
Data set used in this tutorial
Training sets
Neuroph project
See also:
Multi Layer Perceptron Tutorial
|