Forum | Blog | Wiki

Zoo database

An example of a multivariate data type classification problem using Neuroph

by Nevena Jovanovic, Faculty of Organisation Sciences, University of Belgrade

an experiment for Intelligent Systems course

Introduction

In this example we will be testing Neuroph 2.4 with Zoo database, which can be found : here. Several architectures will be tried out, and it will be determined which ones represent a good solution to the problem, and which ones do not.

First here are some useful information about our Zoo database:
Data Set Characteristics: Multivariate
Number of Instances: 101
Attribute Characteristics: Categorical,Integer
Number of Attributes: 17
Associated Tasks: Classification

Introducing the problem

In this project I will work on the classification of animals, according to some of their characteristics. Trying to train through a network of networks learn to recognize our exhibits based on the output..
It was found that each of these animals belonged to one of seven classes:

Class# -- Set of animals:

(41) aardvark, antelope, bear, boar, buffalo, calf, cavy, cheetah, deer, dolphin, elephant, fruitbat, giraffe, girl, goat, gorilla, hamster, hare, leopard, lion, lynx, mink, mole, mongoose, opossum, oryx, platypus, polecat, pony, porpoise, puma, pussycat, raccoon, reindeer, seal, sealion, squirrel, vampire, vole, wallaby, wolf
(20) chicken, crow, dove, duck, flamingo, gull, hawk, kiwi, lark, ostrich, parakeet, penguin, pheasant, rhea, skimmer, skua, sparrow, swan, vulture, wren
(5) pitviper, seasnake, slowworm, tortoise, tuatara
(13) bass, carp, catfish, chub, dogfish, haddock, herring, pike, piranha, seahorse, sole, stingray, tuna
(4) frog, frog, newt, toad
(8) flea, gnat, honeybee, housefly, ladybird, moth, termite, wasp
(10) clam, crab, crayfish, lobster, octopus, scorpion, seawasp, slug, starfish, worm

This variable, named type, represents the output variable. Except the output variable, there were 17 input variables for each animal species. Information of input variables:

animal name
hair
feathers
eggs
milk
airborne
aquatic
predator
toothed
backbone
bretahes
venomous
fins
legs
tail
domestic
catsize

Each variable is type of Boolean, except variable animal name which is nominal variable and variable legs is a numeric variable (set of values: {0, 2, 4, 6, 8}).

Handling non-numeric data, such as Boolean = {true, false}, is more difficult. However, nominal-valued variables can be represented numerically. Value true will be replaced with value 1, and value false will be replaced with value 0. We wil not use variable animal name in experiment, because this variable is unique for each case.

In this example we will be using 70%,85% and 90% of data for training the network and 30%,15% and 10% of data for testing it.

To be able to use this data set in Neuoroph, it is necessary to normalize the data, but in this dataset we used the standard formula for normalization, but will vary for each input or output assign one neuron. So in the case of inputs related to the number of legs of the animals we will have 6 different values, so we will make one column and 7 columns, which will be a combination of 1 and 0. So it will be the last column refers to the exhibit, which is seventh In this way, our dataset now have only 0 and 1 values.

Before you start reading our experiment we suggest to first get more familiar with Neuroph Studio and Multi Layer Perceptron.You can do that by clicking on the links below:

Neuroph Studio Geting started

Multi Layer Perceptron

Network design

Here you can see the structure of our network with its inputs,outputs and hidden neurons in the middle layer.

Training attempt 1

Network Type: Multi Layer Perceptron
Training Algorithm: Backpropagation with Momentum
Number of inputs: 21
Number of outputs: 7
Hidden neurons: 15

Training Parameters:
Learning Rate: 0.2
Momentum: 0.7
Max. Error: 0.01

Training Results:

For this training, we used Sigmoid transfer function.

As you can see, the neural network took 33 iterations to train. Total Net Error is acceptable 0.0095

Total Net Error graph look like this:

Practical Testing:

The final part of testing this network is testing it with several input values. To do that, we will select 4 random input values from our data set. Those are:

	Inputs															Output

number	hair	feathers	eggs	milk	airborne	aquatic	predator	toothed	backbone	venomous	fins	legs	tail	domestic	catsize	Designed outputs	Real outputs
1.	1	0	0	1	0	0	0	1	1	1	0	0,0,0,1,0,0,0	1	1	1	1,0,0,0,0,0,0	1,0,0,0,0,0,0
2	0	0	0	0	0	1	1	1	1	0	1	0,1,0,0,0,0,0	1	0	0	0,0,1,0,0,0,0	0,0,0.78,0.0019,0,0,0.0329
3	0	0	1	0	0	0	1	1	1	1	0	0,1,0,0,0,0,0	1	0	0	0,0,1,0,0,0,0	0,0,0.997,0,0,0,0.008
4.	0	0	1	0	0	0	0	0	0	1	0	0,0,0,0,0,1,0	0	0	0	0,0,0,0,0,1,0	0,0,0,0,0,0.9994,0.0223

The network guessed correct in all five instances. After this test, we can conclude that this solution does not need to be rejected. It can be used to give good results in most cases.

In our next experiment we will be using the same network,but some of the parametres will be diferent and we will see how the result is going to change.

Training attempt 2

Network Type: Multi Layer Perceptron
Training Algorithm: Backpropagation with Momentum
Number of inputs: 21
Number of outputs: 7
Hidden neurons: 14

Training Parameters:
Learning Rate: 0.5
Momentum: 0.01
Max. Error: 0.01

Training Results:

For this training, we used Sigmoid transfer function.

As you can see, the neural network took 15 iterations to train. Total Net Error is acceptable 0.009

Total Net Error graph look like this:

Practical Testing:

The only thing left is to put the random inputs stated above into the neural network. The result of the test are shown in the table. The network guessed right in all five cases.

number	1.Category	2.Category	3.Category	4.Category	5.Category	6.Category	7.Category
1.	0,9705	0	0	0	0	0	0
2.	0	0	0,9985	0	0	0	0
3.	0	0	0	0	0	0,8136	0

In this table we see exhibited in appropriate random number insanci. These are the value obtained in testing. Based on the results we can conclude that the deviations were smaller than in the previous case.

In the next two attempts we will be making a new neural network.The main difference will be the number of hidden neurons in the structure of our network and other parametres will also be changed.

Training attempt 3

Network Type: Multi Layer Perceptron
Training Algorithm: Backpropagation with Momentum
Number of inputs: 21
Number of outputs: 7
Hidden neurons: 10

Training Parameters:

Learning Rate: 0.3
Momentum: 0.9
Max. Error: 0.01

Training Results:

For this training, we used Sigmoid transfer function.

As you can see, the neural network took 7 iterations to train. Total Net Error is acceptable 0.008169

Total Net Error graph look like this:

Practical Testing:

The final part of testing this network is testing it with several input values. To do that, we will select 5 random input values from our data set. Those are:

	Inputs															Output

number	hair	feathers	eggs	milk	airborne	aquatic	predator	toothed	backbone	venomous	fins	legs	tail	domestic	catsize	Designed outputs	Real outputs
1.	1	0	0	1	0	0	0	1	1	1	0	0,0,0,1,0,0,0	1	1	1	1,0,0,0,0,0,0	0,9964,0,001,0,0019,0,0007,0,0029,0,003,0,0002
2.	0	1	1	0	1	0	0	0	1	1	0	0,0,1,0,0,0,0	1	0	0	0,1,0,0,0,0,0	0,0089,0,9824,0,0089,0,0031,0,0009,0,0053,0,0052
3.	1	0	0	1	1	0	0	1	1	1	0	0,0,0,1,0,0,0	1	0	0	1,0,0,0,0,0,0	0,9965,0,0002,0,0016,0,0003,0,0045,0,0084,0,0003
4.	1	0	0	1	0	0	0	0	0	0	1	0,0,1,0,0,0,0	0	0	0	0,0,0,0,0,1,0	0,0025,0,069,0,0276,0,0163,0,0056,0,7659,0,0403

Based on this test, we conclude that the results are worse compared to previous tests and that the error is greater than the earlier ones.

In our next experiment we will be using the same network,but some of the parametres will be diferent and we will see how the result is going to change.

Training attempt 4

Network Type: Multi Layer Perceptron
Training Algorithm: Backpropagation with Momentum
Number of inputs: 21
Number of outputs: 7
Hidden neurons: 10

Training Parameters:

Learning Rate: 0.5
Momentum: 0.4
Max. Error: 0.01

Training Results:

For this training, we used Sigmoid transfer function

As you can see, the neural network took 24 iterations to train. Total Net Error is acceptable 0.0099

Total Net Error graph look like this:

Practical Testing:

The only thing left is to put the random inputs stated above into the neural network. The result of the test are shown in the table. The network guessed right in all four cases.

number	1.Category	2.Category	3.Category	4.Category	5.Category	6.Category	7.Category
1.	0,006	0,975	0,0123	0,0006	0,0018	0,0166	0,0014
2.	0,0013	0,0129	0,0231	0,0396	0,0385	0,0429	0,8808
3.	0,9598	0,0408	0,0099	0,0003	0,0028	0,0072	0
4.	0,0079	0,0616	0,021	0,0023	0,0104	0,6888	0,0767

Training attempt 5

This time we will be making some more significant changes in the structure of our network.Now we will try to train a network with 5 neurons in its hidden layer.

Network Type: Multi Layer Perceptron
Training Algorithm: Backpropagation with Momentum
Number of inputs: 21
Number of outputs: 7
Hidden neurons: 5

Training Parameters:

Learning Rate: 0.2
Momentum: 0.7
Max. Error: 0.01

We use here 85% of data set for traning.

Training Results:

For this training, we used Sigmoid transfer function.

Training was completed in 826 iterations, but the error is large. So this is a not good selection of combinations of network and training set.

Total Net Error graph look like this:

So the conclusion of this experiment is that the choice of the number of hidden neurons is crucial to the effectiveness of a neural network.

One of the "rules" for determining the correct number of neurons to use in the hidden layers is that the number of hidden neurons should be between the size of the input layer and the size of the output layer. Formula that we used looks like this:((number of inputs + number of outputs)/2)+1.In that case we made a good network that showed great results.Then we made a network with less neurons in its hidden layer and the results were not as good as before. So,in the next example we are going to see how will the network react with a greater number of hidden neurons.

Training attempt 6

Network Type: Multi Layer Perceptron
Training Algorithm: Backpropagation with Momentum
Number of inputs: 21
Number of outputs: 7
Hidden neurons: 10

We use here 85% of data set for traning, and 15 % data set for testing.

Training Parameters:

Learning Rate: 0.02
Momentum: 0.7
Max. Error: 0.01

Training Results:

For this training, we used Sigmoid transfer function.

As you can see, the neural network took 23 iterations to train. Total Net Error is acceptable 0.0099

Total Net Error graph look like this:

Practical Testing:

The final part of testing this network is testing it with several input values. To do that, we will select 4 random input values from our data set. Those are:

	Inputs																Output

number	hair	feathers	eggs	milk	airborne	aquatic	predator	toothed	backbone	breathes	venomous	fins	legs	tail	domestic	catsize	Designed outputs	Real outputs
1.	1	0	0	1	0	0	1	1	1	1	0	0	0,0,1,0,0,0	0	0	1	1,0,0,0,0,0,0	1,0,0,0009,0,0,0
2.	0	0	1	0	0	1	1	1	1	0	0	1	1,0,0,0,0,0	1	0	0	0,0,0,1,0,0,0	0,0,0,0291,1,0,0,0
3.	0	0	1	0	0	1	0	1	1	0	0	1	1,0,0,0,0,0	1	1	0	0,0,0,1,0,0,0	0,0,0,0455,1,0,0,0
4.	0	0	1	0	0	0	1	0	0	0	0	0	1,0,0,0,0,0	0	0	0	0,0,0,0,0,0,1	0,0,0,0003,0,0,0,1

We can conclude that a decrease in the number of hidden neurons reduces the total error in the testing and to appear in the test outputs deviate less than anticipated.

As you can see,this number of hidden neurons with appropriate combination of parametres also gave a good results except the third instances, where the error is 0.0000455.

Training attempt 7

Now we will see how the same network is going to work with a diferent set of parametres.

Network Type: Multi Layer Perceptron
Training Algorithm: Backpropagation with Momentum
Number of inputs: 21
Number of outputs: 7
Hidden neurons: 10

Training Parameters:

Learning Rate: 0.01
Momentum: 0.7
Max. Error: 0.01

Training Results:

For this training, we used Sigmoid transfer function.

As you can see, the neural network took 463 iterations to train. Total Net Error is acceptable 0.0099.

Total Net Error graph look like this:

Practical Testing:

The only thing left is to put the random inputs stated above into the neural network. The result of the test are shown in the table. The network guessed right in all four cases. In this case we can see that the deviations from the defined high value on the basis of error in the test. And based on the total error in the test.

Outputs	Individual errors
0.9925, 0.0032, 0.0085; 0.0011, 0.0133, 0.004, 0.0001	-0.0075, 0.0032, 0.0085, 0.0011, 0.0133, 0.004, 0.0001
0.0097, 0.9667, 0.0091, 0.0012, 0.003, 0.0117, 0.0098	0.0097, -0.0333, 0.0091, 0.0012, 0.003, 0.0117, 0.0098
0.3141, 0.0941, 0.0133, 0.0004, 0.0215, 0.027, 0.0184	0.3141, 0.0941, 0.0133, 0.0004, 0.0215, 0.027, -0.9816

In this table you can see a few cases from test result, their outputs and individual errors.

Although in this example we used considerably different set of parametres the network gave a good results in the test.

Some statistics

Four different solutions tested in this experiment have shown that the choice of the number of hidden neurons is very important for the effectiveness of a neural network. We have concluded that one layer of hidden neurons is enough in this case. Also, the experiment showed that the success of a neural network is very sensitive to parameters chosen in the training process. The learning rate must not be too high, and the maximum error must not be too low.

Below is a table that summarizes this experiment. The best solution for the problem is marked in the table.

Training attempt	Number of hidden neurons	Number of hidden layers	Training set	Maximum error	Learning rate	Momentum	Total mean square error	Number of iterations	Test set	Network trained
1	15	1	70% of full data set	0.01	0.2	0.7	0.00084	23	30% of full data set	yes
2	15	1	70% of full data set	0.01	0.5	0.01	0.042	15	30% of full data set	yes
3	10	1	70% of full data set	0.01	0.3	0.9	0.0411	7	30% of full data set	yes
4	10	1	70% of full data set	0.01	0.5	0.4	0.0419	24	30% of full data set	yes
5	5	1	85% of full data set	0.01	0.2	0.7	/	826	15% of full data set	yes
6	10	1	85% of full data set	0.01	0.02	0.7	0,000045	94	15% of full data set	yes
7	10	1	70% of full data set	0.01	0.01	0.7	0,0099	463	30% of full data set	yes

Advanced Training Techniques

When the training is complete, you will want to check the network performance. A learning neural network is expected to extract rules from a finite set of examples. It is often the case that the neural network memorizes the training data well, but fails to generate correct output for some of the new test data. Therefore, it is desirable to come up with some form of regularization.

One form of regularization is to split the training set into a new training set and a validation set. After each step through the new training set, the neural network is evaluated on the validation set. The network with the best performance on the validation set is then used for actual testing. Your new training set consisted of the say it for example 80% - 90% of the original training set, and the remaining 10% - 20% would be classified in the validation set. Then you have to compute the validation error rate periodically during training and stop training when the validation error rate starts to go up. However, validation error is not a good estimate of the generalization error, if your initial set consists of a relatively small number of instances. Our initial set, we named it TS1, consists only of 101 instances (animal species). In this case 10% or 20%, of the original training set, consisted of the 10 or 20 instances. This is the insufficient number of instances to perform validation. In this case instead validation we will use a generalization as a form of regularization.

In the following examples we will check the generalization error, such as from the example to the example we will increase the number of instances in the training set, which we use for training, and we will decrease the number of instances in the sets that we used for testing.

Training attempt 8

In this case we'll create a new training set that will cover 10% sample, a mine that does not have to be located in previously trained dataset.That part of data you can get from this link:TS1.

First we train the network to 90%, and then test the new test of 10%. On this basis, we will see how fast you learn, if the test data that are not in the training set.We will train on another network that we created earlier, and in doing so we will use the parameters: learning rate 0.21, momentum 0.7 and the maximum error 0.01Of course, previously it was necessary to isolate the total dataset of 10 instances we will place at TS1.

The test results are:

Total Net Error graph look like this:

Based on the error which is small, we can conclude that our network learns very quickly even on data that were not previously trained.

Here we have error 0.002 which is ok, and The individual errors are also negligible. Also we can conclude that a decrease in the number of hidden neurons, mainly reduces toltalna error.

Dynamic Backpropagation

These are the results of a Dynamic Backpropagation algoritam used on the best example in our experiment.

Training Results:
For this training, we used Sigmoid transfer function.

Total Net Error graph look like this:

Practical Testing:

Graph

 In this graphic, we see the relationship between
 learning rate and number of iterations.
 And we notice that the number of iterations,
 generally increases with decreasing learnig rate

Conclusion

During this experiment, we created three different architectures, one basic training set and six training sets derived from the basic training set. We normalize the original data set using a linear scaling method. Through six basic steps we explained in detail the creation, training and testing neural networks. If the network architecture using a small number of hidden neurons training will become excessively and the network may over fit no matter what are the values of training parameters. We point out major differences between the Perceptron and MultiLayerPerceptron, as network types. Through the various tests we have demonstrated the sensitivity of neural networks to high and low values of learning parameters. We have shown that the best solution to the problem of classification of animal species, in seven different groups, is architecture with one hidden layer and six hidden neurons. Finally, we explain the importance of generalization and we pointed to the importance of validation as an important form of regularization. In the table below can been seen the overall results of this experiment. Best solution is indicated in green color.

DOWNLOAD

See also:
Multi Layer Perceptron Tutorial