Balance Scale Classification using neural networks

An example of a multivariate data type classification problem using Neuroph framework

Introduction

Classification is one of the most frequently encountered decision making tasks of human activity. A classification problem occurs when an object needs to be assigned into a predefined group or class based on a number of observed attributes related to that object. Because of this the aim of cluster analysis is to classify the objects into clusters, especially in such a way that two objects of the same cluster are more similar than the objects of other clusters. The objects can be of various characteristics. It is possible to cluster animals, plants, text documents, economic data etc. In addition to classification by neural networks, there are other statistical methods dealing with the problem of classification, such as discriminant analysis. One major limitation of the statistical models is that they work well only when the underlying assumptions are satisfied. On the other hand advantage of neural networks lies in the following aspects. First, they can adjust themselves to the data without any explicit specification of functional or distributional form for the underling model, because they are data driven self-adaptive methods. Second, neural networks are nonlinear models, which makes them flexible in modeling real world complex relationships. Finally, neural networks can approximate any function with arbitrary accuracy..

Introduction to the problem

The purpose of this test is to determine classes for italian wine derived from the same area.

Data list contains 13 different attributes used for comparison, and 3 different classes of output.
The attributes are:
1) Alcohol
2) Malic acid
3) Ash
4) Alcalinity of ash
5) Magnesium
6) Total phenols
7) Flavanoids
8) Nonflavanoid phenols
9) Proanthocyanins
10)Color intensity
11)Hue
12)OD280/OD315 of diluted wines
13)Proline
1) Alcohol
2) Malic acid
3) Ash
4) Alcalinity of ash
5) Magnesium
6) Total phenols
7) Flavanoids
8) Nonflavanoid phenols
9) Proanthocyanins
10)Color intensity
11)Hue
12)OD280/OD315 of diluted wines
13)Proline

Data set contains 178 instances.Each instance has one of 3 possible classes: 1,2 and 3.

Procedure of training a neural network

In order to train a neural network, there are six steps to be made:

Normalize the data

Create a Neuroph project

Creating a Training Set

Create a neural network

Train the network

Test the network to make sure that it is trained properly

In this experiment we will demonstrate the use of some standard and advanced training techniques. Several architectures will be tried out, based on which we will be able to determine what brings us the best results for our problem.

Step 1. Data Normalization

Any neural network must be trained before it can be considered intelligent and ready to use. Neural networks are trained using training sets, and now a training set will be created to help us with the wine classification problem. As mentioned above, we first need to normalize the data.

where B is the standardized value, A the given value, and D and C determine the range in which we want our value to be. In this case, D= 0 and C=1.

Step 2. Creating a new Neuroph project

To create new neuroph project do the following:

Click File > New Project.

In a new window we define project name and location. After that we click 'Finish' and a new project is created and will appear in projects window, on the left side of Neuroph Studio.

Step 3. Creating a Training Set

To create training set, in main menu we choose Training > New Training Set to open training set wizard. Then we enter name of training set and number of inputs and outputs. In this case it will be 4 inputs and 3 outputs and we will set type of training to be supervised as the most common way of neural network training.

As supervised training proceeds, the neural network is taken through a number of iterations, until the output of the neural network matches the anticipated output, with a reasonably small rate of the error.

After clicking 'Next' we need to insert data into training set table. All data could be inserted manually, but we have a large number of data instances and it would be a lot more easier to load all data directly from some file. We click on 'Choose File' and select file in which we saved our normalized data set. Values in that file are separated by tab.

Then, we click 'Load' and all data will be loaded into table. We can see that this table has 7 columns, first 4 of them represents inputs, and last 3 of them represents outputs from our data set.

After clicking 'Finish' new training set will appear in our project.
To be able to decide which is the best solution for our problem we will create several neural networks, with different sets of parameters, and most of them will be based on this training set.

Training attempt 1

Step 4.1 Creating a neural network

Now we need to create neural network. In this experiment we will analyze several architecture. Each neural network which we create will be type of Multi Layer Perceptron and each will differ from one another according to parameters of Multi Layer Perceptron.

Why Multi Layer Perceptron?

This is perhaps the most popular network architecture in use today: the units each perform a biased weighted sum of their inputs and pass this activation level through a transfer function to produce their output, and the units are arranged in a layered feedforward topology. The network thus has a simple interpretation as a form of input-output model, with the weights and thresholds (biases) the free parameters of the model. Such networks can model functions of almost arbitrary complexity, with the number of layers, and the number of units in each layer, determining the function complexity.

In the next window we will set multy layer perceptron's parameters. The number of input and output neurons are the same as in the training set. And now we have to choose number of hidden layers, and number of neurons in each layer.

Problems that require more than one hidden layers are rarely encountered. For many practical problems, there is no reason to use any more than one hidden layer. One layer can approximate any function that contains a continuous mapping from one finite space to another. Deciding the number of hidden neuron layers is only a small part of the problem. We must also determine how many neurons will be in each of these hidden layers. Both the number of hidden layers and the number of neurons in each of these hidden layers must be carefully considered.

Using too few neurons in the hidden layers will result in something called underfitting. Underfitting occurs when there are too few neurons in the hidden layers to adequately detect the signals in a complicated data set.

Using too many neurons in the hidden layers can result in several problems. First, too many neurons in the hidden layers may result in overfitting. Overfitting occurs when the neural network has so much information processing capacity that the limited amount of information contained in the training set is not enough to train all of the neurons in the hidden layers. A second problem can occur even when the training data is sufficient. An inordinately large number of neurons in the hidden layers can increase the time it takes to train the network. The amount of training time can increase to the point that it is impossible to adequately train the neural network.

Obviously, some compromise must be reached between too many and too few neurons in the hidden layers.

We’ve decided to have 1 layer and 3 neuron in this first training attempt. Than we check 'Use Bias Neurons' option and choose 'Sigmond'. For learning rule we choose 'Backpropagation with Momentum'. The momentum is added to speed up the process of learning and to improve the efficiency of the algorithm.

Bias neuron is very important, and the error-back propagation neural network without Bias neuron for hidden layer does not learn. The Bias weights control shapes, orientation and steepness of all types of Sigmoidal functions through data mapping space. A bias input always has the value of 1. Without a bias, if all inputs are 0, the only output ever possible will be a zero.

Next, we click 'Finish' and the first neural network is created. In the picture below we can see the graph view of this neural network.

Figure shows the input, the output and hidden neurons and how they are connected with each other. Except for two neurons with activation level 1 (bias activation), all other neurons have an activation level 0. These two neurons represent bias neurons, as we explained above.

Step 5.1 Train the neural network

After we have created training set and neural network we can train neural network. First, we select training set, click 'Train', and then we have to set learning parameters for training.

Learning rate is a control parameter of training algorithms, which controls the step size when weights are iteratively adjusted.

To help avoid settling into a local minimum, a momentum rate allows the network to potentially skip through local minima. A momentum rate set at the maximum of 1.0 may result in training which is highly unstable and thus may not achieve even a local minima, or the network may take an inordinate amount of training time. If set at a low of 0.0, momentum is not considered and the network is more likely to settle into a local minimum.

When the Total Net Error value drops below the max error, the training is complete. If the error is smaller we get a better approximation.

In this first case a maximum error will be 0.01, learning rate 0.2 and momentum 0.7.

Then we click on the 'Next' button and the training process starts.

After certain number of iterations, Total Net Error drop down to a specified level which means that training process was successful and that now we can test this neural network.

Step 6.1 Test the neural network

We test neural network by clicking on the 'Test' button, and then we can see testing results. In the results we can see that the Total Mean Square Error is 0.0205746. That is a great result for a first attempt, so we will continue experimenting with training parameters, and try to find the optimal solution.

Training attempt 2

Step 5.2 Train the neural network

In our second attempt we will only change some learning parameters and then will see what happens. Learning rate will be 0.1, and the momentum will be set to 0.5.

After training is complete, we can test the network to see if we can get the optimal solution with this set of parameters.

Step 6.2 Test the neural network

Now we want to see testing results.

In this attempt the total mean square error is 0,006158, which is a great result. We should try some other architecture in order to find the optimal network architecture with the smallest number of hidden neurons possible.

Training attempt 3

Step 4.3 Creating a neural network

In this attempt we will try to get results by decreasing the size of hidden neurons. It is known that number of hidden neurons is crucial for network training success, and now we will try with 1 hidden neuron.

First we have to create a new neural network. All the parameters are the same as they were in the first training attempt, we will just change the number of hidden neurons.

Step 5.3 Train the neural network

In our first training of this second neural network architecture we will try with the following learning parameters.

After training is complete, we can test the newly built architecture.

Step 6.3 Test the neural network

Now, we will test this neural network and see testing results for this neural network architecture.

In this case the total mean square error is much higher than in the first attempt.. We will now try to change the training parameters too see if we can get a good result with this architecture.

Training attempt 4

Step 5.4 Training a neural network

This solution will try to give us a better results than the previous one.

This time the total mean square error is smaller than in the previous attempt, but it s not even close to the optimal solutin. There is no use in further experimenting with this architecture.

Training attempt 5

Step 4.6 Creating a neural network

We create a new neural network with 2 hidden neurons on one layer, to see if we can get the right balance between the number of hidden neurons and the requiered error.

On the image below we have the architecture of the newly created network.

Step 5.6 Train the neural network

In this neural network architecture we have 2 hidden neurons, We think that it should be enough for network to reach the maximum error of 0.01. Learning rate will be 0.2 and momentum 0.7. In this case we will limit the max error to 0.01. Then we will try to train network, and see what happens.

Network was successfully trained and it was reached a desired error - under 0.01! In the total network error graph we can see that the error decreases continuously throughout the whole training cycle.

Step 6.6 Test the neural network

We are very interested to see the testing results for this type of neural network architecture. In the training process, the total network error was below 0.01, and that could also indicate a better testing results!

After we have finish testing we can see that the total mean square error in this case is 0.0037453183520599247. We can see that the all the errors in this region are 0, and we have now found the optimal architecture of neural network.

Now we need to examine all the individual errors for every single instance and check if there are any extreme values. When you have a large data set, individual testing requires a lot of time. Instead of testing 178 observations we will randomly choose 5 observations which will be subjected to individual testing. Three following table will show the value of input, output and errors in 5 randomly selected observations. These values are taken from the window Test Results.

Table 1. Values of inputs

Observation

Input value

0.4184

0.6344

0.1925

0.4639

0.4783

0.3724

0.5042

0.5094

0.5552

0.7406

0.5447

0.3919

0.674

0.4684

0.8202

0.3636

0.6186

0.6957

0.4931

0.5591

0.6981

0.6751

0.7466

0.4797

0.5458

0.4101

0.7026

0.8281

0.492

0.3711

0.7826

0.7241

0.7152

0.434

0.6372

0.9002

0.3089

0.6374

0.8452

108

0.5553

0.8004

0.508

0.3866

0.8478

0.8621

0.7004

0.3396

0.6151

0.8276

0.6748

0.5788

0.8502

156

0.4368

0.1206

0.4866

0.4124

0.75

0.7379

0.9388

0.0943

0.6404

0.4352

0.9024

0.9231

0.6812

Table 2. Values of outputs

Observation	Output value
5	0.9369	0.0318	0.015
38	0.9602	0.0219	0.0148
87	0.0056	0.9776	0.0088
108	0.0015	0.9353	0.0407
156	0.0106	0.0002	0.9763

Table 3. Individual errors

Observation

Output value

-0.0631

0.0318

0.015

-0.0398

0.0219

0.0148

0.0056

-0.0224

0.0088

108

0.0015

-0.0647

0.0407

156

0.0106

0.0002

-0.0237

Training attempt 6

Step 4.10 Creating a neural network

In this attempt, we will create a different type of neural network. We want to see what will happens if we create neural network with two hidden layers.

First we create a new neural network, type will be Multy Layer Perceptron as it was in the previous attempts.

Now we have to set network parameters. We will set 1 neuron on the first layer, and 1 on the second. Learning rule will be Backpropagation with Momentum.

New neural network has been created, and in the image below is shown the structure of this network.

Step 5.10 Train the neural network

Now we will try to train this neural network. First, we have to set training parameters. We will limit the maximum error to 0.01 because we think that this number of neurons in two hidden layers should be enough for network to reach that error. Next we click on 'Train', and the training process starts.

Initially, the error was increased, but afterwards it started to variate. We let the process continue, but after 455 itterations the proces completetes.

After testing it, we got pretty large mean square error of 0,3690.

We can only conclude that more than one hidden layer is not necessary for our problem, and that we can get better results using just one layer.

Advanced training techniques

Neural networks represent a class of systems that do not fit into the current paradigms of software development and certification. Instead of being programmed, a learning algorithm “teaches” a neural network using a set of data. Often, because of the non-deterministic result of the adaptation, the neural network is considered a “black box” and its response may not be predictable. Testing the neural network with similar data as that used in the training set is one of the few methods used to verify that the network has adequately learned the input domain.

In most instances, such traditional testing techniques prove adequate for the acceptance of a neural network system. However, in more complex, safety- and mission-critical systems, the standard neural network training-testing approach is not able to provide a reliable method for their certification.

One of the major advantages of neural networks is their ability to generalize. This means that a trained network could classify data from the same class as the learning data that it has never seen before. In real world applications developers normally have only a small part of all possible patterns for the generation of a neural network. To reach the best generalization, the data set should be split into three parts: validation, training and testing set.

The validation set contains a smaller percentage of instances from the initial data set, and is used to determine whether the selected network architecture is good enough. If validation was successful, only then we can do the training. The training set is applied to the neural network for learning and adaptation. The testing set is then used to determine the performance of the neural network by computation of an error metric.

This validating-training-testing approach is the first, and often the only, option system developers consider for the assessment of a neural network. The assessment is accomplished by the repeated application of neural network training data, followed by an application of neural network testing data to determine whether the neural network is acceptable.

Training attempt 7

Step 3.11 Creating a Training Set

The idea of this attempt is to use only a part of the data set when training a network, and then test the network with inputs from the other, unused part of the data set. That way we can determine whether the neural network has the power of generalization.

In the initial training set we have 178 instances. In this attempt we will create a new training set that contains only 20% of initial data set instances and we will pick those instances randomly. First, we have to create a new file that would contains new data set instances. A new data set would have 36 instances (12 of 1, 14 of 2, 10 of 3). Then, in Neuroph studio we create a new training set, with the same parameters that we used in the first one, and load data from a new data set.

We will also create a training set that contains the rest 80% of instances that we should use for network testing later in this attempt. This training set will contains 142 instances (47 of 1, 57 of 2, 38 of 3)..

Step 5.11 Train the neural network

Unlike previous attempts, now we will train some neural network which is already created, but in this case it would be trained with a new created training set which contains 20% instances of the initial training set. For this training we will use neural network which has 2 hidden neurons. Learning rate will be 0.2 and momentum 0.7 in this case. We click on 'Train' button and wait for training process to finish.

Step 6.11 Test the neural network

After successfully training, we can now test neural network. First, will test network with training set that contains only 20% of the initial training set instances. We got that in this case the total error is 0.0, which is so far the best result that we got.

But, the idea was to test neural network with the other 80% of data set that wasn't used for training this neural network. So now, we will try to do that kind of test. This time, for testing, we will use training set that contains the remaining 80% instances that weren't used for training.

When training process has completed, we can see that the total error is 0.004694835680751174, which is great result considering the fact that we have tested the network with data that was not used during the training.

Now, we will analyze individual errors by selecting some random inputs to see whether the network is in all cases well predicted the output. We will random choose 5 observations which will be subjected to individual testing. Those observations and their testing results are in the following table:

Input: 0.1868; 0.8538; 0.4866; 0.6804; 0.7283; 0.5793; 0.5591; 0.7547; 0.6341; 0.6826; 0.439; 0.4322; 0.2853; Output: 1; 0; 0; Error: 0; 0; 0;
Input: 0.3763; 0.3735; 0.4011; 0.3608; 0.6522; 0.7172; 0.9135; 0.434; 0.6845; 0.4863; 0.8211; 0.8938; 0.6633; Output: 0; 0; 1; Error: 0; 0; 0;
Input: 1; 0.8478; 0.5508; 0.4381; 0.837; 0.4897; 0.6139; 0.2642; 0.4953; 0.9471; 0; 0.4139; 0.908; Output: 0; 1; 0; Error: 0; 0; 0;
Input: 0.4368; 0.6344; 0.4599; 0.5155; 0.4565; 0.769; 0.9283; 0.2453; 0.6688; 0.3157; 0.9024; 0.8718; 0.5991; Output: 0; 0; 1; Error: 0; 0; 0;
Input: 0.1211; 0.7609; 0.3904; 0.6804; 0.5326; 0.0103; 0.3354; 0.7925; 0.4416; 0.4437; 0.6911; 0.2015; 0.1427; Output: 1; 0; 0; Error: 0; 0; 0;

As we can see in the table, in all 5 times network correctly guessed the output.

Since this network is very adjustable to different conditions, i will experiment, train and test it with different architectures and parameters of training. These results can give us data about network's architecture and it's behaviour under different circumctances. New attempts will not be described in detail, i will just give you entire table with summary of all of the architectures, training parameters and errors that corespond to them. The best solution is the one which has green-coloured background,.

Training attempt	Number of hidden neurons	Number of hidden layers	Training set	Maximum error	Learning rate	Momentum	Total mean square error	Number of correct guesses	Network trained
1	1	1	full	0,01	0,2	0.7	0.199591	/	yes
2	1	1	full	0,01	0,1	0.5	0.170353	/	yes
3	2	1	full	0,01	0,2	0.7	0.003745	/	yes
4	2	1	full	0,01	0,1	0.5	0.003745	/	yes
5	3	1	full	0,01	0,2	0.7	0.020574	/	yes
6	3	1	full	0,01	0,1	0.5	0.006158	/	yes
7	6	1	full	0,01	0,2	0.7	0.021021	/	yes
8	6	1	full	0,01	0,1	0.5	0.003849	/	yes
9	6	1	full	0,01	0,5	0.5	0.007453	/	yes
10	8	1	full	0,01	0,2	0.7	0.021229	/	yes
11	8	1	full	0,01	0,1	0.5	0.008439	/	yes
12	8	1	full	0,01	0,5	0.5	0.010804	/	yes
13	10	1	full	0,01	0,2	0.7	0.003769	/	yes
14	10	1	full	0,01	0,1	0.5	0.005287	/	yes
15	10	1	full	0,01	0,5	0.5	0.014981	/	yes
16	1 1	2	full	0,01	0,2	0.7	0.369047	/	yes
17	1 1	2	full	0,01	0,1	0.5	0.251890	/	yes
18	2	1	80%	0,01	0,2	0.7	0.004695	5/5	yes

Conclusion

During this experiment, i created 6 different architectures, one basic training set and two training sets derived from the basic training set. Through basic steps i have explained in detail the creation, training and testing of multi layer perceptron neural networks.This network uses a small number of neurons, which are enough to do the bussines in every task i have put it on. The error is the smallest with two hidden neurons. Universal opinion is that the error lowers with the increased number of hidden neurons, but it is not the rule, which has been shown while testing this network. With the increase of hidden neurons the error gets higher, and it has been shown that this network gives the best results with the momentum of 0,5. One layer of hidden neurons clearly does the deal here.

In the end, you can try out every Neural Network that I've built.

Download