Forum | Blog | Wiki

TEACHING ASSISTANT EVALUATION WITH NEURAL NETWORKS

An example of a multivariate data type classification problem using Neuroph

By Ivan Petković, Faculty of Organization Sciences, University of Belgrade

an experiment for Intelligent Systems course

Introduction

In this experminet we will be testing Neuroph Studio and show how neural networks and Neuroph are used when it comes to problems of classification. But what is it? Neuroph Studio is a Java neural network development environment built on top of the NetBeans Platform and Neuroph Framework. It is an IDE-like environment customized for neural network development. Neuroph Studio is a GUI that sits on top of Neuroph Framework. Neuroph Framework is a full-featured Java framework that provides classes for building neural networks. It also has support for image recognition, text character recognition, and handwritten letter recognition.

Neural networks are machine learning technology suitable for ill-defined problems, such as recognition, prediction, classification, and control. Advantage of neural networks lies in the following aspects. First, they can adjust themselves to the data without any explicit specification of functional or distributional form for the underling model, because they are data driven self-adaptive methods. Second, neural networks are nonlinear models, which makes them flexible in modeling real world complex relationships. Finally, neural networks can approximate any function with arbitrary accuracy. In this article several architectures will be tried out, and it will be determined which ones represent a good solution to the problem, and which ones do not.

Introducing the problem

The object is to train the neural network with data which can be found here. The data consist of evaluations of teaching performance over three regular semesters and two summer semesters of 151 teaching assistant (TA) assignments at the Statistics Department of the University of Wisconsin-Madison. The scores were divided into 3 roughly equal-sized categories ("low", "medium", and "high") to form the class variable.

First here are some useful information about our Teaching Assistant data set:

Data Set Characteristics: Multivariate
Number of Instances: 151
Attribute Characteristics: Categorical, Integer
Number of Attributes: 5
Associated Tasks: Classification

Attribute Information:

1. Whether of not the TA is a native English speaker (binary); 1=English speaker, 2=non-English speaker
2. Course instructor (categorical, 25 categories)
3. Course (categorical, 26 categories)
4. Summer or regular semester (binary) 1=Summer, 2=Regular
5. Class size (numerical)
6. Class attribute (categorical) 1=Low, 2=Medium, 3=High

For this experiment to work we had to transform our data set in binary format (0, 1). We replaced some of attribute values with suitable binary combination. The rest of data set can not be inserted in Neuroph in its original form. For it to be able to help us with this classification problem, we need to prepare the data first. The type of neural network that will be used in this experiment is multi layer perceptron with backpropagation.

Prodecure of training a neural network

In order to train a neural network, there are six steps to be made:

1. Prepare the data

2. Create a Neuroph project

3. Create a training set

4. Create a neural network

5. Train the network

6. Test the network to make sure that it is trained properly

Step 1. Prepare the data

Some of the input attributes (Class size-numerical) are have numerical values that can be very distant from each other. To prevent that we will normalize data set using Max-Min normalization formula.

Where:

X - value that should be normalized
Xn - normalized value
Xmin - minimum value of X
Xmax - maximum value of X

Step 2. Creating a new Neuroph project

We create a new project in Neuroph Studio by clicking File > New Project, then we choose Neuroph project and click 'Next' button.

In a new window we define project name and location. After that we click 'Finish' and a new project is created and will appear in projects window, on the left side of Neuroph.

Step 3. Create a Training Set

To create training set, in main menu we choose Training > New Training Set to open training set wizard. Then we enter name of training set and number of inputs and outputs. In this case it will be 56 inputs and 3 outputs and we will set type of training to be supervised as the most common way of neural network training.

After that we insert data into training set table. Because we have a large number of data instances and it would be a lot more easier to load all data directly from some file. We click on 'Choose File' and select file in which we saved our normalized data set. Values in that file are separated by comma.

We click 'Load' and all data will be loaded into table. We can not see data from table because we have 56 inputs and 3 outputs, which is much columns, but we can see from the picture that one of the values in a field is 1.0.

After completing this, everything is ready for the creation of neural networks. We will create several neural networks, all with different sets of parameters, and determine which is the best solution for our problem by testing them. This is the reason why there will be several options for steps 4, 5 and 6.

Training attempt 1

Step 4.1 Create a Neural Network

Now, we create a neural network by clicking right clik on folder "Neural Network" and then New -> Neural Network. Each neural network which we create will be type of Multi Layer Perception.

A Multilayer Perceptron is a feedforward artificial neural network model that maps sets of input data onto a set of appropriate output. An MLP consists of multiple layers of nodes in a directed graph, with each layer fully connected to the next one. Except for the input nodes, each node is a neuron (or processing element) with a nonlinear activation function. MLP utilizes a supervised learning technique called backpropagation for training the network. MLP is a modification of the standard linear perceptron, which can distinguish data that is not linearly separable.

The multilayer perceptron consists of three or more layers (an input and an output layer with one or more hidden layers) of nonlinearly-activating nodes.

Input Layer - A vector of predictor variable values (x1...xp) is presented to the input layer. The input layer (or processing before the input layer) standardizes these values so that the range of each variable is -1 to 1. The input layer distributes the values to each of the neurons in the hidden layer. In addition to the predictor variables, there is a constant input of 1.0, called the bias that is fed to each of the hidden layers; the bias is multiplied by a weight and added to the sum going into the neuron.
Hidden Layer - Arriving at a neuron in the hidden layer, the value from each input neuron is multiplied by a weight (wji), and the resulting weighted values are added together producing a combined value uj. The weighted sum (uj) is fed into a transfer function, ?, which outputs a value hj. The outputs from the hidden layer are distributed to the output layer.
Output Layer - Arriving at a neuron in the output layer, the value from each hidden layer neuron is multiplied by a weight (wkj), and the resulting weighted values are added together producing a combined value vj. The weighted sum (vj) is fed into a transfer function, ?, which outputs a value yk. The y values are the outputs of the network.

In next dialog enter number of neurons. The number of input and output neurons are the same as in the training set. so you need to enter 56 as number of input neurons and 3 as number of output neurons.

Here is a key issue to consider the number of hidden neurons, find an optimum number. Too many hidden neurons result in complex data, while a few have resulted in long-term training networks and a large number of iterations. We assume that the number of hidden neurons-2, when we creating the first network.

"Use Bias Neuron"-we check this option because bias neurons are added to neural networks to help them learn patterns. One is the most common bias activation.

Than we choose "Sigmond" for transfer function ( In our data set, values are in the interval between 0 and 1). For learning rule we choose "Backpropagation with Momentum". The momentum is added to speed up the process of learning and to improve the efficiency of the algorithm.

Then, we click 'Finish' and the first neural network is created. Now, we can see the graph view of this neural network, and circles in the first layer are inputs, because there are a lot of.

Step 5.1 Train the network

First, we select training set, click 'Train', and then we have to set learning parameters for training.

Max error - when the Total Net Error value drops below the max error, the training is complete. If the error is smaller we get a better approximation.. Limit is in the range 0.01-0.0.5
Learning rate- is a constant in the algorithm of a neural network that affects the speed of learning. It will apply a smaller or larger proportion of the current adjustment to the previous weight. The higher the rate is set, the faster the network will learn, but if there is large variability in the input the network will not learn very well if at all. Limit is in the range 0.01-0.9
Momentum- Momentum rate allows the network to potentially skip through local minima. Limit is in the range 0.01-0.9

Network Type: Multi Layer Perceptron
Training Algorithm: Backpropagation with Momentum
Number of inputs: 56
Number of outputs: 3 ("low", "medium", and "high")
Hidden neurons: 2

Now, click 'Train' button and see what happens.

Training Results:
For this training, we used Sigmoid transfer function. We can see in pictures below that training was unsuccessfull. After 34312 iterations Neural Network failed to learn problem with error less than 0,01.

Total Net Error graph look like this:

Step 6.1. Testing the Neural Network

We click "Test", in order to see the total error, and the individual error of last result.

Table 1. The final part of testing this network is testing it with several input values. To do that, we will select 4 random input values from our data set. Those are:

	Inputs					Class			Testing results
Number	English speaker	Course instructor	Course	Semester	Class size	Low	Medium	High	Output
1.	0,1	0,0,..1(23),0..(25)	0,0,..1(3),0..(26)	1,0	0.7302	1	0	0	0.9582	0.0439	0,0222
2.	1,0	0,0,..1(3),0..(25)	0,0,..1(2),0..(26)	1,0	0,3651	0	0	1	0.0002	0.6119	0.4819
3.	1,0	0,0,..1(14),0..(25)	0,0,..1(15),0..(26)	1,0	0.5238	1	0	0	0.0014	0.1667	0.7071
4.	0,1	0,0,..1(13),0..(26)	0,0,..1(3),0..(26)	0,1	0.1587	0	0	1	0.9582	0.0439	0.0222

In the last six columns we see outputs defined inside the matrix and the outputs after a training program to see deviations.

Training attempt 2

Step 5.2. Train the network

We will try again with other parameters of the training.

Training Parameters:
Learning Rate: 0.3
Momentum: 0.7
Max. Error: 0.01

Training Results:
For this training, we used Sigmoid transfer function.

After training Network with these parameters we got better results.

Total Net Error graph look like this:

Step 6.2. Testing the Neural Network

In the table below for the next two sessions we will present the results of other trainings for the first architecture. For other trainings is not given graphic.

Table 2. Training results for the first architecture

Training attempt	Hidden Neurons	Learning Rate	Momentum	Max Error	Number of iterations	Total Net Errors
1.	2	0.2	0.7	0.01	34312	0.060154
2.	2	0.3	0.7	0.01	9660	0.017362
3.	2	0.6	0.5	0.01	32972	0.077339
4.	2	0.8	0.8	0.01	14150	0.104862

In the following solution we will increase the number of hidden neurons, and we will try to achieve better results.

Training attempt 5

Step 4.5 Create a Neural Network

Next Neural Network will have same number of input and output neurons but different number of neurons in hidden layer. We will use 6 hidden layer neurons.

Step 5.5. Train the network

First training course, of second architecture, we will start with extremely low values of learning rate and momentum. In field 'set Learning parameters', enter 0.05 for 'Learning rate' and 0.1 for 'Momentum'.

Training Parameters:
Learning Rate: 0.05
Momentum: 0.1
Max. Error: 0.01

Training Results:
For this training, we used Sigmoid transfer function.

As you can see, the neural network took 4622 iterations to train. Total Net Error is still higher than set value.

From the graphics can be seen from iteration to iteration there are no large shifts in the prediction and fluctuations are very small. Reason for such a small fluctuation is that the learning rate is very close to zero. On the other hand small value of momentum slows down the training of the system.

Training attempt 6

Step 5.6. Train the network

Training Parameters:
Learning Rate: 0.3
Momentum: 0.6
Max. Error: 0.01

Training Results:
For this training, we used Sigmoid transfer function.

As you can see, the neural network took 4774 iterations to train. Total Net Error is still higher than set value.

Total Net Error graph look like this:

The objective error function heavily oscillates and the network reaches a state where no useful training takes place.

Training attempt 7

Now we will try something completely unusual. For values of the training will take the upper limit, so we will be learning rate 0.9, momentum 0, 9 and the max error 0.05

Step 5.7. Train the network

Training Parameters:
Learning Rate: 0.9
Momentum: 0.9
Max. Error: 0.05

Training Results:
For this training, we used Sigmoid transfer function.

As you can see, the neural network took 2174 iterations to train. Total Net Error is still higher than set max value of error(0.05).

Total Net Error graph look like this:

In picture we see distinction between small values and large values of learning parameters. We set the momentum parameter too high and we have created a risk of overshooting the minimum, which caused the system to become unstable. On the other hand, the learning rate is very large, the weights diverge and the objective error function heavily oscillates.

Table 3. Training results for the architecture with 6 hidden neurons

Training attempt	Hidden Neurons	Learning Rate	Momentum	Max Error	Number of iterations	Total Net Errors
5.	6	0.05	0.1	0.01	4622	0.039424
6.	6	0.3	0.6	0.01	4774	0.034662
7.	6	0.9	0.9	0.05	2174	0.080210

After several tries with different architecture and parameters we got results that are given in table 3. There is interesting pattern in data. If we look number of hidden neurons and total net eror we can see that higher number of neurons leads us to lesser total net error.

Table 4. Training results for other architectures

Training attempt	Hidden Neurons	Learning Rate	Momentum	Max Error	Number of iterations	Total Net Errors
8.	10	0.01	0.6	0.01	7626	0.220573
9.	10	0.7	0.7	0.01	2251	0.178550
10.	14	0.2	0.7	0.01	2399	0.047535
11.	14	0.3	0.7	0.01	2160	0.074713
12.	18	0.2	0.7	0.01	2743	0.009246
13.	18	0.3	0.6	0.01	2702	0.011334

Training attempt 14

Step 4.14. Create a Neural Network

One of the "rules" for determining the correct number of neurons to use in the hidden layers is that the number of hidden neurons should be between the size of the input layer and the size of the output layer. Formula that we used looks like this: ((number of inputs number of outputs)/2)+1. So,in the next example we are going to see how will the network react with a greater number of hidden neurons. This neural network will contain 30 neurons in hidden layer, as we see in picture below.

Step 5.14. Train the network

Training Parameters:
Learning Rate: 0.5
Momentum: 0.7
Max. Error: 0.01

Training Results:
For this training, we used Sigmoid transfer function.

As you can see, the neural network took 5218 iterations to train. Total Net Error is acceptable 0.004704

Total Net Error graph look like this:

The total net error slowly descends but with high oscilation and finally stops when reaches a level lower than a given (0.01) in 5218 iteration

6.14. Test the network

Total Mean Square Error measures the average of the squares of the "errors". An mean square error of zero, meaning that the estimator predicts observations of the parameter with perfect accuracy, is the ideal, but is practically never possible. The test showed that total mean square is 0.010975562511674352. The goal of experimental design is to construct experiments in such a way that when the observations are analyzed, the mean square error is close to zero relative to the magnitude of at least one of the estimated treatment effects.

We also need to examine all the individual errors to make sure that testing was completely successful. We have a large data set so individual testing can require a lot of time. But at the first sight it is obvious that in this case the individual errors are also much smaller than in previous attempts. There are very few extreme cases. For the first time, we will random choose 5 observations which will be subjected to individual testing.

Table 5. Observations and their testing results are in the following table:

	Inputs					Class			Testing results
Number	English speaker	Course instructor	Course	Semester	Class size	Low	Medium	High	Output			Error
1.	1,0	0,0,..1(6),0..(25)	0,0,..1(17),0..(26)	1,0	0.5714	1	0	0	0.9992	0.0087	0	-0.0008	0.0087	0
2.	1,0	0,0,..1(15),0..(25)	0,0,..1(3),0..(26)	0,1	0,2222	1	0	0	1	0	0	0	0	0
3.	1,0	0,0,..1(14),0..(25)	0,0,..1(15),0..(26)	1,0	0.5238	1	0	0	0.507	0.4929	0	-0.493	0.4929	0
4.	1,0	0,0,..1(7),0..(26)	0,0,..1(11),0..(26)	1,0	0.1587	0	1	0	0	0.9999	0	0	-0.0001	0
5.	0,1	0,0,..1(11),0..(26)	0,0,..1(11),0..(26)	1,0	0.5079	0	1	0	0	1	0	0	0	0

Advanced training techniques

One of the major advantages of neural networks is their ability to generalize. This means that a trained network could classify data from the same class as the learning data that it has never seen before. In real world applications developers normally have only a small part of all possible patterns for the generation of a neural network. To reach the best generalization, the data set should be split into three parts: validation, training and testing set.

The validation set contains a smaller percentage of instances from the initial data set, and is used to determine whether the selected network architecture is good enough. If validation was successful, only then we can do the training. The training set is applied to the neural network for learning and adaptation. The testing set is then used to determine the performance of the neural network by computation of an error metric.

This chapter will show another technique for training a neural network that involves validation and generalization. So far we were training the network with the 90% of our data and tested it with the rest (10%). Now we will be trying out different combinations such as : 80%:20% and 70%:30% and see if we can find any noticable differences beetween them.

Training attempt 15

Step 3.15. Create a Training

We will choose random 70% of instances of training set for training and remaining 30% for testing.

Step 5.15. Train the network

Unlike previous training, now there is no need to create new neural network. Advanced Training Techniques consist in the fact that we examine the performance of existing architectures, using a new training and test set of data.

Training Parameters:
Learning Rate: 0.2
Momentum: 0.7
Max. Error: 0.01

Training Results:
For this training, we used Sigmoid transfer function.

After the trained network, we can see that the error of approximately acceptable values.

Total Net Error graph look like this:

Interestingly, with the graphic we can see that there are no oscillation error and that it is constant during the training network.

Step 6.15. Test the network

After successful training the neural network, we can test the same to discover wheter the results will be as good as the previous testing. Unlike previous practice, where we have to train and test neural networks using the same training set, now we will use the second training set, to test network in which there are data that a neural network has not been seen.

Training attempt 16

Step 3.16. Create a Training

We will choose random 80% of instances of training set for training and remaining 20% for testing.

Step 5.16. Train the network

As in the previous training, there is no need to create new neural network. We will now try to network trained with 80% randomly selected data from the full set and see the achieved results.

Training Parameters:
Learning Rate: 0.4
Momentum: 0.6
Max. Error: 0.01

Training Results:
For this training, we used Sigmoid transfer function.

From the picture above we can see that after 4210 iterations the value of total net error is 0.009289, so we can conclude that the result of training is successful.

Total Net Error graph look like this:

As we can see in the image above, network was successfully trained. It took 4210 iterations for training process to finish.

Step 6.16. Test the network

After successfully training, we can now test neural network. We will test network with training set that contains only 20% of the initial training set instances.

With the previous picture we can see that the total mean square error has a value of 0.0118, so we can say that the value was acceptable. If you look in the right section of picture, we can see that the individual error is acceptable and that the result of the test is successful. After this training, we can conclude that there is no need to further train the network and the results were almost the same training, so we will briefly set out the fact and make a final conclusion.

Conclusion

Six different solutions tested in this experiment have shown that the choice of the number of hidden neurons is very important for the effectiveness of a neural network, as can be seen from the graphic below. As it turned, in our experiment was better to use more neurons. We have tried by using 2, 6, 10, 14 and 18 hidden neurons, but we've got the best results by using 30 neurons. We have concluded that one layer of hidden neurons is enough in this case. Also, the experiment showed that the success of a neural network is very sensitive to parameters chosen in the training process. The learning rate must not be too high, and the maximum error must not be too low.

Below is a table that summarizes this experiment. The best solution for the problem is marked in the table.

Table 6. Training techniques

Training attempt	Number of hidden neurons	Number of hidden layers	Training set	Maximum error	Learning rate	Momentum	Number of iterations	Total mean square error	Test set
1	2	1	90% of full data set	0.01	0.2	0.7	34312	0.060154	10% of full data set
2	2	1	90% of full data set	0.01	0.3	0.7	9660	0.017362	10% of full data set
3	2	1	90% of full data set	0.01	0.6	0.5	32972	0.077339	10% of full data set
4	2	1	90% of full data set	0.01	0.8	0.8	14150	0.104862	10% of full data set
5	6	1	90% of full data set	0.01	0.05	0.1	4622	0.039424	10% of full data set
6	6	1	90% of full data set	0.01	0.3	0.6	4774	0.034662	10% of full data set
7	6	1	90% of full data set	0.05	0.9	0.9	2174	0.080210	10% of full data set
8	10	1	90% of full data set	0.01	0.01	0.6	7626	0.220573	10% of full data set
9	10	1	90% of full data set	0.01	0.7	0.7	2251	0.178550	10% of full data set
10	14	1	90% of full data set	0.01	0.2	0.7	2399	0.047535	10% of full data set
11	14	1	90% of full data set	0.01	0.3	0.7	2160	0.074713	10% of full data set
12	18	1	90% of full data set	0.01	0.2	0.7	2743	0.009246	10% of full data set
13	18	1	90% of full data set	0.01	0.3	0.6	2702	0.011334	10% of full data set
14	30	1	90% of full data set	0.01	0.5	0.7	5218	0.004704	10% of full data set
15	18	1	70% of full data set	0.01	0.2	0.7	5712	0.012648	30% of full data set
16	18	1	80% of full data set	0.01	0.3	0.6	2702	0.011334	20% of full data set

DOWNLOAD

Data set used in this tutorial

Normalized data set

Neuroph project

See also:
Multi Layer Perceptron Tutorial