Predicting relative performance of computer processors with neural networks
By Ivan Jovanovic, Faculty of Organization Sciences, University of Belgrade
An experiment for Intelligent Systems course
Introduction
In this experiment it will be shown how neural networks and Neuroph Studio are used when it comes to problems of approximation and prediction. This experiment will also show the difference between this approach and linear regression approach used for predicting the relative CPU performance
Predicting is making claims about something that will happen, often based on information from past and from current state. In technical domain predictable parameters of a system can be often expressed and evaluated using equations - prediction is then simply evaluation or solution of such equations. However, practically we face problems where such a description would be too complicated or not possible at all. It is possible to use various approximations, for example regression of the dependency of the predicted variable on other events that is then extrapolated to the future. Finding such approximation can be also difficult. This approach generally means creating the model of the predicted event.
Neural networks can be used for prediction with various levels of success. The advantage of then includes automatic learning of dependencies only from measured data without any need to add further information (such as type of dependency like with the regression), and that is the reason this approach will be used here
Introduction to the problem
The objective is to train the neural network to predict relative performance of a CPU using some characteristics that are used as input, and subsequently comparing that result with existing performance that is published and relative performance that is estimated using linear regression method.
The data set that will be used in this experiment can be found here. The name of the data set is Computer Hardware (1987-10-01) and the data was obtained from Tel Aviv University from Phillip Ein-Dor and Jacob Feldmesser
The data set contains 209 instances with the total of 9 attributes. The first two attributes are vendor name and model name of the processor, and the rest of the attributes are as follows:
- machine cycle time in nanoseconds (integer)
- minimum main memory in kilobytes (integer)
- maximum main memory in kilobytes (integer)
- Home team attack rating
- cache memory in kilobytes (integer)
- minimum channels in units (integer)
- maximum channels in units (integer)
- published relative performance (integer)
- relative performance from the original article (integer)
For this experiment the data that can be downloaded from the above site cannot be used in its original form because of the difference in units and values between attributes. For instance machine cycle time ranges between 17 and 1500 nanoseconds but the minimum channels ranges only between 0 and 52 units. That is the reason the data must first be normalized.
Prodecure of training a neural network
In order to train a neural network, there are six steps to be made:
1. Normalize the data
2. Create a Neuroph project
3. Create a training set
4. Create a neural network
5. Train the network
6. Test the network to make sure that it is trained properly
Step 1. Normalizing the data
From the ten attributes only seven will be normalized. The first two have no significant interest in this experiment and are also not an integer values and the last attribute (estimated relative performance) will be compared with our results which will be later converted using the same formula but in the other direction.
The following formula is used when normalizing the data:
B = (A - min(A)) / (max(A) - min(A)) * ( D - C ) + C
Where B is the standardized value, and D and C determines the range in which we want our value to be. In this case, D = 1 and C = 0. There exist many different normalizing formulae but this is quite sufficient.
For fast normalization a program was written in c# which stores normalized values in a new .txt file.
2. Create a Neuroph project
In order to create a new project open Neuroph Studio and Click File -> New Project.
Then in the Categories list choose Neuroph and from Project list choose Neuroph Project and then click Next.
Enter your project name and location and click Finish. Our project will be called ComputerHardware.
Neuroph studio automatically creates for you three folders in your project. Those are:
- Neural Networks – that is the folder where we will create all our networks
- Training Sets – This folder will contain our data set
- Test Sets – folder that contains test sets. It will not be used in this project
3. Create a training set
Next we will add a training set to our project. A training set is a list of data that represents inputs and outputs that will be used in neural network in order to train it. Training simply means finding weights that correspond to neurons in our artificial networks so that the error between expected and known results is minimal or below certain threshold.
This operation is done by right clicking training sets folder and choosing New -> Training Set.
In the opened window we give our training set a name ComputerHardwareTrainingSet1 and select its type. There exist two types of training sets: Supervised and Unsupervised. In supervised learning, the network user assembles a set of training data. The training data contains examples of inputs together with the corresponding outputs, and the network learns to infer the relationship between the two. In other words, supervised learning is used for classification. For an unsupervised learning rule, the training set consists of input training patterns only. Unsupervised learning, on other hand, is used for clustering. We will use supervised learning.
We will have 6 inputs and 1 output corresponding to our normalized data file.
After clicking Next we will insert our data.
There are two ways to do that. The first is manually and the second that we will use is by loading from file. Click Load From File and choose a file that we stored our normalized data in. As a value separator we choose a comma since that is the character that separates our data.
After clicking Finish a training set is created. The following three steps will be repeated multiple times in order to find out which neural network suits best our needs.
Training attempt 1
Step 4.1. Create a Neural Network
The first step we must complete is creating new neural network. We do this by right clicking on a Neural Network folder and choosing New -> Neural Network.
A window should appear where we must choose a type of a neural network to use. For the rest of this project we will use Multi Layer Perceptron. The reason why we use this type of neural network is because Multi Layer Perceptron is an example of an artificial neural network that is used extensively for the solution of a number of different problems, including pattern recognition, interpolation and classification.
It is a development of the Perceptron neural network model, that was originally developed in the early 1960s and today almost synonymous with the term Neural Network.
After giving our network a name and clicking Next we set its parameters. We should always have 6 input and 1 output neurons that correspond to our dataset, but we will change the number of Hidden neurons. We will also choose sigmoid function for our transfer function because our data is between 0 and 1, same as the function. The transfer function or activation function converts the sum of inputs and weight products to a scalar and this type of transfer function is most widely used.
As a learning rule we will use Backpropagation with Momentum. The term Backpropagation is an abbreviation for “backward propagation of errors”. That means that in this network data is sent from the input to the output layer through hidden layers but if any error occurs during training they get sent backwards in order to teach a neural network to distinguish correct from the incorrect behavior. This type of networks is mostly used and is considered a “classic” amongst all other types of neural networks. When momentum is used current direction of weights movements is influenced by previous changes in the weights. Put simply, with momentum, once the weights start moving in a particular direction in weight space, they tend to continue moving in that direction. This type of network is much more efficient than without momentum.
As a number of hidden neurons in our first attempt we will use two and then work upwards by adding neurons and layers. Our goal is to try to achieve minimal error with minimal number of neurons as quickly as we can. More than 2 layers of neurons is rarely used and will not be observed in this research. We will also check to use bias neuron which is a neuron that lies in one layer and connects to all neurons in the next layer but none in previous. It also always emits output of 1 which allows us to shift the activation function to the left or right if necessary.
After clicking Finish our network is created.
Our bias neurons can be seen colored in red with activation level of 1. In the beginning all other activation levels are set to 0 as a default.
Step 5.1 Train the network
First thing we need to do is connect our training set to our network by simply clicking at the training set name whilst our neural network is selected. To train it we need to click on the Train button. A small window should appear which we will use to set learning parameters such as Learning Rate and Momentum.
The learning rate for the neural network controls the rate at which changes are made to the weights store in the synapses. It is a constant that will be used by the learning algorithm. The learning rate must be a positive number less than 1. Typically the learning rate is a number bigger than 0.5. Generally setting the learning rate to a larger value will cause the training to progress faster. Though setting the learning rate to too large a number could cause the network to never converge. Another technique is to start with a relatively high learning rate and decrease this rate as training progresses. This allows initial rapid training of the neural network that will be "fine tuned" as training progresses. That is the reason why we will set this attribute to be 0.9 in the beginning.
The purpose of the momentum is to accelerate the convergence of the error back-propagation learning algorithm. The method involves supplementing the current weight adjustments with a fraction of the most recent weight adjustment. Typically, momentum is chosen between 0.1 and 0.8. In our first training attempt we will use default value of 0.2.
We can also choose between two types of stopping criteria. We can limit the number of iterations which we want to complete or we can stop when the error is close to the max error number. We will choose latter and set our max error to 0.01. Our ultimate goal is to decrease total mean square error under 0.01. Also checkbox that displays error graph should be checked in order to see better what our network does through time.
After clicking train we see that our network was trained in 10 iterations with total mean square error of approximately 0.01797.
We can see our error by clicking test and reading errors for each input individually and at the end total net error.
Training attempt 2
Step 5.2. Train the network
In our next few attempts we will first change momentum and then learning rate in order to try and find better results. Before we retrain the network we must randomize our weights by clicking Randomize button. The results after training are shown in table.
Training attempt |
Hidden Neurons |
Learning Rate |
Momentum |
Max Error |
Number of iterations |
Total Net Errors |
1. |
2 |
0.2 |
0.9 |
0.01 |
10 |
0.01797 |
2. |
2 |
0.2 |
0.7 |
0.01 |
31 |
0.01885 |
3. |
2 |
0.2 |
0.8 |
0.01 |
35 |
0.02173 |
4. |
2 |
0.4 |
0.9 |
0.01 |
9 |
0.01522 |
5. |
2 |
0.1 |
0.9 |
0.01 |
56 |
0.01655 |
As can be seen from the table our best network was our first. We conclude that decreasing Momentum from 0.9 makes total net error bigger but decreasing learning rate makes it smaller but with considerably larger number of iterations and only to a certain point.
These observations lead us in another direction – we should increase the number of neurons in our hidden layer. With this approach we should get the desired error that is less than 0.01 but it should be done carefully because we do not want more neurons than we need.
Training attempt 6
Step 4.6. Create a Neural Network
We need to create a new neural network in order to test our theory. This process is completed in the same way as in step 4.1. but we will set 4 as the number of hidden neurons.
Step 5.6. Train the network
After clicking train we will chose 0.9 as our momentum and 0.1 as our learning rate. After 22 iterations our error is 0.01499 which is the best result so far but still not under 0.01
Training attempt 7
Step 5.7. Train the network
Now we will do the same thing we did in our previous network. We will try and change momentum to 0.8 and the 0.7 and see where it leads us. In all of our attempts the network converges in under 40 iterations. Our first attempt with momentum 0.9 and learning rate 0.1 is displayed below together with two more attempts.
The summary of all observations can be seen below
Training attempt |
Hidden Neurons |
Learning Rate |
Momentum |
Max Error |
Number of iterations |
Total Net Errors |
6. |
4 |
0.1 |
0.9 |
0.01 |
22 |
0.01499 |
7. |
4 |
0.1 |
0.8 |
0.01 |
30 |
0.01696 |
8. |
4 |
0.1 |
0.7 |
0.01 |
37 |
0.02109 |
As it can be seen the smallest error is still below our wanted level and changing the momentum didn’t help. Notice how we didn’t change the learning rate. That is something we will do in our next attempt.
Training attempt 9
Step 5.9. Train the network
Now we will try to change Learning rate but maintain momentum at 0.9 which is our best result so far. With Learning rate of 0.005 we get total net error of 0.01969 in 37 iterations which clearly means we need more neurons in our hidden layer.
Training attempt 10
Step 4.10. Create a Neural Network
We will choose 8 neurons in hidden layer as our final architecture. The other parameters should stay the same.
Step 5.10. Train the network
Once again we start with 0.2 learning rate and 0.9 momentum and after 30 iterations we get an error of 0.005472 which is exactly what we want.
The data that we got in previous two cases are displayed in the next table:
Training attempt |
Hidden Neurons |
Learning Rate |
Momentum |
Max Error |
Number of iterations |
Total Net Errors |
9. |
4 |
0.005 |
0.8 |
0.01 |
37 |
0.01969 |
10. |
8 |
0.2 |
0.9 |
0.01 |
30 |
0.00547 |
Step 6.10. Test the network
Now that we have acceptable solution we will try and test our neural network and see how it stacks up to linear regression model that we have in our dataset. We will choose 5 random inputs data from our data set and calculate their average error and compare it to the same 5 linear regression data. The results table is below:
Network output |
Expected value |
Linear regression estimate |
Network error |
Regression error |
0.012417508 |
0.02972028 |
0.007358953 |
0.017302772 |
0.022361326 |
0.027087257 |
0.022727273 |
0.010629599 |
0.004359985 |
0.012097673 |
0.069840154 |
0.11013986 |
0.05478332 |
0.040299706 |
0.05535654 |
0.005748127 |
0.004370629 |
0.001635323 |
0.001377497 |
0.002735306 |
0.00538984 |
0.01048951 |
0.004088307 |
0.005099671 |
0.006401203 |
Average error |
0.013687926 |
0.01979041 |
As it can be seen from above, in all 5 of the random chosen inputs our neural network achieved better result than linear regression method and can be used as a estimation of the output, which in this case is relative performance of a CPU.
Advanced learning techniques
The real power of neural networks lies in predicting outputs for inputs that it wasn’t trained for. In our case this means that we can use novel data that describes computer processors and predict what is a relative performance of those CPU-s with high statistical certainty. This data can be later used for example by CPU vendor to set prices of those processors and compare with other vendors.
In order to test how our network behaves with novel data, we will divide our data set in two parts – training set and test set. Usually when dividing dataset like this there must be enough data to use in order to train the network but not to much. When the network is over trained any minor deviation from expected data can be considered as a bigger error that it is. That is the reason why those networks are not good in prediction which is exactly what we don’t need.
In this research we will try two different proportions, 70/30 and 80/20, that is for example in first case, about 70% of data will be our training set and the rest we will use to test our network. The data will be chosen randomly because if we chose similar data the network would not be able to perform very well with the data that differs significantly from our original.
Training attempt 11
Step 3.11. Create a training set
First thing we need to do is to create a new training set. We once again do this by right clicking Training Sets -> New -> Training Set. We call it ComputerHardware70 and choose other parameters same as we did in all previous cases. Now we need to load a file called ND70.txt which is a file that contains 70% of original data set in the same manner as we did in the beginning.
Step 5.10. Train the network
After selecting our last neural network with 8 neurons and changing its training set to our new training set we can try and retrain our network. After just 9 iterations we trained it with an error of 0.00591 which is little worse than with our whole dataset. But that doesn’t mean anything. In order to test it we need to create new test set.
Step 6.11. Test the network
Now we need to add another training set to our training set folder in order to test how it behaves. The file we must use is ND30.txt which represents the rest (30%) of our data set. Now instead of clicking train we need to click Test. The results are displayed below:
As we can see, our error is 0.02037 which is not a good result. But when we look at individual results we will see that we have only few extreme errors, mainly in the first few test cases. They is an error of 0.55, 0.33, 0.23 which are isolated cases as can be seen from our picture. That implies that we maybe need to add those cases to our training set instead of our test set.
That means that in our last attempt we will use 80% of our dataset as our training set and the rest as test set. The results are displayed below:
Training attempt |
Hidden Neurons |
Learning Rate |
Momentum |
Max Error |
Number of iterations |
Total Net Errors (training) |
Total Net Errors (test) |
11. |
8 |
0.2 |
0.9 |
0.01 |
9 |
0.00591 |
0.02037 |
12. |
8 |
0.2 |
0.9 |
0.01 |
13 |
0.01382 |
0.02566 |
The results are not promising, we didn’t succeed to train this network with novel data. This must be because there were little data to begin with our that our network needs more neurons.
Conclusion
We created three different neural network architectures in this experiment. We concluded that with recognizing data using neural network the best way to go is to use 8 neurons in a hidden layer. The results that we got worked better in all of our cases where compared to linear regression method that is located in our dataset, and should be used to estimate relative performance of CPU-s. . The total results of our research can be seen in a tables below.
Table 1. Standard training techniques
Training attempt |
Hidden Neurons |
Learning Rate |
Momentum |
Max Error |
Number of iterations |
Total Net Errors |
5 random inputs test - number of correct guesses |
Network Trained |
1. |
2 |
0.2 |
0.9 |
0.01 |
10 |
0.01797 |
/ |
yes |
2. |
2 |
0.2 |
0.7 |
0.01 |
31 |
0.01885 | / | yes |
3. |
2 |
0.2 |
0.8 |
0.01 |
35 |
0.02173 | / | yes |
4. |
2 |
0.4 |
0.9 |
0.01 |
9 |
0.01522 | / | yes |
5. |
2 |
0.1 |
0.9 |
0.01 |
56 |
0.01655 | / | yes |
6. |
4 |
0.1 |
0.9 |
0.01 |
22 |
0.01499 |
/ | yes |
7. |
4 |
0.1 |
0.8 |
0.01 |
30 |
0.01696 |
/ | yes |
8. |
4 |
0.1 |
0.7 |
0.01 |
37 |
0.02109 |
/ | yes |
9. |
4 |
0.005 |
0.8 |
0.01 |
37 |
0.01969 |
/ | yes |
10. |
8 |
0.2 |
0.9 |
0.01 |
30 |
0.00547 |
5/5 | yes |
Table 2. Advanced training techniques
Training attempt |
Hidden Neurons |
Number of hidden layers |
Training set |
Test set |
Learning Rate |
Momentum |
Max Error |
Number of iterations |
Network trained |
5 random inputs test |
Total Net Errors (training) |
Total Net Errors (test) |
11. |
8 |
1 |
70% |
30% |
0.2 |
0.9 |
0.01 |
9 |
yes |
1/5 |
0.00591 |
0.02037 |
12. |
8 |
1 |
80% |
20% |
0.2 |
0.9 |
0.01 |
13 |
yes |
1/5 |
0.01382 |
0.02566 |
Below you can download and try these network and maybe try to enhance the precision of results.
DOWNLOAD
See also:
Multi Layer Perceptron Tutorial
|