Stock Market Predicition with Feed-Forward Neural Networks

Time series prediction plays a big role in economics. The stock market courses, as well as the consumption of energy can be predicted to be able to make decisions. This tutorial shows one possible approach how neural networks can be used for this kind of prediction. It extends the Neuroph tutorial called "Time Series Prediction", that gives a good theoretical base for prediction. To show how it works, we trained the network with the DAX (German stock index) data – for a month (03.2009: from 02th to 30) - to predict the value at 31.03.2009. As a strategy we take the sequences from 4 days to predict each 5th day. In the training set 5th day is the supervised value. The data DAX can be downloaded from the following url (one of the possibilities): http://download.finance.yahoo.com/d/quotes.csv?s=^GDAXI&f=sl1d1t1c1ohgv&e=.cs

TrainingSet Generator (StockFileReader, StockSocketReader and TrainingData) is available for download as a part of NetBeans project, however it is not integrated in the main program to simplify the source code. Test dataset used:

double[ ][ ] days = {{2,3,2009,3710.07}, {3,3,2009,3690.72}, {4,3,2009,3890.94}, {5,3,2009,3695.49}, {6,3,2009,3666.41}, {9,3,2009,3692.03}, {10,3,2009,3886.98}, {11,3,2009,3914.1}, {12,3,2009,3956.22}, {13,3,2009,3953.6}, {16,3,2009,4044.54}, {17,3,2009,3987.77}, {18,3,2009,3996.32}, {19,3,2009,4043.46}, {20,3,2009,4068.74}, {23,3,2009,4176.37}, {24,3,2009,4187.36}, {25,3,2009,4223.29}, {26,3,2009,4259.37}, {27,3,2009,4203.55}, {30,3,2009,3989.23}, {31,3,2009,4084.76}};

Each of the first 3 values in every record shows the date for DAX level. The last value in the records is DAX level. The next is the normalization of the training data in area (0-1). The following formula offers it in two steps:

Next, the network topology is defined: what type of network, how many layers and how many neurons per layer are used. Actually, there is no rule for this, and usually it is determined experimentaly. However the common type of network used for prediction is a multi layer perceptron. A recommendation is to have 2n+1 nodes for hidden-layer, where n is the number of the input nodes. The output layer has only one node in this case. The good results were obtained with the following topology and parameter set: maxIteration=10000, learningRate=0.7,maxerror=0.0001. and the training set is organized as follows:

TrainingSet<SupervisedTrainingElement> trainingSet = new TrainingSet<SupervisedTrainingElement>(4, 1);
double[] in = new double[4];
double[] out = new double[1];
for (int i = 0; i < daxnorm.length - 5; i++) {
for (int j = i; j < i + 4; j++) {
in[j - i] = daxnorm[j]; }
out[0] = daxnorm[i + 4];
trainingSet.addElement(new SupervisedTrainingElement( in, out ));
}

3710,07 3690,72 3890,94 3695,49 3666,41
3690,72 3890,94 3695,49 3666,41 3692,03
3890,94 3695,49 3666,41 3692,03 3886,98
3695,49 3666,41 3692,03 3886,98 3914,10
3666,41 3692,03 3886,98 3914,10 3956,22
3692,03 3886,98 3914,10 3956,22 3953,60
3886,98 3914,10 3956,22 3953,60 4044,54
3914,10 3956,22 3953,60 4044,54 3987,77
3956,22 3953,60 4044,54 3987,77 3996,32
3953,60 4044,54 3987,77 3996,32 4043,46
4044,54 3987,77 3996,32 4043,46 4068,74
3987,77 3996,32 4043,46 4068,74 4176,37
3996,32 4043,46 4068,74 4176,37 4187,36
4043,46 4068,74 4176,37 4187,36 4223,29
4068,74 4176,37 4187,36 4223,29 4259,37
4176,37 4187,36 4223,29 4259,37 4203,55
4187,36 4223,29 4259,37 4203,55 3989,23

At this point, we are ready to train and test the network. For testing we'll use prepared data set in which the DAX data are given from the 27,28,29 and 30.03.09 to predict the value at 31.03.09.

Since the network is initialised with random weight values, the test results will differ from a calculation to calculation. After five tests it came out with the following prediction - results for 03.31.2009: 4084.61; 4081.28; 4073.08; 4075.22; 4087.42.
That is so called a committee - a collection of different neural networks, that together present the example. It gives a much better result compared to other neural networks procedures. The value which was official announced on that day is 4084.76. We are far from the usable result, although the calculations may look good with Neuroph allready. Good results were also obtained with Neuroph package in several other marketing predictions

The next step in direction of obtaining better quantative results is changing the sequencie of calculations, which we carried out in previous example. We can use concurrent calculations to create the committee. The committee tends not only to a stability but it also allows an effective relative control of training conditions. Relative scattering of the results from committee is the figure of merit in this case. To create the concurrency we used the jetlang package. The next table was produced with 10 "members" of the committee.

The value with 0,15% is to be interpreted in such a way, as a maximum sensitivity of the network (multilayer perceptron) to given training set. The network topology is 4 input neurons, 9 hidden neurons and 1 output neuron. In that case the number of the iterations was not be enough to achieve the accuracy.

The previous part is a very simplified introduction. We accepted that we will predict only "one step ahead ..." and we are concerned about predicting more precisely and minimizing the risk. To limit the task we need a theoretical stock market model. In this case the model is defined as following: every point in the ocean waves is a forecast destination in function of other times and values.

This dynamic picture was simulated with CUDA. . It shows only a fragment at the moment of this dynamics.

The periodic signals prevail in this model. In this ocean a point is the value to predict. But which periodic signal is major? Where is the underfitting or overfitting of perceptron? Is it possible to automaticaly predict in this model? We'll show an algorithm with autocorrection which demonstrates some basic development ideas. For simplification purporse, we'll take only a direction in the ocean: "river waves".

How good is this model and the simple "river waves" prediction algorithm for your tasks, you should decide yourself. The simple DAX tests showed good results. The package is named "Stock Market RiverWaves" and the fourier analyser is included for control. The package does not have committee feature, because the committee is allready submitted to StockMarketCommittee.

Note: These download packages are for versions of Neuroph older than 2.6, however differencies are minor