STOCK MARKET PREDICTION USING NEURAL NETWORKS
An example for time-series prediction
by Dr. Valentin Steinhauer
Short description
Time series prediction plays a big role in economics. The stock
market courses, as well as the consumption of energy can be predicted
to be able to make decisions. This tutorial shows one possible approach how neural networks can be used for this kind of prediction. It extends the Neuroph tutorial called
"Time Series Prediction", that gives a good theoretical base
for prediction. To show how it works, we trained the network
with the DAX (German stock index) data – for a month (03.2009: from
02th to 30) - to predict the value at 31.03.2009. As a strategy
we take the sequences from 4 days to predict each 5th day. In
the training set 5th day is the supervised value. The data DAX can be downloaded from the following url (one of the possibilities):
http://download.finance.yahoo.com/d/quotes.csv?s=^GDAXI&f=sl1d1t1c1ohgv&e=.cs
TrainingSet
Generator (StockFileReader, StockSocketReader and TrainingData)
is available for download as a part of NetBeans project, however it is not integrated in the
main program to
simplify the source code. Test dataset used:
double[ ][ ]
days = {{2,3,2009,3710.07}, {3,3,2009,3690.72}, {4,3,2009,3890.94}, {5,3,2009,3695.49}, {6,3,2009,3666.41}, {9,3,2009,3692.03}, {10,3,2009,3886.98}, {11,3,2009,3914.1}, {12,3,2009,3956.22}, {13,3,2009,3953.6}, {16,3,2009,4044.54}, {17,3,2009,3987.77}, {18,3,2009,3996.32}, {19,3,2009,4043.46}, {20,3,2009,4068.74},
{23,3,2009,4176.37}, {24,3,2009,4187.36}, {25,3,2009,4223.29}, {26,3,2009,4259.37}, {27,3,2009,4203.55}, {30,3,2009,3989.23}, {31,3,2009,4084.76}};
Each of the first 3 values in
every record shows the date for DAX level. The last value in the
records is DAX level. The next
is the normalization of the training data in area (0-1). The following
formula offers it in two steps:
- To find the max value of DAX : maxDax = max(days [k], k =0, days.length-1))
- To calculate
normalized values:
daxnorm [i] = (days [i] [3] / maxDax)*0.8+0.1,
where 0.8 and 0.1 will be used to avoid the very small
(0.0...) and very big (0.9999) values. We have carried out a
simplification, have simply divided on 10000.
Next, the
network topology is defined:
what type of network, how many layers and how many neurons per
layer are used. Actually, there is no rule for this, and usually it is determined experimentaly. However the common type of network used for prediction is
a multi layer perceptron. A recommendation is to have 2n+1 nodes
for hidden-layer, where n is the number of the input nodes. The
output layer has only one node in this case. The good results were obtained with
the following topology and parameter set: maxIteration=10000, learningRate=0.7,maxerror=0.0001. and the training set is organized as follows:
TrainingSet<SupervisedTrainingElement> trainingSet = new TrainingSet<SupervisedTrainingElement>(4, 1);
double[] in = new
double[4];
double[] out = new
double[1];
for (int i = 0; i
< daxnorm.length - 5; i++) {
for (int j = i; j
< i + 4; j++) {
in[j - i] = daxnorm[j];
}
out[0]
= daxnorm[i + 4];
trainingSet.addElement(new
SupervisedTrainingElement( in, out ));
} |
|
3710,07 3690,72 3890,94 3695,49 3666,41
3690,72 3890,94 3695,49 3666,41 3692,03
3890,94 3695,49 3666,41 3692,03 3886,98
3695,49 3666,41 3692,03 3886,98 3914,10
3666,41 3692,03 3886,98 3914,10 3956,22
3692,03 3886,98 3914,10 3956,22 3953,60
3886,98 3914,10 3956,22 3953,60 4044,54
3914,10 3956,22 3953,60 4044,54 3987,77
3956,22 3953,60 4044,54 3987,77 3996,32
3953,60 4044,54 3987,77 3996,32 4043,46
4044,54 3987,77 3996,32 4043,46 4068,74
3987,77 3996,32 4043,46 4068,74 4176,37
3996,32 4043,46 4068,74 4176,37 4187,36
4043,46 4068,74 4176,37 4187,36 4223,29
4068,74 4176,37 4187,36 4223,29 4259,37
4176,37 4187,36 4223,29 4259,37 4203,55
4187,36 4223,29 4259,37 4203,55 3989,23
|
At this point, we are ready to train and test the network. For testing we'll use prepared data set in which the DAX data are given from the 27,28,29 and 30.03.09
to predict the value at 31.03.09.
neuralNet.learn(trainingSet);
TrainingSet<TrainingElement> testSet = new TrainingSet<TrainingElement>();
testSet.addElement(new TrainingElement(new double[]{4223.0D / 10000.0D,
4259.0D / 10000.0D, 4203.0D / 10000.0D, 3989.0D / 10000.0D}));
for (TrainingElement testElement : testSet.trainingElements())
{
neuralNet.setInput(testElement.getInput());
neuralNet.calculate();
double[] networkOutput = neuralNet.getOutput();
} |
Since the network is initialised with random weight values,
the test results will differ from a calculation to calculation. After five
tests it came out with the following prediction - results for 03.31.2009:
4084.61; 4081.28; 4073.08; 4075.22; 4087.42.
That is so called a committee - a collection of
different neural networks, that together present the example. It
gives a much better result compared to other neural networks
procedures. The value which was official announced on that day
is 4084.76. We are far from the
usable result, although the calculations may look good with Neuroph
allready. Good results were also obtained with Neuroph package in several other marketing predictions
The next step in direction of obtaining better quantative results is changing the sequencie of
calculations, which we carried out in previous example. We can use concurrent calculations to create the committee. The
committee tends not only to a stability but it also allows an
effective relative control of training conditions. Relative scattering of the
results from committee is the figure of merit in this case. To create the concurrency we used the jetlang package. The
next table was produced with 10 "members" of the committee.
Topology |
Max Error |
Learning Rate |
Scattering % |
Predicted value |
Max Iterations |
4,2,1 |
0,0001 |
0,6 |
0,04 |
4029 |
10000 |
4,3,1 |
0,0001 |
0,6 |
0,06 |
4041 |
10000 |
4,4,1 |
0,0001 |
0,6 |
0,08 |
4047 |
10000 |
4,9,1 |
0,0001 |
0,6 |
0,15 |
4084 |
10000 |
4,15,1 |
0,0001 |
0,6 |
0,09 |
4123 |
10000 |
4,31,1 |
0,0001 |
0,6 |
0,03 |
4145 |
10000 |
The value with 0,15% is to be interpreted in such a way, as
a maximum sensitivity of the network (multilayer perceptron) to
given training set. The network topology is 4 input neurons, 9
hidden neurons and 1 output neuron. In that case the number of the iterations was not be enough to achieve the accuracy.
One step ahead prediction and "river waves"
The previous part is a very simplified introduction. We accepted that we will
predict only "one step ahead ..." and we are concerned about predicting more precisely and minimizing the risk. To limit the task we need a theoretical
stock market model. In this case the model is defined as following:
every point in the ocean waves is a forecast destination in
function of other times and values.
This dynamic picture was simulated with CUDA. . It shows only a fragment at the moment of this dynamics.
The periodic signals prevail in this model. In
this ocean a point is the value to predict. But which periodic
signal is major? Where is the underfitting or overfitting of perceptron?
Is it possible to automaticaly predict in this model? We'll show an algorithm with autocorrection which demonstrates some basic development ideas. For simplification purporse, we'll take
only a direction in the ocean: "river waves".
-
We have more than one sequence of calculations ( see "committee")
-
The main circle gives the variation of number
of points (N) in the window of time prediction. A head period
is determined by N in every variation from "river waves".
The number of hidden layers is 2*N+1 automatic. As figure of
merit will be used the R = ∑ | |Fobs|
- |Fcalc| | / ∑ |Fobs| ,
so called R-factor through full time series data.
-
For each N will be variated the training sets:
the elements will be removed consecutively to achiev
the minimum of R-factor and to achiev the optimal reletionship
betweeen the underfitting and overfitting.
-
The middle value through committee is the
result of this simple automatic flow.
How good is this model and the simple "river waves"
prediction algorithm for your tasks, you should decide yourself. The
simple DAX tests showed good results. The package is named "Stock
Market RiverWaves" and the fourier analyser is included for control. The package does not have committee feature, because the
committee is allready submitted to StockMarketCommittee.
DOWNLOADS
Note: These download packages are for versions of Neuroph older than 2.6, however differencies are minor
1. Neuroph
framework with easyNeurons application
2. NetBeans project for Stock Market Prediction example
3. NetBeans project for Stock Market Committee example
4. NetBeans project for Stock Market RiverWaves example
See also:
Time Series Prediction Tutorial
Chicken Prices Prediction Tutorial
Multi Layer Perceptron Tutorial |