Basically, I want to use a sliding window of historic prices of several companies or funds and

predict the price of a single company. In the end, I achieve decent results in a 10 day window and

predicting 5 days into the future. However, my error is still several euros large :)

The code can be found here: [:Github:]

### Learning Problem, Features and Modeling

First we define the return of interest (ROI) as:

$roi(t, t + 1) = \frac{x_t - x_{t + 1}}{x_t} $

Which is the percentage of earnings at a time $t + 1$ measured to a previous investment

at $t$. In order to extract the features, we compute sliding windows over our historic prices.

For each sample in a window, we compute the ROI to the start of the window, which represents

the earnings to the start of the window. For a window of $T$ steps and $n$ stocks, we flatten the sliding windows and get a feature vector of size $T x n$ of ROI entries. Our target variable we want

to predict is a stock price $d$ days after the last day of the window. The traget is converted

to the ROI, too. So we try to predict the earnings after $d$ days after the last day of the window

which represents a potential investment.

### Some Experimental Results

First we downloaded several historic price datasets from yahoo finance and load them using pandas.
We take the date column as the index and interpolate missing values:

We learn the following classifier on our data.

The neural network's architecture (named model above):

Below we show some prediction results for our classifiers.

We also not the root mean square error in euros.

data = [ ('euroStoxx50', pd.read_csv('data/stoxx50e.csv', index_col=0, na_values='null').interpolate('linear')), ('dax', pd.read_csv('data/EL4A.F.csv', index_col=0, na_values='null').interpolate('linear')), ('us', pd.read_csv('data/EL4Z.F.csv', index_col=0, na_values='null').interpolate('linear')), ('xing', pd.read_csv('data/O1BC.F.csv', index_col=0, na_values='null').interpolate('linear')), ('google', pd.read_csv('data/GOOGL.csv', index_col=0, na_values='null').interpolate('linear')), ('facebook', pd.read_csv('data/FB2A.DE.csv', index_col=0, na_values='null').interpolate('linear')), ('amazon', pd.read_csv('data/AMZN.csv', index_col=0, na_values='null').interpolate('linear')) ]

We learn the following classifier on our data.

predictors = [ ('RF', RandomForestRegressor(n_estimators=250)), ('GP', GaussianProcessRegressor(kernel=RBF(length_scale=2.5))), ('NN', KNeighborsRegressor(n_neighbors=80)), ('NE', KerasPredictor(model, 10, 512, False)), ('GB', GradientBoostingRegressor(n_estimators=250)) ]

The neural network's architecture (named model above):

hidden = [256, 128, 64, 32] inp = (len(data.stoxx) * WIN,) model = Sequential() model.add(Dense(hidden[0], activation='relu', input_shape=inp)) for h in hidden[1:]: model.add(Dense(h, activation='relu')) model.add(Dense(1, activation='linear')) model.compile('adam', 'mse')

Below we show some prediction results for our classifiers.

We also not the root mean square error in euros.

## No comments:

## Post a Comment