probabilistic kernel model. Basically I am following the "Machine Learning: A probabilistic Perspective" book by Kevin Murphy.
The Basics
For an input x_i and output y_i = f(x_i), infer a distribution over functions of the data:p(f | X, y). In order to make predictions y* for a new data point x* given all the previous observations, we define:
p(y*|x*, X, y) = \int p(y*|f, x*) p(f|x, y) df
For our data X = x_1 ... x_n the Gaussian process assumption is that the the distributions over functions p(f_1 ... f_N) is jointly Gaussian with mean \mu(x) and a covariance function \Sigma_{ij} = k(x_i, x_j) and a kernel k.
Covariance structure
It is quite common to leaf the mean function at zero. Given a kernel, we need to compute the covariance function between all training and test instances. However, we structure the kernel matrix in the following way.
The sub-matrix K describes the affinity between all training instances and the sub-matrix K* describes the affinity between training and test instances. Finally, the submatrix K** describes the affinity between test instances to itself. The image above shows the final covariance structure. With that, our function f* can be inferred as: (f, f*) ~ N(0, \Sigma).
In order to obtain the prediction calculate: f* = K_* \bullet K^{-1} \bullet f.
If we want the find the variance of the predictions we calculate: trace(K_{**} - K_* \bullet K^{-1} \bullet K_*^T)
In my opinion the method is very similar to support vector machines and to instance beased learning methods in general. The difference is that a SVM is only using a small set of training examples called support vectors, while the Gaussian process compares every example to all others in order to make a prediction.
A Naive Gaussian Process
Below I implemented a naive Gaussian process using numpy. Basically, I am building the three components of the covariance structure and then solve the prediction equations above. I call my method naive since computing the inverse has a worst case complexity of O(n^3)So it is not advisable to use this method for larger datasets.
class GaussianProcess: def __init__(self, kernel): self.kernel = kernel def predict(self, train_x, train_y, test_x): (n, _) = train_x.shape (m, _) = test_x.shape covar_train = np.zeros((n, n)) covar_train_test = np.zeros((n, m)) covar_test = np.zeros((m, m)) for i in range(0, n): for j in range(0, n): covar_train[i, j] = self.kernel.k(train_x[i], train_x[j], i == j) for i in range(0, m): for j in range(0, m): covar_test[i, j] = self.kernel.k(test_x[i], test_x[j], i == j) for i in range(0, n): for j in range(0, m): covar_train_test[i, j] = self.kernel.k(train_x[i], test_x[j], False) covar_inv = np.linalg.inv(covar_train) prediction = np.dot(np.dot(covar_train_test.T, covar_inv), train_y) confidence = covar_test - np.dot(covar_train_test.T, np.dot(covar_inv, covar_train_test)) confidence = np.sqrt(np.array([confidence[i, i] for i in range(0, m)])) return prediction, confidence
Squared Exponential Kernel
For my experiments I use a squared exponential kernel and also add a variance parameter for for noisy observations:k(x_i, x_j) = \sigma_f^2 e^{-\frac{1}{2l^2} || x_i - x_j ||_2^2} + \sigma_y^2 \mathbb{I}(i == j)
The parameters are as follows:
- \sigma_f^2: Variance giving the scale of the output
- \sigma_y^2 \mathbb{I}(i == j): Noise variance applied to diagnoal
- l: Scale on input
class SquaredExponential: def __init__(self, variance, alpha, noise): self.alpha = alpha self.variance = variance self.noise = noise self.scaler = -1.0 / (2 * alpha**2) def k(self, a, b, on_diag): error = np.sum(np.square(a - b)) f = np.exp(self.scaler * error) scaled = self.variance * f if on_diag: scaled += self.noise return scaled
Results on some simple example datasets
I tried my method on three datasets, 2 times, I am intepolating stock marktet data and one time a sine wave. Each time, the index is the input to the method and the output for the stock market prediction is the closing price.Sine Wave Interpolation:

Stock Interpolation Dax:

Stock Interpolation US:

No comments:
Post a Comment
Note: Only a member of this blog may post a comment.