I'm currently working on this. You can find the preliminary source code of a C++ implementation here.
I recommend standardizing only the shortest time frame of data used for training the model, using the following formula Dt = Di - mean(Di), Dt = Dt / median(abs(Dt)) / Rt, where Dt denotes the data used to train the model, Di is the original input data, and Rt is the preferred range of the data used to train the model.
The ideal kernel matrix of a support vector machine should be antisymmetric and not PSD as stated in the original SVM paper by Vapnik and Chervonenkis. The proof follows below.
My work on Tempus during the past 5 years included developing a multi-layered SVM by using a support vector regressor as the kernel of an SVM. The support vector regression machine that is used as a kernel, is trained on a dataset produced from the ideal kernel matrix for the given problem. This dataset is called a support vectors manifold. An implementation of a manifold dataset can be seen in manifolds.cpp, while the prediction method is implemented in predict.cpp. The ideal kernel matrix is anti-symmetrical and generated from the difference of labels (L1- L2, as well as L2 - L1, where K(1,2) = L1 - L2 = -K(2, 1) = - (L2 - L1), and by concatenating the feature vectors, or L1 - L2 -> F1 § F2 and L2 - L1 -> F2 § F1, where § denotes concatenation of the feature vectors. You can see a preliminary implementation in OnlineMIMOSVR::get_reference_kernel_matrix().
Update: The ideal kernel matrix can not be positive semi-definite, and the support vector machine theory should be amended not to oblige to the Karush-Kuhn-Tucker conditions since having an ideal kernel function removes the need for a weights vector. The kernel distances can be negative and consequently, the weights vector is always set to 1. Therefore, the prediction process for a 3x3 kernel matrix should look like this:
Prof. Emanouil Atanasov proposed that the ideal kernel matrix could be turned into a positive semi-definite matrix by applying exp(-lambda*A) element-wise to the distance matrix, where lambda is a parameter.
Prediction alpha is measured by comparing the L1 error of a model, to using the last-known value at the time of training the model, for predicting the variable. Example:
MAE = |Pv - Av| is the mean absolute error of the predicted values versus the actual values
MAE_LK = |Lv - Av| is the mean absolute error of using the last-known values as predictions
Ap = 100 * (MAE_LK / MAE - 1) prediction alpha in percentage points is the mean-absolute error of using the last known values divided by the L1 error of the model minus one, then multiplied by a hundred.
Converting analog data and high-precision digital sampling to time-series data of particular resolution inevitably leads to loss of information. Here I describe the problem, its solution or alleviation, and its application to financial data. I'll work on this text as soon as I have permission from my clients.
I'll write more on this subject once permitted by my clients.
Higher than appropriate gamma will lead to predictions gravitating towards the training set mean, as seen on the figure below left. The figures below show a screenshot of the LibSVM Java applet from CC Chang's website predicting labels market with dense white dots, while the blue squares show the training set used to train the support vector machine before doing the forecast. Kernel used is RBF, with cost of 8000 and 0 epsilon. The features and the labels start from 0 and increment by 1.
Fig. 1 - SVR with RBF kernel, gamma is too high
Fig. 2 - SVR with RBF kernel, gamma is appropriate