Parameter Estimation for a linear operator using pyGPs

When starting to work on this project, we were looking for a suitable Python package giving us the capabilities we needed concerning Gaussian Processes. We therefore implemented the early code using the pyGPs-package [17]. It is a package best suited for classical Gaussian Process Regression or Classification. After installing pyGPs and testing it, using the provided test data, there appeared an ‘Intel MKL FATAL ERROR: Cannot load mkl_intel_thread.dll’ - error, which we were not able to resolve. By switching from Windows to a Linux Cluster, this issue could be circumvented.

Our approach with pyGPs went as follows:

  1. We assume Gaussian Priors with zero mean and an RBF Kernel for u and f, i.e.:

    \[\begin{split}u(x) \sim \mathcal{GP}(0, k_{uu}(x, x'; \sigma_u, l_u)) \\ f(x) \sim \mathcal{GP}(0, k_{ff}(x, x'; \sigma_f, l_f))\end{split}\]
  2. Given the data \(\{X_u, Y_u\}\) and \(\{X_f, Y_f\}\), we can now optimize the hyperparameters of the two Gaussian Processes separately using pyGPs.

  3. Since \(f=\mathcal{L}_x^{\phi} u(x)\), we know that the covariance matrix for f is given by \(k_f = \mathcal{L}_{x'}^{\phi} \mathcal{L}_x^{\phi} k_{uu}\) (cf. Chapter 1.3.1). As an approximation, we set \(k_f = k_{ff}\). As a consequence, also

    \[k_f(x_i, x_i) = k_{ff}(x_i, x_i)\]

    must hold for all data points \(x_i\). Rearranging leads to some function

    \[\phi = g(\sigma_u, \sigma_f, l_u),\]

    which we can evaluate.

Now this approach worked for simple examples (e.g. in the Example implementation), though it failed for more complicated ones. As an attempt to resolve this problem, we wanted to avoid the approximation in step 3 and work with the correct covariance matrix instead. The pyGPs-package allows the user to define custom covariance functions. This was of no avail mainly due to incongruities in the pyGPs source code regarding the derivatives of covariance functions in combination with insufficient documentation in that regard and the resulting complexity of the task itself. As an example, we have included a custom covariance function in the appendix. We finally ceased expanding upon this approach, since it is not clear, whether the independent optimization of the hyperparameters for the functions u and f would yield the result we are striving for.

Example implementation

Assumptions:

\(\mathcal{L}_x^{\phi}u(x) = f(x)\)

\(u(x) \sim \mathcal{GP}(0, k_{uu}(x, x'; \theta))\)

\(f(x) \sim \mathcal{GP}(0, k_{ff}(x, x'; \theta, \phi))\)

\(\theta = \{\sigma, l\}\)

Chosen operator: \(\mathcal{L}_x^{\phi}u(x) = \phi*u(x)\)

We choose the following two functions to sample the data (The factor 12 can be varied):

Problem at hand: Given \(\{X_u, y_u\}\) and \(\{X_f, y_f\}\), estimate \(\phi\).

We clearly expect \(\phi\) to be estimated as being as close to 12 as possible.

We will get an estimate of 12.05 using 15 evenly spaced data samples in the interval \([0, 2\pi]\).

We employ a GP with a RBF kernel for u and f:

\(k_{uu}(x_i, x_j; \theta_u) = \sigma_u^2 \exp(-\frac{1}{2l_u^2}(x_i - x_j)^2)\)

\(k_{ff}(x_i, x_j; \theta_f) = \sigma_f^2 \exp(-\frac{1}{2l_f^2}(x_i - x_j)^2)\)

We use the known transformation behavior of Gaussian Processes:

\(k_{ff}(x_i, x_j; \theta, \phi) \\ = \mathcal{L}_{x_i}^{\phi}\mathcal{L}_{x_j}^{\phi}k_{uu}(x_i, x_j; \theta) \\ = \phi^2\sigma_u^2 \exp(-\frac{1}{2l_u^2}(x_i - x_j)^2)\)

Equating the two expressions we have for \(k_{ff}\) and comparing a diagonal entry (where \(x_i = x_j\)), it follows that \(\sigma_f^2 = \phi^2\sigma_u^2\), i.e.:

In [9]:
def main():

    import numpy as np
    import pyGPs

    # Generating data:
    # Keeping it as simple as possible, using sine instead of sqrt the pyGPs-
    # optimizer won't be able to calculate the optimal hyperparameters,
    # independent of the method
    x_u = np.linspace(0, 2*np.pi, 15)
    y_u = np.sqrt(x_u)

    x_f = x_u
    y_f = 12.0*np.sqrt(x_f)

    # The function u is assumed to be a Gaussian Process.
    # After a linear transformation, f has to be a Gaussian Process as well.

    model_u = pyGPs.GPR()
    model_u.setData(x_u, y_u)
    model_u.optimize(x_u, y_u)

    model_f = pyGPs.GPR()
    model_f.setData(x_f, y_f)
    model_f.optimize(x_f, y_f)

    # Note that in hyp only the logarithm of the hyperparameter is stored!
    # Characteristic length-scale l is equal to np.exp(hyp[0]) (Default: 1)
    # Signal variance sigma is equal to np.exp(hyp[1]) (Default: 1)

    # This should give 12 as output:
    print(np.exp(model_f.covfunc.hyp[1])/np.exp(model_u.covfunc.hyp[1]))

# Prevents execution by Sphinx:
if __name__ == '__main__':

    main()
Number of line searches 40
Number of line searches 40
12.049201003414321