On the impact of prior distributions on efficiency of sparse Gaussian process regression

Mohsen Esmaeilbeigi, Omid Chatrabgoun, Alireza Daneshkhah, Maryam Shafa

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)
40 Downloads (Pure)


Gaussian process regression (GPR) is a kernel-based learning model, which unfortunately suffers from computational intractability for irregular domain and large datasets due to the full kernel matrix. In this paper, we propose a novel method to produce a sparse kernel matrix using the compact support radial kernels (CSRKs) to efficiently learn the GPR from large datasets. The CSRKs can effectively avoid the ill-conditioned and full kernel matrix during GPR training and prediction, consequently reducing computational costs and memory requirements. In practice, the interest in CSRKs waned slightly as it became evident that, there is a trade-off principle (conflict between accuracy and sparsity) for compactly supported kernels. Hence, when using kernels with compact support, during GPR training, the main focus will be on providing a high level of accuracy. In this case, the advantage of achieving a sparse covariance matrix for CSRKs will almost disappear, as we will see in the numerical results. This trade-off has led authors to search for an “optimal” value of the scale parameter. Accordingly, by selecting the suitable priors on the kernel hyperparameters, and simply estimating the hyperparameters using a modified version of the maximum likelihood estimation (MLE), the GPR model derived from the CSRKs yields maximal accuracy while still maintaining a sparse covariance matrix. In fact, in GPR training, modified version of the MLE will be proportional to the product of MLE and a given suitable prior distribution for the hyperparameters that provides an efficient method for learning. The misspecification of prior distributions and their impact on the predictability of the sparse GPR models are also comprehensively investigated using several empirical studies. The proposed new approach is applied to some irregular domains with noisy test functions in 2D data sets in a comparative study. We finally investigate the effect of prior on the predictability of GPR models based on the real dataset. The derived results suggest the proposed method leads to more sparsity and well-conditioned kernel matrices in all cases.

Original languageEnglish
Pages (from-to)2905-2925
Number of pages21
JournalEngineering with Computers
Issue number4
Early online date26 Jun 2022
Publication statusPublished - Aug 2023

Bibliographical note

The final publication is available at Springer via http://dx.doi.org/10.1007/s00366-022-01686-7

Copyright © and Moral Rights are retained by the author(s) and/ or other copyright owners. A copy can be downloaded for personal non-commercial research or study, without prior permission or charge. This item cannot be reproduced or quoted extensively from without first obtaining permission in writing from the copyright holder(s). The content must not be changed in any way or sold commercially in any format or medium without the formal permission of the copyright holders.

This document is the author’s post-print version, incorporating any revisions agreed during the peer-review process. Some differences between the published version and this version may remain and you are advised to consult the published version if you wish to cite from it.


  • Compact support radial kernels
  • Gaussian process
  • Hyperparameter
  • Maximum likelihood estimation
  • Priors

ASJC Scopus subject areas

  • Software
  • Modelling and Simulation
  • Engineering(all)
  • Computer Science Applications


Dive into the research topics of 'On the impact of prior distributions on efficiency of sparse Gaussian process regression'. Together they form a unique fingerprint.

Cite this