Instructor
Dr. Melih Kandemir
Berlinerstrasse 43 (Mathematikon Bauteil B)
Room: MB 118
name [dot] surname [at] iwr [dot] uni-heidelberg [dot] de
Course Location and Time
Fridays at 09:15-11:00
Berlinerstrasse 43 (Mathematikon Bauteil B, 3rd Floor)
Room: MB 128
Spring 2016
Announcements
- As announced in the lectures earlier, the mid-way project deadline is 29 July 2016, 23:59 Berlin Time. The students need to email me the status of their projects by that time (source code and/or report). Only those who finished 50% of their projects by then will be granted extra time.
- The project submission deadline will be announced after the course starts. The students are encouraged to define their own project topics. As a guidance, I will still provide a list of suggestions.
Course Description
This course is aimed for the students to develop a strong background on GPs, which are effective discriminative Bayesian learners proven to be useful in many real-world applications. The course starts from the basics of the GP prior and spans the path through solving a wide spectrum of hard machine learning problems with them and their inference under various difficulties, such as non-conjugate likelihoods and big data.Credits
The course is offered for 5 ECTS by default. However, it is welcome to gain more credits with more labor-intensive projects on demand. REMARK!!: The projects of the students who attend at least 9 of the 13 lectures will be graded. Students who are absent 5 or more weeks will not earn any credits from this course.Tentative Syllabus (dd/mm)
- 1) Introduction (29/04) [PDF]: Intro to probability theory, Bayesian modeling, the kernel trick, links of GPs to kernel linear regression and Kriging, definition of a GP, and properties of a GP prior.
- 2) Learning with Kernels (06/05): [PDF (by A. Gretton)] A brief theory of Reproducing Kernel Hilbert Spaces (RKHS) and how they help in devising powerful learners.
- 3) GP Regression (13/05): [PDF (by Rasmussen and Williams)] Predicting real-valued output with GPs, calculation of the marginal likelihood, the predictive distribution, and principled tuning of kernel hyperparameters.
- 4) GP Regression Cont'd (20/05) [Visual material distributed internally]: : Calculating the posterior and the predictive distributions of a Bayesian linear regression model, how to apply the same path to GP regression with Gaussian likelihood, how to use the Gaussian integral to calculate predictive distributions, fast implementation of GP regression using Cholesky decomposition and forward/backward substitution.
- 5) Approximate Bayesian Inference 1 (27/05): Bayesian model selection and hyperparameter tuning in GPs [PDF] , Variational Inference [PDF] .
- 6) Approximate Bayesian Inference 2 (03/06): Variational Inference Recap, Laplace Approximation [PDF (by MacKay) ].
- 7) Classification with GPs (10/06): GP classification with Laplace Approximation [PDF (by Rasmussen and Williams)].
- 8) Classification with GPs 2 (17/06): GP classification with Expectation Propagation [PDF (by Rasmussen and Williams)],
- 9) Classification with GPs 3 (24/06): GP classification with Expectation Propagation Cont'd [PDF (by Rasmussen and Williams)]
- 10) Time Series Modeling with GPs (01/07): Recent approaches to building latent state space models [3] and learning correlated trends with GPs [4].
- 11) GPs for Big Data (08/07): How to scale GPs up to millions of data points. Fully Independent Training Conditional (FITC) approximation, Stochastic Variational Inference (SVI) with application to GPs.
- 12) Dimensionality Reduction with GPs (15/07): Gaussian Process Latent Variable Model (GPLVM) [2], its Bayesian extension, and fast inference.
- 13) Deep Learning with GPs (22/07): Building networks of GPs [5, 6] and combining GPs with deep convolutional kernels [7].
Course Material
I will not follow any particular textbook, but will either present my own slides or use external visual material. Please refer to the links below for further reading:[1] Seeger, M. PAC-Bayesian generalisation error bounds for Gaussian process classification, In JMLR, 2002.
[2] Lawrence, N.D. Gaussian process latent variable models for visualisation of high dimensional data. In NIPS, 2004.
[3] Frigola, R., Chen, Y., and Rasmussen, C.E. Variational Gaussian process state-space models. In NIPS, 2014.
[4] Wilson, A. G. and Ghahramani, Z. Generalised Wishart Processes. In UAI, 2011.
[5] Damianou, A. and Lawrence, N.D. Deep Gaussian processes. In UAI, 2013.
[6] Kandemir, M. and Hamprecht, F. A. The deep feed-forward Gaussian process: An effective generalization to covariance priors. JMLR, 2015.
[7] Wilson, A.G., Hu, Z., Salakhutdinov, R., and Xing, E.P. Deep Kernel Learning, In AISTATS, 2016
[8] Shah, A., Wilson, A.G., and Ghahramani, Z. Student-t processes as alternatives to Gaussian processes. In AISTATS, 2011.
[9] de G. Matthews A.G., Filippone M., Hensman, J., and Z. Ghahramani. MCMC for variationally sparse Gaussian processes. In NIPS, 2015.
[10] Bonilla, E., Chai, K.M., and Williams, C.I. Multi-task Gaussian process prediction. In NIPS, 2007.
[11] Nguyen, T. and Bonilla, E. Collaborative multi-output Gaussian processes. In UAI, 2014.
[12] Titsias, M.K. and Bonilla, E. Spike and slab variational inference for multi-task and multiple kernel learning. In NIPS, 2011.
[13] Kim, M. and de la Torre, F. Gaussian process multiple instance learning. In ICML, 2010.
[14] Carl E. Rasmussen and Chris I. Williams, Gaussian Processes for Machine Learning, MIT Press, 2006.