Bayesian Methods for Machine Learning

Start Date: 11/17/2019

Course Type: Common Course

Course Link: https://www.coursera.org/learn/bayesian-methods-in-machine-learning

Explore 1600+ online courses from top universities. Join Coursera today to learn data science, programming, business strategy, and more.

Course Syllabus

Welcome to first week of our course! Today we will discuss what bayesian methods are and what are probabilistic models. We will see how they can be used to model real-life situations and how to make conclusions from them. We will also learn about conjugate priors — a class of models where all math becomes really simple.

Deep Learning Specialization on Coursera

Course Introduction

Bayesian methods are used in lots of fields: from game development to drug discovery. They give supe

Course Tag

Bayesian Optimization Gaussian Process Markov Chain Monte Carlo (MCMC) Variational Bayesian Methods

Related Wiki Topic

Article Example
Variational Bayesian methods Variational Bayesian methods are a family of techniques for approximating intractable integrals arising in Bayesian inference and machine learning. They are typically used in complex statistical models consisting of observed variables (usually termed "data") as well as unknown parameters and latent variables, with various sorts of relationships among the three types of random variables, as might be described by a graphical model. As is typical in Bayesian inference, the parameters and latent variables are grouped together as "unobserved variables". Variational Bayesian methods are primarily used for two purposes:
Quantum machine learning The term quantum machine learning is also used for approaches that apply classical methods of machine learning to the study of quantum systems, for instance in the context of quantum information theory or for the development of quantum technologies. For example, when experimentalists have to deal with incomplete information on a quantum system or source, Bayesian methods and concepts of algorithmic learning can be fruitfully applied. This includes the application of machine learning to tackle quantum state classification, Hamiltonian learning, or learning an unknown unitary transformation.
Bayesian inference In the 1980s, there was a dramatic growth in research and applications of Bayesian methods, mostly attributed to the discovery of Markov chain Monte Carlo methods, which removed many of the computational problems, and an increasing interest in nonstandard, complex applications. Despite growth of Bayesian research, most undergraduate teaching is still based on frequentist statistics. Nonetheless, Bayesian methods are widely accepted and used, such as for example in the field of machine learning.
Machine learning Some statisticians have adopted methods from machine learning, leading to a combined field that they call "statistical learning".
Quantum machine learning Quantum machine learning is an emerging interdisciplinary research area at the intersection of quantum physics and machine learning. One can distinguish four different ways of merging the two parent disciplines. Quantum machine learning algorithms can use the advantages of quantum computation in order to improve classical methods of machine learning, for example by developing efficient implementations of expensive classical algorithms on a quantum computer. On the other hand, one can apply classical methods of machine learning to analyse quantum systems. Most generally, one can consider situations wherein both the learning device and the system under study are fully quantum.
Bayesian probability In the 1980s there was a dramatic growth in research and applications of Bayesian methods, mostly attributed to the discovery of Markov chain Monte Carlo methods and the consequent removal of many of the computational problems, and to an increasing interest in nonstandard, complex applications. While frequentist statistics remains strong (as seen by the fact that most undergraduate teaching is still based on it ), Bayesian methods are widely accepted and used, e.g., in the field of machine learning.
Bayesian Program Synthesis Bayesian probabilities is a strategy to learn distributions over Bayesian programs. Gamalon, a machine learning company, invented the term as describing their framework for using Bayesian probabilistic programs to learn specialized probabilistic programs based on input data.
Bayesian network Automatically learning the graph structure of a Bayesian network is a challenge pursued within machine learning. The basic idea goes back to a recovery algorithm
Bayesian Program Synthesis In machine learning, Bayesian Program Synthesis (BPS), Bayesian Programs write (synthesize) new Bayesian programs. This is in contrast to the field of probabilistic programs where humans write new probabilistic (Bayesian) programs.
Machine learning Machine learning is closely related to (and often overlaps with) computational statistics, which also focuses on prediction-making through the use of computers. It has strong ties to mathematical optimization, which delivers methods, theory and application domains to the field. Machine learning is sometimes conflated with data mining, where the latter subfield focuses more on exploratory data analysis and is known as unsupervised learning. Machine learning can also be unsupervised and be used to learn and establish baseline behavioral profiles for various entities and then used to find meaningful anomalies.
Relevance vector machine In mathematics, a Relevance Vector Machine (RVM) is a machine learning technique that uses Bayesian inference to obtain parsimonious solutions for regression and probabilistic classification.
Machine learning Machine learning and data mining often employ the same methods and overlap significantly, but while machine learning focuses on prediction, based on "known" properties learned from the training data, data mining focuses on the discovery of (previously) "unknown" properties in the data (this is the analysis step of Knowledge Discovery in Databases). Data mining uses many machine learning methods, but with different goals; on the other hand, machine learning also employs data mining methods as "unsupervised learning" or as a preprocessing step to improve learner accuracy. Much of the confusion between these two research communities (which do often have separate conferences and separate journals, ECML PKDD being a major exception) comes from the basic assumptions they work with: in machine learning, performance is usually evaluated with respect to the ability to "reproduce known" knowledge, while in Knowledge Discovery and Data Mining (KDD) the key task is the discovery of previously "unknown" knowledge. Evaluated with respect to known knowledge, an uninformed (unsupervised) method will easily be outperformed by other supervised methods, while in a typical KDD task, supervised methods cannot be used due to the unavailability of training data.
Bayesian interpretation of kernel regularization In machine learning, kernel methods arise from the assumption of an inner product space or similarity structure on inputs. For some such methods, such as support vector machines (SVMs), the original formulation and its regularization were not Bayesian in nature. It is helpful to understand them from a Bayesian perspective. Because the kernels are not necessarily positive semidefinite, the underlying structure may not be inner product spaces, but instead more general reproducing kernel Hilbert spaces. In Bayesian probability kernel methods are a key component of Gaussian processes, where the kernel function is known as the covariance function. Kernel methods have traditionally been used in supervised learning problems where the "input space" is usually a "space of vectors" while the "output space" is a "space of scalars". More recently these methods have been extended to problems that deal with multiple outputs such as in multi-task learning.
Transduction (machine learning) this category is the Bayesian Committee Machine (BCM).
Zoubin Ghahramani Ghahramani has made significant contributions in the areas of Bayesian machine learning (particularly variational methods for approximate Bayesian inference), as well as graphical models and computational neuroscience. His current research focuses on nonparametric Bayesian modelling and statistical machine learning. He has also worked on artificial intelligence, information retrieval, bioinformatics and statistics which provide the mathematical foundations for handling uncertainty, making decisions, and designing learning systems. He has published over 200 papers, receiving over 30,000 citations (an h-index of 74).
Bayesian Bayesian methods have been also applied to the interpretation of quantum mechanics:
Machine learning Rule-based machine learning is a general term for any machine learning method that identifies, learns, or evolves `rules’ to store, manipulate or apply, knowledge. The defining characteristic of a rule-based machine learner is the identification and utilization of a set of relational rules that collectively represent the knowledge captured by the system. This is in contrast to other machine learners that commonly identify a singular model that can be universally applied to any instance in order to make a prediction. Rule-based machine learning approaches include learning classifier systems, association rule learning, and artificial immune systems.
Quantum machine learning Quantum matrix inversion can be applied to machine learning methods in which the training reduces to solving a linear system of equations, for example in least-squares linear regression, the least-squares version of support vector machines, and Gaussian processes.
Variable-order Bayesian network Variable-order Bayesian network (VOBN) models provide an important extension of both the Bayesian network models and the variable-order Markov models. VOBN models are used in machine learning in general and have shown great potential in bioinformatics applications.
Bayesian Bayesian refers to methods in probability and statistics named after Thomas Bayes (c. 1702–61), in particular methods related to statistical inference: