Probabilistic Graphical Models 3: Learning

Start Date: 07/05/2020

Course Type: Common Course

Course Link:

About Course

Probabilistic graphical models (PGMs) are a rich framework for encoding probability distributions over complex domains: joint (multivariate) distributions over large numbers of random variables that interact with each other. These representations sit at the intersection of statistics and computer science, relying on concepts from probability theory, graph algorithms, machine learning, and more. They are the basis for the state-of-the-art methods in a wide variety of applications, such as medical diagnosis, image understanding, speech recognition, natural language processing, and many, many more. They are also a foundational tool in formulating many machine learning problems. This course is the third in a sequence of three. Following the first course, which focused on representation, and the second, which focused on inference, this course addresses the question of learning: how a PGM can be learned from a data set of examples. The course discusses the key problems of parameter estimation in both directed and undirected models, as well as the structure learning task for directed models. The (highly recommended) honors track contains two hands-on programming assignments, in which key routines of two commonly used learning algorithms are implemented and applied to a real-world problem.

Course Syllabus

This module discusses the simples and most basic of the learning problems in probabilistic graphical models: that of parameter estimation in a Bayesian network. We discuss maximum likelihood estimation, and the issues with it. We then discuss Bayesian estimation and how it can ameliorate these problems.

Coursera Plus banner featuring three learners and university partner logos

Course Introduction

Probabilistic Graphical Models 3: Learning and Models This course is the third and last course in the specialization about Probabilistic Graphical Models (PDMs). These are state-of-the-art models that solve for a large number of parameters (loops) and are designed to span a wide variety of applications. We will learn two new modelling techniques, supervised (overlapping layers) and unsupervised (clustering). We will also learn how to use PDEs for clustering and for finding the best models. Finally, we will look at the challenges associated with clustering in the US and importances in Python programs. Note that the sample case study uses Python 2.7.Getting Started and Building Model Solvers Trains and Sends approximations Mean-Variance Analysis Discrete Optimization Probabilistic Graphical Models 1: Representation, Decomposition, and Visualization This course is the first course in a sequence that covers Probabilistic Graphical Models (PDMs), which are state-of-the-art models that solve for a large number of parameters (loops) and are designed to span a wide variety of applications. We will learn two new modelling techniques, variational (overlapping layers) and unsupervised (clustering). We will also learn how to use PDEs for clustering and for finding the best models.

Course Tag

Algorithms Expectation–Maximization (EM) Algorithm Graphical Model Markov Random Field

Related Wiki Topic

Article Example
Graphical model A graphical model or probabilistic graphical model (PGM) is a probabilistic model for which a graph expresses the conditional dependence structure between random variables. They are commonly used in probability theory, statistics—particularly Bayesian statistics—and machine learning.
Russ Salakhutdinov He specializes in deep learning, probabilistic graphical models, and large-scale optimization.
Probabilistic programming language A probabilistic programming language (PPL) is a programming language designed to describe probabilistic models and then perform inference in those models. PPLs are closely related to graphical models and Bayesian networks, but are more expressive and flexible. Probabilistic programming represents an attempt to "[unify] general purpose programming with probabilistic modeling."
Probabilistic soft logic Probabilistic soft logic (PSL) is a framework for collective, probabilistic reasoning in relational domains. PSL uses first order logic rules as a template language for graphical models over random variables with soft truth values from the interval [0,1].
Graphical model This type of graphical model is known as a directed graphical model, Bayesian network, or belief network. Classic machine learning models like hidden Markov models, neural networks and newer models such as variable-order Markov models can be considered special cases of Bayesian networks.
Probabilistic classification Some classification models, such as naive Bayes, logistic regression and multilayer perceptrons (when trained under an appropriate loss function) are naturally probabilistic. Other models such as support vector machines are not, but methods exist to turn them into probabilistic classifiers.
Probabilistic classification Binary probabilistic classifiers are also called binomial regression models in statistics. In econometrics, probabilistic classification in general is called discrete choice.
Machine learning A Bayesian network, belief network or directed acyclic graphical model is a probabilistic graphical model that represents a set of random variables and their conditional independencies via a directed acyclic graph (DAG). For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases. Efficient algorithms exist that perform inference and learning.
Probabilistic programming language A probabilistic relational programming language (PRPL) is a PPL specially designed to describe and infer with probabilistic relational models (PRMs).
Probabilistic soft logic In recent years there has been a rise in the approaches that combine graphical models and first-order logic to allow the development of complex probabilistic models with relational structures. A notable example of such approaches is Markov logic networks (MLNs). Like MLNs PSL is a modelling language (with an accompanying implementation) for learning and predicting in relational domains. Unlike MLNs, PSL uses soft truth values for predicates in an interval between [0,1]. This allows for the integration of similarity functions in the into models. This is useful in problems such as Ontology Mapping and Entity Resolution. Also, in PSL the formula syntax is restricted to rules with conjunctive bodies.
Estimation of distribution algorithm Bivariate and multivariate distributions are usually represented as Probabilistic Graphical Models (graphs), in which edges denote statistical dependencies (or conditional probabilities) and vertices denote variables. To learn the structure of a PGM from data linkage-learning is employed.
Graphical models for protein structure Graphical models can still be used when the variables of choice are continuous. In these cases, the probability distribution is represented as a multivariate probability distribution over continuous variables. Each family of distribution will then impose certain properties on the graphical model. Multivariate Gaussian distribution is one of the most convenient distributions in this problem. The simple form of the probability, and the direct relation with the corresponding graphical model makes it a popular choice among researchers.
Graphical models for protein structure Graphical models have become powerful frameworks for protein structure prediction, protein–protein interaction and free energy calculations for protein structures. Using a graphical model to represent the protein structure allows the solution of many problems including secondary structure prediction, protein protein interactions, protein-drug interaction, and free energy calculations.
Graphical models for protein structure Gaussian graphical models are multivariate probability distributions encoding a network of dependencies among variables. Let formula_15 be a set of formula_16 variables, such as formula_16 dihedral angles, and let formula_18 be the value of the probability density function at a particular value "D". A multivariate Gaussian graphical model defines this probability as follows:
Supervised learning Although formula_12 and formula_18 can be any space of functions, many learning algorithms are probabilistic models where formula_11 takes the form of a conditional probability model formula_22, or formula_23 takes the form of a joint probability model formula_24. For example, naive Bayes and linear discriminant analysis are joint probability models, whereas logistic regression is a conditional probability model.
Probabilistic programming language Probabilistic reasoning is a foundational technology of machine learning. It is used by companies such as Google, and Microsoft. Probabilistic reasoning has been used for predicting stock prices, recommending movies, diagnosing computers, detecting cyber intrusions and image detection.
Statistical relational learning As is evident from the characterization above, the field is not strictly limited to learning aspects; it is equally concerned with reasoning (specifically probabilistic inference) and knowledge representation. Therefore, alternative terms that reflect the main foci of the field include "statistical relational learning and reasoning" (emphasizing the importance of reasoning) and "first-order probabilistic languages" (emphasizing the key properties of the languages with which models are represented).
Graphical model Generally, probabilistic graphical models use a graph-based representation as the foundation for encoding a complete distribution over a multi-dimensional space and a graph that is a compact or factorized representation of a set of independences that hold in the specific distribution. Two branches of graphical representations of distributions are commonly used, namely, Bayesian networks and Markov random fields. Both families encompass the properties of factorization and independences, but they differ in the set of independences they can encode and the factorization of the distribution that they induce.
Occam learning Occam algorithms have also been shown to be successful for PAC learning in the presence of errors, probabilistic concepts, function learning and Markovian non-independent examples.
Learning curve A learning curve is a graphical representation of the increase of learning (vertical axis) with experience (horizontal axis).