Supervised Machine Learning: Classification

Start Date: 07/11/2021

Course Type: Common Course

Course Link:

About Course

This course introduces you to one of the main types of modeling families of supervised Machine Learning: Classification. You will learn how to train predictive models to classify categorical outcomes and how to use error metrics to compare across different models. The hands-on section of this course focuses on using best practices for classification, including train and test splits, and handling data sets with unbalanced classes.

Course Syllabus

Logistic Regression
Support Vector Machines
Ensemble Models
Modeling Unbalanced Classes

Coursera Plus banner featuring three learners and university partner logos

Course Introduction

This course introduces you to one of the main types of modeling families of supervised Machine Learning: Classification.

Course Tag

Related Wiki Topic

Article Example
Supervised learning Supervised learning is the machine learning task of inferring a function from "". The training data consist of a set of "training examples". In supervised learning, each example is a "pair" consisting of an input object (typically a vector) and a desired output value (also called the "supervisory signal"). A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples. An optimal scenario will allow for the algorithm to correctly determine the class labels for unseen instances. This requires the learning algorithm to generalize from the training data to unseen situations in a "reasonable" way (see inductive bias).
Semi-supervised learning As in the supervised learning framework, we are given a set of formula_1 independently identically distributed examples formula_2 with corresponding labels formula_3. Additionally, we are given formula_4 unlabeled examples formula_5. Semi-supervised learning attempts to make use of this combined information to surpass the classification performance that could be obtained either by discarding the unlabeled data and doing supervised learning or by discarding the labels and doing unsupervised learning.
Supervised learning A wide range of supervised learning algorithms are available, each with its strengths and weaknesses. There is no single learning algorithm that works best on all supervised learning problems (see the No free lunch theorem).
Supervised learning There are four major issues to consider in supervised learning:
Binary classification Statistical classification is a problem studied in machine learning. It is a type of supervised learning, a method of machine learning where the categories are predefined, and is used to categorize new probabilistic observations into said categories. When there are only two categories the problem is known as statistical binary classification.
Supervised learning In empirical risk minimization, the supervised learning algorithm seeks the function formula_11 that minimizes formula_32. Hence, a supervised learning algorithm can be constructed by applying an optimization algorithm to find formula_11.
Semi-supervised learning Generative approaches to statistical learning first seek to estimate formula_9, the distribution of data points belonging to each class. The probability formula_10 that a given point formula_11 has label formula_12 is then proportional to formula_13 by Bayes' rule. Semi-supervised learning with generative models can be viewed either as an extension of supervised learning (classification plus information about formula_14) or as an extension of unsupervised learning (clustering plus some labels).
Supervised learning The supervised learning optimization problem is to find the function formula_11 that minimizes
Supervised learning There are several ways in which the standard supervised learning problem can be generalized:
Semi-supervised learning Semi-supervised learning is a class of supervised learning tasks and techniques that also make use of unlabeled data for training – typically a small amount of labeled data with a large amount of unlabeled data. Semi-supervised learning falls between unsupervised learning (without any labeled training data) and supervised learning (with completely labeled training data). Many machine-learning researchers have found that unlabeled data, when used in conjunction with a small amount of labeled data, can produce considerable improvement in learning accuracy. The acquisition of labeled data for a learning problem often requires a skilled human agent (e.g. to transcribe an audio segment) or a physical experiment (e.g. determining the 3D structure of a protein or determining whether there is oil at a particular location). The cost associated with the labeling process thus may render a fully labeled training set infeasible, whereas acquisition of unlabeled data is relatively inexpensive. In such situations, semi-supervised learning can be of great practical value. Semi-supervised learning is also of theoretical interest in machine learning and as a model for human learning.
Machine learning Learning classifier systems (LCS) are a family of rule-based machine learning algorithms that combine a discovery component (e.g. typically a genetic algorithm) with a learning component (performing either supervised learning, reinforcement learning, or unsupervised learning). They seek to identify a set of context-dependent rules that collectively store and apply knowledge in a piecewise manner in order to make predictions.
Boosting (machine learning) Boosting is a machine learning ensemble meta-algorithm for primarily reducing bias, and also variance in supervised learning, and a family of machine learning algorithms which convert weak learners to strong ones. Boosting is based on the question posed by Kearns and Valiant (1988, 1989): Can a set of weak learners create a single strong learner? A weak learner is defined to be a classifier which is only slightly correlated with the true classification (it can label examples better than random guessing). In contrast, a strong learner is a classifier that is arbitrarily well-correlated with the true classification.
Computational learning theory Theoretical results in machine learning mainly deal with a type of inductive learning called supervised learning. In supervised
Supervised learning In order to solve a given problem of supervised learning, one has to perform the following steps:
Active learning (machine learning) Active learning is a special case of semi-supervised machine learning in which a learning algorithm is able to interactively query the user (or some other information source) to obtain the desired outputs at new data points. In statistics literature it is sometimes also called optimal experimental design.
Adversarial machine learning Attacks against (supervised) machine learning algorithms have been categorized along three primary axes: their "influence" on the classifier, the "security violation" they cause, and their "specificity".
Semi-supervised learning "Self-training" is a wrapper method for semi-supervised learning. First a supervised learning algorithm is trained based on the labeled data only. This classifier is then applied to the unlabeled data to generate more labeled examples as input for the supervised learning algorithm. Generally only the labels the classifier is most confident of are added at each step.
Semi-supervised learning Some methods for semi-supervised learning are not intrinsically geared to learning from both unlabeled and labeled data, but instead make use of unlabeled data within a supervised learning framework. For instance, the labeled and unlabeled examples formula_45 may inform a choice of representation, distance metric, or kernel for the data in an unsupervised first step. Then supervised learning proceeds from only the labeled examples.
Unsupervised learning Unsupervised machine learning is the machine learning task of inferring a function to describe hidden structure from "unlabeled" data (a classification or categorization is not included in the observations). Since the examples given to the learner are unlabeled, there is no evaluation of the accuracy of the structure that is output by the relevant algorithm—which is one way of distinguishing unsupervised learning from supervised learning and reinforcement learning.
Statistical classification In the terminology of machine learning, classification is considered an instance of supervised learning, i.e. learning where a training set of correctly identified observations is available. The corresponding unsupervised procedure is known as clustering, and involves grouping data into categories based on some measure of inherent similarity or distance.