Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization

Start Date: 03/17/2019

Course Type: Common Course

Course Link: https://www.coursera.org/learn/deep-neural-network

Explore 1600+ online courses from top universities. Join Coursera today to learn data science, programming, business strategy, and more.

About Course

This course will teach you the "magic" of getting deep learning to work well. Rather than the deep learning process being a black box, you will understand what drives performance, and be able to more systematically get good results. You will also learn TensorFlow. After 3 weeks, you will: - Understand industry best-practices for building deep learning applications. - Be able to effectively use the common neural network "tricks", including initialization, L2 and dropout regularization, Batch normalization, gradient checking, - Be able to implement and apply a variety of optimization algorithms, such as mini-batch gradient descent, Momentum, RMSprop and Adam, and check for their convergence. - Understand new best-practices for the deep learning era of how to set up train/dev/test sets and analyze bias/variance - Be able to implement a neural network in TensorFlow. This is the second course of the Deep Learning Specialization.

Deep Learning Specialization on Coursera

Course Introduction

This course will teach you the "magic" of getting deep learning to work well.

Course Tag

Deep Learning AI Machine Learning Andrew NG Deep Neural Networks Hyperparameter Tuning Regularization Optimization Hyperparameter Tensorflow Hyperparameter Optimization

Related Wiki Topic

Article Example
Hyperparameter optimization In the context of machine learning, hyperparameter optimization or model selection is the problem of choosing a set of hyperparameters for a learning algorithm, usually with the goal of optimizing a measure of the algorithm's performance on an independent data set. Often cross-validation is used to estimate this generalization performance. Hyperparameter optimization contrasts with actual learning problems, which are also often cast as optimization problems, but optimize a loss function on the training set alone. In effect, learning algorithms learn parameters that model/reconstruct their inputs well, while hyperparameter optimization is to ensure the model does not overfit its data by tuning, e.g., regularization.
Hyperparameter optimization The traditional way of performing hyperparameter optimization has been "grid search", or a "parameter sweep", which is simply an exhaustive searching through a manually specified subset of the hyperparameter space of a learning algorithm. A grid search algorithm must be guided by some performance metric, typically measured by cross-validation on the training set
Hyperparameter optimization Bayesian optimization is a methodology for the global optimization of noisy black-box functions. Applied to hyperparameter optimization, Bayesian optimization consists of developing a statistical model of the function from hyperparameter values to the objective evaluated on a validation set. Intuitively, the methodology assumes that there is some smooth but noisy function that acts as a mapping from hyperparameters to the objective. In Bayesian optimization, one aims to gather observations in such a manner as to evaluate the machine learning model the least number of times while revealing as much information as possible about this function and, in particular, the location of the optimum. Bayesian optimization relies on assuming a very general prior over functions which when combined with observed hyperparameter values and corresponding outputs yields a distribution over functions. The methodology proceeds by iteratively picking hyperparameters to observe (experiments to run) in a manner that trades off exploration (hyperparameters for which the outcome is most uncertain) and exploitation (hyperparameters which are expected to have a good outcome). In practice, Bayesian optimization has been shown to obtain better results in fewer experiments than grid search and random search, due to the ability to reason about the quality of experiments before they are run.
Dropout (neural networks) Dropout is a regularization technique for reducing overfitting in neural networks by preventing complex co-adaptations on training data. It is a very efficient way of performing model averaging with neural networks. The term "dropout" refers to dropping out units (both hidden and visible) in a neural network.
Hyperparameter optimization For example, a typical soft-margin SVM classifier equipped with an RBF kernel has at least two hyperparameters that need to be tuned for good performance on unseen data: a regularization constant "C" and a kernel hyperparameter γ. Both parameters are continuous, so to perform grid search, one selects a finite set of "reasonable" values for each, say
Deep learning Various deep learning architectures such as deep neural networks, convolutional deep neural networks, deep belief networks and recurrent neural networks have been applied to fields like computer vision, automatic speech recognition, natural language processing, audio recognition and bioinformatics where they have been shown to produce state-of-the-art results on various tasks.
Artificial neural network Support vector machines and other, much simpler methods such as linear classifiers gradually overtook neural networks in machine learning popularity. As earlier challenges in training deep neural networks were successfully addressed with methods such as Unsupervised Pre-training and computing power increased through the use of GPUs and distributed computing, neural networks were again deployed on a large scale, particularly in image and visual recognition problems. This became known as "deep learning", although deep learning is not strictly synonymous with deep neural networks.
Hyperparameter One may take a single value for a given hyperparameter, or one can iterate and take a probability distribution on the hyperparameter itself, called a hyperprior.
Hyperparameter optimization Grid search suffers from the curse of dimensionality, but is often embarrassingly parallel because typically the hyperparameter settings it evaluates are independent of each other.
Hyperparameter optimization For specific learning algorithms, it is possible to compute the gradient with respect to hyperparameters and then optimize the hyperparameters using gradient descent. The first usage of these techniques was focused on neural networks. Since then, these methods have been extended to other models such as support vector machines or logistic regression.
Deep learning With the advent of the back-propagation algorithm based on automatic differentiation, many researchers tried to train supervised deep artificial neural networks from scratch, initially with little success. Sepp Hochreiter's diploma thesis of 1991 formally identified the reason for this failure as the vanishing gradient problem, which affects many-layered feedforward networks and recurrent neural networks. Recurrent networks are trained by unfolding them into very deep feedforward networks, where a new layer is created for each time step of an input sequence processed by the network. As errors propagate from layer to layer, they shrink exponentially with the number of layers, impeding the tuning of neuron weights which is based on those errors.
Artificial neural network Evolutionary methods, gene expression programming, simulated annealing, expectation-maximization, non-parametric methods and particle swarm optimization are some other methods for training neural networks.
Rectifier (neural networks) For the first time in 2011, the use of the rectifier as a non-linearity has been shown to enable training deep supervised neural networks without requiring unsupervised pre-training.
Rectifier (neural networks) Rectified linear units find applications in computer vision and speech recognition using deep neural nets.
Neural Networks (journal) Neural Networks is a monthly peer-reviewed scientific journal and an official journal of the International Neural Network Society, European Neural Network Society, and Japanese Neural Network Society. It was established in 1988 and is published by Elsevier. The journal covers all aspects of research on artificial neural networks. The founding editor-in-chief was Stephen Grossberg (Boston University), the current editors-in-chief are DeLiang Wang (Ohio State University) and Kenji Doya (Okinawa Institute of Science and Technology). The journal is abstracted and indexed in Scopus and the Science Citation Index. According to the "Journal Citation Reports", the journal has a 2012 impact factor of 1.927.
Hyperparameter One often uses a prior which comes from a parametric family of probability distributions – this is done partly for explicitness (so one can write down a distribution, and choose the form by varying the hyperparameter, rather than trying to produce an arbitrary function), and partly so that one can "vary" the hyperparameter, particularly in the method of "conjugate priors," or for "sensitivity analysis."
Types of artificial neural networks A committee of machines (CoM) is a collection of different neural networks that together "vote" on a given example. This generally gives a much better result compared to other neural network models. Because neural networks suffer from local minima, starting with the same architecture and training but using different initial random weights often gives vastly different networks. A CoM tends to stabilize the result.
ExtraHop Networks ExtraHop Networks, Inc. is an enterprise technology company headquartered in Seattle, Washington. ExtraHop sells network appliances that perform real-time analysis of wire data for performance troubleshooting, security detection, optimization and tuning, and business analytics.
Instantaneously trained neural networks Instantaneously trained neural networks have been proposed as models of short term learning and used in web search, and financial time series prediction applications. They have also been used in instant classification of documents and for deep learning and data mining.
Deep learning "Deep learning" has been characterized as a buzzword, or a rebranding of neural networks.