Deep Learning in Computer Vision

Start Date: 11/05/2018

Course Type: Common Course

Course Link:

Explore 1600+ online courses from top universities. Join Coursera today to learn data science, programming, business strategy, and more.

About Course

Deep learning added a huge boost to the already rapidly developing field of computer vision. With deep learning, a lot of new applications of computer vision techniques have been introduced and are now becoming parts of our everyday lives. These include face recognition and indexing, photo stylization or machine vision in self-driving cars. The goal of this course is to introduce students to computer vision, starting from basics and then turning to more modern deep learning models. We will cover both image and video recognition, including image classification and annotation, object recognition and image search, various object detection techniques, motion estimation, object tracking in video, human action recognition, and finally image stylization, editing and new image generation. In course project, students will learn how to build face recognition and manipulation system to understand the internal mechanics of this technology, probably the most renown and oftenly demonstrated in movies and TV-shows example of computer vision and AI.

Course Syllabus

Welcome to the "Deep Learning for Computer Vision“ course! In the first introductory week, you'll learn about the purpose of computer vision, digital images, and operations that can be applied to them, like brightness and contrast correction, convolution and linear filtering. These simple image processing methods solve as building blocks for all the deep learning employed in the field of computer vision. Let’s get started!

Deep Learning Specialization on Coursera

Course Introduction

Deep learning added a huge boost to the already rapidly developing field of computer vision. With deep learning, a lot of new applications of computer vision techniques have been introduced and are now becoming parts of our everyday lives.

Course Tag

Deep Learning Computer Vision AI Deep Learning Models Image Recognition Video Recognition Image classification

Related Wiki Topic

Article Example
Deep learning Since its resurgence, deep learning has become part of many state-of-the-art systems in various disciplines, particularly computer vision and automatic speech recognition (ASR). Results on commonly used evaluation sets such as TIMIT (ASR) and MNIST (image classification), as well as a range of large-vocabulary speech recognition tasks are constantly being improved with new applications of deep learning. Recently, it was shown that deep learning architectures in the form of convolutional neural networks have been nearly best performing; however, these are more widely used in computer vision than in ASR, and modern large scale speech recognition is typically based on CTC for LSTM.
Computer vision Artificial intelligence and computer vision share other topics such as pattern recognition and learning techniques. Consequently, computer vision is sometimes seen as a part of the artificial intelligence field or the computer science field in general.
Deep learning Various deep learning architectures such as deep neural networks, convolutional deep neural networks, deep belief networks and recurrent neural networks have been applied to fields like computer vision, automatic speech recognition, natural language processing, audio recognition and bioinformatics where they have been shown to produce state-of-the-art results on various tasks.
Computer vision In many computer vision applications, the computers are pre-programmed to solve a particular task, but methods based on learning are now becoming increasingly common. Examples of applications of computer vision include systems for:
Deep learning Compound hierarchical-deep models compose deep networks with non-parametric Bayesian models. Features can be learned using deep architectures such as DBNs, DBMs, deep auto encoders, convolutional variants, ssRBMs, deep coding networks, DBNs with sparse feature learning, recursive neural networks, conditional DBNs, de-noising auto encoders. This provides a better representation, allowing faster learning and more accurate classification with high-dimensional data. However, these architectures are poor at learning novel classes with few examples, because all network units are involved in representing the input (a "distributed representation") and must be adjusted together (high degree of freedom). Limiting the degree of freedom reduces the number of parameters to learn, facilitating learning of new classes from few examples. "Hierarchical Bayesian (HB)" models allow learning from few examples, for example for computer vision, statistics, and cognitive science.
Computer vision A third field which plays an important role is neurobiology, specifically the study of the biological vision system. Over the last century, there has been an extensive study of eyes, neurons, and the brain structures devoted to processing of visual stimuli in both humans and various animals. This has led to a coarse, yet complicated, description of how "real" vision systems operate in order to solve certain vision related tasks. These results have led to a subfield within computer vision where artificial systems are designed to mimic the processing and behavior of biological systems, at different levels of complexity. Also, some of the learning-based methods developed within computer vision ("e.g." neural net and deep learning based image and feature analysis and classification) have their background in biology.
Deep learning Deep learning (also known as deep structured learning, hierarchical learning or deep machine learning) is a class
Computer vision Sub-domains of computer vision include scene reconstruction, event detection, video tracking, object recognition, object pose estimation, learning, indexing, motion estimation, and image restoration.
Deep learning The first general, working learning algorithm for supervised deep feedforward multilayer perceptrons was published by Ivakhnenko and Lapa in 1965. A 1971 paper described a deep network with 8 layers trained by the Group method of data handling algorithm which is still popular in the current millennium. These ideas were implemented in a computer identification system "Alpha", which demonstrated the learning process. Other Deep Learning working architectures, specifically those built from artificial neural networks (ANN), date back to the Neocognitron introduced by Kunihiko Fukushima in 1980. The ANNs themselves date back even further. The challenge was how to train networks with multiple layers.
Computer vision Photogrammetry also overlaps with computer vision, e.g., stereophotogrammetry vs. stereo computer vision.
Deep learning A deep Q-network (DQN) is a type of deep learning model developed at Google DeepMind which combines a deep convolutional neural network with Q-learning, a form of reinforcement learning. Unlike earlier reinforcement learning agents, DQNs can learn directly from high-dimensional sensory inputs. Preliminary results were presented in 2014, with a paper published in February 2015 in Nature The application discussed in this paper is limited to Atari 2600 gaming, although it has implications for other applications. However, much before this work, there had been a number of reinforcement learning models that apply deep learning approaches (e.g.,).
Machine learning Falling hardware prices and the development of GPUs for personal use in the last few years have contributed to the development of the concept of Deep learning which consists of multiple hidden layers in an artificial neural network. This approach tries to model the way the human brain processes light and sound into vision and hearing. Some successful applications of deep learning are computer vision and speech recognition.
Deep learning Recommendation systems have used deep learning to extract meaningful deep features for latent factor model for content-based recommendation for music. Recently, a more general approach for learning user preferences from multiple domains using multiview deep learning has been introduced. The model uses a hybrid collaborative and content-based approach and enhances recommendations in multiple tasks.
Comparison of deep learning software The following table compares some of the most popular software frameworks, libraries and computer programs for deep learning.
Deep learning These definitions have in common (1) multiple layers of nonlinear processing units and (2) the supervised or unsupervised learning of feature representations in each layer, with the layers forming a hierarchy from low-level to high-level features. The composition of a layer of nonlinear processing units used in a deep learning algorithm depends on the problem to be solved. Layers that have been used in deep learning include hidden layers of an artificial neural network and sets of complicated propositional formulas. They may also include latent variables organized layer-wise in deep generative models such as the nodes in Deep Belief Networks and Deep Boltzmann Machines.
Computer vision Yet another field related to computer vision is signal processing. Many methods for processing of one-variable signals, typically temporal signals, can be extended in a natural way to processing of two-variable signals or multi-variable signals in computer vision. However, because of the specific nature of images there are many methods developed within computer vision which have no counterpart in processing of one-variable signals. Together with the multi-dimensionality of the signal, this defines a subfield in signal processing as a part of computer vision.
Computer vision Some strands of computer vision research are closely related to the study of biological vision – indeed, just as many strands of AI research are closely tied with research into human consciousness, and the use of stored knowledge to interpret, integrate and utilize visual information. The field of biological vision studies and models the physiological processes behind visual perception in humans and other animals. Computer vision, on the other hand, studies and describes the processes implemented in software and hardware behind artificial vision systems. Interdisciplinary exchange between biological and computer vision has proven fruitful for both fields.
Deep learning Significant additional impact of deep learning in image or object recognition was felt in the years 2011–2012. Although CNNs trained by backpropagation had been around for decades, and GPU implementations of NNs for years, including CNNs, fast implementations of CNNs with max-pooling on GPUs in the style of Dan Ciresan and colleagues were needed to make a dent in computer vision. In 2011, this approach achieved for the first time superhuman performance in a visual pattern recognition contest. Also in 2011, it won the ICDAR Chinese handwriting contest, and in May 2012, it won the ISBI image segmentation contest.
Deep learning Many deep learning algorithms are applied to unsupervised learning tasks. This is an important benefit because unlabeled data are usually more abundant than labeled data. Examples of deep structures that can be trained in an unsupervised manner are neural history compressors and deep belief networks.
Robot learning While machine learning is frequently used by computer vision algorithms employed in the context of robotics, these applications are usually not referred to as "robot learning".