Big Data Applications: Machine Learning at Scale

Start Date: 09/13/2020

Course Type: Common Course

Course Link:

Explore 1600+ online courses from top universities. Join Coursera today to learn data science, programming, business strategy, and more.

About Course

Machine learning is transforming the world around us. To become successful, you’d better know what kinds of problems can be solved with machine learning, and how they can be solved. Don’t know where to start? The answer is one button away. During this course you will: - Identify practical problems which can be solved with machine learning - Build, tune and apply linear models with Spark MLLib - Understand methods of text processing - Fit decision trees and boost them with ensemble learning - Construct your own recommender system. As a practical assignment, you will - build and apply linear models for classification and regression tasks; - learn how to work with texts; - automatically construct decision trees and improve their performance with ensemble learning; - finally, you will build your own recommender system! With these skills, you will be able to tackle many practical machine learning tasks. We provide the tools, you choose the place of application to make this world of machines more intelligent. Special thanks to: - Prof. Mikhail Roytberg, APT dept., MIPT, who was the initial reviewer of the project, the supervisor and mentor of half of the BigData team. He was the one, who helped to get this show on the road. - Oleg Sukhoroslov (PhD, Senior Researcher at IITP RAS), who has been teaching MapReduce, Hadoop and friends since 2008. Now he is leading the infrastructure team. - Oleg Ivchenko (PhD student APT dept., MIPT), Pavel Akhtyamov (MSc. student at APT dept., MIPT) and Vladimir Kuznetsov (Assistant at P.G. Demidov Yaroslavl State University), superbrains who have developed and now maintain the infrastructure used for practical assignments in this course. - Asya Roitberg, Eugene Baulin, Marina Sudarikova. These people never sleep to babysit this course day and night, to make your learning experience productive, smooth and exciting.

Deep Learning Specialization on Coursera

Course Introduction

Big Data Applications: Machine Learning at Scale This course covers the topics of: linear machine learning, vector machines, and deep neural networks. You will learn about the design of linear models for data augmentation. You will also study optimization, gradient descent, and convolutional architectures for machine learning. You will then be able to implement machine learning algorithms for applications ranging from audio to image recognition to speech recognition. Finally, you will learn how to use these algorithms for classification of real world data. You will also learn the current and historical (2017-) state of the art in the field with a focus on theoretical foundations and practical applications. You will meet some interesting people from among others: NYU, Google, Intel, Stanford, and many more.Programming Applications in Machine Learning Linear Models Optimization Classification at Scale Bimodal Language Design: Sentence-Based Representations In this course, you will learn about the structure of sentences and their derivations, reading and their semantics, structure and syntactic sugar, and their reading and evaluation. You will also learn about the structure of sentences and their derivations, and their semantic and syntactic sugar. You will then learn about the structure of sentences and their semantic and syntactic sugar. You will be able to recognize sentence structure and syntactic sugar when you see it, and you will be able to evaluate the meaning of all sentences as well as their synt

Course Tag

Related Wiki Topic

Article Example
Big data Multidimensional big data can also be represented as tensors, which can be more efficiently handled by tensor-based computation, such as multilinear subspace learning. Additional technologies being applied to big data include massively parallel-processing (MPP) databases, search-based applications, data mining, distributed file systems, distributed databases, cloud-based infrastructure (applications, storage and computing resources) and the Internet.
Quantum machine learning Sampling from high-dimensional probability distributions is at the core of a wide spectrum of computational techniques with important applications across science, engineering, and society. Examples include deep learning, probabilistic programming, and other machine learning and artificial intelligence applications.
Machine learning Machine learning is the subfield of computer science that, according to Arthur Samuel in 1959, gives "computers the ability to learn without being explicitly programmed." Evolved from the study of pattern recognition and computational learning theory in artificial intelligence, machine learning explores the study and construction of algorithms that can learn from and make predictions on data – such algorithms overcome following strictly static program instructions by making data driven predictions or decisions, through building a model from sample inputs. Machine learning is employed in a range of computing tasks where designing and programming explicit algorithms with good performance is difficult or unfeasible; example applications include email filtering, detection of network intruders or malicious insiders working towards a data breach, optical character recognition (OCR), learning to rank and computer vision.
Machine learning Machine Learning poses a host of ethical questions. Systems which are trained on datasets collected with biases may exhibit these biases upon use, thus digitizing cultural prejudices. Responsible collection of data thus is a critical part of machine learning.
Data mining It also is a buzzword and is frequently applied to any form of large-scale data or information processing (collection, extraction, warehousing, analysis, and statistics) as well as any application of computer decision support system, including artificial intelligence, machine learning, and business intelligence. The book "Data mining: Practical machine learning tools and techniques with Java" (which covers mostly machine learning material) was originally to be named just "Practical machine learning", and the term "data mining" was only added for marketing reasons. Often the more general terms ("large scale") "data analysis" and "analytics" – or, when referring to actual methods, "artificial intelligence" and "machine learning" – are more appropriate.
Big data Real or near-real time information delivery is one of the defining characteristics of big data analytics. Latency is therefore avoided whenever and wherever possible. Data in memory is good—data on spinning disk at the other end of a FC SAN connection is not. The cost of a SAN at the scale needed for analytics applications is very much higher than other storage techniques.
Active learning (machine learning) Active learning is a special case of semi-supervised machine learning in which a learning algorithm is able to interactively query the user (or some other information source) to obtain the desired outputs at new data points. In statistics literature it is sometimes also called optimal experimental design.
Massive Online Analysis These algorithms are designed for large scale machine learning, dealing with concept drift, and big data streams in real time.
Online machine learning In computer science, online machine learning is a method of machine learning in which data becomes available in a sequential order and is used to update our best predictor for future data at each step, as opposed to batch learning techniques which generate the best predictor by learning on the entire training data set at once. Online learning is a common technique used in areas of machine learning where it is computationally infeasible to train over the entire dataset, requiring the need of out-of-core algorithms. It is also used in situations where it is necessary for the algorithm to dynamically adapt to new patterns in the data, or when the data itself is generated as a function of time, e.g. stock price prediction.
Machine learning Machine learning is closely related to (and often overlaps with) computational statistics, which also focuses on prediction-making through the use of computers. It has strong ties to mathematical optimization, which delivers methods, theory and application domains to the field. Machine learning is sometimes conflated with data mining, where the latter subfield focuses more on exploratory data analysis and is known as unsupervised learning. Machine learning can also be unsupervised and be used to learn and establish baseline behavioral profiles for various entities and then used to find meaningful anomalies.
Machine learning Machine learning and data mining often employ the same methods and overlap significantly, but while machine learning focuses on prediction, based on "known" properties learned from the training data, data mining focuses on the discovery of (previously) "unknown" properties in the data (this is the analysis step of Knowledge Discovery in Databases). Data mining uses many machine learning methods, but with different goals; on the other hand, machine learning also employs data mining methods as "unsupervised learning" or as a preprocessing step to improve learner accuracy. Much of the confusion between these two research communities (which do often have separate conferences and separate journals, ECML PKDD being a major exception) comes from the basic assumptions they work with: in machine learning, performance is usually evaluated with respect to the ability to "reproduce known" knowledge, while in Knowledge Discovery and Data Mining (KDD) the key task is the discovery of previously "unknown" knowledge. Evaluated with respect to known knowledge, an uninformed (unsupervised) method will easily be outperformed by other supervised methods, while in a typical KDD task, supervised methods cannot be used due to the unavailability of training data.
Machine learning Leo Breiman distinguished two statistical modelling paradigms: data model and algorithmic model, wherein 'algorithmic model' means more or less the machine learning algorithms like Random forest.
Machine learning Within the field of data analytics, machine learning is a method used to devise complex models and algorithms that lend themselves to prediction; in commercial use, this is known as predictive analytics. These analytical models allow researchers, data scientists, engineers, and analysts to "produce reliable, repeatable decisions and results" and uncover "hidden insights" through learning from historical relationships and trends in the data.
Big data Big data analytics for manufacturing applications is marketed as a 5C architecture (connection, conversion, cyber, cognition, and configuration).
Machine learning Machine learning and statistics are closely related fields. According to Michael I. Jordan, the ideas of machine learning, from methodological principles to theoretical tools, have had a long pre-history in statistics. He also suggested the term data science as a placeholder to call the overall field.
Adversarial machine learning Adversarial machine learning is a research field that lies at the intersection of machine learning and computer security. It aims to enable the safe adoption of machine learning techniques in adversarial settings like spam filtering, malware detection and biometric recognition.
Adversarial machine learning Machine learning algorithms are often re-trained on data collected during operation to adapt to changes in the underlying data distribution. For instance, intrusion detection systems (IDSs) are often re-trained on a set of samples collected during network operation. Within this scenario, an attacker may poison the training data by injecting carefully designed samples to eventually compromise the whole learning process. Poisoning may thus be regarded as an adversarial contamination of the training data. Examples of poisoning attacks against machine learning algorithms (including learning in the presence of worst-case adversarial label flips in the training data) can be found in.
Online machine learning Progressive learning is an effective learning model which is demonstrated by the human learning process. It is the process of learning continuously from direct experience. Progressive learning technique (PLT) in machine learning can learn new classes/labels dynamically on the run. Though online learning can learn "new samples" of data that arrive sequentially, they cannot learn "new classes" of data being introduced to the model. The learning paradigm of progressive learning, is independent of the number of class constraints and it can learn new classes while still retaining the knowledge of previous classes. Whenever a new class (non-native to the knowledge learnt thus far) is encountered, the classifier gets remodeled automatically and the parameters are calculated in such a way that it retains the knowledge learnt thus far. This technique is suitable for real-world applications where the number of classes is often unknown and online learning from real-time data is required.
Big data Big data requires a set of techniques and technologies with new forms of integration to reveal insights from datasets that are diverse, complex, and of a massive scale.
Waffles (machine learning) The Waffles machine learning toolkit contains command-line tools for performing various operations related to machine learning, data mining, and predictive modeling. The primary focus of Waffles is to provide tools that are simple to use in scripted experiments or processes. For example, the supervised learning algorithms included in Waffles are all designed to support multi-dimensional labels, classification and regression, automatically impute missing values, and automatically apply necessary filters to transform the data to a type that the algorithm can support, such that arbitrary learning algorithms can be used with arbitrary data sets. Many other machine learning toolkits provide similar functionality, but require the user to explicitly configure data filters and transformations to make it compatible with a particular learning algorithm. The algorithms provided in Waffles also have the ability to automatically tune their own parameters (with the cost of additional computational overhead).