Machine Learning for Data Analysis

Start Date: 11/03/2019

Course Type: Common Course

Course Link: https://www.coursera.org/learn/machine-learning-data-analysis

Explore 1600+ online courses from top universities. Join Coursera today to learn data science, programming, business strategy, and more.

Course Syllabus

In this session, you will learn about decision trees, a type of data mining algorithm that can select from among a large number of variables those and their interactions that are most important in predicting the target or response variable to be explained. Decision trees create segmentations or subgroups in the data, by applying a series of simple rules or criteria over and over again, which choose variable constellations that best predict the target variable.

Deep Learning Specialization on Coursera

Course Introduction

Are you interested in predicting future outcomes using your data? This course helps you do just that

Course Tag

Data Analysis Python Programming Machine Learning Exploratory Data Analysis

Related Wiki Topic

Article Example
Machine learning Clustering is a method of unsupervised learning, and a common technique for statistical data analysis.
Machine learning Machine learning is closely related to (and often overlaps with) computational statistics, which also focuses on prediction-making through the use of computers. It has strong ties to mathematical optimization, which delivers methods, theory and application domains to the field. Machine learning is sometimes conflated with data mining, where the latter subfield focuses more on exploratory data analysis and is known as unsupervised learning. Machine learning can also be unsupervised and be used to learn and establish baseline behavioral profiles for various entities and then used to find meaningful anomalies.
Data mining It also is a buzzword and is frequently applied to any form of large-scale data or information processing (collection, extraction, warehousing, analysis, and statistics) as well as any application of computer decision support system, including artificial intelligence, machine learning, and business intelligence. The book "Data mining: Practical machine learning tools and techniques with Java" (which covers mostly machine learning material) was originally to be named just "Practical machine learning", and the term "data mining" was only added for marketing reasons. Often the more general terms ("large scale") "data analysis" and "analytics" – or, when referring to actual methods, "artificial intelligence" and "machine learning" – are more appropriate.
Semantic analysis (machine learning) In machine learning, semantic analysis of a corpus is the task of building structures that approximate concepts from a large set of documents. It generally does not involve prior semantic understanding of the documents.
Machine learning Machine learning and data mining often employ the same methods and overlap significantly, but while machine learning focuses on prediction, based on "known" properties learned from the training data, data mining focuses on the discovery of (previously) "unknown" properties in the data (this is the analysis step of Knowledge Discovery in Databases). Data mining uses many machine learning methods, but with different goals; on the other hand, machine learning also employs data mining methods as "unsupervised learning" or as a preprocessing step to improve learner accuracy. Much of the confusion between these two research communities (which do often have separate conferences and separate journals, ECML PKDD being a major exception) comes from the basic assumptions they work with: in machine learning, performance is usually evaluated with respect to the ability to "reproduce known" knowledge, while in Knowledge Discovery and Data Mining (KDD) the key task is the discovery of previously "unknown" knowledge. Evaluated with respect to known knowledge, an uninformed (unsupervised) method will easily be outperformed by other supervised methods, while in a typical KDD task, supervised methods cannot be used due to the unavailability of training data.
Machine learning Machine Learning poses a host of ethical questions. Systems which are trained on datasets collected with biases may exhibit these biases upon use, thus digitizing cultural prejudices. Responsible collection of data thus is a critical part of machine learning.
Smoothed analysis Since its introduction in 2001, smoothed analysis has been used as a basis for considerable research, for problems ranging from mathematical programming, numerical analysis, machine learning, and data mining.
Data analysis Data integration is a precursor to data analysis, and data analysis is closely linked to data visualization and data dissemination. The term "data analysis" is sometimes used as a synonym for data modeling.
Data analysis techniques for fraud detection To go beyond, a data analysis system has to be equipped with a substantial amount of background knowledge, and be able to perform reasoning tasks involving that knowledge and the data provided. In effort to meet this goal, researchers have turned to ideas from the machine learning field. This is a natural source of ideas, since the machine learning task can be described as turning background knowledge and examples (input) into knowledge (output).
Machine-dependent software Huang, J., Li, Y. F., & Xie, M., 2015, An empirical analysis of data preprocessing for machine learning-based software cost estimation, "Information and Software Technology", 67, 108-127
Statistics There are two applications for machine learning and data mining: data management and data analysis. Statistics tools are necessary for the data analysis.
Machine learning Leo Breiman distinguished two statistical modelling paradigms: data model and algorithmic model, wherein 'algorithmic model' means more or less the machine learning algorithms like Random forest.
Online machine learning In computer science, online machine learning is a method of machine learning in which data becomes available in a sequential order and is used to update our best predictor for future data at each step, as opposed to batch learning techniques which generate the best predictor by learning on the entire training data set at once. Online learning is a common technique used in areas of machine learning where it is computationally infeasible to train over the entire dataset, requiring the need of out-of-core algorithms. It is also used in situations where it is necessary for the algorithm to dynamically adapt to new patterns in the data, or when the data itself is generated as a function of time, e.g. stock price prediction.
Link analysis With the vast amounts of data and information that are stored electronically, users are confronted with multiple unrelated sources of information available for analysis. Data analysis techniques are required to make effective and efficient use of the data. Palshikar classifies data analysis techniques into two categories – statistical (models, time-series analysis, clustering and classification, matching algorithms to detect anomalies) and artificial intelligence (AI) techniques (data mining, expert systems, pattern recognition, machine learning techniques, neural networks).
Data analysis Data initially obtained must be processed or organised for analysis. For instance, these may involve placing data into rows and columns in a table format (i.e., structured data) for further analysis, such as within a spreadsheet or statistical software.
Adversarial machine learning Machine learning algorithms are often re-trained on data collected during operation to adapt to changes in the underlying data distribution. For instance, intrusion detection systems (IDSs) are often re-trained on a set of samples collected during network operation. Within this scenario, an attacker may poison the training data by injecting carefully designed samples to eventually compromise the whole learning process. Poisoning may thus be regarded as an adversarial contamination of the training data. Examples of poisoning attacks against machine learning algorithms (including learning in the presence of worst-case adversarial label flips in the training data) can be found in.
Machine learning Within the field of data analytics, machine learning is a method used to devise complex models and algorithms that lend themselves to prediction; in commercial use, this is known as predictive analytics. These analytical models allow researchers, data scientists, engineers, and analysts to "produce reliable, repeatable decisions and results" and uncover "hidden insights" through learning from historical relationships and trends in the data.
Waffles (machine learning) The Waffles machine learning toolkit contains command-line tools for performing various operations related to machine learning, data mining, and predictive modeling. The primary focus of Waffles is to provide tools that are simple to use in scripted experiments or processes. For example, the supervised learning algorithms included in Waffles are all designed to support multi-dimensional labels, classification and regression, automatically impute missing values, and automatically apply necessary filters to transform the data to a type that the algorithm can support, such that arbitrary learning algorithms can be used with arbitrary data sets. Many other machine learning toolkits provide similar functionality, but require the user to explicitly configure data filters and transformations to make it compatible with a particular learning algorithm. The algorithms provided in Waffles also have the ability to automatically tune their own parameters (with the cost of additional computational overhead).
Learning analytics "Learning analytics is the use of intelligent data, learner-produced data, and analysis models to discover information and social connections for predicting and advising people's learning."
Topological data analysis One of the main fields of data analysis today is machine learning. Some examples of machine learning in TDA can be found in Adcock et al. A conference is dedicated to the link between TDA and machine learning. In order to apply tools from machine leaning, the information obtained from TDA should be represented in vector form. An ongoing and promising attempt is the persistence landscape discussed above. Another attempt uses the concept of persistence images. However, one problem of this method is the loss of stability, since the hard stability theorem depends on the barcode representation.