## Project: Linear Regression with NumPy and Python

Start Date: 03/01/2020

 Course Type: Common Course

Explore 1600+ online courses from top universities. Join Coursera today to learn data science, programming, business strategy, and more.

Welcome to this project-based course on Linear Regression with NumPy and Python. In this project, you will do all the machine learning without using any of the popular machine learning libraries such as scikit-learn and statsmodels. The aim of this project and is to implement all the machinery, including gradient descent and linear regression, of the various learning algorithms yourself, so you have a deeper understanding of the fundamentals. This course runs on Coursera's hands-on project platform called Rhyme. On Rhyme, you do projects in a hands-on manner in your browser. You will get instant access to pre-configured cloud desktops containing all of the software and data you need for the project. Everything is already set up directly in your internet browser so you can just focus on learning. For this project, you’ll get instant access to a cloud desktop with Python, Jupyter, and scikit-learn pre-installed.

#### Course Syllabus

Project: Linear Regression with NumPy and Python

#### Course Introduction

Welcome to this project-based course on Linear Regression with NumPy and Python. I

#### Related Wiki Topic

Article Example
NumPy Using NumPy in Python gives functionality comparable to MATLAB since they are both interpreted, and they both allow the user to write fast programs as long as most operations work on arrays or matrices instead of scalars. In comparison, MATLAB boasts a large number of additional toolboxes, notably Simulink, whereas NumPy is intrinsically integrated with Python, a more modern and complete programming language. Moreover, complementary Python packages are available; SciPy is a library that adds more MATLAB-like functionality and Matplotlib is a plotting package that provides MATLAB-like plotting functionality. Internally, both MATLAB and NumPy rely on BLAS and LAPACK for efficient linear algebra computations.
Linear regression The very simplest case of a single scalar predictor variable "x" and a single scalar response variable "y" is known as "simple linear regression". The extension to multiple and/or vector-valued predictor variables (denoted with a capital "X") is known as "multiple linear regression", also known as "multivariable linear regression". Nearly all real-world regression models involve multiple predictors, and basic descriptions of linear regression are often phrased in terms of the multiple regression model. Note, however, that in these cases the response variable "y" is still a scalar. Another term "multivariate linear regression" refers to cases where "y" is a vector, i.e., the same as "general linear regression".
NumPy NumPy targets the CPython reference implementation of Python, which is a non-optimizing bytecode interpreter. Mathematical algorithms written for this version of Python often run much slower than compiled equivalents. NumPy address the slowness problem partly by providing multidimensional arrays and functions and operators that operate efficiently on arrays, requiring (re)writing some code, mostly inner loops using NumPy.
NumPy NumPy (pronounced () or sometimes ()) is an extension to the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large library of high-level mathematical functions to operate on these arrays. The ancestor of NumPy, Numeric, was originally created by Jim Hugunin with contributions from several other developers. In 2005, Travis Oliphant created NumPy by incorporating features of the competing Numarray into Numeric, with extensive modifications. NumPy is open-source software and has many contributors.
NumPy In early 2005, NumPy developer Travis Oliphant wanted to unify the community around a single array package and ported Numarray's features to Numeric, releasing the result as NumPy 1.0 in 2006. This new project was part of SciPy. To avoid installing the large SciPy package just to get an array object, this new package was separated and called NumPy. Support for Python 3 was added in version 1.5.0.
Linear regression In statistics, linear regression is an approach for modeling the relationship between a scalar dependent variable "y" and one or more explanatory variables (or independent variables) denoted "X". The case of one explanatory variable is called "simple linear regression". For more than one explanatory variable, the process is called "multiple linear regression". (This term is distinct from "multivariate linear regression", where multiple correlated dependent variables are predicted, rather than a single scalar variable.)
NumPy Python bindings of the widely used computer vision library OpenCV utilize NumPy arrays to store and operate on data.
Linear regression The following are the major assumptions made by standard linear regression models with standard estimation techniques (e.g. ordinary least squares):
Linear regression Linear regression models are often fitted using the least squares approach, but they may also be fitted in other ways, such as by minimizing the "lack of fit" in some other norm (as with least absolute deviations regression), or by minimizing a penalized version of the least squares loss function as in ridge regression ("L"-norm penalty) and lasso ("L"-norm penalty). Conversely, the least squares approach can be used to fit models that are not linear models. Thus, although the terms "least squares" and "linear model" are closely linked, they are not synonymous.
Linear regression In linear regression, the relationships are modeled using linear predictor functions whose unknown model parameters are estimated from the data. Such models are called "linear models". Most commonly, the conditional mean of "y" given the value of "X" is assumed to be an affine function of "X"; less commonly, the median or some other quantile of the conditional distribution of "y" given "X" is expressed as a linear function of "X". Like all forms of regression analysis, linear regression focuses on the conditional probability distribution of "y" given "X", rather than on the joint probability distribution of "y" and "X", which is the domain of multivariate analysis.
Linear regression The general linear model considers the situation when the response variable "Y" is not a scalar but a vector. Conditional linearity of "E"("y"|"x") = "Bx" is still assumed, with a matrix "B" replacing the vector "β" of the classical linear regression model. Multivariate analogues of Ordinary Least-Squares (OLS) and Generalized Least-Squares (GLS) have been developed. "General linear models" are also called "multivariate linear models". These are not the same as multivariable linear models (also called "multiple linear models").
Linear regression Some of the more common estimation techniques for linear regression are summarized below.
Linear regression The capital asset pricing model uses linear regression as well as the concept of beta for analyzing and quantifying the systematic risk of an investment. This comes directly from the beta coefficient of the linear regression model that relates the return on the investment to the return on all risky assets.
Simple linear regression In statistics, simple linear regression is a linear regression model with a single explanatory variable. That is, it concerns two-dimensional sample points with one independent variable and one dependent variable (conventionally, the "x" and "y" coordinates in a Cartesian coordinate system) and finds a linear function (a non-vertical straight line) that, as accurately as possible, predicts the dependent variable values as a function of the independent variables.
Linear regression In statistics and numerical analysis, the problem of numerical methods for linear least squares is an important one because linear regression models are one of the most important types of model, both as formal statistical models and for exploration of data sets. The majority of statistical computer packages contain facilities for regression analysis that make use of linear least squares computations. Hence it is appropriate that considerable effort has been devoted to the task of ensuring that these computations are undertaken efficiently and with due regard to numerical precision.
NumPy An implementation of a matrix package was completed by Jim Fulton, then generalized by Jim Hugunin to become "Numeric", also variously called Numerical Python extensions or NumPy.
Bayesian multivariate linear regression where Y and E are formula_14 matrices. The design matrix X is an formula_15 matrix with the observations stacked vertically, as in the standard linear regression setup:
Linear regression Linear regression has many practical uses. Most applications fall into one of the following two broad categories:
NumPy Algorithms that are not expressible as a vectorized operation will typically run slowly because they must be implemented in "pure Python", while vectorization may increase memory complexity of some operations from constant to linear, because temporary arrays must be created that are as large as the inputs. Runtime compilation of numerical code has been implemented by several groups to avoid these problems; open source solutions that interoperate with NumPy include codice_1, numexpr and Numba. Cython is a static-compiling alternative to these.
Bayesian linear regression In statistics, Bayesian linear regression is an approach to linear regression in which the statistical analysis is undertaken within the context of Bayesian inference. When the regression model has errors that have a normal distribution, and if a particular form of prior distribution is assumed, explicit results are available for the posterior probability distributions of the model's parameters.