## Regression Models

Start Date: 07/05/2020

 Course Type: Common Course

Linear models, as their name implies, relates an outcome to a set of predictors of interest using linear assumptions. Regression models, a subset of linear models, are the most important statistical analysis tool in a data scientist’s toolkit. This course covers regression analysis, least squares and inference using regression models. Special cases of the regression model, ANOVA and ANCOVA will be covered as well. Analysis of residuals and variability will be investigated. The course will cover modern thinking on model selection and novel uses of regression models including scatterplot smoothing.

#### Course Introduction

Regression Models for Data Analysis This course is the third course part of the Data Analysis in R programming specialization. This course is designed to teach you how to use regression models for data analysis. We will be using the free software R® and the R® Foundation for modeling and analysis. We will also rely on a variety of free statistical packages to get you started. In this course you will get an introduction to regression modeling. We will cover the basics of regression, including logistic regression, one-variable regression, and multiple-variable regression. You will also learn how to use Excel for regression modeling. Regression models are a powerful tool for data analysis and often the entry point for new data analysis programs. You will also learn about how to use regression discontinuity and time-varying covariance models. These concepts will help you interpret data in and out of regression for appropriate modeling and interpretation. At the end of this course you will be able to: • Model your data using regression • Execute regression models for data analysis • Communicate results from your models effectively • Evaluate and interpret results from your models effectively • Construct a linear regression model using regression • Compute the model's fit using regression • Execute the fit for different types of predictors • Evaluate the fit for different types of predictors and timevarying covariance models • Evaluate the fit

#### Course Tag

Model Selection Generalized Linear Model Linear Regression Regression Analysis

#### Related Wiki Topic

Article Example
Regression analysis Regression models involve the following variables:
Nonparametric regression Nonparametric regression models always fits for larger data
Censored regression model These and other censored regression models are often confused with truncated regression models. Truncated regression models are used for data where whole observations are missing so that the values for the dependent and the independent variables are unknown. Censored regression models are used for data where only the value for the dependent variable (hours of work in the example above) is unknown while the values of the independent variables (age, education, family status) are still available.
Truncated regression model Truncated regression models are often confused with censored regression models where only the value of the dependent variable is clustered at a lower threshold, an upper threshold, or both, while the value for independent variables is available.
Linear regression The following are the major assumptions made by standard linear regression models with standard estimation techniques (e.g. ordinary least squares):
Logistic regression Discrimination in linear regression models is generally measured using R. Since this has no direct analog in logistic regression, various methods including the following can be used instead.
Errors-in-variables models In statistics, errors-in-variables models or measurement error models are regression models that account for measurement errors in the independent variables. In contrast, standard regression models assume that those regressors have been measured exactly, or observed without error; as such, those models account only for errors in the dependent variables, or responses.
Poisson regression Poisson regression models are generalized linear models with the logarithm as the (canonical) link function, and the Poisson distribution function as the assumed probability distribution of the response.
Regression dilution Hughes (1993) shows that the regression dilution ratio methods apply approximately in survival models. Rosner (1992) shows that the ratio methods apply approximately to logistic regression models. Carroll et al. (1995) give more detail on regression dilution in nonlinear models, presenting the regression dilution ratio methods as the simplest case of "regression calibration" methods, in which additional covariates may also be incorporated.
Truncated regression model Estimation of truncated regression models are usually done via parametric, semi-parametric and non-parametric maximum likelihood methods.
Unit-weighted regression Andreas Graefe applied an equal weighting approach to nine established multiple regression models for forecasting U.S. presidential elections. Across the ten elections from 1976 to 2012, equally weighted predictors reduced the forecast error of the original regression models on average by four percent. An equal-weights model that includes all variables provided well-calibrated forecasts that reduced the error of the most accurate regression model by 29% percent.
Semiparametric regression In statistics, semiparametric regression includes regression models that combine parametric and nonparametric models. They are often used in situations where the fully nonparametric model may not perform well or when the researcher wants to use a parametric model but the functional form with respect to a subset of the regressors or the density of the errors is not known. Semiparametric regression models are a particular type of semiparametric modelling and, since semiparametric models contain a parametric component, they rely on parametric assumptions and may be misspecified and inconsistent, just like a fully parametric model.
Vector generalized linear model and include 3 of the most important statistical regression models:
Regression dilution The case of multiple predictor variables (possibly correlated) subject to variability (possibly correlated) has been well-studied for linear regression, and for some non-linear regression models. Other non-linear models, such as proportional hazards models for survival analysis, have been considered only with a single predictor subject to variability.
Least-angle regression In statistics, least-angle regression (LARS) is an algorithm for fitting linear regression models to high-dimensional data, developed by Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani.
Generalized linear model GLMs with this setup are logistic regression models (or "logit models").
Backfitting algorithm Additive models are a class of non-parametric regression models of the form:
History of network traffic models Regression models define explicitly the next random variable in the sequence by previous ones within a specified time window and a moving average of a white noise.[5]
Quantitative structure–activity relationship Quantitative structure–activity relationship models (QSAR models) are regression or classification models used in the chemical and biological sciences and engineering. Like other regression models, QSAR regression models relate a set of "predictor" variables (X) to the potency of the response variable (Y), while classification QSAR models relate the predictor variables to a categorical value of the response variable.
Local regression LOESS and LOWESS (locally weighted scatterplot smoothing) are two strongly related non-parametric regression methods that combine multiple regression models in a "k"-nearest-neighbor-based meta-model. "LOESS" is a later generalization of LOWESS; although it is not a true initialism, it may be understood as standing for "LOcal regrESSion".