Introduction to Probability and Data

Start Date: 07/05/2020

Course Type: Common Course

Course Link:

About Course

This course introduces you to sampling and exploring data, as well as basic probability theory and Bayes' rule. You will examine various types of sampling methods, and discuss how such methods can impact the scope of inference. A variety of exploratory data analysis techniques will be covered, including numeric summary statistics and basic data visualization. You will be guided through installing and using R and RStudio (free statistical software), and will use this software for lab exercises and a final project. The concepts and techniques in this course will serve as building blocks for the inference and modeling courses in the Specialization.

Course Syllabus

This course introduces you to sampling and exploring data, as well as basic probability theory. You will examine various types of sampling methods and discuss how such methods can impact the utility of a data analysis. The concepts in this module will serve as building blocks for our later courses.

Each lesson comes with a set of learning objectives that will be covered in a series of short videos. Supplementary readings and practice problems will also be suggested from OpenIntro Statistics, 3rd Edition (a free online introductory statistics textbook, that I co-authored). There will be weekly quizzes designed to assess your learning and mastery of the material covered that week in the videos. In addition, each week will also feature a lab assignment, in which you will use R to apply what you are learning to real data. There will also be a data analysis project designed to enable you to answer research questions of your own choosing.

Since this is a Coursera course, you are welcome to participate as much or as little as you’d like, though I hope that you will begin by participating fully. One of the most rewarding aspects of a Coursera course is participation in forum discussions about the course materials. Please take advantage of other students' feedback and insight and contribute your own perspective where you see fit to do so. You can also check out the resource page listing useful resources for this course.

Thank you for joining the Introduction to Probability and Data community! Say hello in the Discussion Forums. We are looking forward to your participation in the course.

Coursera Plus banner featuring three learners and university partner logos

Course Introduction

Introduction to Probability and Data This course introduces the basic concepts of probability and data, including the definition of sampling and normal distribution. You will learn about the types of sampling and how to compute normal distribution. You will also learn the basic concepts of probability and how to compute confidence intervals. You will then use these concepts to design a model or decision tree. You will implement a series of simple experiments and analyze your model. This course is designed to help you build practical experience using probability and data in scientific study and modeling. At the end of this course, you will be able to: 1. Describe the concepts of sampling and normal distribution 2. Compute confidence intervals 3. Interpret data from normal distribution 4. Use probability to model and evaluate models 5. Apply the normal distribution to find linear trend 6. Use the sampling method to compute confidence intervals 7. Compute logarithms This course is part of the iMBA offered by the University of Illinois, a flexible, fully-accredited online MBA at an incredibly competitive price. For more information, please see the Resource page in this course and 1: Sizing Up on Data Module 2: Normal Distribution Module 3: Compute Sampling Module 4: Pick the One Introduction to Probability and Data This course introduces the basic concepts of probability and data,

Course Tag

Statistics R Programming Rstudio Exploratory Data Analysis

Related Wiki Topic

Article Example
Coverage probability The "probability" in "coverage probability" is interpreted with respect to a set of hypothetical repetitions of the entire data collection and analysis procedure. In these hypothetical repetitions, independent data sets following the same probability distribution as the actual data are considered, and a confidence interval is computed from each of these data sets; see Neyman construction.
Bayesian probability Bayesian probability belongs to the category of evidential probabilities; to evaluate the probability of a hypothesis, the Bayesian probabilist specifies some prior probability, which is then updated to a posterior probability in the light of new, relevant data (evidence). The Bayesian interpretation provides a standard set of procedures and formulae to perform this calculation.
Inverse probability Today, the problem of determining an unobserved variable (by whatever method) is called inferential statistics, the method of inverse probability (assigning a probability distribution to an unobserved variable) is called Bayesian probability, the "distribution" of data given the unobserved variable is rather the likelihood function (which is not a probability distribution), and the distribution of an unobserved variable, given both data and a prior distribution, is the posterior distribution. The development of the field and terminology from "inverse probability" to "Bayesian probability" is described by Fienberg (2006).
Probability There have been at least two successful attempts to formalize probability, namely the Kolmogorov formulation and the Cox formulation. In Kolmogorov's formulation (see probability space), sets are interpreted as events and probability itself as a measure on a class of sets. In Cox's theorem, probability is taken as a primitive (that is, not further analyzed) and the emphasis is on constructing a consistent assignment of probability values to propositions. In both cases, the laws of probability are the same, except for technical details.
Algorithmic probability Algorithmic probability is closely related to the concept of Kolmogorov complexity. Kolmogorov's introduction of complexity was motivated by information theory and problems in randomness, while Solomonoff introduced algorithmic complexity for a different reason: inductive reasoning. A single universal prior probability that can be substituted for each actual prior probability in Bayes’s rule was invented by Solomonoff with Kolmogorov complexity as a side product.
Probability The discovery of rigorous methods to assess and combine probability assessments has changed society. It is important for most citizens to understand how probability assessments are made, and how they contribute to decisions.
Probability and statistics Probability and Statistics or also called Statistics and Probability are two related but separate academic disciplines. Statistical analysis often uses probability distributions, and the two topics are often studied together. However, probability theory contains much that is mostly of mathematical interest and not directly relevant to statistics. Moreover, many topics in statistics are independent of probability theory.
Inductive probability In considering some data as a string of bits the prior probabilities for a sequence of 1s and 0s, the probability of 1 and 0 is equal. Therefore, each extra bit halves the probability of a sequence of bits.
Probability Probability is the measure of the likelihood that an event will occur. Probability is quantified as a number between 0 and 1 (where 0 indicates impossibility and 1 indicates certainty). The higher the probability of an event, the more certain that the event will occur. A simple example is the tossing of a fair (unbiased) coin. Since the coin is unbiased, the two outcomes ("head" and "tail") are both equally probable; the probability of "head" equals the probability of "tail". Since no other outcomes are possible, the probability is 1/2 (or 50%), of either "head" or "tail". In other words, the probability of "head" is 1 out of 2 outcomes and the probability of "tail" is also 1 out of 2 outcomes, expressed as 0.5 when converted to decimal, with the above-mentioned quantification system. This type of probability is also called a priori probability.
Probability distribution fitting Probability distribution fitting or simply distribution fitting is the fitting of a probability distribution to a series of data concerning the repeated measurement of a variable phenomenon.
Posterior probability Posterior probability is a conditional probability conditioned on randomly observed data. Hence it is a random variable. For a random variable, it is important to summarize its amount of uncertainty. One way to achieve this goal is to provide a credible interval of the posterior probability.
History of probability Probability is distinguished from statistics. (See history of statistics). While statistics deals with data and inferences from it, (stochastic) probability deals with the stochastic (random) processes which lie behind data or outcomes.
Data dredging Ultimately, the statistical significance of a test and the statistical confidence of a finding are joint properties of data and the method used to examine the data. Thus, if someone says that a certain event has probability of 20% ± 2% 19 times out of 20, this means that if the probability of the event is estimated "by the same method" used to obtain the 20% estimate, the result is between 18% and 22% with probability 0.95. No claim of statistical significance can be made by only looking, without due regard to the method used to assess the data.
Inductive probability But in reality each individual does not have the same information. And in general the probability of each outcome is not equal. The dice may be loaded, and this loading needs to be inferred from the data.
Probability The probability of an event "A" is written as formula_4, formula_5, or formula_6. This mathematical definition of probability can extend to infinite sample spaces, and even uncountable sample spaces, using the concept of a measure.
Probability space A probability space formula_70 is said to be a complete probability space if for all formula_71 with formula_72 and all formula_73 one has formula_74. Often, the study of probability spaces is restricted to complete probability spaces.
Probability Conditional probability is written formula_18, and is read "the probability of "A", given "B"". It is defined by
Applied probability Applied probability is the application of probability theory to statistical problems and other scientific and engineering domains.
Probability theory Most introductions to probability theory treat discrete probability distributions and continuous probability distributions separately. The more mathematically advanced measure theory-based treatment of probability covers the discrete, continuous, a mix of the two, and more.
Inverse probability weighting Inverse probability weighting is also used to account for missing data when subjects with missing data cannot be included in the primary analysis.