Reproducible Research

Start Date: 07/05/2020

Course Type: Common Course

Course Link:

About Course

This course focuses on the concepts and tools behind reporting modern data analyses in a reproducible manner. Reproducible research is the idea that data analyses, and more generally, scientific claims, are published with their data and software code so that others may verify the findings and build upon them. The need for reproducibility is increasing dramatically as data analyses become more complex, involving larger datasets and more sophisticated computations. Reproducibility allows for people to focus on the actual content of a data analysis, rather than on superficial details reported in a written summary. In addition, reproducibility makes an analysis more useful to others because the data and code that actually conducted the analysis are available. This course will focus on literate statistical analysis tools which allow one to publish data analyses in a single document that allows others to easily execute the same analysis to obtain the same results.

Course Syllabus

This week will cover the basic ideas of reproducible research since they may be unfamiliar to some of you. We also cover structuring and organizing a data analysis to help make it more reproducible. I recommend that you watch the videos in the order that they are listed on the web page, but watching the videos out of order isn't going to ruin the story.

Coursera Plus banner featuring three learners and university partner logos

Course Introduction

Reproducible Research Reproducible research means reproducibility of research forms. Research is based on reproducibility, or the assumption that research results are valid. In this course, we look at three types of reproducibility assumptions: quantitative, qualitative, and cultural. We will apply the principles of reproducibility in research: (1) quantitative research, (2) qualitative research, and (3) cultural research. We will look at reproducibility in the fields of biology, psychology, computer science, and computer engineering. All of this is done with the goal of making reproducible research a reality for everyone. By the end of this course, you will be able to: • Understand reproducibility as a conceptual framework • Describe the three types of reproducibility assumptions • Explain the principles of reproducibility in research • Show how the three types of reproducibility assumptions are met • Explain the differences in the paradigms of qualitative and quantitative research • How cultural and quantitative research fit together • How to identify the nature of the data used in research. All of this will position you to be an expert on reproducibility in research. Reproducibility is important because it provides an affordable and scalable standard for data analysis and interpretation. In this course, you will learn how to use reproducibility tools to make data analyses and interpretation more efficient. You will also get a

Course Tag

Knitr Data Analysis R Programming Markup Language

Related Wiki Topic

Article Example
Biometrical Journal Biometrical Journal covers statistical methods and their applications in life sciences including medicine, environmental sciences and agriculture. Typical articles contain both, the development of methodology and its application. At present, articles are accompanied on the publisher's web site by computer code and illustrative data sets for the sake of reproducible research. The code is checked by an appointed Reproducible Research Editor before it is published as supplementary material.
Data sharing Some research organizations feel particularly strongly about data sharing. Stanford University's WaveLab has a philosophy about reproducible research and disclosing all algorithms and source code necessary to reproduce the research. In a paper titled "WaveLab and Reproducible Research," the authors describe some of the problems they encountered in trying to reproduce their own research after a period of time. In many cases, it was so difficult they gave up the effort. These experiences are what convinced them of the importance of disclosing source code. The philosophy is described:
Reproducibility Reproducible research is key to new discoveries in pharmacology. A Phase I discovery will be followed by Phase II reproductions as a drug develops towards commercial production. In recent decades Phase II success has fallen from 28% to 18%. A 2011 study found that 65% of medical studies were inconsistent when re-tested, and only 6% were completely reproducible.
Reproducibility The term "reproducible research" refers to the idea that the ultimate product of academic research is the paper along with the laboratory notebooks and full computational environment used to produce the results in the paper such as the code, data, etc. that can be used to reproduce the results and create new work based on the research.
Knitr knitr is an engine for dynamic report generation with R. It is a package in the statistical programming language R that enables integration of R code into LaTeX, LyX, HTML, Markdown, AsciiDoc, and reStructuredText documents. The purpose of knitr is to allow reproducible research in R through the means of Literate Programming. It is licensed under the GNU General Public License.
Yihui Xie Yihui Xie created the animation package in R which allows animationin graphics through R. He then authored the knitr package which makes reproducible research available from R. Since 2013, he has been working with RStudio, the makers of the RStudio IDE for the R programming language.
Computational science The complexity of computational methods is a threat to the reproducibility of research. Jon Claerbout has become prominent for pointing out that "reproducible research" requires archiving and documenting all raw data and all code used to obtain a result. Nick Barnes, in the "Science Code Manifesto", proposed five principles that should be followed when software is used in open science publication. Tomi Kauppinen et al. established and defined "Linked Open Science", an approach to interconnect scientific assets to enable transparent, reproducible and transdisciplinary research.
Persona (user experience) In terms of scientific logic, it has been argued that because personas are fictional, they have no clear relationship to real customer data and therefore cannot be considered scientific. Chapman & Milham described the purported flaws in considering personas as a scientific research method. They argued that there is no procedure to work reliably from given data to specific personas, and thus such a process is not subject to the scientific method of reproducible research.
Reproducibility Project One earlier study found that around $28 billion worth of research per year in medical fields is non-reproducible.
Flow cytometry bioinformatics GenePattern is a predominantly genomic analysis platform with over 200 tools for analysis of gene expression, proteomics, and other data. A web-based interface provides easy access to these tools and allows the creation of automated analysis pipelines enabling reproducible research. Recently, a GenePattern Flow Cytometry Suite has been developed in order to bring advanced flow cytometry data analysis tools to experimentalists without programmatic skills. It contains close to 40 open source GenePattern flow cytometry modules covering methods from basic processing of flow cytometry standard (i.e., FCS) files to advanced algorithms for automated identification of cell populations, normalization and quality assessment. Internally, most of these modules leverage functionality developed in BioConductor.
Insight Journal The "Insight Journal" is based on the concept of open science and provides full access to scientific material under a license allowing readers to create derivative works, the Creative Commons "by-attribution" license. the journal uses open peer review and allows any reader to volunteer as reviewer, and requiring reviewers to make their reviews public. Readers can rate reviewers in a similar way that online retailers such as self-regulate their evaluations. The journal is inspired by the concept of Reproducibility and by the Reproducible Research initiative championed by Jon Claerbout.
Research There are two major types of empirical research design: qualitative research and quantitative research. Researchers choose qualitative or quantitative methods according to the nature of the research topic they want to investigate and the research questions they aim to answer:
Research Artistic research, also seen as 'practice-based research', can take form when creative works are considered both the research and the object of research itself. It is the debatable body of thought which offers an alternative to purely scientific methods in research in its search for knowledge and truth.
Research Research is often conducted using the hourglass model structure of research. The hourglass model starts with a broad spectrum for research, focusing in on the required information through the method of the project (like the neck of the hourglass), then expands the research in the form of discussion and results. The major steps in conducting research are:
Research The Society for Artistic Research (SAR) publishes the triannual "Journal for Artistic Research" (JAR), an international, online, open access, and peer-reviewed journal for the identification, publication, and dissemination of artistic research and its methodologies, from all arts disciplines and it runs the "Research Catalogue" (RC), a searchable, documentary database of artistic research, to which anyone can contribute.
Research According to artist Hakan Topal, in artistic research, "perhaps more so than other disciplines, intuition is utilized as a method to identify a wide range of new and unexpected productive modalities". Most writers, whether of fiction or non-fiction books, also have to do research to support their creative work. This may be factual, historical, or background research. Background research could include, for example, geographical or procedural research.
Research Original research is research that is not exclusively based on a summary, review or synthesis of earlier publications on the subject of research. This material is of a primary source character. The purpose of the original research is to produce new knowledge, rather than to present the existing knowledge in a new form ("e.g.", summarized or classified).
Research In either qualitative or quantitative research, the researcher(s) may collect primary or secondary data. Primary data is data collected specifically for the research, such as through interviews or questionnaires. Secondary data is data that already exists, such as census data, which can be re-used for the research. It is good ethical research practice to use secondary data wherever possible.
Research The steps generally represent the overall process; however, they should be viewed as an ever-changing iterative process rather than a fixed set of steps. Most research begins with a general statement of the problem, or rather, the purpose for engaging in the study. The literature review identifies flaws or holes in previous research which provides justification for the study. Often, a literature review is conducted in a given subject area before a research question is identified. A gap in the current literature, as identified by a researcher, then engenders a research question. The research question may be parallel to the hypothesis. The hypothesis is the supposition to be tested. The researcher(s) collects data to test the hypothesis. The researcher(s) then analyzes and interprets the data via a variety of statistical methods, engaging in what is known as empirical research. The results of the data analysis in rejecting or failing to reject the null hypothesis are then reported and evaluated. At the end, the researcher may discuss avenues for further research. However, some researchers advocate for the reverse approach: starting with articulating findings and discussion of them, moving "up" to identification of a research problem that emerges in the findings and literature review. The reverse approach is justified by the transactional nature of the research endeavor where research inquiry, research questions, research method, relevant research literature, and so on are not fully known until the findings have fully emerged and been interpreted.
Research Most funding for scientific research comes from three major sources: corporate research and development departments; private foundations, for example, the Bill and Melinda Gates Foundation; and government research councils such as the National Institutes of Health in the USA and the Medical Research Council in the UK. These are managed primarily through universities and in some cases through military contractors. Many senior researchers (such as group leaders) spend a significant amount of their time applying for grants for research funds. These grants are necessary not only for researchers to carry out their research but also as a source of merit.