Reproducible Research

Start Date: 11/05/2018

Course Type: Common Course

About Course

This course focuses on the concepts and tools behind reporting modern data analyses in a reproducible manner. Reproducible research is the idea that data analyses, and more generally, scientific claims, are published with their data and software code so that others may verify the findings and build upon them. The need for reproducibility is increasing dramatically as data analyses become more complex, involving larger datasets and more sophisticated computations. Reproducibility allows for people to focus on the actual content of a data analysis, rather than on superficial details reported in a written summary. In addition, reproducibility makes an analysis more useful to others because the data and code that actually conducted the analysis are available. This course will focus on literate statistical analysis tools which allow one to publish data analyses in a single document that allows others to easily execute the same analysis to obtain the same results.

Course Syllabus

This week will cover the basic ideas of reproducible research since they may be unfamiliar to some of you. We also cover structuring and organizing a data analysis to help make it more reproducible. I recommend that you watch the videos in the order that they are listed on the web page, but watching the videos out of order isn't going to ruin the story.

Biometrical Journal Biometrical Journal covers statistical methods and their applications in life sciences including medicine, environmental sciences and agriculture. Typical articles contain both, the development of methodology and its application. At present, articles are accompanied on the publisher's web site by computer code and illustrative data sets for the sake of reproducible research. The code is checked by an appointed Reproducible Research Editor before it is published as supplementary material.
Data sharing Some research organizations feel particularly strongly about data sharing. Stanford University's WaveLab has a philosophy about reproducible research and disclosing all algorithms and source code necessary to reproduce the research. In a paper titled "WaveLab and Reproducible Research," the authors describe some of the problems they encountered in trying to reproduce their own research after a period of time. In many cases, it was so difficult they gave up the effort. These experiences are what convinced them of the importance of disclosing source code. The philosophy is described:
Reproducibility Reproducible research is key to new discoveries in pharmacology. A Phase I discovery will be followed by Phase II reproductions as a drug develops towards commercial production. In recent decades Phase II success has fallen from 28% to 18%. A 2011 study found that 65% of medical studies were inconsistent when re-tested, and only 6% were completely reproducible.
Reproducibility The term "reproducible research" refers to the idea that the ultimate product of academic research is the paper along with the laboratory notebooks and full computational environment used to produce the results in the paper such as the code, data, etc. that can be used to reproduce the results and create new work based on the research.
Knitr knitr is an engine for dynamic report generation with R. It is a package in the statistical programming language R that enables integration of R code into LaTeX, LyX, HTML, Markdown, AsciiDoc, and reStructuredText documents. The purpose of knitr is to allow reproducible research in R through the means of Literate Programming. It is licensed under the GNU General Public License.
Yihui Xie Yihui Xie created the animation package in R which allows animationin graphics through R. He then authored the knitr package which makes reproducible research available from R. Since 2013, he has been working with RStudio, the makers of the RStudio IDE for the R programming language.
Computational science The complexity of computational methods is a threat to the reproducibility of research. Jon Claerbout has become prominent for pointing out that "reproducible research" requires archiving and documenting all raw data and all code used to obtain a result. Nick Barnes, in the "Science Code Manifesto", proposed five principles that should be followed when software is used in open science publication. Tomi Kauppinen et al. established and defined "Linked Open Science", an approach to interconnect scientific assets to enable transparent, reproducible and transdisciplinary research.
Persona (user experience) In terms of scientific logic, it has been argued that because personas are fictional, they have no clear relationship to real customer data and therefore cannot be considered scientific. Chapman & Milham described the purported flaws in considering personas as a scientific research method. They argued that there is no procedure to work reliably from given data to specific personas, and thus such a process is not subject to the scientific method of reproducible research.
Reproducibility Project One earlier study found that around $28 billion worth of research per year in medical fields is non-reproducible.
Flow cytometry bioinformatics GenePattern is a predominantly genomic analysis platform with over 200 tools for analysis of gene expression, proteomics, and other data. A web-based interface provides easy access to these tools and allows the creation of automated analysis pipelines enabling reproducible research. Recently, a GenePattern Flow Cytometry Suite has been developed in order to bring advanced flow cytometry data analysis tools to experimentalists without programmatic skills. It contains close to 40 open source GenePattern flow cytometry modules covering methods from basic processing of flow cytometry standard (i.e., FCS) files to advanced algorithms for automated identification of cell populations, normalization and quality assessment. Internally, most of these modules leverage functionality developed in BioConductor.
Insight Journal The "Insight Journal" is based on the concept of open science and provides full access to scientific material under a license allowing readers to create derivative works, the Creative Commons "by-attribution" license. the journal uses open peer review and allows any reader to volunteer as reviewer, and requiring reviewers to make their reviews public. Readers can rate reviewers in a similar way that online retailers such as self-regulate their evaluations. The journal is inspired by the concept of Reproducibility and by the Reproducible Research initiative championed by Jon Claerbout.
