Genomic Data Science with Galaxy

Start Date: 07/05/2020

Course Type: Common Course

Course Link:

About Course

Learn to use the tools that are available from the Galaxy Project. This is the second course in the Genomic Big Data Science Specialization.

Coursera Plus banner featuring three learners and university partner logos

Course Introduction

Genomic Data Science with Galaxy This course is all about data science and genomics, and in this course you will get an introduction to the Science of the Genomic Revolution. We will learn about the techniques used to perform genome-wide SNP data analyses, the techniques used to perform RNA-seq data analyses, and the methods used to perform microarrays based SNP-sequencing analyses. We will also describe the data science and engineering challenges that accompany the analysis of genomic data with large SNP datasets. After completing this course, you will be able to: 1. Describe the Science behind the Data Science Tools 2. Leverage all the tools offered by the scientific community 3. Overcome some of the pre-reqs to get started with data science 4. Dig into the concepts and algorithms involved in data science 5. Explain the data science involved in RNA-seq and SNP-sequencing 6. Explore the different types of analyses performed 7. How data scientists approach the problem of genome-wide SNP data Galaxy is a free, curated app for Android phones that enables anyone with a smartphone to access genomic data. It offers access to genetic information in a way that is easy for anyone to follow: just copy and paste the URLs into Google Earth and on the other hand it provides access to the data in a way that is safe for all. In this course, we will use Galaxy as a reference for the tasks that we will need to complete to access the data

Course Tag

Bioinformatics Data Analysis Genome Genomics

Related Wiki Topic

Article Example
Galaxy (computational biology) Galaxy is "an open, web-based platform for performing accessible, reproducible, and transparent genomic science."
Compression of Genomic Re-Sequencing Data High-throughput sequencing technologies have led to a dramatic decline of genome sequencing costs and to an astonishingly rapid accumulation of genomic data. These technologies are enabling ambitious genome sequencing endeavours, such as the 1000 Genomes Project and 1001 ("Arabidopsis thaliana") Genomes Project. The storage and transfer of the tremendous amount of genomic data have become a mainstream problem, motivating the development of high-performance compression tools designed specifically for genomic data. A recent surge of interest in the development of novel algorithms and tools for storing and managing genomic re-sequencing data emphasizes the growing demand for efficient methods for genomic data compression.
Data science Data science is a "concept to unify statistics, data analysis and their related methods" in order to "understand and analyze actual phenomena" with data.
Data science he initiated the modern, non-computer science, usage of the term "data science" and advocated that statistics be renamed data science and statisticians data scientists.
The Genomic HyperBrowser The Genomic HyperBrowser is a web-based system for statistical analysis of genomic annotation data.
Data science The term "data science" (originally used interchangeably with "datalogy") has existed for over thirty years and was used initially as a substitute for computer science by Peter Naur in 1960. In 1974, Naur published "Concise Survey of Computer Methods", which freely used the term data science in its survey of the contemporary data processing methods that are used in a wide range of applications.
Data science In 2013, the IEEE Task Force on Data Science and Advanced Analytics was launched, and the first international conference: IEEE International Conference on Data Science and Advanced Analytics was launched in 2014. In 2014, the American Statistical Association section on Statistical Learning and Data Mining renamed its journal to "Statistical Analysis and Data Mining: The ASA Data Science Journal" and in 2016 changed its section name to "Statistical Learning and Data Science". In 2015, the International Journal on Data Science and Analytics was launched by Springer to publish original work on data science and big data analytics. 2013 the first "European Conference on Data Analysis (ECDA)" was organised in Luxembourg establishing the European Association for Data Science (EuADS) in August 2015. In September 2015 the Gesellschaft für Klassifikation (GfKl) added to the name of the Society "Data Science Society" at the third ECDA conference at the University of Essex, Colchester, UK.
Galaxy Science Fiction Novels Galaxy novels, sometimes titled Galaxy Science Fiction Novels, were a series of mostly reprint American science fiction novels published between 1950 and 1961.
Data science In 2001, William S. Cleveland introduced data science as an independent discipline, extending the field of statistics to incorporate "advances in computing with data" in his article "Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics," which was published in Volume 69, No. 1, of the April 2001 edition of the International Statistical Review / Revue Internationale de Statistique. In his report, Cleveland establishes six technical areas which he believed to encompass the field of data science: multidisciplinary investigations, models and methods for data, computing with data, pedagogy, tool evaluation, and theory.
Data science Turing award winner Jim Gray imagined data science as a "fourth paradigm" of science (empirical, theoretical, computational and now data-driven) and asserted that "everything about science is changing because of the impact of information technology" and the data deluge.
Genomic and Medical Data Beacon Project: Beacon Project is an open web service that tests the willingness of international sites to share genetic data. It is being implemented on the websites of the world's top genomic research organizations.
Data science "Data Scientist" has become a popular occupation with Harvard Business Review dubbing it "The Sexiest Job of the 21st Century" and McKinsey & Company projecting a global excess demand of 1.5 million new data scientists. Universities are offering masters courses in data science. Shorter private bootcamps are also offering data science certificates including student-paid programs like General Assembly to employer-paid programs like The Data Incubator.
Data science Data science, also known as data-driven science, is an interdisciplinary field about scientific methods, processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured, similar to Knowledge Discovery in Databases (KDD).
Genomic counseling Genomic counseling is the process by which a person gets informed about his or her genome. In contrast to genetic counseling, which focuses on Mendelian diseases and typically involves person-to-person communication with a medical genetics expert, genomic counseling is not limited to currently clinically relevant information and includes other genomic information that is of interest for the informed person, such as increased risk for complex disease (for example diabetes or obesity), genetically determined non-disease related traits (for example baldness), or genetic genealogy data. Given the less sensitive nature of this information, genomic advice can be given impersonally, for example over the internet (virtual genomic counseling).
Data science In April 2002, the International Council for Science: Committee on Data for Science and Technology (CODATA) started the "Data Science Journal", a publication focused on issues such as the description of data systems, their publication on the internet, applications and legal issues. Shortly thereafter, in January 2003, Columbia University began publishing "The Journal of Data Science", which provided a platform for all data workers to present their views and exchange ideas. The journal was largely devoted to the application of statistical methods and quantitative research. In 2005, The National Science Board published "Long-lived Digital Data Collections: Enabling Research and Education in the 21st Century" defining data scientists as "the information and computer scientists, database and software and programmers, disciplinary experts, curators and expert annotators, librarians, archivists, and others, who are crucial to the successful management of a digital data collection" whose primary activity is to "conduct creative inquiry and analysis."
Data science Although use of the term "data science" has exploded in business environments, many academics and journalists see no distinction between data science and statistics. Writing in Forbes, Gil Press argues that data science is a buzzword without a clear definition and has simply replaced “business analytics” in contexts such as graduate degree programs. In the question-and-answer section of his keynote address at the Joint Statistical Meetings of American Statistical Association, noted applied statistician Nate Silver said, “I think data-scientist is a sexed up term for a statistician...Statistics is a branch of science. Data scientist is slightly redundant in some way and people shouldn’t berate the term statistician.”
ACE (genomic file format) The ACE file format is a specification for storing data about genomic contigs.
The Science Fiction Galaxy The Science Fiction Galaxy is an anthology of science fiction short stories edited by Groff Conklin. It was first published in hardcover by Permabooks in 1950.
Galaxy Science Fiction Science fiction historian and critic Mike Ashley regards "Galaxy"'s success as the main reason for the subsequent boom in science fiction magazines, commenting that it "revolutionized the field overnight". Under Gold "Galaxy" provided a market for social science fiction stories that might not have been accepted by "Astounding" and "Fantasy & Science Fiction", the other leading magazines. Pohl regards "Galaxy" as the place where "the stunning new kinds of science fiction ... flowered, and changed everything in science fiction". In his opinion, Gold's innovation was to ask writers to consider not just new technology, but the subsequent impact of that technology on society. He adds, "What "Galaxy" brought to magazine science fiction was a kind of sophisticated intellectual subtlety. ... After "Galaxy" it was impossible to go on being naive." Science fiction author Brian Stableford argues that "Galaxy" quickly usurped "Astounding"s position as "pioneer of hardcore sf's progress" because it "embraced and gleefully pursued a new series of challenges to moral orthodoxy." Isaac Asimov, in his memoirs, recalled being deeply impressed by the first issue, adding that many fans, including himself, felt that "Galaxy" became the field's leader almost immediately. In critic John Clute's assessment, "Galaxy" indeed swiftly supplanted "Astounding" and remained the leading magazine in the field until Pohl resigned as editor in 1969.
Compression of Genomic Re-Sequencing Data While standard data compression tools (e.g., zip and rar) are being used to compress sequence data (e.g., GenBank flat files), this approach has been criticized to be extravagant because genomic sequences often contain repetitive content (e.g., microsatellite sequences) or many sequences exhibit high levels of similarity (e.g., multiple genome sequences from the same species). Additionally, the statistical and information-theoretic properties of genomic sequences can potentially be exploited for compressing sequencing data.