Genome Sequencing (Bioinformatics II)

Start Date: 07/05/2020

Course Type: Common Course

Course Link:

About Course

You may have heard a lot about genome sequencing and its potential to usher in an era of personalized medicine, but what does it mean to sequence a genome? Biologists still cannot read the nucleotides of an entire genome as you would read a book from beginning to end. However, they can read short pieces of DNA. In this course, we will see how graph theory can be used to assemble genomes from these short pieces. We will further learn about brute force algorithms and apply them to sequencing mini-proteins called antibiotics. In the first half of the course, we will see that biologists cannot read the 3 billion nucleotides of a human genome as you would read a book from beginning to end. However, they can read shorter fragments of DNA. In this course, we will see how graph theory can be used to assemble genomes from these short pieces in what amounts to the largest jigsaw puzzle ever put together. In the second half of the course, we will discuss antibiotics, a topic of great relevance as antimicrobial-resistant bacteria like MRSA are on the rise. You know antibiotics as drugs, but on the molecular level they are short mini-proteins that have been engineered by bacteria to kill their enemies. Determining the sequence of amino acids making up one of these antibiotics is an important research problem, and one that is similar to that of sequencing a genome by assembling tiny fragments of DNA. We will see how brute force algorithms that try every possible solution are able to identify naturally occurring antibiotics so that they can be synthesized in a lab. Finally, you will learn how to apply popular bioinformatics software tools to sequence the genome of a deadly Staphylococcus bacterium that has acquired antibiotics resistance.

Course Syllabus

Welcome to class!

This course will focus on two questions at the forefront of modern computational biology, along with the algorithmic approaches we will use to solve them in parentheses:

  1. Weeks 1-2: How Do We Assemble Genomes? (Graph Algorithms)
  2. How Do We Sequence Antibiotics? (Brute Force Algorithms)

Each of the two chapters of content in the class is accompanied by a Bioinformatics Cartoon created by talented San Diego artist Randall Christopher and serving as a chapter header in the Specialization's bestselling print companion. You can find the first chapter's cartoon at the bottom of this message. What does a time machine trip to 1735, a stack of newspapers, a jigsaw puzzle, and a giant ant invading a riverside city have to do with putting together a genome? Start learning today to find out!

Coursera Plus banner featuring three learners and university partner logos

Course Introduction

Genome Sequencing (Bioinformatics II) Genome sequencing is the application of computing power to analyze individual nucleotides in a genome, which are more precisely organized as genes. The main goal of this course is to introduce you to the main algorithms for sequence alignment and annotation of genes in the genome of a bacteria or a human. In the first part of the course, we will focus on the basic algorithms for aligning genes to molecules in the genome, which can be used to infer the function of these genes. We will use the BLAST algorithm to search the genome of a bacterium or other organism and generate short reads that tell the story of the DNA sequence of the bacteria living in that organism. You will learn the essential steps for implementing the tasks in the algorithm and the steps that must be taken to run it. You will be guided through building a database of chloroplasts or genes, which are small molecules that act as genetic information machines. You will learn how to use this information to make predictions of the function of the bacterial genes, which will be used to infer the function of the bacteria from the short reads in the genome. In the second part of the course, we will focus on the more advanced algorithms for annotation of genes in the genome of a bacterium or other organism. We will use the BLAST algorithm to search the genome of a bacterium and generate short reads that tell the story of the DNA sequence of the bacteria living in that organism. You will learn how to use this

Course Tag

Algorithms Python Programming Whole Genome Sequencing Dynamic Programming

Related Wiki Topic

Article Example
Whole genome sequencing Whole genome sequencing (also known as WGS, full genome sequencing, complete genome sequencing, or entire genome sequencing) is the process of determining the complete DNA sequence of an organism's genome at a single time. This entails sequencing all of an organism's chromosomal DNA as well as DNA contained in the mitochondria and, for plants, in the chloroplast.
Cancer genome sequencing Historically, cancer genome sequencing efforts has been divided between transcriptome-based sequencing projects and DNA-centered efforts.
Cancer genome sequencing Unlike whole genome (WG) sequencing which is typically from blood cells, such as J. Craig Venter's and James D. Watson’s WG sequencing projects, saliva, epithelial cells or bone - cancer genome sequencing involves direct sequencing of primary tumor tissue, adjacent or distal normal tissue, the tumor micro environment such as fibroblast/stromal cells, or metastatic tumor sites.
Cancer genome sequencing Cancer genome sequencing is the whole genome sequencing of a single, homogeneous or heterogeneous group of cancer cells. It is a biochemical laboratory method for the characterization and identification of the DNA or RNA sequences of cancer cell(s).
Whole genome sequencing While capillary sequencing was the first approach to successfully sequence a nearly full human genome, it is still too expensive and takes too long for commercial purposes. Since 2005 capillary sequencing has been progressively displaced by high-throughput (formerly "next-generation") sequencing technologies such as Illumina dye sequencing, pyrosequencing, and SMRT sequencing. All of these technologies continue to employ the basic shotgun strategy, namely, parallelization and template generation via genome fragmentation.
Whole genome sequencing Whole genome sequencing should not be confused with DNA profiling, which only determines the likelihood that genetic material came from a particular individual or group, and does not contain additional information on genetic relationships, origin or susceptibility to specific diseases. In addition, whole genome sequencing should not be confused with methods that sequence specific subsets of the genome - such methods include whole exome sequencing (1% of the genome) or SNP genotyping (<0.1% of the genome). Almost all truly complete genomes are of microbes; the term "full genome" is thus sometimes used loosely to mean "greater than 95%". The remainder of this article focuses on nearly complete human genomes.
Whole genome sequencing A commonly-referenced commercial target for sequencing cost is the $1,000 genome.
Whole genome sequencing In June 2009, Illumina announced that they were launching their own Personal Full Genome Sequencing Service at a depth of 30× for $48,000 per genome.
Whole genome sequencing In May 2011, Illumina lowered its Full Genome Sequencing service to $5,000 per human genome, or $4,000 if ordering 50 or more.
Whole genome sequencing Full genome sequencing provides information on a genome that is orders of magnitude larger than by DNA arrays, the previous leader in genotyping technology.
Bioinformatics a bioinformatics tool BPGA can be used to characterize the Pan Genome of bacterial species.
Whole genome sequencing The first bacterial and archaeal genomes, including that of "H. influenzae", were sequenced by Shotgun sequencing. In 1996 the first eukaryotic genome ("Saccharomyces cerevisiae") was sequenced. "S. cerevisiae", a model organism in biology has a genome of only around 12 million nucleotide pairs, and was the first "unicellular" eukaryote to have its whole genome sequenced. The first "multicellular" eukaryote, and animal, to have its whole genome sequenced was the nematode worm: "Caenorhabditis elegans" in 1998. Eukaryotic genomes are sequenced by several methods including Shotgun sequencing of short DNA fragments and sequencing of larger DNA clones from DNA libraries such as bacterial artificial chromosomes (BACs) and yeast artificial chromosomes (YACs).
Cancer genome sequencing Cancer genome sequencing utilizes the same technology involved in whole genome sequencing. The history of sequencing has come a long way, originating in 1977 by two independent groups - Fredrick Sanger’s enzymatic didoxy DNA sequencing technique and the Allen Maxam and Walter Gilbert chemical degradation technique. Following these landmark papers, over 20 years later ‘Second Generation’ high-throughput next generation sequencing (HT-NGS) was born followed by ‘Third Generation HT-NGS technology’ in 2010. The figures to the right illustrate the general biological pipeline and companies involved in second and third generation HT-NGS sequencing.
Bioinformatics Bioinformatics is a science field that is similar to but distinct from biological computation, while it is often considered synonymous to computational biology. Biological computation uses bioengineering and biology to build biological computers, whereas bioinformatics uses computation to better understand biology. Bioinformatics and computational biology involve the analysis of biological data, particularly DNA, RNA, and protein sequences. The field of bioinformatics experienced explosive growth starting in the mid-1990s, driven largely by the Human Genome Project and by rapid advances in DNA sequencing technology.
Whole genome sequencing Single cell genome sequencing is being tested as a method of preimplantation genetic diagnosis, wherein a cell from the embryo created by in vitro fertilization is taken and analyzed before embryo transfer into the uterus. After implantation, cell-free fetal DNA can be taken by simple venipuncture from the mother and used for whole genome sequencing of the fetus.
Whole genome bisulfite sequencing Whole genome bisulfite sequencing (WGBS), is a next-generation sequencing technology used to determine the DNA methylation status of single cytosines by treating the DNA with sodium bisulfite before sequencing. Sodium bisulfite is a chemical compound that converts unmethylated cytosines into uracil. The cytosines that haven't converted in uracil are methylated. After sequencing, the unmethylated cytosines appear as thymines.
Cancer genome sequencing These cellular factions could only have been identified through cancer genome sequencing, showing the information that sequencing can yield, and the complexity and heterogeneity of a tumor within one individual.
Whole genome sequencing Other technologies are emerging, including nanopore technology. Though nanopore sequencing technology is still being refined, its portability and potential capability of generating long reads are of relevance to whole-genome sequencing applications.
Sequencing Whereas the methods above describe various sequencing methods, separate related terms are used when a large portion of a genome is sequenced. Several platforms were developed to perform exome sequencing (a subset of all DNA across all chromosomes that encode genes) or whole genome sequencing (sequencing of the all nuclear DNA of a human).
Bioinformatics Most DNA sequencing techniques produce short fragments of sequence that need to be assembled to obtain complete gene or genome sequences. The so-called shotgun sequencing technique (which was used, for example, by The Institute for Genomic Research (TIGR) to sequence the first bacterial genome, "Haemophilus influenzae") generates the sequences of many thousands of small DNA fragments (ranging from 35 to 900 nucleotides long, depending on the sequencing technology). The ends of these fragments overlap and, when aligned properly by a genome assembly program, can be used to reconstruct the complete genome. Shotgun sequencing yields sequence data quickly, but the task of assembling the fragments can be quite complicated for larger genomes. For a genome as large as the human genome, it may take many days of CPU time on large-memory, multiprocessor computers to assemble the fragments, and the resulting assembly usually contains numerous gaps that must be filled in later. Shotgun sequencing is the method of choice for virtually all genomes sequenced today, and genome assembly algorithms are a critical area of bioinformatics research.