Algorithms for DNA Sequencing

Start Date: 04/18/2021

Course Type: Common Course

Course Link:

About Course

We will learn computational methods -- algorithms and data structures -- for analyzing DNA sequencing data. We will learn a little about DNA, genomics, and how DNA sequencing is used. We will use Python to implement key algorithms and data structures and to analyze real genomes and DNA sequencing datasets.

Coursera Plus banner featuring three learners and university partner logos

Course Introduction

Algorithms for DNA Sequencing If you have ever used a computer to solve a problem, you have used algorithms. This is the same kind of thing as programming—you are solving a problem using computer code. So this is the course for. In this class, you will learn the basic concepts of algorithms and data structures. You will start out with a program that will read in data from another file in a DNA sequencing database, read out genes from a sequence database, and manipulate these genes in different ways depending on whether they are expressed or not. You will then learn how to use functions to implement different algorithms and data structures. You will then learn how to use a hardware-based microprocessor to read in and manipulate DNA. You will then use the processor to optimize a DNA sequencing run. You will then use the program to execute various algorithms for different types of DNA sequencing, including a full-text search, a brute force search, a brute force collision-theorem solving, and a brute force enrichment-based search. You will then learn how to use a tool to evaluate the strength of your search algorithm. After taking this course, you will be able to: 1. Describe the background of DNA sequencing 2. Write algorithms for DNA sequencing 3. Implement various algorithms for DNA sequencing 4. Use an assembler to run DNA sequencing 5. Evaluate algorithms for DNA sequencing 6. Implement various algorithms to read in DNA 7. Write algorithms for DNA manipulation In

Course Tag

Bioinformatics Algorithms Algorithms Python Programming Algorithms On Strings

Related Wiki Topic

Article Example
Edico Genome DRAGEN is a reconfigurable Bio-IT Processor that is integrated on a PCIe card with accompanying software sold as a platform as a service. The processor is loaded with algorithms for DNA sequencing.
DNA sequencing DNA sequencing may be used along with DNA profiling methods for forensic identification and paternity testing.
DNA sequencing The success of a DNA sequencing protocol is dependent on the sample preparation. A successful DNA extraction will yield a sample with long, non-degraded strands of DNA which require further preparation according to the sequencing technology to be used. For Sanger sequencing, either cloning procedures or PCR are required prior to sequencing. In the case of next generation sequencing methods, library preparation is required before processing.
DNA sequencing Several new methods for DNA sequencing were developed in the mid to late 1990s and were implemented in commercial DNA sequencers by the year 2000. Together these were called the "next-generation" sequencing methods.
DNA sequencing DNA nanoball sequencing is a type of high throughput sequencing technology used to determine the entire genomic sequence of an organism. The company Complete Genomics uses this technology to sequence samples submitted by independent researchers. The method uses rolling circle replication to amplify small fragments of genomic DNA into DNA nanoballs. Unchained sequencing by ligation is then used to determine the nucleotide sequence. This method of DNA sequencing allows large numbers of DNA nanoballs to be sequenced per run and at low reagent costs compared to other high-throughput sequencing platforms. However, only short sequences of DNA are determined from each DNA nanoball which makes mapping the short reads to a reference genome difficult. This technology has been used for multiple genome sequencing projects and is scheduled to be used for more.
DNA sequencing Knowledge of DNA sequences has become indispensable for basic biological research, and in numerous applied fields such as medical diagnosis, biotechnology, forensic biology, virology and biological systematics. The rapid speed of sequencing attained with modern DNA sequencing technology has been instrumental in the sequencing of complete DNA sequences, or genomes of numerous types and species of life, including the human genome and other complete DNA sequences of many animal, plant, and microbial species.
DNA sequencing Shotgun sequencing is a sequencing method designed for analysis of DNA sequences longer than 1000 base pairs, up to and including entire chromosomes. This method requires the target DNA to be broken into random fragments. After sequencing individual fragments, the sequences can be reassembled on the basis of their overlapping regions.
DNA sequencing Applied Biosystems' (now a Life Technologies brand) SOLiD technology employs sequencing by ligation. Here, a pool of all possible oligonucleotides of a fixed length are labeled according to the sequenced position. Oligonucleotides are annealed and ligated; the preferential ligation by DNA ligase for matching sequences results in a signal informative of the nucleotide at that position. Before sequencing, the DNA is amplified by emulsion PCR. The resulting beads, each containing single copies of the same DNA molecule, are deposited on a glass slide. The result is sequences of quantities and lengths comparable to Illumina sequencing. This sequencing by ligation method has been reported to have some issue sequencing palindromic sequences.
DNA sequencing The high demand for low-cost sequencing has driven the development of high-throughput sequencing technologies that parallelize the sequencing process, producing thousands or millions of sequences concurrently. High-throughput sequencing technologies are intended to lower the cost of DNA sequencing beyond what is possible with standard dye-terminator methods. In ultra-high-throughput sequencing as many as 500,000 sequencing-by-synthesis operations may be run in parallel.
DNA sequencing DNA sequencing is the process of determining the precise order of nucleotides within a DNA molecule. It includes any method or technology that is used to determine the order of the four bases—adenine, guanine, cytosine, and thymine—in a strand of DNA. The advent of rapid DNA sequencing methods has greatly accelerated biological and medical research and discovery.
DNA sequencing The first method for determining DNA sequences involved a location-specific primer extension strategy established by Ray Wu at Cornell University in 1970. DNA polymerase catalysis and specific nucleotide labeling, both of which figure prominently in current sequencing schemes, were used to sequence the cohesive ends of lambda phage DNA. Between 1970 and 1973, Wu, R Padmanabhan and colleagues demonstrated that this method can be employed to determine any DNA sequence using synthetic location-specific primers. Frederick Sanger then adopted this primer-extension strategy to develop more rapid DNA sequencing methods at the MRC Centre, Cambridge, UK and published a method for "DNA sequencing with chain-terminating inhibitors" in 1977. Walter Gilbert and Allan Maxam at Harvard also developed sequencing methods, including one for "DNA sequencing by chemical degradation". In 1973, Gilbert and Maxam reported the sequence of 24 basepairs using a method known as wandering-spot analysis. Advancements in sequencing were aided by the concurrent development of recombinant DNA technology, allowing DNA samples to be isolated from sources other than viruses.
DNA sequencing Large-scale sequencing often aims at sequencing very long DNA pieces, such as whole chromosomes, although large-scale sequencing can also be used to generate very large numbers of short sequences, such as found in phage display. For longer targets such as chromosomes, common approaches consist of cutting (with restriction enzymes) or shearing (with mechanical forces) large DNA fragments into shorter DNA fragments. The fragmented DNA may then be cloned into a DNA vector and amplified in a bacterial host such as "Escherichia coli". Short DNA fragments purified from individual bacterial colonies are individually sequenced and assembled electronically into one long, contiguous sequence. Studies have shown that adding a size selection step to collect DNA fragments of uniform size can improve sequencing efficiency and accuracy of the genome assembly. In these studies, automated sizing has proven to be more reproducible and precise than manual gel sizing.
DNA sequencing DNA sequencing may be used to determine the sequence of individual genes, larger genetic regions (i.e. clusters of genes or operons), full chromosomes or entire genomes, of any organism. DNA sequencing is also the most efficient way to sequence RNA or proteins (via their open reading frames). In fact, DNA sequencing has become a key technology in many areas of biology and other sciences such as medicine, forensics, or anthropology.
DNA sequencing The first DNA sequences were obtained in the early 1970s by academic researchers using laborious methods based on two-dimensional chromatography. Following the development of fluorescence-based sequencing methods with a DNA sequencer, DNA sequencing has become easier and orders of magnitude faster.
DNA sequencing Most sequencing approaches use an "in vitro" cloning step to amplify individual DNA molecules, because their molecular detection methods are not sensitive enough for single molecule sequencing. Emulsion PCR isolates individual DNA molecules along with primer-coated beads in aqueous droplets within an oil phase. A polymerase chain reaction (PCR) then coats each bead with clonal copies of the DNA molecule followed by immobilization for later sequencing. Emulsion PCR is used in the methods developed by Marguilis et al. (commercialized by 454 Life Sciences), Shendure and Porreca et al. (also known as "Polony sequencing") and SOLiD sequencing, (developed by Agencourt, later Applied Biosystems, now Life Technologies). Emulsion PCR is also used in the GemCode and Chromium platforms developed by 10x Genomics.
DNA sequencing theory DNA sequencing theory is the broad body of work that attempts to lay analytical foundations for determining the order of specific nucleotides in a sequence of DNA, otherwise known as DNA sequencing. The practical aspects revolve around designing and optimizing sequencing projects (known as "strategic genomics"), predicting project performance, troubleshooting experimental results, characterizing factors such as sequence bias and the effects of software processing algorithms, and comparing various sequencing methods to one another. In this sense, it could be considered a branch of systems engineering or operations research. The permanent archive of work is primarily mathematical, although numerical calculations are often conducted for particular problems too. DNA sequencing theory addresses "physical processes" related to sequencing DNA and should not be confused with theories of analyzing resultant DNA sequences, e.g. sequence alignment. Publications sometimes do not make a careful distinction, but the latter are primarily concerned with algorithmic issues. Sequencing theory is based on elements of mathematics, biology, and systems engineering, so it is highly interdisciplinary. The subject may be studied within the context of computational biology.
DNA sequencing Human genetics have been included within the field of bioethics since the early 1970s and the growth in the use of DNA sequencing (particularly high-throughput sequencing) has introduced a number of ethical issues. One key issue is the ownership of an individual's DNA and the data produced when that DNA is sequenced. Regarding the DNA molecule itself, the leading legal case on this topic, "Moore v. Regents of the University of California" (1990) ruled that individuals have no property rights to discarded cells or any profits made using these cells (for instance, as a patented cell line). However, individuals have a right to informed consent regarding removal and use of cells. Regarding the data produced through DNA sequencing, "Moore" gives the individual no rights to the information derived from their DNA.
DNA sequencing A non-radioactive method for transferring the DNA molecules of sequencing reaction mixtures onto an immobilizing matrix during electrophoresis was developed by Pohl and co-workers in the early 1980s. Followed by the commercialization of the DNA sequencer "Direct-Blotting-Electrophoresis-System GATC 1500" by GATC Biotech, which was intensively used in the framework of the EU genome-sequencing programme, the complete DNA sequence of the yeast "Saccharomyces cerevisiae" chromosome II. Leroy E. Hood's laboratory at the California Institute of Technology announced the first semi-automated DNA sequencing machine in 1986. This was followed by Applied Biosystems' marketing of the first fully automated sequencing machine, the ABI 370, in 1987 and by Dupont's Genesis 2000 which used a novel fluorescent labeling technique enabling all four dideoxynucleotides to be identified in a single lane. By 1990, the U.S. National Institutes of Health (NIH) had begun large-scale sequencing trials on "Mycoplasma capricolum", "Escherichia coli", "Caenorhabditis elegans", and "Saccharomyces cerevisiae" at a cost of US$0.75 per base. Meanwhile, sequencing of human cDNA sequences called expressed sequence tags began in Craig Venter's lab, an attempt to capture the coding fraction of the human genome. In 1995, Venter, Hamilton Smith, and colleagues at The Institute for Genomic Research (TIGR) published the first complete genome of a free-living organism, the bacterium "Haemophilus influenzae". The circular chromosome contains 1,830,137 bases and its publication in the journal Science marked the first published use of whole-genome shotgun sequencing, eliminating the need for initial mapping efforts.
DNA sequencing On April 1, 1997, Pascal Mayer and Laurent Farinelli submitted patents to the World Intellectual Property Organization describing DNA colony sequencing. The DNA sample preparation and random surface-PCR arraying methods described in this patent, coupled to Roger Tsien et al.'s "base-by-base" sequencing method, is now implemented in Illumina's Hi-Seq genome sequencers.
Transmission electron microscopy DNA sequencing Compared to other second- and third-generation DNA sequencing technologies, transmission electron microscopy DNA sequencing has a number of potential key strengths and weaknesses, which will ultimately determine its usefulness and prominence as a future DNA sequencing technology.