Comparing Genes, Proteins, and Genomes (Bioinformatics III)

Start Date: 07/05/2020

Course Type: Common Course

Course Link:

About Course

Once we have sequenced genomes in the previous course, we would like to compare them to determine how species have evolved and what makes them different. In the first half of the course, we will compare two short biological sequences, such as genes (i.e., short sequences of DNA) or proteins. We will encounter a powerful algorithmic tool called dynamic programming that will help us determine the number of mutations that have separated the two genes/proteins. In the second half of the course, we will "zoom out" to compare entire genomes, where we see large scale mutations called genome rearrangements, seismic events that have heaved around large blocks of DNA over millions of years of evolution. Looking at the human and mouse genomes, we will ask ourselves: just as earthquakes are much more likely to occur along fault lines, are there locations in our genome that are "fragile" and more susceptible to be broken as part of genome rearrangements? We will see how combinatorial algorithms will help us answer this question. Finally, you will learn how to apply popular bioinformatics software tools to solve problems in sequence alignment, including BLAST.

Course Syllabus

Welcome to class!

If you joined us in the previous course in this Specialization, then you became an expert at assembling genomes and sequencing antibiotics. The next natural question to ask is how to compare DNA and amino acid sequences. This question will motivate this week's discussion of sequence alignment, which is the first of two questions that we will ask in this class (the algorithmic methods used to answer them are shown in parentheses):

  1. How Do We Compare DNA Sequences? (Dynamic Programming)
  2. Are There Fragile Regions in the Human Genome? (Combinatorial Algorithms)

As in previous courses, each of these two chapters is accompanied by a Bioinformatics Cartoon created by talented artist Randall Christopher and serving as a chapter header in the Specialization's bestselling print companion. You can find the first chapter's cartoon at the bottom of this message. Why have taxis suddenly become free of charge in Manhattan? Where did Pavel get so much spare change? And how should you get dressed in the morning so that you aren't late to your job as a crime-stopping superhero? Answers to these questions, and many more, in this week's installment of the course.

Coursera Plus banner featuring three learners and university partner logos

Course Introduction

Comparing Genes, Proteins, and Genomes (Bioinformatics III) The course compares two different biological systems and introduces the iterative nature of the process of comparing genotypes. The course shows how to identify variants associated with disease states and how to sequence the genome of an organism. These are the topics that were covered in the previous course, Bioinformatics I. The course is designed to cover the biological principles underlying gene–protein interactions in a simple yet useful application, namely the comparison of two loci, one single genotype against another loci. Our main goal is to cover the biological principles underlying gene–protein interactions in a simple yet useful application, namely the detection of variants associated with disease states. Our main goal is to cover the detection and comparison of two loci, one single genotype against another loci. We hope that the course will stimulate your interest in the topic and draw in a number of students, but most importantly, you will have a solid foundation in the process of comparing genotypes of different organisms.Week 1 Week 2 Week 3 Week 4 Computer Architecture and Implementation This course is an introduction to computer architecture and implementation in C. The course focuses on learning about the basic principles of computer architecture and implementation, including hardware design, the microprocessor, the memory controller, and the design of the operating systems. We'll learn how these principles are implemented in hardware, how the operating systems interact with each other, and how the operating

Course Tag

Bioinformatics Graph Theory Bioinformatics Algorithms Python Programming

Related Wiki Topic

Article Example
Bioinformatics Pan genomics is a concept introduced in 2005 by Tettelin and Medini which eventually took root in bioinformatics. Pan genome is the complete gene repertoire of a particular taxonomic group: although initially applied to closely related strains of a species, it can be applied to a larger context like genus, phylum etc. It is divided in two parts- The Core genome: Set of genes common to all the genomes under study (These are often housekeeping genes vital for survival) and The Dispensable/Flexible Genome: Set of genes not present in all but one or some genomes under study.
Ensembl Genomes Ensembl Genomes allows comparing and visualising user data while browsing karyotypes and genes. Most Ensembl Genomes views include an ‘Add your data’ or ‘Manage your data’ button that will allow the user to upload new tracks containing reads or sequences to Ensembl Genomes or to modify data that has been previously uploaded. The uploaded data can be visualised in region views or over the whole karyotype. The uploaded data can be localised using Chromosome Coordinates or BAC Clone Coordinates.
G3: Genes, Genomes, Genetics G3: Genes, Genomes, Genetics (also styled as "G3: Genes | Genomes | Genetics") is a peer-reviewed open-access scientific mega journal that focuses on rapid publication of research in the fields of genetics and genomics and from a broad range of biological disciplines. It is published by the Genetics Society of America in association with HighWire Press. "G3" was established in 2011 as an outlet for research and experimental resources that is largely unrestricted by criteria such as impact and perceived significance. The journal is abstracted and indexed in MEDLINE, PubMed Central, PubMed, and Science Citation Index Expanded. The founding editor-in-chief is Brenda Andrews (University of Toronto).
Ilham Shahmuradov structure and evolution of eukaryotic genomes; Organization and expression of genes in eukaryotic genomes; Organelle-to-nucleus gene transfer in plants; Development of bioinformatics tools and databases.
Bioinformatics Bioinformatics has become an important part of many areas of biology. In experimental molecular biology, bioinformatics techniques such as image and signal processing allow extraction of useful results from large amounts of raw data. In the field of genetics and genomics, it aids in sequencing and annotating genomes and their observed mutations. It plays a role in the text mining of biological literature and the development of biological and gene ontologies to organize and query biological data. It also plays a role in the analysis of gene and protein expression and regulation. Bioinformatics tools aid in the comparison of genetic and genomic data and more generally in the understanding of evolutionary aspects of molecular biology. At a more integrative level, it helps analyze and catalogue the biological pathways and networks that are an important part of systems biology. In structural biology, it aids in the simulation and modeling of DNA, RNA, proteins as well as biomolecular interactions.
Bioinformatics In the context of genomics, annotation is the process of marking the genes and other biological features in a DNA sequence. This process needs to be automated because most genomes are too large to annotate by hand, not to mention the desire to annotate as many genomes as possible, as the rate of sequencing has ceased to pose a bottleneck. Annotation is made possible by the fact that genes have recognisable start and stop regions, although the exact sequence found in these regions can vary between genes.
Ensembl Genomes The project is run by the European Bioinformatics Institute, and was launched in 2009 using the Ensembl technology. The main objective of the Ensembl Genomes database is to complement the main Ensembl
Henipavirus As all mononegaviral genomes, Hendra virus and Nipah virus genomes are non-segmented, single-stranded negative-sense RNA. Both genomes are 18.2 kb in length and contain six genes corresponding to six structural proteins.
List of A1 genes, proteins or receptors This is a list of genes, proteins or receptors named A1 or Alpha-1 :
1000 Plant Genomes Project The goals of the two projects are significantly different. While the 1000 Genomes Project focuses on genetic variation in a single species, the 1000 Plant Genomes Project looks at the evolutionary relationships and genes of 1000 different plant species.
Bioinformatics Bioinformatics is both an umbrella term for the body of biological studies that use computer programming as part of their methodology, as well as a reference to specific analysis "pipelines" that are repeatedly used, particularly in the field of genomics. Common uses of bioinformatics include the identification of candidate genes and nucleotides (SNPs). Often, such identification is made with the aim of better understanding the genetic basis of disease, unique adaptations, desirable properties (esp. in agricultural species), or differences between populations. In a less formal way, bioinformatics also tries to understand the organisational principles within nucleic acid and protein sequences, called proteomics.
FOX proteins Many genes encoding FOX proteins have been identified. For example,
Translational bioinformatics Since the completion of the human genome, new projects are now attempting to systematically analyze all the gene alterations in a disease like cancer rather than focusing on a few genes at a time. In the future, large-scale data will be integrated from different sources in order to extract functional information. The availability of a large number of human genomes will allow for statistical mining of their relation to lifestyles, drug interactions, and other factors. Translational bioinformatics is therefore transforming the search for disease genes and is becoming a crucial component of other areas of medical research including pharmacogenomics.
Bacterial microcompartment Although the carboxysome, propanediol utilizing (PDU), and ethanolamine utilizing (EUT) BMCs encapsulate different enzymes and therefore have different functions, the genes encoding for the shell proteins are very similar. Most of the genes (coding for the shell proteins and the encapsulated enzymes) from experimentally characterized BMCs are located near one another in distinct genetic loci or operons. There are currently over 20,000 bacterial genomes sequenced, and bioinformatics methods can be used to find all BMC shell genes and to look at what other genes are in the vicinity, producing a list of potential BMCs. Recently a comprehensive survey identified 23 different loci encoding up to 10 functionally distinct BMCs across 23 bacterial phyla.
Bioinformatics Software tools for bioinformatics range from simple command-line tools, to more complex graphical programs and standalone web-services available from various bioinformatics companies or public institutions.
Bioinformatics Bioinformatics is an interdisciplinary field that develops methods and software tools for understanding biological data. As an interdisciplinary field of science, bioinformatics combines computer science, statistics, mathematics, and engineering to analyze and interpret biological data. Bioinformatics has been used for "in silico" analyses of biological queries using mathematical and statistical techniques.
Bioinformatics Basic bioinformatics services are classified by the EBI into three categories: SSS (Sequence Search Services), MSA (Multiple Sequence Alignment), and BSA (Biological Sequence Analysis). The availability of these service-oriented bioinformatics resources demonstrate the applicability of web-based bioinformatics solutions, and range from a collection of standalone tools with a common data format under a single, standalone or web-based interface, to integrative, distributed and extensible bioinformatics workflow management systems.
Bioinformatics With the breakthroughs that this next-generation sequencing technology is providing to the field of Bioinformatics, cancer genomics could drastically change. These new methods and software allow bioinformaticians to sequence many cancer genomes quickly and affordably. This could create a more flexible process for classifying types of cancer by analysis of cancer driven mutations in the genome. Furthermore, tracking of patients while the disease progresses may be possible in the future with the sequence of cancer samples.
Bioinformatics The range of open-source software packages includes titles such as Bioconductor, BioPerl, Biopython, BioJava, BioJS, BioRuby, Bioclipse, EMBOSS, .NET Bio, Orange with its bioinformatics add-on, Apache Taverna, UGENE and GenoCAD. To maintain this tradition and create further opportunities, the non-profit Open Bioinformatics Foundation have supported the annual Bioinformatics Open Source Conference (BOSC) since 2000.
Major urinary proteins Between 1979 and 1981, it was estimated that Mups are encoded by a gene family of between 15 and 35 genes and pseudogenes in the mouse and by an estimated 20 genes in the rat. In 2008 a more precise number of "Mup" genes in a range of species was determined by analyzing the DNA sequence of whole genomes.