## A Geometrical Approach to Genome Analysis: Skew & Z-Curve

Start Date: 09/20/2020

 Course Type: Common Course

Explore 1600+ online courses from top universities. Join Coursera today to learn data science, programming, business strategy, and more. #### Related Wiki Topic

Article Example
Z curve The Z curve (or Z-curve) method is a bioinformatics algorithm for genome analysis. The Z-curve is a three-dimensional curve that constitutes a unique representation of a DNA sequence, i.e., for the Z-curve and the given DNA sequence each can be uniquely reconstructed from the other.
Z curve The Z curve has also been experimentally used to determine phylogenetic relationships. In one study, a novel coronavirus in China was analyzed using sequence analysis and the Z curve method to determine its phylogenetic relationship to other coronaviruses. It was determined that similarities and differences in related species can quickly by determined by visually examining their Z curves. An algorithm was created to identify the geometric center and other trends in the Z curve of 24 species of coronaviruses. The data was used to create a phylogenetic tree. The results matched the tree that was generated using sequence analysis. The Z curve method proved superior because while sequence analysis creates a phylogenetic tree based solely on coding sequences in the genome, the Z curve method analyzed the entire genome.
Z curve and comparative genomics. Analysis of the Z curve has also been shown to be able to predict if a gene contains introns,
Z curve The Z-curve method has been used in many different areas of genome research, such as replication origin identification,", ab initio" gene prediction,
Curve A plane curve is a curve for which formula_22 is the Euclidean plane—these are the examples first encountered—or in some cases the projective plane. A space curve is a curve for which formula_22 is of three dimensions, usually Euclidean space; a skew curve is a space curve which lies in no plane. These definitions of plane, space and skew curves apply also to real algebraic curves, although the above definition of a curve does not applies (a real algebraic curve may be disconnected).
GC skew The second approach is referred to as cumulative GC skew (CGC skew). This method still uses the sliding window strategy but it takes advantage of the sum of the adjacent windows from an arbitrary start. In this scheme, the entire genome is usually plotted 5' to 3' using an arbitrary start and arbitrary strand. In the cumulative GC skew plot, the peaks corresponds to the switch points (terminus or origin).
Decline curve analysis Before the availability of computers, decline curve analysis was performed by hand on semi-log plot paper. Currently, decline curve analysis software on PC computers is used to plot production decline curves for petroleum economics analysis.
Skew heap With no structural constraints, it may seem that a skew heap would be horribly inefficient. However, amortized complexity analysis can be used to demonstrate that all operations on a skew heap can be done in O(log n).
Z curve The resulting curve has a zigzag shape, hence the name Z-curve.
GC skew The final approach is the Z curve. Unlike the previous methods, this method do not uses the sliding window strategy and is thought to perform better as to finding the origin of replication. In this method each base’s cumulative frequency with respect to the base at the beginning of the sequence is investigated. Z curve uses a three-dimensional representation with the following parameters:
Z curve The Z Curve method was first created in 1994 as a way to visually map a DNA or RNA sequence. Different properties of the Z curve, such as its symmetry and periodicity can give unique information on the DNA sequence. The Z curve is generated from a series of nodes, P, P,…P, with the coordinates x, y, and z (n=0,1,2…N, with N being the length of the DNA sequence). The Z curve is created by connecting each of the nodes sequentially.
Z curve Experiments have shown that the Z curve can be used to identify the replication origin in various organisms. One study analyzed the Z curve for multiple species of Archaea and found that the oriC is located at a sharp peak on the curve followed by a broad base. This region was rich in AT bases and had multiple repeats, which is expected for replication origin sites. This and other similar studies were used to generate a program that could predict the origins of replication using the Z curve.
GC skew The GC skew is proven to be useful as the indicator of the DNA leading strand, lagging strand, replication origin, and replication terminal. Most prokaryotes and archaea contain only one DNA replication origin. The GC skew is positive and negative in the leading strand and in the lagging strand respectively; therefore, it is expected to see a switch in GC skew sign just at the point of DNA replication origin and terminus. GC skew can also be used to study the strand biases and mechanism related to them by calculating the excess of one base over its complementary base in different milieus. Method such as GC skew, CGC skew, and Z-curve are tools that can provide opportunity to better investigate the mechanism of DNA replication in different organisms.
Skew arch Buck's trigonometrical approach allowed every dimension of a skew arch to be calculated without recourse to taking measurements from scale drawings and it allowed him to calculate the theoretical minimum angle of obliquity to which a practical semicircular helicoidal skew bridge could be designed and safely built.
Decline curve analysis Decline curve analysis is a means of predicting future oil well or gas well production based on past production history. Production decline curve analysis is a traditional means of identifying well production problems and predicting well performance and life based on measured oil well production.
Document layout analysis There are two issues common to any approach at document layout analysis: noise and skew. Noise refers to image noise, such as salt and pepper noise or Gaussian noise. Skew refers to the fact that a document image may be rotated in a way so that the text lines are not perfectly horizontal. It is a common assumption in both document layout analysis algorithms and optical character recognition algorithms that the characters in the document image are oriented so that text lines are horizontal. Therefore, if there is skew present then it is important to rotate the document image so as to remove it.
Melting curve analysis Many research and clinical examples exist in the literature that show the use of melting curve analysis to obviate or complement sequencing efforts, and thus reduce costs.
Algebraic curve More generally, one may consider algebraic curves that are not contained in the plane, but in a space of higher dimension. A curve that is not contained in some plane is called a skew curve. The simplest example of a skew algebraic curve is the twisted cubic. One may also consider algebraic curves contained in the projective space and even algebraic curves that are defined independently to any embedding in an affine or projective space. This leads to the most general definition of an algebraic curve:
GC skew The first approach is GC and AT Asymmetry. Jean R. Lobry was the first to illustrate the nucleotide composition asymmetry throughout the genome of three bacterium: "E. coli", "Bacillus subtilis", and "haemophilus influenzae" by using GC and AT bias. This is the most common and traditional way to quantitatively evaluate base composition asymmetry. The original formulas at the time were not named skew, but rather deviation from C or A:
Skew gradient The skew gradient can be defined using complex analysis and the Cauchy–Riemann equations.