site hit counter

[KMM]∎ PDF Free Mathematics of Genome Analysis Cambridge Studies in Mathematical Biology Jerome K Percus 9780511613197 Books

Mathematics of Genome Analysis Cambridge Studies in Mathematical Biology Jerome K Percus 9780511613197 Books



Download As PDF : Mathematics of Genome Analysis Cambridge Studies in Mathematical Biology Jerome K Percus 9780511613197 Books

Download PDF Mathematics of Genome Analysis Cambridge Studies in Mathematical Biology Jerome K Percus 9780511613197 Books

The massive research effort known as the Human Genome Project is an attempt to record the sequence of the three trillion nucleotides that make up the human genome and to identify individual genes within this sequence. The description and classification of sequences is heavily dependent on mathematical and statistical models. This short textbook presents a brief description of several ways in which mathematics and statistics are being used in genome analysis and sequencing.

Mathematics of Genome Analysis Cambridge Studies in Mathematical Biology Jerome K Percus 9780511613197 Books

This book is a short overview of some of the important mathematical techniques used to study genome sequences. In spite of the length of the book, the author does a fine job of introducing these techniques. Students of computational biology will especially benefit from its perusal.
The first section is a brief overview of the structure of DNA, m-RNA, and t-RNA. Recognizing that DNA is two large for direct analysis, restriction fragments are discussed in the second section, with emphasis on the restriction-enzyme fingerprint. The author's goal is to find the probability of occurences of a 6-letter word in a strand and the mean distance between occurrences of this word (assuming no overlap between the words or the occurences and equal probabilities for the bases). The effect of successive pair correlation (Markov chain effect) is considered briefly. This is followed by a calculation of the probability that a base pair is contained in a given clone. The author omits any discussion of algorithms for optical mapping, but does give a brief discussion of restriction maps.
The mathematics becomes more rigorous in chapter two, wherein the author analyzes a chain that exists as a set of cloned subchains with unknown overlap. This is the 'fingerprint assembly' problem the object of which is to produce a physical map of the full sequence. The fingerprint of the clone is a collection of lengths of a particular restriction fragments. This algorithm involves a sequence of contiguous clones called 'islands'; and 'contigs', which are two or more clones. The average number and size of islands are calculated assuming that the clones have equal length and identical overlap threshold. The method of anchoring is also discussed as a second method for obtaining the physical map of the genome. The author then considers the problem of covering the whole sequence by first placing n markers on a genome and covering by intervals centered at these markers. This is the restriction-fragment-length polymorphism analysis, the combinatorics of which the author solves by using Laplace and Fourier transforms. He also considers adaptive and non-adaptive pooling, in order to find a particular set of proteins on a large fragment.
The third chapter addresses sequence statistics, with the author addressing the nonhomogeneity of sequences and the correlation dependence in the bases. The chi-square test is discussed is some detail and the author discusses the accuracy of the Markov chain assumption. Noting that very long chains would be needed to determine the parameters for the expressions for the conditional correlations, he uses the maximum likelihood method to find the intrinsic correlation length, and then estimates the parameters by modeling the parameter set.
The author then studies the isochore regions and discusses their detection via the Jensen-Shannon entropy. Asking whether there are correlations between these long regions and within them motivates him to consider the long-range properties of DNA. This leads to the examination of a long fragment of a single strand of DNA, and with the assumption that strand-symmetry holds, the correlation coefficients are studied, with the decay properties of the auto- and cross-correlation discussed. Then, distinguishing only dual pairs, the author considers the probability that a pair is separated by an integer after an integral number of steps, a calculation that reduces to finding the largest eigenvalue of a 'transfer matrix', a procedure well-known in statistical physics.
Next, a consideration of simple sequence repeats leads to a difference equation that is solved by the method of moments. Windows of bases are then discussed, in order to improve on the statistics. Correlations within and between windows are calculated. Interestingly, the consideration of long-range correlations gives a power-law dependence for the correlations, which is related to the Hurst index for self-similar patterns. Readers get their first taste of hidden Markov models in this chapter, which are currently very popular in sequence analysis. Even more interesting is the discussion of walking Markov models, wherein a first-order base-to-base Markov chain is chosen to depend on a hidden parameter, and the time evolution is shown to satisfy a Fokker-Planck (diffusion) equation. Spectral analysis and information theoretic criteria are also discussed.
In the next chapter of the book, the author considers the most important part of sequence analysis, namely the comparison between sequences according to their linear ordering. The problem is to find the probability of a common subsequence of two linear chains with a given length. The first calculation assumes that the matches are mutually exclusive, and the result is an upper bound on the probability. The author then considers the matches to be independent events, and again bounds are given for the probability, the so-called Chen-Stein estimate). He also gives an estimate of the probability in terms of an asymptotic series. Extreme value methods are then used to calculate the expectation value and the variance of the length of the longest match. An interesting exercise is assigned for the reader; namely of finding the effect on the Fourier and Walsh power spectrum with the assumption that the base correlations are fractal in form. The alignment problem is then generalized to include replication errors, mutations, etc. The chapter ends, appropriately, with a discussion of multisequence comparison. The author poses the problem as one of finding the best match of a word to an n-tuple of words, which he tackles first using 'information content'. The category analysis of separating subsequence configurations into clusters is briefly discussed via simulated annealing, discriminant analysis, Bayesian analysis, and neural networks.
The last chapter is a short introduction to the biophysics of DNA. The Hamiltonian for the dynamics of DNA is given, thermal equilibrium is assumed, and the partition function is calculated. This is followed by a discussion of the dynamics at low temperature when the energy is given by RNA polymerase instead of the heat bath, and the dynamics is solved via the Lagrangian using Bessel functions.

Product details

  • Series Cambridge Studies in Mathematical Biology (Book 17)
  • Printed Access Code
  • Publisher Cambridge University Press (December 4, 2009)
  • Language English
  • ISBN-10 0511613199

Read Mathematics of Genome Analysis Cambridge Studies in Mathematical Biology Jerome K Percus 9780511613197 Books

Tags : Buy Mathematics of Genome Analysis (Cambridge Studies in Mathematical Biology) on Amazon.com ✓ FREE SHIPPING on qualified orders,Jerome K. Percus,Mathematics of Genome Analysis (Cambridge Studies in Mathematical Biology),Cambridge University Press,0511613199,Applied mathematics,MATHEMATICS Applied,Molecular biology
People also read other books :

Mathematics of Genome Analysis Cambridge Studies in Mathematical Biology Jerome K Percus 9780511613197 Books Reviews


Genome analysis is a huge field - any title that promises to address it all has taken on a huge task.
This brief book does not deliver on the title's promise. It provides a cursory introduction to the assembly problem. That intro is so brief, however, that I don't think a reader will come away understanding what genome assembly is really about.
It continues with a disappointing analysis of nucleotide frequencies. The probability analysis is competent enough, within its limits, but I don't see any mention of why the analysis is interesting, or how to extend it the same techniques proteins. The author proposes spectral analysis as a tool, and argues for Walsh vectors as basis functions. Spectral analysis is offbeat, to say the least, but the author does not explain what (if any) biological insight the technique generates. More mainstream tools, including Markov Models, get little or no mention.
The chapter on sequence comparison is so short and skips so much critical material, that I'm tempted to call it negligent.
Perhaps you have specific reason for wanting the narrow and idiosyncratic view that Percus brings. ...
This book is a short overview of some of the important mathematical techniques used to study genome sequences. In spite of the length of the book, the author does a fine job of introducing these techniques. Students of computational biology will especially benefit from its perusal.
The first section is a brief overview of the structure of DNA, m-RNA, and t-RNA. Recognizing that DNA is two large for direct analysis, restriction fragments are discussed in the second section, with emphasis on the restriction-enzyme fingerprint. The author's goal is to find the probability of occurences of a 6-letter word in a strand and the mean distance between occurrences of this word (assuming no overlap between the words or the occurences and equal probabilities for the bases). The effect of successive pair correlation (Markov chain effect) is considered briefly. This is followed by a calculation of the probability that a base pair is contained in a given clone. The author omits any discussion of algorithms for optical mapping, but does give a brief discussion of restriction maps.
The mathematics becomes more rigorous in chapter two, wherein the author analyzes a chain that exists as a set of cloned subchains with unknown overlap. This is the 'fingerprint assembly' problem the object of which is to produce a physical map of the full sequence. The fingerprint of the clone is a collection of lengths of a particular restriction fragments. This algorithm involves a sequence of contiguous clones called 'islands'; and 'contigs', which are two or more clones. The average number and size of islands are calculated assuming that the clones have equal length and identical overlap threshold. The method of anchoring is also discussed as a second method for obtaining the physical map of the genome. The author then considers the problem of covering the whole sequence by first placing n markers on a genome and covering by intervals centered at these markers. This is the restriction-fragment-length polymorphism analysis, the combinatorics of which the author solves by using Laplace and Fourier transforms. He also considers adaptive and non-adaptive pooling, in order to find a particular set of proteins on a large fragment.
The third chapter addresses sequence statistics, with the author addressing the nonhomogeneity of sequences and the correlation dependence in the bases. The chi-square test is discussed is some detail and the author discusses the accuracy of the Markov chain assumption. Noting that very long chains would be needed to determine the parameters for the expressions for the conditional correlations, he uses the maximum likelihood method to find the intrinsic correlation length, and then estimates the parameters by modeling the parameter set.
The author then studies the isochore regions and discusses their detection via the Jensen-Shannon entropy. Asking whether there are correlations between these long regions and within them motivates him to consider the long-range properties of DNA. This leads to the examination of a long fragment of a single strand of DNA, and with the assumption that strand-symmetry holds, the correlation coefficients are studied, with the decay properties of the auto- and cross-correlation discussed. Then, distinguishing only dual pairs, the author considers the probability that a pair is separated by an integer after an integral number of steps, a calculation that reduces to finding the largest eigenvalue of a 'transfer matrix', a procedure well-known in statistical physics.
Next, a consideration of simple sequence repeats leads to a difference equation that is solved by the method of moments. Windows of bases are then discussed, in order to improve on the statistics. Correlations within and between windows are calculated. Interestingly, the consideration of long-range correlations gives a power-law dependence for the correlations, which is related to the Hurst index for self-similar patterns. Readers get their first taste of hidden Markov models in this chapter, which are currently very popular in sequence analysis. Even more interesting is the discussion of walking Markov models, wherein a first-order base-to-base Markov chain is chosen to depend on a hidden parameter, and the time evolution is shown to satisfy a Fokker-Planck (diffusion) equation. Spectral analysis and information theoretic criteria are also discussed.
In the next chapter of the book, the author considers the most important part of sequence analysis, namely the comparison between sequences according to their linear ordering. The problem is to find the probability of a common subsequence of two linear chains with a given length. The first calculation assumes that the matches are mutually exclusive, and the result is an upper bound on the probability. The author then considers the matches to be independent events, and again bounds are given for the probability, the so-called Chen-Stein estimate). He also gives an estimate of the probability in terms of an asymptotic series. Extreme value methods are then used to calculate the expectation value and the variance of the length of the longest match. An interesting exercise is assigned for the reader; namely of finding the effect on the Fourier and Walsh power spectrum with the assumption that the base correlations are fractal in form. The alignment problem is then generalized to include replication errors, mutations, etc. The chapter ends, appropriately, with a discussion of multisequence comparison. The author poses the problem as one of finding the best match of a word to an n-tuple of words, which he tackles first using 'information content'. The category analysis of separating subsequence configurations into clusters is briefly discussed via simulated annealing, discriminant analysis, Bayesian analysis, and neural networks.
The last chapter is a short introduction to the biophysics of DNA. The Hamiltonian for the dynamics of DNA is given, thermal equilibrium is assumed, and the partition function is calculated. This is followed by a discussion of the dynamics at low temperature when the energy is given by RNA polymerase instead of the heat bath, and the dynamics is solved via the Lagrangian using Bessel functions.
Ebook PDF Mathematics of Genome Analysis Cambridge Studies in Mathematical Biology Jerome K Percus 9780511613197 Books

0 Response to "[KMM]∎ PDF Free Mathematics of Genome Analysis Cambridge Studies in Mathematical Biology Jerome K Percus 9780511613197 Books"

Post a Comment