IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/0020073.html
   My bibliography  Save this article

Parametric Alignment of Drosophila Genomes

Author

Listed:
  • Colin N Dewey
  • Peter M Huggins
  • Kevin Woods
  • Bernd Sturmfels
  • Lior Pachter

Abstract

The classic algorithms of Needleman–Wunsch and Smith–Waterman find a maximum a posteriori probability alignment for a pair hidden Markov model (PHMM). To process large genomes that have undergone complex genome rearrangements, almost all existing whole genome alignment methods apply fast heuristics to divide genomes into small pieces that are suitable for Needleman–Wunsch alignment. In these alignment methods, it is standard practice to fix the parameters and to produce a single alignment for subsequent analysis by biologists. As the number of alignment programs applied on a whole genome scale continues to increase, so does the disagreement in their results. The alignments produced by different programs vary greatly, especially in non-coding regions of eukaryotic genomes where the biologically correct alignment is hard to find. Parametric alignment is one possible remedy. This methodology resolves the issue of robustness to changes in parameters by finding all optimal alignments for all possible parameters in a PHMM. Our main result is the construction of a whole genome parametric alignment of Drosophila melanogaster and Drosophila pseudoobscura. This alignment draws on existing heuristics for dividing whole genomes into small pieces for alignment, and it relies on advances we have made in computing convex polytopes that allow us to parametrically align non-coding regions using biologically realistic models. We demonstrate the utility of our parametric alignment for biological inference by showing that cis-regulatory elements are more conserved between Drosophila melanogaster and Drosophila pseudoobscura than previously thought. We also show how whole genome parametric alignment can be used to quantitatively assess the dependence of branch length estimates on alignment parameters. Synopsis: Dewey and colleagues describe a parametric alignment of the genomes of Drosophila melanogaster and Drosophila pseudoobscura. The parametric alignment consists of all optimal alignments of the two Drosophila genomes for all choices of parameters for some widely used scoring schemes. Computation and analysis of the parametric alignment requires the integration of ideas from mathematics, algorithms, and biology. Mathematically, the parametric analysis rests on the geometric principle of convexity. In particular, the alignment polytope, which organizes the alignments according to the optimal alignments, is introduced and described. Algorithmically, efficient procedures are developed for computing alignment polytopes on a large scale and for models with more parameters than had previously been practical. Biologically, the utility of parametric analysis is demonstrated by showing that the degree of conservation between cis-regulatory elements in Drosophila melanogaster and Drosophila pseudoobscura is higher than previously thought, and by assessing the dependence of branch length estimates on alignment parameters.

Suggested Citation

  • Colin N Dewey & Peter M Huggins & Kevin Woods & Bernd Sturmfels & Lior Pachter, 2006. "Parametric Alignment of Drosophila Genomes," PLOS Computational Biology, Public Library of Science, vol. 2(6), pages 1-9, June.
  • Handle: RePEc:plo:pcbi00:0020073
    DOI: 10.1371/journal.pcbi.0020073
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.0020073
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.0020073&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.0020073?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Arribas-Gil Ana & Matias Catherine, 2012. "A Context Dependent Pair Hidden Markov Model for Statistical Alignment," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(1), pages 1-29, January.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:0020073. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.