IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1000299.html
   My bibliography  Save this article

Alignment and Prediction of cis-Regulatory Modules Based on a Probabilistic Model of Evolution

Author

Listed:
  • Xin He
  • Xu Ling
  • Saurabh Sinha

Abstract

Cross-species comparison has emerged as a powerful paradigm for predicting cis-regulatory modules (CRMs) and understanding their evolution. The comparison requires reliable sequence alignment, which remains a challenging task for less conserved noncoding sequences. Furthermore, the existing models of DNA sequence evolution generally do not explicitly treat the special properties of CRM sequences. To address these limitations, we propose a model of CRM evolution that captures different modes of evolution of functional transcription factor binding sites (TFBSs) and the background sequences. A particularly novel aspect of our work is a probabilistic model of gains and losses of TFBSs, a process being recognized as an important part of regulatory sequence evolution. We present a computational framework that uses this model to solve the problems of CRM alignment and prediction. Our alignment method is similar to existing methods of statistical alignment but uses the conserved binding sites to improve alignment. Our CRM prediction method deals with the inherent uncertainties of binding site annotations and sequence alignment in a probabilistic framework. In simulated as well as real data, we demonstrate that our program is able to improve both alignment and prediction of CRM sequences over several state-of-the-art methods. Finally, we used alignments produced by our program to study binding site conservation in genome-wide binding data of key transcription factors in the Drosophila blastoderm, with two intriguing results: (i) the factor-bound sequences are under strong evolutionary constraints even if their neighboring genes are not expressed in the blastoderm and (ii) binding sites in distal bound sequences (relative to transcription start sites) tend to be more conserved than those in proximal regions. Our approach is implemented as software, EMMA (Evolutionary Model-based cis-regulatory Module Analysis), ready to be applied in a broad biological context.Author Summary: Comparison of noncoding DNA sequences across species has the potential to significantly improve our understanding of gene regulation and our ability to annotate regulatory regions of the genome. This potential is evident from recent publications analyzing 12 Drosophila genomes for regulatory annotation. However, because noncoding sequences are much less structured than coding sequences, their interspecies comparison presents technical challenges, such as ambiguity about how to align them and how to predict transcription factor binding sites, which are the fundamental units that make up regulatory sequences. This article describes how to build an integrated probabilistic framework that performs alignment and binding site prediction simultaneously, in the process improving the accuracy of both tasks. It defines a stochastic model for the evolution of entire “cis-regulatory modules,” with its highlight being a novel theoretical treatment of the commonly observed loss and gain of binding sites during evolution. This new evolutionary model forms the backbone of newly developed software for the prediction of new cis-regulatory modules, alignment of known modules to elucidate general principles of cis-regulatory evolution, or both. The new software is demonstrated to provide benefits in performance of these two crucial genomics tasks.

Suggested Citation

  • Xin He & Xu Ling & Saurabh Sinha, 2009. "Alignment and Prediction of cis-Regulatory Modules Based on a Probabilistic Model of Evolution," PLOS Computational Biology, Public Library of Science, vol. 5(3), pages 1-14, March.
  • Handle: RePEc:plo:pcbi00:1000299
    DOI: 10.1371/journal.pcbi.1000299
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000299
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1000299&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1000299?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Rahul Siddharthan & Eric D Siggia & Erik van Nimwegen, 2005. "PhyloGibbs: A Gibbs Sampling Motif Finder That Incorporates Phylogeny," PLOS Computational Biology, Public Library of Science, vol. 1(7), pages 1-23, December.
    2. Michael Z. Ludwig & Casey Bergman & Nipam H. Patel & Martin Kreitman, 2000. "Evidence for stabilizing selection in a eukaryotic enhancer element," Nature, Nature, vol. 403(6769), pages 564-567, February.
    3. Saurabh Sinha & Xin He, 2007. "MORPH: Probabilistic Alignment Combined with Hidden Markov Models of cis-Regulatory Modules," PLOS Computational Biology, Public Library of Science, vol. 3(11), pages 1-15, November.
    4. Pradipta Ray & Suyash Shringarpure & Mladen Kolar & Eric P Xing, 2008. "CSMET: Comparative Genomic Motif Detection via Multi-Resolution Phylogenetic Shadowing," PLOS Computational Biology, Public Library of Science, vol. 4(6), pages 1-20, June.
    5. Colin N Dewey & Peter M Huggins & Kevin Woods & Bernd Sturmfels & Lior Pachter, 2006. "Parametric Alignment of Drosophila Genomes," PLOS Computational Biology, Public Library of Science, vol. 2(6), pages 1-9, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Pradipta Ray & Suyash Shringarpure & Mladen Kolar & Eric P Xing, 2008. "CSMET: Comparative Genomic Motif Detection via Multi-Resolution Phylogenetic Shadowing," PLOS Computational Biology, Public Library of Science, vol. 4(6), pages 1-20, June.
    2. Ivan Dotu & Scott I Adamson & Benjamin Coleman & Cyril Fournier & Emma Ricart-Altimiras & Eduardo Eyras & Jeffrey H Chuang, 2018. "SARNAclust: Semi-automatic detection of RNA protein binding motifs from immunoprecipitation data," PLOS Computational Biology, Public Library of Science, vol. 14(3), pages 1-25, March.
    3. William H Majoros & Uwe Ohler, 2010. "Modeling the Evolution of Regulatory Elements by Simultaneous Detection and Alignment with Phylogenetic Pair HMMs," PLOS Computational Biology, Public Library of Science, vol. 6(12), pages 1-12, December.
    4. Qi Dai & Lihua Li & Xiaoqing Liu & Yuhua Yao & Fukun Zhao & Michael Zhang, 2011. "Integrating Overlapping Structures and Background Information of Words Significantly Improves Biological Sequence Comparison," PLOS ONE, Public Library of Science, vol. 6(11), pages 1-10, November.
    5. Armita Nourmohammad & Michael Lässig, 2011. "Formation of Regulatory Modules by Local Sequence Duplication," PLOS Computational Biology, Public Library of Science, vol. 7(10), pages 1-12, October.
    6. Robert K Bradley & Adam Roberts & Michael Smoot & Sudeep Juvekar & Jaeyoung Do & Colin Dewey & Ian Holmes & Lior Pachter, 2009. "Fast Statistical Alignment," PLOS Computational Biology, Public Library of Science, vol. 5(5), pages 1-15, May.
    7. Harri Lähdesmäki & Alistair G Rust & Ilya Shmulevich, 2008. "Probabilistic Inference of Transcription Factor Binding from Multiple Data Sources," PLOS ONE, Public Library of Science, vol. 3(3), pages 1-24, March.
    8. Jia Lu & Xiaoyi Cao & Sheng Zhong, 2018. "A likelihood approach to testing hypotheses on the co-evolution of epigenome and genome," PLOS Computational Biology, Public Library of Science, vol. 14(12), pages 1-28, December.
    9. Arribas-Gil Ana & Matias Catherine, 2012. "A Context Dependent Pair Hidden Markov Model for Statistical Alignment," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(1), pages 1-29, January.
    10. Saeed Omidi & Mihaela Zavolan & Mikhail Pachkov & Jeremie Breda & Severin Berger & Erik van Nimwegen, 2017. "Automated incorporation of pairwise dependency in transcription factor binding site prediction using dinucleotide weight tensors," PLOS Computational Biology, Public Library of Science, vol. 13(7), pages 1-22, July.
    11. Aqil M Azmi & Abdulrakeeb Al-Ssulami, 2014. "Encoded Expansion: An Efficient Algorithm to Discover Identical String Motifs," PLOS ONE, Public Library of Science, vol. 9(5), pages 1-9, May.
    12. Lauren A. Choate & Gilad Barshad & Pierce W. McMahon & Iskander Said & Edward J. Rice & Paul R. Munn & James J. Lewis & Charles G. Danko, 2021. "Multiple stages of evolutionary change in anthrax toxin receptor expression in humans," Nature Communications, Nature, vol. 12(1), pages 1-12, December.
    13. Iksoo Huh & Isabel Mendizabal & Taesung Park & Soojin V Yi, 2018. "Functional conservation of sequence determinants at rapidly evolving regulatory regions across mammals," PLOS Computational Biology, Public Library of Science, vol. 14(10), pages 1-21, October.
    14. Timothy E Reddy & Charles DeLisi & Boris E Shakhnovich, 2007. "Binding Site Graphs: A New Graph Theoretical Framework for Prediction of Transcription Factor Binding Sites," PLOS Computational Biology, Public Library of Science, vol. 3(5), pages 1-11, May.
    15. Mathilde Paris & Tommy Kaplan & Xiao Yong Li & Jacqueline E Villalta & Susan E Lott & Michael B Eisen, 2013. "Extensive Divergence of Transcription Factor Binding in Drosophila Embryos with Highly Conserved Gene Expression," PLOS Genetics, Public Library of Science, vol. 9(9), pages 1-18, September.
    16. Siewert Elizabeth A & Kechris Katerina J, 2009. "Prediction of Motifs Based on a Repeated-Measures Model for Integrating Cross-Species Sequence and Expression Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 8(1), pages 1-36, September.
    17. Kenzie D MacIsaac & Ernest Fraenkel, 2006. "Practical Strategies for Discovering Regulatory DNA Sequence Motifs," PLOS Computational Biology, Public Library of Science, vol. 2(4), pages 1-10, April.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1000299. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.