IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1003214.html
   My bibliography  Save this article

The Next Generation of Transcription Factor Binding Site Prediction

Author

Listed:
  • Anthony Mathelier
  • Wyeth W Wasserman

Abstract

Finding where transcription factors (TFs) bind to the DNA is of key importance to decipher gene regulation at a transcriptional level. Classically, computational prediction of TF binding sites (TFBSs) is based on basic position weight matrices (PWMs) which quantitatively score binding motifs based on the observed nucleotide patterns in a set of TFBSs for the corresponding TF. Such models make the strong assumption that each nucleotide participates independently in the corresponding DNA-protein interaction and do not account for flexible length motifs. We introduce transcription factor flexible models (TFFMs) to represent TF binding properties. Based on hidden Markov models, TFFMs are flexible, and can model both position interdependence within TFBSs and variable length motifs within a single dedicated framework. The availability of thousands of experimentally validated DNA-TF interaction sequences from ChIP-seq allows for the generation of models that perform as well as PWMs for stereotypical TFs and can improve performance for TFs with flexible binding characteristics. We present a new graphical representation of the motifs that convey properties of position interdependence. TFFMs have been assessed on ChIP-seq data sets coming from the ENCODE project, revealing that they can perform better than both PWMs and the dinucleotide weight matrix extension in discriminating ChIP-seq from background sequences. Under the assumption that ChIP-seq signal values are correlated with the affinity of the TF-DNA binding, we find that TFFM scores correlate with ChIP-seq peak signals. Moreover, using available TF-DNA affinity measurements for the Max TF, we demonstrate that TFFMs constructed from ChIP-seq data correlate with published experimentally measured DNA-binding affinities. Finally, TFFMs allow for the straightforward computation of an integrated TF occupancy score across a sequence. These results demonstrate the capacity of TFFMs to accurately model DNA-protein interactions, while providing a single unified framework suitable for the next generation of TFBS prediction.Author Summary: Transcription factors are critical proteins for sequence-specific control of transcriptional regulation. Finding where these proteins bind to DNA is of key importance for global efforts to decipher the complex mechanisms of gene regulation. Greater understanding of the regulation of transcription promises to improve human genetic analysis by specifying critical gene components that have eluded investigators. Classically, computational prediction of transcription factor binding sites (TFBS) is based on models giving weights to each nucleotide at each position. We introduce a novel statistical model for the prediction of TFBS tolerant of a broader range of TFBS configurations than can be conveniently accommodated by existing methods. The new models are designed to address the confounding properties of nucleotide composition, inter-positional sequence dependence and variable lengths (e.g. variable spacing between half-sites) observed in the more comprehensive experimental data now emerging. The new models generate scores consistent with DNA-protein affinities measured experimentally and can be represented graphically, retaining desirable attributes of past methods. It demonstrates the capacity of the new approach to accurately assess DNA-protein interactions. With the rich experimental data generated from chromatin immunoprecipitation experiments, a greater diversity of TFBS properties has emerged that can now be accommodated within a single predictive approach.

Suggested Citation

  • Anthony Mathelier & Wyeth W Wasserman, 2013. "The Next Generation of Transcription Factor Binding Site Prediction," PLOS Computational Biology, Public Library of Science, vol. 9(9), pages 1-18, September.
  • Handle: RePEc:plo:pcbi00:1003214
    DOI: 10.1371/journal.pcbi.1003214
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003214
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1003214&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1003214?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Elizabeth G Wilbanks & Marc T Facciotti, 2010. "Evaluation of Algorithm Performance in ChIP-Seq Peak Detection," PLOS ONE, Public Library of Science, vol. 5(7), pages 1-12, July.
    2. Martin C Frith & Neil F W Saunders & Bostjan Kobe & Timothy L Bailey, 2008. "Discovering Sequence Motifs with Arbitrary Insertions and Deletions," PLOS Computational Biology, Public Library of Science, vol. 4(5), pages 1-12, May.
    3. Liron Levkovitz & Nir Yosef & Marvin C Gershengorn & Eytan Ruppin & Roded Sharan & Yoram Oron, 2010. "A Novel HMM-Based Method for Detecting Enriched Transcription Factor Binding Sites Reveals RUNX3 as a Potential Target in Pancreatic Cancer Biology," PLOS ONE, Public Library of Science, vol. 5(12), pages 1-9, December.
    4. Barbara Felice & Claudia Cattoglio & Davide Cittaro & Anna Testa & Annarita Miccio & Giuliana Ferrari & Lucilla Luzi & Alessandra Recchia & Fulvio Mavilio, 2009. "Transcription Factor Binding Sites Are Genetic Determinants of Retroviral Integration in the Human Genome," PLOS ONE, Public Library of Science, vol. 4(2), pages 1-16, February.
    5. Eran Segal & Yvonne Fondufe-Mittendorf & Lingyi Chen & AnnChristine Thåström & Yair Field & Irene K. Moore & Ji-Ping Z. Wang & Jonathan Widom, 2006. "A genomic code for nucleosome positioning," Nature, Nature, vol. 442(7104), pages 772-778, August.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Saeed Omidi & Mihaela Zavolan & Mikhail Pachkov & Jeremie Breda & Severin Berger & Erik van Nimwegen, 2017. "Automated incorporation of pairwise dependency in transcription factor binding site prediction using dinucleotide weight tensors," PLOS Computational Biology, Public Library of Science, vol. 13(7), pages 1-22, July.
    2. Miaomiao Li & Tao Yao & Wanru Lin & Will E. Hinckley & Mary Galli & Wellington Muchero & Andrea Gallavotti & Jin-Gui Chen & Shao-shan Carol Huang, 2023. "Double DAP-seq uncovered synergistic DNA binding of interacting bZIP transcription factors," Nature Communications, Nature, vol. 14(1), pages 1-19, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ji-Ping Wang & Yvonne Fondufe-Mittendorf & Liqun Xi & Guei-Feng Tsai & Eran Segal & Jonathan Widom, 2008. "Preferentially Quantized Linker DNA Lengths in Saccharomyces cerevisiae," PLOS Computational Biology, Public Library of Science, vol. 4(9), pages 1-10, September.
    2. Zing Tsung-Yeh Tsai & Shin-Han Shiu & Huai-Kuang Tsai, 2015. "Contribution of Sequence Motif, Chromatin State, and DNA Structure Features to Predictive Models of Transcription Factor Binding in Yeast," PLOS Computational Biology, Public Library of Science, vol. 11(8), pages 1-22, August.
    3. Caiyan Jia & Matthew B Carson & Yang Wang & Youfang Lin & Hui Lu, 2014. "A New Exhaustive Method and Strategy for Finding Motifs in ChIP-Enriched Regions," PLOS ONE, Public Library of Science, vol. 9(1), pages 1-13, January.
    4. Segal Mark R, 2008. "Re-Cracking the Nucleosome Positioning Code," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 7(1), pages 1-24, April.
    5. Moser Carlee & Gupta Mayetri, 2012. "A Generalized Hidden Markov Model for Determining Sequence-based Predictors of Nucleosome Positioning," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(2), pages 1-23, January.
    6. Monica Naughtin & Zofia Haftek-Terreau & Johan Xavier & Sam Meyer & Maud Silvain & Yan Jaszczyszyn & Nicolas Levy & Vincent Miele & Mohamed Salah Benleulmi & Marc Ruff & Vincent Parissi & Cédric Vaill, 2015. "DNA Physical Properties and Nucleosome Positions Are Major Determinants of HIV-1 Integrase Selectivity," PLOS ONE, Public Library of Science, vol. 10(6), pages 1-28, June.
    7. Wolfram Möbius & Ulrich Gerland, 2010. "Quantitative Test of the Barrier Nucleosome Model for Statistical Positioning of Nucleosomes Up- and Downstream of Transcription Start Sites," PLOS Computational Biology, Public Library of Science, vol. 6(8), pages 1-11, August.
    8. Weronika Sikora-Wohlfeld & Marit Ackermann & Eleni G Christodoulou & Kalaimathy Singaravelu & Andreas Beyer, 2013. "Assessing Computational Methods for Transcription Factor Target Gene Identification Based on ChIP-seq Data," PLOS Computational Biology, Public Library of Science, vol. 9(11), pages 1-11, November.
    9. Guo-Cheng Yuan & Jun S Liu, 2008. "Genomic Sequence Is Highly Predictive of Local Nucleosome Depletion," PLOS Computational Biology, Public Library of Science, vol. 4(1), pages 1-11, January.
    10. Yuzhuo Wang & Chengzhi Zhang & Kai Li, 2022. "A review on method entities in the academic literature: extraction, evaluation, and application," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(5), pages 2479-2520, May.
    11. Leelavati Narlikar & Raluca Gordân & Alexander J Hartemink, 2007. "A Nucleosome-Guided Map of Transcription Factor Binding Sites in Yeast," PLOS Computational Biology, Public Library of Science, vol. 3(11), pages 1-10, November.
    12. Matti Annala & Kirsti Laurila & Harri Lähdesmäki & Matti Nykter, 2011. "A Linear Model for Transcription Factor Binding Affinity Prediction in Protein Binding Microarrays," PLOS ONE, Public Library of Science, vol. 6(5), pages 1-13, May.
    13. Daniel Ryan & Laura Jenniches & Sarah Reichardt & Lars Barquist & Alexander J. Westermann, 2020. "A high-resolution transcriptome map identifies small RNA regulation of metabolism in the gut microbe Bacteroides thetaiotaomicron," Nature Communications, Nature, vol. 11(1), pages 1-16, December.
    14. Wei Chen & Hao Lin & Peng-Mian Feng & Chen Ding & Yong-Chun Zuo & Kuo-Chen Chou, 2012. "iNuc-PhysChem: A Sequence-Based Predictor for Identifying Nucleosomes via Physicochemical Properties," PLOS ONE, Public Library of Science, vol. 7(10), pages 1-9, October.
    15. Timothy Bailey & Pawel Krajewski & Istvan Ladunga & Celine Lefebvre & Qunhua Li & Tao Liu & Pedro Madrigal & Cenny Taslim & Jie Zhang, 2013. "Practical Guidelines for the Comprehensive Analysis of ChIP-seq Data," PLOS Computational Biology, Public Library of Science, vol. 9(11), pages 1-8, November.
    16. Dongjun Chung & Dan Park & Kevin Myers & Jeffrey Grass & Patricia Kiley & Robert Landick & Sündüz Keleş, 2013. "dPeak: High Resolution Identification of Transcription Factor Binding Sites from PET and SET ChIP-Seq Data," PLOS Computational Biology, Public Library of Science, vol. 9(10), pages 1-13, October.
    17. Joke J F A van Vugt & Martijn de Jager & Magdalena Murawska & Alexander Brehm & John van Noort & Colin Logie, 2009. "Multiple Aspects of ATP-Dependent Nucleosome Translocation by RSC and Mi-2 Are Directed by the Underlying DNA Sequence," PLOS ONE, Public Library of Science, vol. 4(7), pages 1-14, July.
    18. Miguel A Fortuna & Luis Zaman & Charles Ofria & Andreas Wagner, 2017. "The genotype-phenotype map of an evolving digital organism," PLOS Computational Biology, Public Library of Science, vol. 13(2), pages 1-20, February.
    19. Fang Liu & Eivind Tøstesen & Jostein K Sundet & Tor-Kristian Jenssen & Christoph Bock & Geir Ivar Jerstad & William G Thilly & Eivind Hovig, 2007. "The Human Genomic Melting Map," PLOS Computational Biology, Public Library of Science, vol. 3(5), pages 1-13, May.
    20. Harsh Nagpal & Ahmad Ali-Ahmad & Yasuhiro Hirano & Wei Cai & Mario Halic & Tatsuo Fukagawa & Nikolina Sekulić & Beat Fierz, 2023. "CENP-A and CENP-B collaborate to create an open centromeric chromatin state," Nature Communications, Nature, vol. 14(1), pages 1-18, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1003214. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.