IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1000916.html
   My bibliography  Save this article

High Resolution Models of Transcription Factor-DNA Affinities Improve In Vitro and In Vivo Binding Predictions

Author

Listed:
  • Phaedra Agius
  • Aaron Arvey
  • William Chang
  • William Stafford Noble
  • Christina Leslie

Abstract

Accurately modeling the DNA sequence preferences of transcription factors (TFs), and using these models to predict in vivo genomic binding sites for TFs, are key pieces in deciphering the regulatory code. These efforts have been frustrated by the limited availability and accuracy of TF binding site motifs, usually represented as position-specific scoring matrices (PSSMs), which may match large numbers of sites and produce an unreliable list of target genes. Recently, protein binding microarray (PBM) experiments have emerged as a new source of high resolution data on in vitro TF binding specificities. PBM data has been analyzed either by estimating PSSMs or via rank statistics on probe intensities, so that individual sequence patterns are assigned enrichment scores (E-scores). This representation is informative but unwieldy because every TF is assigned a list of thousands of scored sequence patterns. Meanwhile, high-resolution in vivo TF occupancy data from ChIP-seq experiments is also increasingly available. We have developed a flexible discriminative framework for learning TF binding preferences from high resolution in vitro and in vivo data. We first trained support vector regression (SVR) models on PBM data to learn the mapping from probe sequences to binding intensities. We used a novel -mer based string kernel called the di-mismatch kernel to represent probe sequence similarities. The SVR models are more compact than E-scores, more expressive than PSSMs, and can be readily used to scan genomics regions to predict in vivo occupancy. Using a large data set of yeast and mouse TFs, we found that our SVR models can better predict probe intensity than the E-score method or PBM-derived PSSMs. Moreover, by using SVRs to score yeast, mouse, and human genomic regions, we were better able to predict genomic occupancy as measured by ChIP-chip and ChIP-seq experiments. Finally, we found that by training kernel-based models directly on ChIP-seq data, we greatly improved in vivo occupancy prediction, and by comparing a TF's in vitro and in vivo models, we could identify cofactors and disambiguate direct and indirect binding.Author Summary: Transcription factors (TFs) are proteins that bind sites in the non-coding DNA and regulate the expression of targeted genes. Being able to predict the genome-wide binding locations of TFs is an important step in deciphering gene regulatory networks. Historically, there was very limited experimental data on the DNA-binding preferences of most TFs. Computational biologists used known sites to estimate simple binding site motifs, called position-specific scoring matrices, and scan the genome for additional potential binding locations, but this approach often led to many false positive predictions. Here we introduce a machine learning approach to leverage new high resolution data on the binding preferences of TFs, namely, protein binding microarray (PBM) experiments which measure the in vitro binding affinities of TFs with respect to an array of double-stranded DNA probes, and chromatin immunoprecipitation experiments followed by next generation sequencing (ChIP-seq) which measure in vivo genome-wide binding of TFs in a given cell type. We show that by training statistical models on high resolution PBM and ChIP-seq data, we can more accurately represent the subtle DNA binding preferences of TFs and predict their genome-wide binding locations. These results will enable advances in the computational analysis of transcriptional regulation in mammalian genomes.

Suggested Citation

  • Phaedra Agius & Aaron Arvey & William Chang & William Stafford Noble & Christina Leslie, 2010. "High Resolution Models of Transcription Factor-DNA Affinities Improve In Vitro and In Vivo Binding Predictions," PLOS Computational Biology, Public Library of Science, vol. 6(9), pages 1-12, September.
  • Handle: RePEc:plo:pcbi00:1000916
    DOI: 10.1371/journal.pcbi.1000916
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000916
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1000916&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1000916?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Christopher T. Harbison & D. Benjamin Gordon & Tong Ihn Lee & Nicola J. Rinaldi & Kenzie D. Macisaac & Timothy W. Danford & Nancy M. Hannett & Jean-Bosco Tagne & David B. Reynolds & Jane Yoo & Ezra G., 2004. "Transcriptional regulatory code of a eukaryotic genome," Nature, Nature, vol. 431(7004), pages 99-104, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Manu Setty & Christina S Leslie, 2015. "SeqGL Identifies Context-Dependent Binding Signals in Genome-Wide Regulatory Element Maps," PLOS Computational Biology, Public Library of Science, vol. 11(5), pages 1-21, May.
    2. Jiangning Song & Hao Tan & Andrew J Perry & Tatsuya Akutsu & Geoffrey I Webb & James C Whisstock & Robert N Pike, 2012. "PROSPER: An Integrated Feature-Based Tool for Predicting Protease Substrate Cleavage Sites," PLOS ONE, Public Library of Science, vol. 7(11), pages 1-23, November.
    3. Matti Annala & Kirsti Laurila & Harri Lähdesmäki & Matti Nykter, 2011. "A Linear Model for Transcription Factor Binding Affinity Prediction in Protein Binding Microarrays," PLOS ONE, Public Library of Science, vol. 6(5), pages 1-13, May.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Zing Tsung-Yeh Tsai & Shin-Han Shiu & Huai-Kuang Tsai, 2015. "Contribution of Sequence Motif, Chromatin State, and DNA Structure Features to Predictive Models of Transcription Factor Binding in Yeast," PLOS Computational Biology, Public Library of Science, vol. 11(8), pages 1-22, August.
    2. Gross, Eitan, 2015. "Effect of environmental stress on regulation of gene expression in the yeast," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 430(C), pages 224-235.
    3. Armita Nourmohammad & Michael Lässig, 2011. "Formation of Regulatory Modules by Local Sequence Duplication," PLOS Computational Biology, Public Library of Science, vol. 7(10), pages 1-12, October.
    4. Wei-Sheng Wu & Fu-Jou Lai, 2016. "Detecting Cooperativity between Transcription Factors Based on Functional Coherence and Similarity of Their Target Gene Sets," PLOS ONE, Public Library of Science, vol. 11(9), pages 1-12, September.
    5. Rahul Siddharthan & Eric D Siggia & Erik van Nimwegen, 2005. "PhyloGibbs: A Gibbs Sampling Motif Finder That Incorporates Phylogeny," PLOS Computational Biology, Public Library of Science, vol. 1(7), pages 1-23, December.
    6. Harri Lähdesmäki & Alistair G Rust & Ilya Shmulevich, 2008. "Probabilistic Inference of Transcription Factor Binding from Multiple Data Sources," PLOS ONE, Public Library of Science, vol. 3(3), pages 1-24, March.
    7. Jens Keilwagen & Jan Grau & Ivan A Paponov & Stefan Posch & Marc Strickert & Ivo Grosse, 2011. "De-Novo Discovery of Differentially Abundant Transcription Factor Binding Sites Including Their Positional Preference," PLOS Computational Biology, Public Library of Science, vol. 7(2), pages 1-13, February.
    8. Guo-Cheng Yuan & Jun S Liu, 2008. "Genomic Sequence Is Highly Predictive of Local Nucleosome Depletion," PLOS Computational Biology, Public Library of Science, vol. 4(1), pages 1-11, January.
    9. Saket Navlakha & Anthony Gitter & Ziv Bar-Joseph, 2012. "A Network-based Approach for Predicting Missing Pathway Interactions," PLOS Computational Biology, Public Library of Science, vol. 8(8), pages 1-13, August.
    10. Leelavati Narlikar & Raluca Gordân & Alexander J Hartemink, 2007. "A Nucleosome-Guided Map of Transcription Factor Binding Sites in Yeast," PLOS Computational Biology, Public Library of Science, vol. 3(11), pages 1-10, November.
    11. Jeremiah J Faith & Boris Hayete & Joshua T Thaden & Ilaria Mogno & Jamey Wierzbowski & Guillaume Cottarel & Simon Kasif & James J Collins & Timothy S Gardner, 2007. "Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles," PLOS Biology, Public Library of Science, vol. 5(1), pages 1-13, January.
    12. Joshua S Weitz & Philip N Benfey & Ned S Wingreen, 2007. "Evolution, Interactions, and Biological Networks," PLOS Biology, Public Library of Science, vol. 5(1), pages 1-3, January.
    13. Dana S F Homsi & Vineet Gupta & Gary D Stormo, 2009. "Modeling the Quantitative Specificity of DNA-Binding Proteins from Example Binding Sites," PLOS ONE, Public Library of Science, vol. 4(8), pages 1-9, August.
    14. Manikandan Narayanan & Adrian Vetta & Eric E Schadt & Jun Zhu, 2010. "Simultaneous Clustering of Multiple Gene Expression and Physical Interaction Datasets," PLOS Computational Biology, Public Library of Science, vol. 6(4), pages 1-13, April.
    15. Sourav Bandyopadhyay & Ryan Kelley & Nevan J Krogan & Trey Ideker, 2008. "Functional Maps of Protein Complexes from Quantitative Genetic Interaction Data," PLOS Computational Biology, Public Library of Science, vol. 4(4), pages 1-8, April.
    16. Yue Yuan & Qiang Huo & Ziru Zhang & Qun Wang & Juanxia Wang & Shuaikang Chang & Peng Cai & Karen M. Song & David W. Galbraith & Weixiao Zhang & Long Huang & Rentao Song & Zeyang Ma, 2024. "Decoding the gene regulatory network of endosperm differentiation in maize," Nature Communications, Nature, vol. 15(1), pages 1-19, December.
    17. Timothy E Reddy & Charles DeLisi & Boris E Shakhnovich, 2007. "Binding Site Graphs: A New Graph Theoretical Framework for Prediction of Transcription Factor Binding Sites," PLOS Computational Biology, Public Library of Science, vol. 3(5), pages 1-11, May.
    18. Anshul Kundaje & Xiantong Xin & Changgui Lan & Steve Lianoglou & Mei Zhou & Li Zhang & Christina Leslie, 2008. "A Predictive Model of the Oxygen and Heme Regulatory Network in Yeast," PLOS Computational Biology, Public Library of Science, vol. 4(11), pages 1-21, November.
    19. Kyoung-Jae Won & Saurabh Agarwal & Li Shen & Robert Shoemaker & Bing Ren & Wei Wang, 2009. "An Integrated Approach to Identifying Cis-Regulatory Modules in the Human Genome," PLOS ONE, Public Library of Science, vol. 4(5), pages 1-8, May.
    20. Eilon Sharon & Shai Lubliner & Eran Segal, 2008. "A Feature-Based Approach to Modeling Protein–DNA Interactions," PLOS Computational Biology, Public Library of Science, vol. 4(8), pages 1-17, August.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1000916. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.