IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1001070.html
   My bibliography  Save this article

De-Novo Discovery of Differentially Abundant Transcription Factor Binding Sites Including Their Positional Preference

Author

Listed:
  • Jens Keilwagen
  • Jan Grau
  • Ivan A Paponov
  • Stefan Posch
  • Marc Strickert
  • Ivo Grosse

Abstract

Transcription factors are a main component of gene regulation as they activate or repress gene expression by binding to specific binding sites in promoters. The de-novo discovery of transcription factor binding sites in target regions obtained by wet-lab experiments is a challenging problem in computational biology, which has not been fully solved yet. Here, we present a de-novo motif discovery tool called Dispom for finding differentially abundant transcription factor binding sites that models existing positional preferences of binding sites and adjusts the length of the motif in the learning process. Evaluating Dispom, we find that its prediction performance is superior to existing tools for de-novo motif discovery for 18 benchmark data sets with planted binding sites, and for a metazoan compendium based on experimental data from micro-array, ChIP-chip, ChIP-DSL, and DamID as well as Gene Ontology data. Finally, we apply Dispom to find binding sites differentially abundant in promoters of auxin-responsive genes extracted from Arabidopsis thaliana microarray data, and we find a motif that can be interpreted as a refined auxin responsive element predominately positioned in the 250-bp region upstream of the transcription start site. Using an independent data set of auxin-responsive genes, we find in genome-wide predictions that the refined motif is more specific for auxin-responsive genes than the canonical auxin-responsive element. In general, Dispom can be used to find differentially abundant motifs in sequences of any origin. However, the positional distribution learned by Dispom is especially beneficial if all sequences are aligned to some anchor point like the transcription start site in case of promoter sequences. We demonstrate that the combination of searching for differentially abundant motifs and inferring a position distribution from the data is beneficial for de-novo motif discovery. Hence, we make the tool freely available as a component of the open-source Java framework Jstacs and as a stand-alone application at http://www.jstacs.de/index.php/Dispom.Author Summary: Binding of transcription factors to promoters of genes, and subsequent enhancement or repression of transcription, is one of the main steps of transcriptional gene regulation. Direct or indirect wet-lab experiments allow the identification of approximate regions potentially bound or regulated by a transcription factor. Subsequently, de-novo motif discovery tools can be used for detecting the precise positions of binding sites. Many traditional tools focus on motifs over-represented in the target regions, which often turn out to be similarly over-represented in the entire genome. In contrast, several recent tools focus on differentially abundant motifs in target regions compared to a control set. As binding sites are often located at some preferred distance to the transcription start site, it is favorable to include this information into de-novo motif discovery. Here, we present Dispom a novel approach for learning differentially abundant motifs and their positional preferences simultaneously, which predicts binding sites with increased accuracy compared to many popular de-novo motif discovery tools. When applying Dispom to promoters of auxin-responsive genes of Arabidopsis thaliana, we find a binding motif slightly different from the canonical auxin-response element, which exhibits a strong positional preference and which is considerably more specific to auxin-responsive genes.

Suggested Citation

  • Jens Keilwagen & Jan Grau & Ivan A Paponov & Stefan Posch & Marc Strickert & Ivo Grosse, 2011. "De-Novo Discovery of Differentially Abundant Transcription Factor Binding Sites Including Their Positional Preference," PLOS Computational Biology, Public Library of Science, vol. 7(2), pages 1-13, February.
  • Handle: RePEc:plo:pcbi00:1001070
    DOI: 10.1371/journal.pcbi.1001070
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1001070
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1001070&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1001070?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Christopher T. Harbison & D. Benjamin Gordon & Tong Ihn Lee & Nicola J. Rinaldi & Kenzie D. Macisaac & Timothy W. Danford & Nancy M. Hannett & Jean-Bosco Tagne & David B. Reynolds & Jane Yoo & Ezra G., 2004. "Transcriptional regulatory code of a eukaryotic genome," Nature, Nature, vol. 431(7004), pages 99-104, September.
    2. David J. Lockhart & Elizabeth A. Winzeler, 2000. "Genomics, gene expression and DNA arrays," Nature, Nature, vol. 405(6788), pages 827-836, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Uday Kamath & Kenneth De Jong & Amarda Shehu, 2014. "Effective Automated Feature Construction and Selection for Classification of Biological Sequences," PLOS ONE, Public Library of Science, vol. 9(7), pages 1-14, July.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Matvei Khoroshkin & Andrey Buyan & Martin Dodel & Albertas Navickas & Johnny Yu & Fathima Trejo & Anthony Doty & Rithvik Baratam & Shaopu Zhou & Sean B. Lee & Tanvi Joshi & Kristle Garcia & Benedict C, 2024. "Systematic identification of post-transcriptional regulatory modules," Nature Communications, Nature, vol. 15(1), pages 1-21, December.
    2. Armita Nourmohammad & Michael Lässig, 2011. "Formation of Regulatory Modules by Local Sequence Duplication," PLOS Computational Biology, Public Library of Science, vol. 7(10), pages 1-12, October.
    3. Guo-Cheng Yuan & Jun S Liu, 2008. "Genomic Sequence Is Highly Predictive of Local Nucleosome Depletion," PLOS Computational Biology, Public Library of Science, vol. 4(1), pages 1-11, January.
    4. Joshua S Weitz & Philip N Benfey & Ned S Wingreen, 2007. "Evolution, Interactions, and Biological Networks," PLOS Biology, Public Library of Science, vol. 5(1), pages 1-3, January.
    5. Manikandan Narayanan & Adrian Vetta & Eric E Schadt & Jun Zhu, 2010. "Simultaneous Clustering of Multiple Gene Expression and Physical Interaction Datasets," PLOS Computational Biology, Public Library of Science, vol. 6(4), pages 1-13, April.
    6. Sourav Bandyopadhyay & Ryan Kelley & Nevan J Krogan & Trey Ideker, 2008. "Functional Maps of Protein Complexes from Quantitative Genetic Interaction Data," PLOS Computational Biology, Public Library of Science, vol. 4(4), pages 1-8, April.
    7. Yue Yuan & Qiang Huo & Ziru Zhang & Qun Wang & Juanxia Wang & Shuaikang Chang & Peng Cai & Karen M. Song & David W. Galbraith & Weixiao Zhang & Long Huang & Rentao Song & Zeyang Ma, 2024. "Decoding the gene regulatory network of endosperm differentiation in maize," Nature Communications, Nature, vol. 15(1), pages 1-19, December.
    8. Eilon Sharon & Shai Lubliner & Eran Segal, 2008. "A Feature-Based Approach to Modeling Protein–DNA Interactions," PLOS Computational Biology, Public Library of Science, vol. 4(8), pages 1-17, August.
    9. Xiaoyu Tu & Sibo Ren & Wei Shen & Jianjian Li & Yuxiang Li & Chuanshun Li & Yangmeihui Li & Zhanxiang Zong & Weibo Xie & Donald Grierson & Zhangjun Fei & Jim Giovannoni & Pinghua Li & Silin Zhong, 2022. "Limited conservation in cross-species comparison of GLK transcription factor binding suggested wide-spread cistrome divergence," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    10. Xiaoke Ma & Long Gao & Georgios Karamanlidis & Peng Gao & Chi Fung Lee & Lorena Garcia-Menendez & Rong Tian & Kai Tan, 2015. "Revealing Pathway Dynamics in Heart Diseases by Analyzing Multiple Differential Networks," PLOS Computational Biology, Public Library of Science, vol. 11(6), pages 1-19, June.
    11. Shojaie Ali & Michailidis George, 2010. "Network Enrichment Analysis in Complex Experiments," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 9(1), pages 1-36, May.
    12. Phaedra Agius & Aaron Arvey & William Chang & William Stafford Noble & Christina Leslie, 2010. "High Resolution Models of Transcription Factor-DNA Affinities Improve In Vitro and In Vivo Binding Predictions," PLOS Computational Biology, Public Library of Science, vol. 6(9), pages 1-12, September.
    13. Kenzie D MacIsaac & Ernest Fraenkel, 2006. "Practical Strategies for Discovering Regulatory DNA Sequence Motifs," PLOS Computational Biology, Public Library of Science, vol. 2(4), pages 1-10, April.
    14. Zing Tsung-Yeh Tsai & Shin-Han Shiu & Huai-Kuang Tsai, 2015. "Contribution of Sequence Motif, Chromatin State, and DNA Structure Features to Predictive Models of Transcription Factor Binding in Yeast," PLOS Computational Biology, Public Library of Science, vol. 11(8), pages 1-22, August.
    15. Gross, Eitan, 2015. "Effect of environmental stress on regulation of gene expression in the yeast," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 430(C), pages 224-235.
    16. Kai Deng & Xin Zhang, 2022. "Tensor envelope mixture model for simultaneous clustering and multiway dimension reduction," Biometrics, The International Biometric Society, vol. 78(3), pages 1067-1079, September.
    17. Andreas Wagner, 2001. "Estimating Coarse Gene Network Structure from Large-Scale Gene Perturbation Data," Working Papers 01-09-051, Santa Fe Institute.
    18. Wei-Sheng Wu & Fu-Jou Lai, 2016. "Detecting Cooperativity between Transcription Factors Based on Functional Coherence and Similarity of Their Target Gene Sets," PLOS ONE, Public Library of Science, vol. 11(9), pages 1-12, September.
    19. Rahul Siddharthan & Eric D Siggia & Erik van Nimwegen, 2005. "PhyloGibbs: A Gibbs Sampling Motif Finder That Incorporates Phylogeny," PLOS Computational Biology, Public Library of Science, vol. 1(7), pages 1-23, December.
    20. Harri Lähdesmäki & Alistair G Rust & Ilya Shmulevich, 2008. "Probabilistic Inference of Transcription Factor Binding from Multiple Data Sources," PLOS ONE, Public Library of Science, vol. 3(3), pages 1-24, March.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1001070. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.