IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0090735.html
   My bibliography  Save this article

STEME: A Robust, Accurate Motif Finder for Large Data Sets

Author

Listed:
  • John E Reid
  • Lorenz Wernisch

Abstract

Motif finding is a difficult problem that has been studied for over 20 years. Some older popular motif finders are not suitable for analysis of the large data sets generated by next-generation sequencing. We recently published an efficient approximation (STEME) to the EM algorithm that is at the core of many motif finders such as MEME. This approximation allows the EM algorithm to be applied to large data sets. In this work we describe several efficient extensions to STEME that are based on the MEME algorithm. Together with the original STEME EM approximation, these extensions make STEME a fully-fledged motif finder with similar properties to MEME. We discuss the difficulty of objectively comparing motif finders. We show that STEME performs comparably to existing prominent discriminative motif finders, DREME and Trawler, on 13 sets of transcription factor binding data in mouse ES cells. We demonstrate the ability of STEME to find long degenerate motifs which these discriminative motif finders do not find. As part of our method, we extend an earlier method due to Nagarajan et al. for the efficient calculation of motif E-values. STEME's source code is available under an open source license and STEME is available via a web interface.

Suggested Citation

  • John E Reid & Lorenz Wernisch, 2014. "STEME: A Robust, Accurate Motif Finder for Large Data Sets," PLOS ONE, Public Library of Science, vol. 9(3), pages 1-11, March.
  • Handle: RePEc:plo:pone00:0090735
    DOI: 10.1371/journal.pone.0090735
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0090735
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0090735&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0090735?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Vishwanath R. Iyer & Christine E. Horak & Charles S. Scafe & David Botstein & Michael Snyder & Patrick O. Brown, 2001. "Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF," Nature, Nature, vol. 409(6819), pages 533-538, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Xinyi Liu & Bin Liu & Zhimin Huang & Ting Shi & Yingyi Chen & Jian Zhang, 2012. "SPPS: A Sequence-Based Method for Predicting Probability of Protein-Protein Interaction Partners," PLOS ONE, Public Library of Science, vol. 7(1), pages 1-6, January.
    2. G. Saharidis & I. Androulakis & M. Ierapetritou, 2011. "Model building using bi-level optimization," Journal of Global Optimization, Springer, vol. 49(1), pages 49-67, January.
    3. Emily N Manderson & Mohan Malleshaiah & Stephen W Michnick, 2008. "A Novel Genetic Screen Implicates Elm1 in the Inactivation of the Yeast Transcription Factor SBF," PLOS ONE, Public Library of Science, vol. 3(1), pages 1-9, January.
    4. Cheemeng Tan & Robert Phillip Smith & Ming-Chi Tsai & Russell Schwartz & Lingchong You, 2014. "Phenotypic Signatures Arising from Unbalanced Bacterial Growth," PLOS Computational Biology, Public Library of Science, vol. 10(8), pages 1-10, August.
    5. Kyoung-Jae Won & Saurabh Agarwal & Li Shen & Robert Shoemaker & Bing Ren & Wei Wang, 2009. "An Integrated Approach to Identifying Cis-Regulatory Modules in the Human Genome," PLOS ONE, Public Library of Science, vol. 4(5), pages 1-8, May.
    6. Eilon Sharon & Shai Lubliner & Eran Segal, 2008. "A Feature-Based Approach to Modeling Protein–DNA Interactions," PLOS Computational Biology, Public Library of Science, vol. 4(8), pages 1-17, August.
    7. Xun Lan & Christopher Adams & Mark Landers & Miroslav Dudas & Daniel Krissinger & George Marnellos & Russell Bonneville & Maoxiong Xu & Junbai Wang & Tim H-M Huang & Gavin Meredith & Victor X Jin, 2011. "High Resolution Detection and Analysis of CpG Dinucleotides Methylation Using MBD-Seq Technology," PLOS ONE, Public Library of Science, vol. 6(7), pages 1-11, July.
    8. Zhengdong D Zhang & Joel Rozowsky & Michael Snyder & Joseph Chang & Mark Gerstein, 2008. "Modeling ChIP Sequencing In Silico with Applications," PLOS Computational Biology, Public Library of Science, vol. 4(8), pages 1-10, August.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0090735. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.