IDEAS home Printed from https://ideas.repec.org/a/bpj/sagmbi/v2y2003i1n5.html
   My bibliography  Save this article

Supervised Detection of Regulatory Motifs in DNA Sequences

Author

Listed:
  • Keles Sunduz

    (Division of Biostatistics, School of Public Health, University of California, Berkeley)

  • van der Laan Mark J.

    (Division of Biostatistics, School of Public Health, University of California, Berkeley)

  • Dudoit Sandrine

    (Division of Biostatistics, School of Public Health, University of California, Berkeley)

  • Xing Biao

    (Division of Biostatistics, School of Public Health, University of California, Berkeley)

  • Eisen Michael B.

    (Department of Molecular and Cell Biology, University of California, Berkeley; Life Sciences Division, Ernest Orlando Lawrence Berkeley National Laboratory, Berkeley)

Abstract

Identification of transcription factor binding sites (regulatory motifs) is a major interest in contemporary biology. We propose a new likelihood based method, COMODE, for identifying structural motifs in DNA sequences. Commonly used methods (e.g. MEME, Gibbs motif sampler) model binding sites as families of sequences described by a position weight matrix (PWM) and identify PWMs that maximize the likelihood of observed sequence data under a simple multinomial mixture model. This model assumes that the positions of the PWM correspond to independent multinomial distributions with four cell probabilities. We address supervising the search for DNA binding sites using the information derived from structural characteristics of protein-DNA interactions. We extend the simple multinomial mixture model to a constrained multinomial mixture model by incorporating constraints on the information content profiles or on specific parameters of the motif PWMs. The parameters of this extended model are estimated by maximum likelihood using a nonlinear constraint optimization method. Likelihood-based cross-validation is used to select model parameters such as motif width and constraint type. The performance of COMODE is compared with existing motif detection methods on simulated data that incorporate real motif examples from Saccharomyces cerevisiae. The proposed method is especially effective when the motif of interest appears as a weak signal in the data. Some of the transcription factor binding data of Lee et al. (2002) were also analyzed using COMODE and biologically verified sites were identified.

Suggested Citation

  • Keles Sunduz & van der Laan Mark J. & Dudoit Sandrine & Xing Biao & Eisen Michael B., 2003. "Supervised Detection of Regulatory Motifs in DNA Sequences," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 2(1), pages 1-40, August.
  • Handle: RePEc:bpj:sagmbi:v:2:y:2003:i:1:n:5
    DOI: 10.2202/1544-6115.1015
    as

    Download full text from publisher

    File URL: https://doi.org/10.2202/1544-6115.1015
    Download Restriction: For access to full text, subscription to the journal or payment for the individual article is required.

    File URL: https://libkey.io/10.2202/1544-6115.1015?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Miloslavsky, Maja & van der Laan, Mark J., 2003. "Fitting of mixtures with unspecified number of components using cross validation distance estimate," Computational Statistics & Data Analysis, Elsevier, vol. 41(3-4), pages 413-428, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Molinaro, Annette M. & Dudoit, Sandrine & van der Laan, M.J.Mark J., 2004. "Tree-based multivariate regression and density estimation with right-censored data," Journal of Multivariate Analysis, Elsevier, vol. 90(1), pages 154-177, July.
    2. Sandrine Dudoit & Mark van der Laan & Sunduz Keles & Annette Molinaro & Sandra Sinisi & Siew Leng Teng, 2004. "Loss-Based Estimation with Cross-Validation: Applications to Microarray Data Analysis and Motif Finding," U.C. Berkeley Division of Biostatistics Working Paper Series 1136, Berkeley Electronic Press.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Proust-Lima, Cécile & Joly, Pierre & Dartigues, Jean-François & Jacqmin-Gadda, Hélène, 2009. "Joint modelling of multivariate longitudinal outcomes and a time-to-event: A nonlinear latent class approach," Computational Statistics & Data Analysis, Elsevier, vol. 53(4), pages 1142-1154, February.
    2. Bohning, Dankmar & Seidel, Wilfried, 2003. "Editorial: recent developments in mixture models," Computational Statistics & Data Analysis, Elsevier, vol. 41(3-4), pages 349-357, January.
    3. Hunt, Lynette A. & Basford, Kaye E., 2016. "Comparing classical criteria for selecting intra-class correlated features in Multimix," Computational Statistics & Data Analysis, Elsevier, vol. 103(C), pages 350-366.
    4. Tao, Jian & Shi, Ning-Zhong & Lee, S.-Y.Sik-Yum, 2004. "Drug risk assessment with determining the number of sub-populations under finite mixture normal models," Computational Statistics & Data Analysis, Elsevier, vol. 46(4), pages 661-676, July.
    5. Dankmar Böhning & Ekkehart Dietz & Ronny Kuhnert & Dieter Schön, 2005. "Mixture models for capture-recapture count data," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 14(1), pages 29-43, February.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bpj:sagmbi:v:2:y:2003:i:1:n:5. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.degruyter.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.