IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1000761.html
   My bibliography  Save this article

Learning “graph-mer” Motifs that Predict Gene Expression Trajectories in Development

Author

Listed:
  • Xuejing Li
  • Casandra Panea
  • Chris H Wiggins
  • Valerie Reinke
  • Christina Leslie

Abstract

A key problem in understanding transcriptional regulatory networks is deciphering what cis regulatory logic is encoded in gene promoter sequences and how this sequence information maps to expression. A typical computational approach to this problem involves clustering genes by their expression profiles and then searching for overrepresented motifs in the promoter sequences of genes in a cluster. However, genes with similar expression profiles may be controlled by distinct regulatory programs. Moreover, if many gene expression profiles in a data set are highly correlated, as in the case of whole organism developmental time series, it may be difficult to resolve fine-grained clusters in the first place. We present a predictive framework for modeling the natural flow of information, from promoter sequence to expression, to learn cis regulatory motifs and characterize gene expression patterns in developmental time courses. We introduce a cluster-free algorithm based on a graph-regularized version of partial least squares (PLS) regression to learn sequence patterns—represented by graphs of k-mers, or “graph-mers”—that predict gene expression trajectories. Applying the approach to wildtype germline development in Caenorhabditis elegans, we found that the first and second latent PLS factors mapped to expression profiles for oocyte and sperm genes, respectively. We extracted both known and novel motifs from the graph-mers associated to these germline-specific patterns, including novel CG-rich motifs specific to oocyte genes. We found evidence supporting the functional relevance of these putative regulatory elements through analysis of positional bias, motif conservation and in situ gene expression. This study demonstrates that our regression model can learn biologically meaningful latent structure and identify potentially functional motifs from subtle developmental time course expression data.Author Summary: A major challenge in functional genomics is to decipher the gene regulatory networks operating in multi-cellular organisms, such as the nematode C. elegans. The expression level of a gene is controlled, to a great extent, by regulatory proteins called transcription factors that bind short motifs in the gene's promoter (regulatory region in the non-coding DNA). In a temporal regulatory process, for example in development, the “regulatory logic” of DNA motifs in the promoter largely determines the gene's expression trajectory, as the gene responds over time to changing concentrations of the transcription factors that control it. This study addresses the problem of learning DNA motifs that predict temporal expression profiles, using genomewide expression data from developmental time series in C. elegans. We developed a novel algorithm based on techniques from multivariate regression that sets up a correspondence between sequence patterns and expression trajectories. Sequence motifs are represented as graphs of sequence-similar k-length subsequences called “graph-mers”. By applying the method to germline development in C. elegans, we found both known and novel DNA motifs associated with oocyte and sperm genes.

Suggested Citation

  • Xuejing Li & Casandra Panea & Chris H Wiggins & Valerie Reinke & Christina Leslie, 2010. "Learning “graph-mer” Motifs that Predict Gene Expression Trajectories in Development," PLOS Computational Biology, Public Library of Science, vol. 6(4), pages 1-13, April.
  • Handle: RePEc:plo:pcbi00:1000761
    DOI: 10.1371/journal.pcbi.1000761
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000761
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1000761&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1000761?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Anshul Kundaje & Xiantong Xin & Changgui Lan & Steve Lianoglou & Mei Zhou & Li Zhang & Christina Leslie, 2008. "A Predictive Model of the Oxygen and Heme Regulatory Network in Yeast," PLOS Computational Biology, Public Library of Science, vol. 4(11), pages 1-21, November.
    2. Eran Segal & Tali Raveh-Sadka & Mark Schroeder & Ulrich Unnerstall & Ulrike Gaul, 2008. "Predicting expression patterns from regulatory sequence in Drosophila segmentation," Nature, Nature, vol. 451(7178), pages 535-540, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Shun Adachi, 2017. "Rigid geometry solves “curse of dimensionality” effects in clustering methods: An application to omics data," PLOS ONE, Public Library of Science, vol. 12(6), pages 1-20, June.
    2. Manuel Cambón & Óscar Sánchez, 2022. "Thermodynamic Modelling of Transcriptional Control: A Sensitivity Analysis," Mathematics, MDPI, vol. 10(13), pages 1-18, June.
    3. Farzaneh Khajouei & Saurabh Sinha, 2018. "An information theoretic treatment of sequence-to-expression modeling," PLOS Computational Biology, Public Library of Science, vol. 14(9), pages 1-24, September.
    4. Amir Shahein & Maria López-Malo & Ivan Istomin & Evan J. Olson & Shiyu Cheng & Sebastian J. Maerkl, 2022. "Systematic analysis of low-affinity transcription factor binding site clusters in vitro and in vivo establishes their functional relevance," Nature Communications, Nature, vol. 13(1), pages 1-17, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1000761. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.