IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1006459.html
   My bibliography  Save this article

An information theoretic treatment of sequence-to-expression modeling

Author

Listed:
  • Farzaneh Khajouei
  • Saurabh Sinha

Abstract

Studying a gene’s regulatory mechanisms is a tedious process that involves identification of candidate regulators by transcription factor (TF) knockout or over-expression experiments, delineation of enhancers by reporter assays, and demonstration of direct TF influence by site mutagenesis, among other approaches. Such experiments are often chosen based on the biologist’s intuition, from several testable hypotheses. We pursue the goal of making this process systematic by using ideas from information theory to reason about experiments in gene regulation, in the hope of ultimately enabling rigorous experiment design strategies. For this, we make use of a state-of-the-art mathematical model of gene expression, which provides a way to formalize our current knowledge of cis- as well as trans- regulatory mechanisms of a gene. Ambiguities in such knowledge can be expressed as uncertainties in the model, which we capture formally by building an ensemble of plausible models that fit the existing data and defining a probability distribution over the ensemble. We then characterize the impact of a new experiment on our understanding of the gene’s regulation based on how the ensemble of plausible models and its probability distribution changes when challenged with results from that experiment. This allows us to assess the ‘value’ of the experiment retroactively as the reduction in entropy of the distribution (information gain) resulting from the experiment’s results. We fully formalize this novel approach to reasoning about gene regulation experiments and use it to evaluate a variety of perturbation experiments on two developmental genes of D. melanogaster. We also provide objective and ‘biologist-friendly’ descriptions of the information gained from each such experiment. The rigorously defined information theoretic approaches presented here can be used in the future to formulate systematic strategies for experiment design pertaining to studies of gene regulatory mechanisms.Author summary: In-depth studies of gene regulatory mechanisms employ a variety of experimental approaches such as identifying a gene’s enhancer(s) and testing its variants through reporter assays, followed by transcription factor mis-expression or knockouts, site mutagenesis, etc. The biologist is often faced with the challenging problem of selecting the ideal next experiment to perform so that its results provide novel mechanistic insights, and has to rely on their intuition about what is currently known on the topic and which experiments may add to that knowledge. We seek to make this intuition-based process more systematic, by borrowing ideas from the mature statistical field of experiment design. Towards this goal, we use the language of mathematical models to formally describe what is known about a gene’s regulatory mechanisms, and how an experiment’s results enhance that knowledge. We use information theoretic ideas to assign a ‘value’ to an experiment as well as explain objectively what is learned from that experiment. We demonstrate use of this novel approach on two extensively studied developmental genes in fruitfly. We expect our work to lead to systematic strategies for selecting the most informative experiments in a study of gene regulation.

Suggested Citation

  • Farzaneh Khajouei & Saurabh Sinha, 2018. "An information theoretic treatment of sequence-to-expression modeling," PLOS Computational Biology, Public Library of Science, vol. 14(9), pages 1-24, September.
  • Handle: RePEc:plo:pcbi00:1006459
    DOI: 10.1371/journal.pcbi.1006459
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006459
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1006459&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1006459?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Chris Fraley & Adrian E. Raftery, 2003. "Enhanced Model-Based Clustering, Density Estimation, and Discriminant Analysis Software: MCLUST," Journal of Classification, Springer;The Classification Society, vol. 20(2), pages 263-286, September.
    2. Jason Gertz & Eric D. Siggia & Barak A. Cohen, 2009. "Analysis of combinatorial cis-regulation in synthetic and genomic promoters," Nature, Nature, vol. 457(7226), pages 215-218, January.
    3. Ryan N Gutenkunst & Joshua J Waterfall & Fergal P Casey & Kevin S Brown & Christopher R Myers & James P Sethna, 2007. "Universally Sloppy Parameter Sensitivities in Systems Biology Models," PLOS Computational Biology, Public Library of Science, vol. 3(10), pages 1-8, October.
    4. Eran Segal & Tali Raveh-Sadka & Mark Schroeder & Ulrich Unnerstall & Ulrike Gaul, 2008. "Predicting expression patterns from regulatory sequence in Drosophila segmentation," Nature, Nature, vol. 451(7178), pages 535-540, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Manuel Cambón & Óscar Sánchez, 2022. "Thermodynamic Modelling of Transcriptional Control: A Sensitivity Analysis," Mathematics, MDPI, vol. 10(13), pages 1-18, June.
    2. Mark S. Handcock & Adrian E. Raftery & Jeremy M. Tantrum, 2007. "Model‐based clustering for social networks," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 170(2), pages 301-354, March.
    3. repec:jss:jstsof:14:i12 is not listed on IDEAS
    4. Samuel Bandara & Johannes P Schlöder & Roland Eils & Hans Georg Bock & Tobias Meyer, 2009. "Optimal Experimental Design for Parameter Estimation of a Cell Signaling Model," PLOS Computational Biology, Public Library of Science, vol. 5(11), pages 1-12, November.
    5. Maugis, C. & Celeux, G. & Martin-Magniette, M.-L., 2011. "Variable selection in model-based discriminant analysis," Journal of Multivariate Analysis, Elsevier, vol. 102(10), pages 1374-1387, November.
    6. Cathy Maugis & Gilles Celeux & Marie-Laure Martin-Magniette, 2009. "Variable Selection for Clustering with Gaussian Mixture Models," Biometrics, The International Biometric Society, vol. 65(3), pages 701-709, September.
    7. Jeffrey Andrews & Paul McNicholas, 2014. "Variable Selection for Clustering and Classification," Journal of Classification, Springer;The Classification Society, vol. 31(2), pages 136-153, July.
    8. repec:jss:jstsof:18:i06 is not listed on IDEAS
    9. Adel Dayarian & Madalena Chaves & Eduardo D Sontag & Anirvan M Sengupta, 2009. "Shape, Size, and Robustness: Feasible Regions in the Parameter Space of Biochemical Networks," PLOS Computational Biology, Public Library of Science, vol. 5(1), pages 1-12, January.
    10. Chen, Yu & Yu, Hui & Liu, Chengjie & Xie, Jin & Han, Jun & Dai, Houde, 2024. "Synergistic fusion of physical modeling and data-driven approaches for parameter inference to enzymatic biodiesel production system," Applied Energy, Elsevier, vol. 373(C).
    11. Amrita X Sarkar & Eric A Sobie, 2010. "Regression Analysis for Constraining Free Parameters in Electrophysiological Models of Cardiac Cells," PLOS Computational Biology, Public Library of Science, vol. 6(9), pages 1-11, September.
    12. Hongwei Shao & Tao Peng & Zhiwei Ji & Jing Su & Xiaobo Zhou, 2013. "Systematically Studying Kinase Inhibitor Induced Signaling Network Signatures by Integrating Both Therapeutic and Side Effects," PLOS ONE, Public Library of Science, vol. 8(12), pages 1-16, December.
    13. Zhang, Ping & Serban, Nicoleta, 2007. "Discovery, visualization and performance analysis of enterprise workflow," Computational Statistics & Data Analysis, Elsevier, vol. 51(5), pages 2670-2687, February.
    14. Hennig, Christian, 2008. "Dissolution point and isolation robustness: Robustness criteria for general cluster analysis methods," Journal of Multivariate Analysis, Elsevier, vol. 99(6), pages 1154-1176, July.
    15. Alireza Yazdani & Lu Lu & Maziar Raissi & George Em Karniadakis, 2020. "Systems biology informed deep learning for inferring parameters and hidden dynamics," PLOS Computational Biology, Public Library of Science, vol. 16(11), pages 1-19, November.
    16. Fridtjof Brauns & Leila Iñigo de la Cruz & Werner K.-G. Daalman & Ilse Bruin & Jacob Halatek & Liedewij Laan & Erwin Frey, 2023. "Redundancy and the role of protein copy numbers in the cell polarization machinery of budding yeast," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    17. Hornik, Kurt, 2005. "A CLUE for CLUster Ensembles," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 14(i12).
    18. Eberhard O Voit & Harald A Martens & Stig W Omholt, 2015. "150 Years of the Mass Action Law," PLOS Computational Biology, Public Library of Science, vol. 11(1), pages 1-7, January.
    19. Céline Christiansen-Jucht & Kamil Erguler & Chee Yan Shek & María-Gloria Basáñez & Paul E. Parham, 2015. "Modelling Anopheles gambiae s.s. Population Dynamics with Temperature- and Age-Dependent Survival," IJERPH, MDPI, vol. 12(6), pages 1-31, May.
    20. Salter-Townshend, Michael & Murphy, Thomas Brendan, 2013. "Variational Bayesian inference for the Latent Position Cluster Model for network data," Computational Statistics & Data Analysis, Elsevier, vol. 57(1), pages 661-671.
    21. Gabriele Lillacci & Mustafa Khammash, 2010. "Parameter Estimation and Model Selection in Computational Biology," PLOS Computational Biology, Public Library of Science, vol. 6(3), pages 1-17, March.
    22. Andrew White & Malachi Tolman & Howard D Thames & Hubert Rodney Withers & Kathy A Mason & Mark K Transtrum, 2016. "The Limitations of Model-Based Experimental Design and Parameter Estimation in Sloppy Systems," PLOS Computational Biology, Public Library of Science, vol. 12(12), pages 1-26, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1006459. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.