IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1002725.html
   My bibliography  Save this article

A Graphical Modelling Approach to the Dissection of Highly Correlated Transcription Factor Binding Site Profiles

Author

Listed:
  • Robert Stojnic
  • Audrey Qiuyan Fu
  • Boris Adryan

Abstract

Inferring the combinatorial regulatory code of transcription factors (TFs) from genome-wide TF binding profiles is challenging. A major reason is that TF binding profiles significantly overlap and are therefore highly correlated. Clustered occurrence of multiple TFs at genomic sites may arise from chromatin accessibility and local cooperation between TFs, or binding sites may simply appear clustered if the profiles are generated from diverse cell populations. Overlaps in TF binding profiles may also result from measurements taken at closely related time intervals. It is thus of great interest to distinguish TFs that directly regulate gene expression from those that are indirectly associated with gene expression. Graphical models, in particular Bayesian networks, provide a powerful mathematical framework to infer different types of dependencies. However, existing methods do not perform well when the features (here: TF binding profiles) are highly correlated, when their association with the biological outcome is weak, and when the sample size is small. Here, we develop a novel computational method, the Neighbourhood Consistent PC (NCPC) algorithms, which deal with these scenarios much more effectively than existing methods do. We further present a novel graphical representation, the Direct Dependence Graph (DDGraph), to better display the complex interactions among variables. NCPC and DDGraph can also be applied to other problems involving highly correlated biological features. Both methods are implemented in the R package ddgraph, available as part of Bioconductor (http://bioconductor.org/packages/2.11/bioc/html/ddgraph.html). Applied to real data, our method identified TFs that specify different classes of cis-regulatory modules (CRMs) in Drosophila mesoderm differentiation. Our analysis also found depletion of the early transcription factor Twist binding at the CRMs regulating expression in visceral and somatic muscle cells at later stages, which suggests a CRM-specific repression mechanism that so far has not been characterised for this class of mesodermal CRMs. Author Summary: Transcription factors (TFs) are proteins that bind to DNA and regulate gene expression. Recent technological advances make it possible to map TF binding patterns across the whole genome. Multiple single-gene studies showed that combinatorial binding of multiple transcription factors determines the gene transcriptional output. A common naive assumption is that correlated binding profiles may indicate combinatorial binding. However, it has been found that many TFs bind to distinct hotspots whose role is currently unclear. It is thus of great interest to find transcription factor combinations whose correlated binding is causally most immediate to gene expression. Building upon theories of statistical dependence and causality, we develop novel graphical modelbased algorithms that handle highly correlated transcription factor binding profiles more efficiently and reliably than existing algorithms do. These algorithms can also be applied to other biological areas involving highly correlated variables, such as the analysis of high-throughput gene knock-down experiments.

Suggested Citation

  • Robert Stojnic & Audrey Qiuyan Fu & Boris Adryan, 2012. "A Graphical Modelling Approach to the Dissection of Highly Correlated Transcription Factor Binding Site Profiles," PLOS Computational Biology, Public Library of Science, vol. 8(11), pages 1-13, November.
  • Handle: RePEc:plo:pcbi00:1002725
    DOI: 10.1371/journal.pcbi.1002725
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002725
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1002725&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1002725?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Yoseph Barash & John A. Calarco & Weijun Gao & Qun Pan & Xinchen Wang & Ofer Shai & Benjamin J. Blencowe & Brendan J. Frey, 2010. "Deciphering the splicing code," Nature, Nature, vol. 465(7294), pages 53-59, May.
    2. Frank Schnorrer & Cornelia Schönbauer & Christoph C. H. Langer & Georg Dietzl & Maria Novatchkova & Katharina Schernhuber & Michaela Fellner & Anna Azaryan & Martin Radolf & Alexander Stark & Krystyna, 2010. "Systematic genetic analysis of muscle morphogenesis and function in Drosophila," Nature, Nature, vol. 464(7286), pages 287-291, March.
    3. Scutari, Marco, 2010. "Learning Bayesian Networks with the bnlearn R Package," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 35(i03).
    4. Jennifer L. Mummery-Widmer & Masakazu Yamazaki & Thomas Stoeger & Maria Novatchkova & Sheetal Bhalerao & Doris Chen & Georg Dietzl & Barry J. Dickson & Juergen A. Knoblich, 2009. "Genome-wide analysis of Notch signalling in Drosophila by transgenic RNAi," Nature, Nature, vol. 458(7241), pages 987-992, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Oksana Netschitailo & Yidong Wang & Anna Wagner & Vivien Sommer & Eveline C. Verhulst & Martin Beye, 2023. "The function and evolution of a genetic switch controlling sexually dimorphic eye differentiation in honeybees," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    2. Asmundur Oddsson & Patrick Sulem & Gardar Sveinbjornsson & Gudny A. Arnadottir & Valgerdur Steinthorsdottir & Gisli H. Halldorsson & Bjarni A. Atlason & Gudjon R. Oskarsson & Hannes Helgason & Henriet, 2023. "Deficit of homozygosity among 1.52 million individuals and genetic causes of recessive lethality," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    3. Prabal Das & D. A. Sachindra & Kironmala Chanda, 2022. "Machine Learning-Based Rainfall Forecasting with Multiple Non-Linear Feature Selection Algorithms," Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), Springer;European Water Resources Association (EWRA), vol. 36(15), pages 6043-6071, December.
    4. Vuong, Quan-Hoang & La, Viet-Phuong, 2019. "The bayesvl R package. User guide v0.8.1," OSF Preprints w5dx6, Center for Open Science.
    5. F. Cugnata & G. Perucca & S. Salini, 2017. "Bayesian networks and the assessment of universities' value added," Journal of Applied Statistics, Taylor & Francis Journals, vol. 44(10), pages 1785-1806, July.
    6. Ridvan Eksi & Hong-Dong Li & Rajasree Menon & Yuchen Wen & Gilbert S Omenn & Matthias Kretzler & Yuanfang Guan, 2013. "Systematically Differentiating Functions for Alternatively Spliced Isoforms through Integrating RNA-seq Data," PLOS Computational Biology, Public Library of Science, vol. 9(11), pages 1-16, November.
    7. Roland R. Ramsahai, 2020. "Connecting actuarial judgment to probabilistic learning techniques with graph theory," Papers 2007.15475, arXiv.org.
    8. Tang, Kayu & Parsons, David J. & Jude, Simon, 2019. "Comparison of automatic and guided learning for Bayesian networks to analyse pipe failures in the water distribution system," Reliability Engineering and System Safety, Elsevier, vol. 186(C), pages 24-36.
    9. Myriam Patricia Cifuentes & Clara Mercedes Suarez & Ricardo Cifuentes & Noel Malod-Dognin & Sam Windels & Jose Fernando Valderrama & Paul D. Juarez & R. Burciaga Valdez & Cynthia Colen & Charles Phill, 2022. "Big Data to Knowledge Analytics Reveals the Zika Virus Epidemic as Only One of Multiple Factors Contributing to a Year-Over-Year 28-Fold Increase in Microcephaly Incidence," IJERPH, MDPI, vol. 19(15), pages 1-21, July.
    10. Silvia de Juan & Maria Dulce Subida & Andres Ospina-Alvarez & Ainara Aguilar & Miriam Fernandez, 2020. "Disentangling the socio-ecological drivers behind illegal fishing in a small-scale fishery managed by a TURF system," Papers 2012.08970, arXiv.org.
    11. Meineri, Eric & Dahlberg, C. Johan & Hylander, Kristoffer, 2015. "Using Gaussian Bayesian Networks to disentangle direct and indirect associations between landscape physiography, environmental variables and species distribution," Ecological Modelling, Elsevier, vol. 313(C), pages 127-136.
    12. Michail Tsagris, 2021. "A New Scalable Bayesian Network Learning Algorithm with Applications to Economics," Computational Economics, Springer;Society for Computational Economics, vol. 57(1), pages 341-367, January.
    13. Prasanna Katti & Peter T. Ajayi & Angel Aponte & Christopher K. E. Bleck & Brian Glancy, 2022. "Identification of evolutionarily conserved regulators of muscle mitochondrial network organization," Nature Communications, Nature, vol. 13(1), pages 1-18, December.
    14. Michael J. Brusco & Douglas Steinley & Ashley L. Watts, 2022. "Disentangling relationships in symptom networks using matrix permutation methods," Psychometrika, Springer;The Psychometric Society, vol. 87(1), pages 133-155, March.
    15. Sangsung Park & Sunghae Jun, 2020. "Patent Keyword Analysis of Disaster Artificial Intelligence Using Bayesian Network Modeling and Factor Analysis," Sustainability, MDPI, vol. 12(2), pages 1-11, January.
    16. Federica Cugnata & Silvia Salini & Elena Siletti, 2021. "Deepening Well-Being Evaluation with Different Data Sources: A Bayesian Networks Approach," IJERPH, MDPI, vol. 18(15), pages 1-10, July.
    17. Bibartiu, Otto & Dürr, Frank & Rothermel, Kurt & Ottenwälder, Beate & Grau, Andreas, 2021. "Scalable k-out-of-n models for dependability analysis with Bayesian networks," Reliability Engineering and System Safety, Elsevier, vol. 210(C).
    18. Lingfei Wang, 2021. "Single-cell normalization and association testing unifying CRISPR screen and gene co-expression analyses with Normalisr," Nature Communications, Nature, vol. 12(1), pages 1-13, December.
    19. Bruce G. Marcot & Anca M. Hanea, 2021. "What is an optimal value of k in k-fold cross-validation in discrete Bayesian network analysis?," Computational Statistics, Springer, vol. 36(3), pages 2009-2031, September.
    20. Ryan G. Lim & Osama Al-Dalahmah & Jie Wu & Maxwell P. Gold & Jack C. Reidling & Guomei Tang & Miriam Adam & David K. Dansu & Hye-Jin Park & Patrizia Casaccia & Ricardo Miramontes & Andrea M. Reyes-Ort, 2022. "Huntington disease oligodendrocyte maturation deficits revealed by single-nucleus RNAseq are rescued by thiamine-biotin supplementation," Nature Communications, Nature, vol. 13(1), pages 1-23, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1002725. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.