IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v13y2022i1d10.1038_s41467-022-34360-z.html
   My bibliography  Save this article

CLIMB: High-dimensional association detection in large scale genomic data

Author

Listed:
  • Hillary Koch

    (Pennsylvania State University)

  • Cheryl A. Keller

    (Pennsylvania State University)

  • Guanjue Xiang

    (Pennsylvania State University)

  • Belinda Giardine

    (Pennsylvania State University)

  • Feipeng Zhang

    (Xi’an Jiaotong University)

  • Yicheng Wang

    (University of British Columbia)

  • Ross C. Hardison

    (Pennsylvania State University
    Pennsylvania State University)

  • Qunhua Li

    (Pennsylvania State University
    Pennsylvania State University)

Abstract

Joint analyses of genomic datasets obtained in multiple different conditions are essential for understanding the biological mechanism that drives tissue-specificity and cell differentiation, but they still remain computationally challenging. To address this we introduce CLIMB (Composite LIkelihood eMpirical Bayes), a statistical methodology that learns patterns of condition-specificity present in genomic data. CLIMB provides a generic framework facilitating a host of analyses, such as clustering genomic features sharing similar condition-specific patterns and identifying which of these features are involved in cell fate commitment. We apply CLIMB to three sets of hematopoietic data, which examine CTCF ChIP-seq measured in 17 different cell populations, RNA-seq measured across constituent cell populations in three committed lineages, and DNase-seq in 38 cell populations. Our results show that CLIMB improves upon existing alternatives in statistical precision, while capturing interpretable and biologically relevant clusters in the data.

Suggested Citation

  • Hillary Koch & Cheryl A. Keller & Guanjue Xiang & Belinda Giardine & Feipeng Zhang & Yicheng Wang & Ross C. Hardison & Qunhua Li, 2022. "CLIMB: High-dimensional association detection in large scale genomic data," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
  • Handle: RePEc:nat:natcom:v:13:y:2022:i:1:d:10.1038_s41467-022-34360-z
    DOI: 10.1038/s41467-022-34360-z
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-022-34360-z
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-022-34360-z?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Ferguson John P. & Cho Judy H. & Zhao Hongyu, 2012. "A New Approach for the Joint Analysis of Multiple Chip-Seq Libraries with Application to Histone Modification," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(3), pages 1-21, February.
    2. Timothée Flutre & Xiaoquan Wen & Jonathan Pritchard & Matthew Stephens, 2013. "A Statistical Framework for Joint eQTL Analysis in Multiple Tissues," PLOS Genetics, Public Library of Science, vol. 9(5), pages 1-13, May.
    3. Robert E. Thurman & Eric Rynes & Richard Humbert & Jeff Vierstra & Matthew T. Maurano & Eric Haugen & Nathan C. Sheffield & Andrew B. Stergachis & Hao Wang & Benjamin Vernot & Kavita Garg & Sam John &, 2012. "The accessible chromatin landscape of the human genome," Nature, Nature, vol. 489(7414), pages 75-82, September.
    4. Wouter Meuleman & Alexander Muratov & Eric Rynes & Jessica Halow & Kristen Lee & Daniel Bates & Morgan Diegel & Douglas Dunn & Fidencio Neri & Athanasios Teodosiadis & Alex Reynolds & Eric Haugen & Je, 2020. "Index and biological spectrum of human DNase I hypersensitive sites," Nature, Nature, vol. 584(7820), pages 244-251, August.
    5. Steffen Fieuws & Geert Verbeke & Filip Boen & Christophe Delecluse, 2006. "High dimensional multivariate mixed models for binary questionnaire data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 55(4), pages 449-460.
    6. Molenberghs, Geert & Verbeke, Geert & Iddi, Samuel, 2011. "Pseudo-likelihood methodology for partitioned large and complex samples," Statistics & Probability Letters, Elsevier, vol. 81(7), pages 892-901, July.
    7. Jeff Vierstra & John Lazar & Richard Sandstrom & Jessica Halow & Kristen Lee & Daniel Bates & Morgan Diegel & Douglas Dunn & Fidencio Neri & Eric Haugen & Eric Rynes & Alex Reynolds & Jemma Nelson & A, 2020. "Global reference mapping of human transcription factor footprints," Nature, Nature, vol. 583(7818), pages 729-736, July.
    8. Dmitri D. Pervouchine & Sarah Djebali & Alessandra Breschi & Carrie A. Davis & Pablo Prieto Barja & Alex Dobin & Andrea Tanzer & Julien Lagarde & Chris Zaleski & Lei-Hoon See & Meagan Fastuca & Jorg D, 2015. "Enhanced transcriptome maps from multiple mouse tissues reveal evolutionary constraint in gene expression," Nature Communications, Nature, vol. 6(1), pages 1-11, May.
    9. Vivek Behera & Perry Evans & Carolyne J. Face & Nicole Hamagami & Laavanya Sankaranarayanan & Cheryl A. Keller & Belinda Giardine & Kai Tan & Ross C. Hardison & Junwei Shi & Gerd A. Blobel, 2018. "Exploiting genetic variation to uncover rules of transcription factor binding and chromatin accessibility," Nature Communications, Nature, vol. 9(1), pages 1-15, December.
    10. D. R. Cox, 2004. "A note on pseudolikelihood constructed from marginal densities," Biometrika, Biometrika Trust, vol. 91(3), pages 729-737, September.
    11. David Amar & Ron Shamir & Daniel Yekutieli, 2017. "Extracting replicable associations across multiple studies: Empirical Bayes algorithms for controlling the false discovery rate," PLOS Computational Biology, Public Library of Science, vol. 13(8), pages 1-22, August.
    12. Yoav Benjamini & Ruth Heller, 2008. "Screening for Partial Conjunction Hypotheses," Biometrics, The International Biometric Society, vol. 64(4), pages 1215-1222, December.
    13. Vasdekis, Vassilis G. S. & Rizopoulos, Dimitris & Moustaki, Irini, 2014. "Weighted pairwise likelihood estimation for a general class of random effects models," LSE Research Online Documents on Economics 56733, London School of Economics and Political Science, LSE Library.
    14. Chib, Siddhartha, 1992. "Bayes inference in the Tobit censored regression model," Journal of Econometrics, Elsevier, vol. 51(1-2), pages 79-99.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Molenberghs, Geert & Verbeke, Geert & Iddi, Samuel, 2011. "Pseudo-likelihood methodology for partitioned large and complex samples," Statistics & Probability Letters, Elsevier, vol. 81(7), pages 892-901, July.
    2. Papageorgiou, Ioulia & Moustaki, Irini, 2019. "Sampling of pairs in pairwise likelihood estimation for latent variable models with categorical observed variables," LSE Research Online Documents on Economics 87592, London School of Economics and Political Science, LSE Library.
    3. K. Florios & I. Moustaki & D. Rizopoulos & V. Vasdekis, 2015. "A modified weighted pairwise likelihood estimator for a class of random effects models," METRON, Springer;Sapienza Università di Roma, vol. 73(2), pages 217-228, August.
    4. Cristiano Varin, 2008. "On composite marginal likelihoods," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 92(1), pages 1-28, February.
    5. L-J Kao & C-C Lu & C-C Chiu, 2011. "The training institution efficiency of the semiconductor institute programme in Taiwan—application of spatiotemporal ICA with DEA approach," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 62(12), pages 2162-2172, December.
    6. Koop, Gary & Poirier, Dale J., 2004. "Bayesian variants of some classical semiparametric regression techniques," Journal of Econometrics, Elsevier, vol. 123(2), pages 259-282, December.
    7. Laura Liu & Hyungsik Roger Moon & Frank Schorfheide, 2023. "Forecasting with a panel Tobit model," Quantitative Economics, Econometric Society, vol. 14(1), pages 117-159, January.
    8. M.-L. Feddag, 2016. "Pairwise likelihood estimation for the normal ogive model with binary data," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 100(2), pages 223-237, April.
    9. Geweke, J. & Joel Horowitz & Pesaran, M.H., 2006. "Econometrics: A Bird’s Eye View," Cambridge Working Papers in Economics 0655, Faculty of Economics, University of Cambridge.
    10. Alexendar R. Perez & Laura Sala & Richard K. Perez & Joana A. Vidigal, 2021. "CSC software corrects off-target mediated gRNA depletion in CRISPR-Cas9 essentiality screens," Nature Communications, Nature, vol. 12(1), pages 1-11, December.
    11. Ho-Chuan Huang, 2001. "Bayesian analysis of the SUR Tobit model," Applied Economics Letters, Taylor & Francis Journals, vol. 8(9), pages 617-622.
    12. Lee Fawcett & David Walshaw, 2014. "Estimating the probability of simultaneous rainfall extremes within a region: a spatial approach," Journal of Applied Statistics, Taylor & Francis Journals, vol. 41(5), pages 959-976, May.
    13. Jorge E. Araña & Carmelo J. León, 2012. "Scale-perception bias in the valuation of environmental risks," Applied Economics, Taylor & Francis Journals, vol. 44(20), pages 2607-2617, July.
    14. Pakel, Cavit, 2019. "Bias reduction in nonlinear and dynamic panels in the presence of cross-section dependence," Journal of Econometrics, Elsevier, vol. 213(2), pages 459-492.
    15. Singh, Sonika & Ratchford, Brian T. & Prasad, Ashutosh, 2014. "Offline and Online Search in Used Durables Markets," Journal of Retailing, Elsevier, vol. 90(3), pages 301-320.
    16. Ji, Yonggang & Lin, Nan & Zhang, Baoxue, 2012. "Model selection in binary and tobit quantile regression using the Gibbs sampler," Computational Statistics & Data Analysis, Elsevier, vol. 56(4), pages 827-839.
    17. Keane, Michael & Stavrunova, Olena, 2016. "Adverse selection, moral hazard and the demand for Medigap insurance," Journal of Econometrics, Elsevier, vol. 190(1), pages 62-78.
    18. Luc Bauwens & Michel Lubrano, 2007. "Bayesian Inference in Dynamic Disequilibrium Models: An Application to the Polish Credit Market," Econometric Reviews, Taylor & Francis Journals, vol. 26(2-4), pages 469-486.
    19. Koji Miyawaki & Yasuhiro Omori & Akira Hibiki, 2018. "A discrete/continuous choice model on a nonconvex budget set," Econometric Reviews, Taylor & Francis Journals, vol. 37(2), pages 89-113, February.
    20. Bradlow, Eric T. & Gangwar, Manish & Kopalle, Praveen & Voleti, Sudhir, 2017. "The Role of Big Data and Predictive Analytics in Retailing," Journal of Retailing, Elsevier, vol. 93(1), pages 79-95.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:13:y:2022:i:1:d:10.1038_s41467-022-34360-z. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.