IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v13y2022i1d10.1038_s41467-022-34360-z.html
   My bibliography  Save this article

CLIMB: High-dimensional association detection in large scale genomic data

Author

Listed:
  • Hillary Koch

    (Pennsylvania State University)

  • Cheryl A. Keller

    (Pennsylvania State University)

  • Guanjue Xiang

    (Pennsylvania State University)

  • Belinda Giardine

    (Pennsylvania State University)

  • Feipeng Zhang

    (Xi’an Jiaotong University)

  • Yicheng Wang

    (University of British Columbia)

  • Ross C. Hardison

    (Pennsylvania State University
    Pennsylvania State University)

  • Qunhua Li

    (Pennsylvania State University
    Pennsylvania State University)

Abstract

Joint analyses of genomic datasets obtained in multiple different conditions are essential for understanding the biological mechanism that drives tissue-specificity and cell differentiation, but they still remain computationally challenging. To address this we introduce CLIMB (Composite LIkelihood eMpirical Bayes), a statistical methodology that learns patterns of condition-specificity present in genomic data. CLIMB provides a generic framework facilitating a host of analyses, such as clustering genomic features sharing similar condition-specific patterns and identifying which of these features are involved in cell fate commitment. We apply CLIMB to three sets of hematopoietic data, which examine CTCF ChIP-seq measured in 17 different cell populations, RNA-seq measured across constituent cell populations in three committed lineages, and DNase-seq in 38 cell populations. Our results show that CLIMB improves upon existing alternatives in statistical precision, while capturing interpretable and biologically relevant clusters in the data.

Suggested Citation

  • Hillary Koch & Cheryl A. Keller & Guanjue Xiang & Belinda Giardine & Feipeng Zhang & Yicheng Wang & Ross C. Hardison & Qunhua Li, 2022. "CLIMB: High-dimensional association detection in large scale genomic data," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
  • Handle: RePEc:nat:natcom:v:13:y:2022:i:1:d:10.1038_s41467-022-34360-z
    DOI: 10.1038/s41467-022-34360-z
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-022-34360-z
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-022-34360-z?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Wouter Meuleman & Alexander Muratov & Eric Rynes & Jessica Halow & Kristen Lee & Daniel Bates & Morgan Diegel & Douglas Dunn & Fidencio Neri & Athanasios Teodosiadis & Alex Reynolds & Eric Haugen & Je, 2020. "Index and biological spectrum of human DNase I hypersensitive sites," Nature, Nature, vol. 584(7820), pages 244-251, August.
    2. David Amar & Ron Shamir & Daniel Yekutieli, 2017. "Extracting replicable associations across multiple studies: Empirical Bayes algorithms for controlling the false discovery rate," PLOS Computational Biology, Public Library of Science, vol. 13(8), pages 1-22, August.
    3. Steffen Fieuws & Geert Verbeke & Filip Boen & Christophe Delecluse, 2006. "High dimensional multivariate mixed models for binary questionnaire data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 55(4), pages 449-460, August.
    4. Yoav Benjamini & Ruth Heller, 2008. "Screening for Partial Conjunction Hypotheses," Biometrics, The International Biometric Society, vol. 64(4), pages 1215-1222, December.
    5. Ferguson John P. & Cho Judy H. & Zhao Hongyu, 2012. "A New Approach for the Joint Analysis of Multiple Chip-Seq Libraries with Application to Histone Modification," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(3), pages 1-21, February.
    6. Timothée Flutre & Xiaoquan Wen & Jonathan Pritchard & Matthew Stephens, 2013. "A Statistical Framework for Joint eQTL Analysis in Multiple Tissues," PLOS Genetics, Public Library of Science, vol. 9(5), pages 1-13, May.
    7. Molenberghs, Geert & Verbeke, Geert & Iddi, Samuel, 2011. "Pseudo-likelihood methodology for partitioned large and complex samples," Statistics & Probability Letters, Elsevier, vol. 81(7), pages 892-901, July.
    8. Jeff Vierstra & John Lazar & Richard Sandstrom & Jessica Halow & Kristen Lee & Daniel Bates & Morgan Diegel & Douglas Dunn & Fidencio Neri & Eric Haugen & Eric Rynes & Alex Reynolds & Jemma Nelson & A, 2020. "Global reference mapping of human transcription factor footprints," Nature, Nature, vol. 583(7818), pages 729-736, July.
    9. Robert E. Thurman & Eric Rynes & Richard Humbert & Jeff Vierstra & Matthew T. Maurano & Eric Haugen & Nathan C. Sheffield & Andrew B. Stergachis & Hao Wang & Benjamin Vernot & Kavita Garg & Sam John &, 2012. "The accessible chromatin landscape of the human genome," Nature, Nature, vol. 489(7414), pages 75-82, September.
    10. Vasdekis, Vassilis G. S. & Rizopoulos, Dimitris & Moustaki, Irini, 2014. "Weighted pairwise likelihood estimation for a general class of random effects models," LSE Research Online Documents on Economics 56733, London School of Economics and Political Science, LSE Library.
    11. Dmitri D. Pervouchine & Sarah Djebali & Alessandra Breschi & Carrie A. Davis & Pablo Prieto Barja & Alex Dobin & Andrea Tanzer & Julien Lagarde & Chris Zaleski & Lei-Hoon See & Meagan Fastuca & Jorg D, 2015. "Enhanced transcriptome maps from multiple mouse tissues reveal evolutionary constraint in gene expression," Nature Communications, Nature, vol. 6(1), pages 1-11, May.
    12. Vivek Behera & Perry Evans & Carolyne J. Face & Nicole Hamagami & Laavanya Sankaranarayanan & Cheryl A. Keller & Belinda Giardine & Kai Tan & Ross C. Hardison & Junwei Shi & Gerd A. Blobel, 2018. "Exploiting genetic variation to uncover rules of transcription factor binding and chromatin accessibility," Nature Communications, Nature, vol. 9(1), pages 1-15, December.
    13. D. R. Cox, 2004. "A note on pseudolikelihood constructed from marginal densities," Biometrika, Biometrika Trust, vol. 91(3), pages 729-737, September.
    14. Chib, Siddhartha, 1992. "Bayes inference in the Tobit censored regression model," Journal of Econometrics, Elsevier, vol. 51(1-2), pages 79-99.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Papageorgiou, Ioulia & Moustaki, Irini, 2019. "Sampling of pairs in pairwise likelihood estimation for latent variable models with categorical observed variables," LSE Research Online Documents on Economics 87592, London School of Economics and Political Science, LSE Library.
    2. K. Florios & I. Moustaki & D. Rizopoulos & V. Vasdekis, 2015. "A modified weighted pairwise likelihood estimator for a class of random effects models," METRON, Springer;Sapienza Università di Roma, vol. 73(2), pages 217-228, August.
    3. Cristiano Varin, 2008. "On composite marginal likelihoods," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 92(1), pages 1-28, February.
    4. Molenberghs, Geert & Verbeke, Geert & Iddi, Samuel, 2011. "Pseudo-likelihood methodology for partitioned large and complex samples," Statistics & Probability Letters, Elsevier, vol. 81(7), pages 892-901, July.
    5. Christian Gouriéroux & Alain Monfort, 2017. "Composite Indirect Inference with Application," Working Papers 2017-07, Center for Research in Economics and Statistics.
    6. L-J Kao & C-C Lu & C-C Chiu, 2011. "The training institution efficiency of the semiconductor institute programme in Taiwan—application of spatiotemporal ICA with DEA approach," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 62(12), pages 2162-2172, December.
    7. Tatiyana V. Apanasovich & David Ruppert & Joanne R. Lupton & Natasa Popovic & Nancy D. Turner & Robert S. Chapkin & Raymond J. Carroll, 2008. "Aberrant Crypt Foci and Semiparametric Modeling of Correlated Binary Data," Biometrics, The International Biometric Society, vol. 64(2), pages 490-500, June.
    8. Koop, Gary & Poirier, Dale J., 2004. "Bayesian variants of some classical semiparametric regression techniques," Journal of Econometrics, Elsevier, vol. 123(2), pages 259-282, December.
    9. Paik, Jane & Ying, Zhiliang, 2012. "A composite likelihood approach for spatially correlated survival data," Computational Statistics & Data Analysis, Elsevier, vol. 56(1), pages 209-216, January.
    10. Laura Liu & Hyungsik Roger Moon & Frank Schorfheide, 2023. "Forecasting with a panel Tobit model," Quantitative Economics, Econometric Society, vol. 14(1), pages 117-159, January.
    11. M.-L. Feddag, 2016. "Pairwise likelihood estimation for the normal ogive model with binary data," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 100(2), pages 223-237, April.
    12. Liang-Yu Fu & Tao Zhu & Xinkai Zhou & Ranran Yu & Zhaohui He & Peijing Zhang & Zhigui Wu & Ming Chen & Kerstin Kaufmann & Dijun Chen, 2022. "ChIP-Hub provides an integrative platform for exploring plant regulome," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    13. Jeonghye Choi & David R. Bell & Leonard M. Lodish, 2012. "Traditional and IS-Enabled Customer Acquisition on the Internet," Management Science, INFORMS, vol. 58(4), pages 754-769, April.
    14. Bhat, Chandra R. & Astroza, Sebastian & Sidharthan, Raghuprasad & Alam, Mohammad Jobair Bin & Khushefati, Waleed H., 2014. "A joint count-continuous model of travel behavior with selection based on a multinomial probit residential density choice model," Transportation Research Part B: Methodological, Elsevier, vol. 68(C), pages 31-51.
    15. Geweke, J. & Joel Horowitz & Pesaran, M.H., 2006. "Econometrics: A Bird’s Eye View," Cambridge Working Papers in Economics 0655, Faculty of Economics, University of Cambridge.
    16. Vassilis Vasdekis & Silvia Cagnone & Irini Moustaki, 2012. "A Composite Likelihood Inference in Latent Variable Models for Ordinal Longitudinal Responses," Psychometrika, Springer;The Psychometric Society, vol. 77(3), pages 425-441, July.
    17. Alexendar R. Perez & Laura Sala & Richard K. Perez & Joana A. Vidigal, 2021. "CSC software corrects off-target mediated gRNA depletion in CRISPR-Cas9 essentiality screens," Nature Communications, Nature, vol. 12(1), pages 1-11, December.
    18. Ho-Chuan Huang, 2001. "Bayesian analysis of the SUR Tobit model," Applied Economics Letters, Taylor & Francis Journals, vol. 8(9), pages 617-622.
    19. Sögner, Leopold, 2015. "Learning, convergence and economic constraints," Mathematical Social Sciences, Elsevier, vol. 75(C), pages 27-43.
    20. Fabrice Murtin, 2007. "The Structural Change and the Endogeneity Bias of the College Premium in the United States 1968-2001"," Working Papers 2007-14, Center for Research in Economics and Statistics.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:13:y:2022:i:1:d:10.1038_s41467-022-34360-z. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.