IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v197y2024ics0167947324000586.html
   My bibliography  Save this article

Bayesian simultaneous factorization and prediction using multi-omic data

Author

Listed:
  • Samorodnitsky, Sarah
  • Wendt, Chris H.
  • Lock, Eric F.

Abstract

Integrative factorization methods for multi-omic data estimate factors explaining biological variation. Factors can be treated as covariates to predict an outcome and the factorization can be used to impute missing values. However, no available methods provide a comprehensive framework for statistical inference and uncertainty quantification for these tasks. A novel framework, Bayesian Simultaneous Factorization (BSF), is proposed to decompose multi-omics variation into joint and individual structures simultaneously within a probabilistic framework. BSF uses conjugate normal priors and the posterior mode of this model can be estimated by solving a structured nuclear norm-penalized objective that also achieves rank selection and motivates the choice of hyperparameters. BSF is then extended to simultaneously predict a continuous or binary phenotype while estimating latent factors, termed Bayesian Simultaneous Factorization and Prediction (BSFP). BSF and BSFP accommodate concurrent imputation, i.e., imputation during the model-fitting process, and full posterior inference for missing data, including “blockwise” missingness. It is shown via simulation that BSFP is competitive in recovering latent variation structure, and demonstrate the importance of accounting for uncertainty in the estimated factorization within the predictive model. The imputation performance of BSF is examined via simulation under missing-at-random and missing-not-at-random assumptions. Finally, BSFP is used to predict lung function based on the bronchoalveolar lavage metabolome and proteome from a study of HIV-associated obstructive lung disease, revealing multi-omic patterns related to lung function decline and a cluster of patients with obstructive lung disease driven by shared metabolomic and proteomic abundance patterns.

Suggested Citation

  • Samorodnitsky, Sarah & Wendt, Chris H. & Lock, Eric F., 2024. "Bayesian simultaneous factorization and prediction using multi-omic data," Computational Statistics & Data Analysis, Elsevier, vol. 197(C).
  • Handle: RePEc:eee:csdana:v:197:y:2024:i:c:s0167947324000586
    DOI: 10.1016/j.csda.2024.107974
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947324000586
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2024.107974?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Kowarik, Alexander & Templ, Matthias, 2016. "Imputation with the R Package VIM," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 74(i07).
    2. Ronglai Shen & Qianxing Mo & Nikolaus Schultz & Venkatraman E Seshan & Adam B Olshen & Jason Huse & Marc Ladanyi & Chris Sander, 2012. "Integrative Subtype Discovery in Glioblastoma Using iCluster," PLOS ONE, Public Library of Science, vol. 7(4), pages 1-9, April.
    3. Sirio Legramanti & Daniele Durante & David B Dunson, 2020. "Bayesian cumulative shrinkage for infinite factorizations," Biometrika, Biometrika Trust, vol. 107(3), pages 745-752.
    4. Thierry Chekouo & Francesco C. Stingo & James D. Doecke & Kim-Anh Do, 2017. "A Bayesian integrative approach for multi-platform genomic data: A kidney cancer case study," Biometrics, The International Biometric Society, vol. 73(2), pages 615-624, June.
    5. Palzer, Elise F. & Wendt, Christine H. & Bowler, Russell P. & Hersh, Craig P. & Safo, Sandra E. & Lock, Eric F., 2022. "sJIVE: Supervised joint and individual variation explained," Computational Statistics & Data Analysis, Elsevier, vol. 175(C).
    6. A. Bhattacharya & D. B. Dunson, 2011. "Sparse Bayesian infinite factor models," Biometrika, Biometrika Trust, vol. 98(2), pages 291-306.
    7. Jun Young Park & Eric F. Lock, 2020. "Integrative factorization of bidimensionally linked matrices," Biometrics, The International Biometric Society, vol. 76(1), pages 61-74, March.
    8. Sandra E. Safo & Eun Jeong Min & Lillian Haine, 2022. "Sparse linear discriminant analysis for multiview structured data," Biometrics, The International Biometric Society, vol. 78(2), pages 612-623, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Sylvia Fruhwirth-Schnatter, 2023. "Generalized Cumulative Shrinkage Process Priors with Applications to Sparse Bayesian Factor Analysis," Papers 2303.00473, arXiv.org.
    2. Kim, Jonathan & Sandri, Brian J. & Rao, Raghavendra B. & Lock, Eric F., 2023. "Bayesian predictive modeling of multi-source multi-way data," Computational Statistics & Data Analysis, Elsevier, vol. 186(C).
    3. Florian Huber & Gary Koop, 2023. "Subspace shrinkage in conjugate Bayesian vector autoregressions," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 38(4), pages 556-576, June.
    4. Daewon Yang & Taeryon Choi & Eric Lavigne & Yeonseung Chung, 2022. "Non‐parametric Bayesian covariate‐dependent multivariate functional clustering: An application to time‐series data for multiple air pollutants," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 71(5), pages 1521-1542, November.
    5. Dimitris Korobilis & Kenichi Shimizu, 2022. "Bayesian Approaches to Shrinkage and Sparse Estimation," Foundations and Trends(R) in Econometrics, now publishers, vol. 11(4), pages 230-354, June.
    6. Daniel R. Kowal & Antonio Canale, 2021. "Semiparametric Functional Factor Models with Bayesian Rank Selection," Papers 2108.02151, arXiv.org, revised May 2022.
    7. Sylvia Fruhwirth-Schnatter & Darjus Hosszejni & Hedibert Freitas Lopes, 2023. "When it counts -- Econometric identification of the basic factor model based on GLT structures," Papers 2301.06354, arXiv.org.
    8. Darjus Hosszejni & Sylvia Fruhwirth-Schnatter, 2022. "Cover It Up! Bipartite Graphs Uncover Identifiability in Sparse Factor Analysis," Papers 2211.00671, arXiv.org, revised Nov 2022.
    9. Kastner, Gregor, 2019. "Sparse Bayesian time-varying covariance estimation in many dimensions," Journal of Econometrics, Elsevier, vol. 210(1), pages 98-115.
    10. Bai, Jushan & Ando, Tomohiro, 2013. "Multifactor asset pricing with a large number of observable risk factors and unobservable common and group-specific factors," MPRA Paper 52785, University Library of Munich, Germany, revised Dec 2013.
    11. Conti, Gabriella & Frühwirth-Schnatter, Sylvia & Heckman, James J. & Piatek, Rémi, 2014. "Bayesian exploratory factor analysis," Journal of Econometrics, Elsevier, vol. 183(1), pages 31-57.
    12. Niko Hauzenberger & Maximilian Bock & Michael Pfarrhofer & Anna Stelzer & Gregor Zens, 2018. "Implications of macroeconomic volatility in the Euro area," Papers 1801.02925, arXiv.org, revised Jun 2018.
    13. Henry Webel & Lili Niu & Annelaura Bach Nielsen & Marie Locard-Paulet & Matthias Mann & Lars Juhl Jensen & Simon Rasmussen, 2024. "Imputation of label-free quantitative mass spectrometry-based proteomics data using self-supervised deep learning," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    14. Matthew W. Wheeler, 2019. "Bayesian additive adaptive basis tensor product models for modeling high dimensional surfaces: an application to high‐throughput toxicity testing," Biometrics, The International Biometric Society, vol. 75(1), pages 193-201, March.
    15. Bram Janssens & Matthias Bogaert & Mathijs Maton, 2023. "Predicting the next Pogačar: a data analytical approach to detect young professional cycling talents," Annals of Operations Research, Springer, vol. 325(1), pages 557-588, June.
    16. Bonnie R. Joubert & Marianthi-Anna Kioumourtzoglou & Toccara Chamberlain & Hua Yun Chen & Chris Gennings & Mary E. Turyk & Marie Lynn Miranda & Thomas F. Webster & Katherine B. Ensor & David B. Dunson, 2022. "Powering Research through Innovative Methods for Mixtures in Epidemiology (PRIME) Program: Novel and Expanded Statistical Methods," IJERPH, MDPI, vol. 19(3), pages 1-24, January.
    17. Chhetri, Netra & Ghimire, Rajiv & Wagner, Melissa & Wang, Meng, 2020. "Global citizen deliberation: Case of world-wide views on climate and energy," Energy Policy, Elsevier, vol. 147(C).
    18. Ieva Burakauskaitė & Andrius Čiginas, 2023. "An Approach to Integrating a Non-Probability Sample in the Population Census," Mathematics, MDPI, vol. 11(8), pages 1-14, April.
    19. Roberta De Vito & Ruggero Bellio & Lorenzo Trippa & Giovanni Parmigiani, 2019. "Multi‐study factor analysis," Biometrics, The International Biometric Society, vol. 75(1), pages 337-346, March.
    20. Joshua Chan, 2023. "BVARs and Stochastic Volatility," Papers 2310.14438, arXiv.org.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:197:y:2024:i:c:s0167947324000586. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.