IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v197y2024ics0167947324000586.html
   My bibliography  Save this article

Bayesian simultaneous factorization and prediction using multi-omic data

Author

Listed:
  • Samorodnitsky, Sarah
  • Wendt, Chris H.
  • Lock, Eric F.

Abstract

Integrative factorization methods for multi-omic data estimate factors explaining biological variation. Factors can be treated as covariates to predict an outcome and the factorization can be used to impute missing values. However, no available methods provide a comprehensive framework for statistical inference and uncertainty quantification for these tasks. A novel framework, Bayesian Simultaneous Factorization (BSF), is proposed to decompose multi-omics variation into joint and individual structures simultaneously within a probabilistic framework. BSF uses conjugate normal priors and the posterior mode of this model can be estimated by solving a structured nuclear norm-penalized objective that also achieves rank selection and motivates the choice of hyperparameters. BSF is then extended to simultaneously predict a continuous or binary phenotype while estimating latent factors, termed Bayesian Simultaneous Factorization and Prediction (BSFP). BSF and BSFP accommodate concurrent imputation, i.e., imputation during the model-fitting process, and full posterior inference for missing data, including “blockwise” missingness. It is shown via simulation that BSFP is competitive in recovering latent variation structure, and demonstrate the importance of accounting for uncertainty in the estimated factorization within the predictive model. The imputation performance of BSF is examined via simulation under missing-at-random and missing-not-at-random assumptions. Finally, BSFP is used to predict lung function based on the bronchoalveolar lavage metabolome and proteome from a study of HIV-associated obstructive lung disease, revealing multi-omic patterns related to lung function decline and a cluster of patients with obstructive lung disease driven by shared metabolomic and proteomic abundance patterns.

Suggested Citation

  • Samorodnitsky, Sarah & Wendt, Chris H. & Lock, Eric F., 2024. "Bayesian simultaneous factorization and prediction using multi-omic data," Computational Statistics & Data Analysis, Elsevier, vol. 197(C).
  • Handle: RePEc:eee:csdana:v:197:y:2024:i:c:s0167947324000586
    DOI: 10.1016/j.csda.2024.107974
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947324000586
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2024.107974?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Kowarik, Alexander & Templ, Matthias, 2016. "Imputation with the R Package VIM," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 74(i07).
    2. Ronglai Shen & Qianxing Mo & Nikolaus Schultz & Venkatraman E Seshan & Adam B Olshen & Jason Huse & Marc Ladanyi & Chris Sander, 2012. "Integrative Subtype Discovery in Glioblastoma Using iCluster," PLOS ONE, Public Library of Science, vol. 7(4), pages 1-9, April.
    3. Sirio Legramanti & Daniele Durante & David B Dunson, 2020. "Bayesian cumulative shrinkage for infinite factorizations," Biometrika, Biometrika Trust, vol. 107(3), pages 745-752.
    4. Palzer, Elise F. & Wendt, Christine H. & Bowler, Russell P. & Hersh, Craig P. & Safo, Sandra E. & Lock, Eric F., 2022. "sJIVE: Supervised joint and individual variation explained," Computational Statistics & Data Analysis, Elsevier, vol. 175(C).
    5. Jun Young Park & Eric F. Lock, 2020. "Integrative factorization of bidimensionally linked matrices," Biometrics, The International Biometric Society, vol. 76(1), pages 61-74, March.
    6. Sandra E. Safo & Eun Jeong Min & Lillian Haine, 2022. "Sparse linear discriminant analysis for multiview structured data," Biometrics, The International Biometric Society, vol. 78(2), pages 612-623, June.
    7. Thierry Chekouo & Francesco C. Stingo & James D. Doecke & Kim-Anh Do, 2017. "A Bayesian integrative approach for multi-platform genomic data: A kidney cancer case study," Biometrics, The International Biometric Society, vol. 73(2), pages 615-624, June.
    8. A. Bhattacharya & D. B. Dunson, 2011. "Sparse Bayesian infinite factor models," Biometrika, Biometrika Trust, vol. 98(2), pages 291-306.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Daewon Yang & Taeryon Choi & Eric Lavigne & Yeonseung Chung, 2022. "Non‐parametric Bayesian covariate‐dependent multivariate functional clustering: An application to time‐series data for multiple air pollutants," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 71(5), pages 1521-1542, November.
    2. Daniel R. Kowal & Antonio Canale, 2021. "Semiparametric Functional Factor Models with Bayesian Rank Selection," Papers 2108.02151, arXiv.org, revised May 2022.
    3. Darjus Hosszejni & Sylvia Fruhwirth-Schnatter, 2022. "Cover It Up! Bipartite Graphs Uncover Identifiability in Sparse Factor Analysis," Papers 2211.00671, arXiv.org, revised Nov 2022.
    4. Sylvia Fruhwirth-Schnatter, 2023. "Generalized Cumulative Shrinkage Process Priors with Applications to Sparse Bayesian Factor Analysis," Papers 2303.00473, arXiv.org.
    5. Kim, Jonathan & Sandri, Brian J. & Rao, Raghavendra B. & Lock, Eric F., 2023. "Bayesian predictive modeling of multi-source multi-way data," Computational Statistics & Data Analysis, Elsevier, vol. 186(C).
    6. Florian Huber & Gary Koop, 2023. "Subspace shrinkage in conjugate Bayesian vector autoregressions," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 38(4), pages 556-576, June.
    7. Dimitris Korobilis & Kenichi Shimizu, 2022. "Bayesian Approaches to Shrinkage and Sparse Estimation," Foundations and Trends(R) in Econometrics, now publishers, vol. 11(4), pages 230-354, June.
    8. Sylvia Fruhwirth-Schnatter & Darjus Hosszejni & Hedibert Freitas Lopes, 2023. "When it counts -- Econometric identification of the basic factor model based on GLT structures," Papers 2301.06354, arXiv.org.
    9. Conti, Gabriella & Frühwirth-Schnatter, Sylvia & Heckman, James J. & Piatek, Rémi, 2014. "Bayesian exploratory factor analysis," Journal of Econometrics, Elsevier, vol. 183(1), pages 31-57.
    10. Niko Hauzenberger & Maximilian Bock & Michael Pfarrhofer & Anna Stelzer & Gregor Zens, 2018. "Implications of macroeconomic volatility in the Euro area," Papers 1801.02925, arXiv.org, revised Jun 2018.
    11. Matthew W. Wheeler, 2019. "Bayesian additive adaptive basis tensor product models for modeling high dimensional surfaces: an application to high‐throughput toxicity testing," Biometrics, The International Biometric Society, vol. 75(1), pages 193-201, March.
    12. Chhetri, Netra & Ghimire, Rajiv & Wagner, Melissa & Wang, Meng, 2020. "Global citizen deliberation: Case of world-wide views on climate and energy," Energy Policy, Elsevier, vol. 147(C).
    13. Joshua Chan, 2023. "BVARs and Stochastic Volatility," Papers 2310.14438, arXiv.org.
    14. Durante, Daniele, 2017. "A note on the multiplicative gamma process," Statistics & Probability Letters, Elsevier, vol. 122(C), pages 198-204.
    15. Carlos Miguel Lemos & Ross Joseph Gore & Ivan Puga-Gonzalez & F LeRon Shults, 2019. "Dimensionality and factorial invariance of religiosity among Christians and the religiously unaffiliated: A cross-cultural analysis based on the International Social Survey Programme," PLOS ONE, Public Library of Science, vol. 14(5), pages 1-36, May.
    16. Simon Beyeler & Sylvia Kaufmann, 2021. "Reduced‐form factor augmented VAR—Exploiting sparsity to include meaningful factors," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 36(7), pages 989-1012, November.
    17. Philip A. White & Alan E. Gelfand, 2021. "Multivariate functional data modeling with time-varying clustering," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 30(3), pages 586-602, September.
    18. Gautam Sabnis & Debdeep Pati & Anirban Bhattacharya, 2019. "Compressed Covariance Estimation with Automated Dimension Learning," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 81(2), pages 466-481, December.
    19. Zhiguang Huo & Li Zhu & Tianzhou Ma & Hongcheng Liu & Song Han & Daiqing Liao & Jinying Zhao & George Tseng, 2020. "Two-Way Horizontal and Vertical Omics Integration for Disease Subtype Discovery," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 12(1), pages 1-22, April.
    20. Fangting Zhou & Kejun He & Kunbo Wang & Yanxun Xu & Yang Ni, 2023. "Functional Bayesian networks for discovering causality from multivariate functional data," Biometrics, The International Biometric Society, vol. 79(4), pages 3279-3293, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:197:y:2024:i:c:s0167947324000586. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.