IDEAS home Printed from https://ideas.repec.org/a/inm/orijds/v1y2022i2p115-137.html
   My bibliography  Save this article

Weakly Supervised Multi-output Regression via Correlated Gaussian Processes

Author

Listed:
  • Seokhyun Chung

    (Department of Industrial & Operations Engineering, University of Michigan, Ann Arbor, Michigan 48109)

  • Raed Al Kontar

    (Department of Industrial & Operations Engineering, University of Michigan, Ann Arbor, Michigan 48109)

  • Zhenke Wu

    (Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48109)

Abstract

Multi-output regression seeks to borrow strength and leverage commonalities across different but related outputs in order to enhance learning and prediction accuracy. A fundamental assumption is that the output/group membership labels for all observations are known. This assumption is often violated in real applications. For instance, in healthcare data sets, sensitive attributes such as ethnicity are often missing or unreported. To this end, we introduce a weakly supervised multi-output model based on dependent Gaussian processes. Our approach is able to leverage data without complete group labels or possibly only prior belief on group memberships to enhance accuracy across all outputs. Through intensive simulations and case studies on insulin, testosterone and body fat data sets, we show that our model excels in multi-output settings with missing labels while being competitive in traditional fully labeled settings. We end by highlighting the possible use of our approach in fair inference and sequential decision making.

Suggested Citation

  • Seokhyun Chung & Raed Al Kontar & Zhenke Wu, 2022. "Weakly Supervised Multi-output Regression via Correlated Gaussian Processes," INFORMS Joural on Data Science, INFORMS, vol. 1(2), pages 115-137, October.
  • Handle: RePEc:inm:orijds:v:1:y:2022:i:2:p:115-137
    DOI: 10.1287/ijds.2022.0018
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/ijds.2022.0018
    Download Restriction: no

    File URL: https://libkey.io/10.1287/ijds.2022.0018?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. David M. Blei & Alp Kucukelbir & Jon D. McAuliffe, 2017. "Variational Inference: A Review for Statisticians," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(518), pages 859-877, April.
    2. Kaufman, Cari G. & Schervish, Mark J. & Nychka, Douglas W., 2008. "Covariance Tapering for Likelihood-Based Estimation in Large Spatial Data Sets," Journal of the American Statistical Association, American Statistical Association, vol. 103(484), pages 1545-1555.
    3. Jing Xie & Peter I. Frazier & Stephen E. Chick, 2016. "Bayesian Optimization via Simulation with Pairwise Sampling and Correlated Prior Beliefs," Operations Research, INFORMS, vol. 64(2), pages 542-559, April.
    4. Reinhard Furrer & Marc G. Genton, 2011. "Aggregation-cokriging for highly multivariate spatial data," Biometrika, Biometrika Trust, vol. 98(3), pages 615-631.
    5. Nerini, David & Monestiez, Pascal & Manté, Claude, 2010. "Cokriging for spatial functional data," Journal of Multivariate Analysis, Elsevier, vol. 101(2), pages 409-418, February.
    6. Jialei Wang & Scott C. Clark & Eric Liu & Peter I. Frazier, 2020. "Parallel Bayesian Global Optimization of Expensive Functions," Operations Research, INFORMS, vol. 68(6), pages 1850-1865, November.
    7. Yanming Li & Bin Nan & Ji Zhu, 2015. "Multivariate sparse group lasso for the multivariate multiple linear regression with an arbitrary group structure," Biometrics, The International Biometric Society, vol. 71(2), pages 354-363, June.
    8. Izenman, Alan Julian, 1975. "Reduced-rank regression for the multivariate linear model," Journal of Multivariate Analysis, Elsevier, vol. 5(2), pages 248-264, June.
    9. Ilya O. Ryzhov & Warren B. Powell & Peter I. Frazier, 2012. "The Knowledge Gradient Algorithm for a General Class of Online Learning Problems," Operations Research, INFORMS, vol. 60(1), pages 180-195, February.
    10. Leo Breiman & Jerome H. Friedman, 1997. "Predicting Multivariate Responses in Multiple Linear Regression," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 59(1), pages 3-54.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Wang, Yihe & Zhao, Sihai Dave, 2021. "A nonparametric empirical Bayes approach to large-scale multivariate regression," Computational Statistics & Data Analysis, Elsevier, vol. 156(C).
    2. Andres Alban & Stephen E. Chick & Martin Forster, 2023. "Value-Based Clinical Trials: Selecting Recruitment Rates and Trial Lengths in Different Regulatory Contexts," Management Science, INFORMS, vol. 69(6), pages 3516-3535, June.
    3. Kohei Yoshikawa & Shuichi Kawano, 2023. "Sparse reduced-rank regression for simultaneous rank and variable selection via manifold optimization," Computational Statistics, Springer, vol. 38(1), pages 53-75, March.
    4. Paul Hewson & Keming Yu, 2008. "Quantile regression for binary performance indicators," Applied Stochastic Models in Business and Industry, John Wiley & Sons, vol. 24(5), pages 401-418, September.
    5. Roman Flury & Reinhard Furrer, 2021. "Discussion on Competition for Spatial Statistics for Large Datasets," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 26(4), pages 599-603, December.
    6. Matthias Katzfuss & Joseph Guinness & Wenlong Gong & Daniel Zilber, 2020. "Vecchia Approximations of Gaussian-Process Predictions," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 25(3), pages 383-414, September.
    7. Fassò, A. & Finazzi, F. & Madonna, F., 2018. "Statistical issues in radiosonde observation of atmospheric temperature and humidity profiles," Statistics & Probability Letters, Elsevier, vol. 136(C), pages 97-100.
    8. Alain-Philippe Fortin & Patrick Gagliardini & O. Scaillet, 2022. "Eigenvalue tests for the number of latent factors in short panels," Swiss Finance Institute Research Paper Series 22-81, Swiss Finance Institute.
    9. Jewson Stephen & Penzer Jeremy, 2006. "Estimating Trends in Weather Series: Consequences for Pricing Derivatives," Studies in Nonlinear Dynamics & Econometrics, De Gruyter, vol. 10(3), pages 1-17, September.
    10. Moreno Bevilacqua & Alfredo Alegria & Daira Velandia & Emilio Porcu, 2016. "Composite Likelihood Inference for Multivariate Gaussian Random Fields," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 21(3), pages 448-469, September.
    11. Hansen, Peter Reinhard, 2003. "Structural changes in the cointegrated vector autoregressive model," Journal of Econometrics, Elsevier, vol. 114(2), pages 261-295, June.
    12. Luebke, Karsten & Czogiel, Irina & Weihs, Claus, 2004. "Latent Factor Prediction Pursuit for Rank Deficient Regressors," Technical Reports 2004,75, Technische Universität Dortmund, Sonderforschungsbereich 475: Komplexitätsreduktion in multivariaten Datenstrukturen.
    13. Shen Liu & Hongyan Liu, 2021. "Tagging Items Automatically Based on Both Content Information and Browsing Behaviors," INFORMS Journal on Computing, INFORMS, vol. 33(3), pages 882-897, July.
    14. Bura, Efstathia & Cook, R. Dennis, 2003. "Rank estimation in reduced-rank regression," Journal of Multivariate Analysis, Elsevier, vol. 87(1), pages 159-176, October.
    15. Mark Semelhago & Barry L. Nelson & Eunhye Song & Andreas Wächter, 2021. "Rapid Discrete Optimization via Simulation with Gaussian Markov Random Fields," INFORMS Journal on Computing, INFORMS, vol. 33(3), pages 915-930, July.
    16. Ranadeep Daw & Christopher K. Wikle, 2023. "REDS: Random ensemble deep spatial prediction," Environmetrics, John Wiley & Sons, Ltd., vol. 34(1), February.
    17. Chen, Canyi & Xu, Wangli & Zhu, Liping, 2022. "Distributed estimation in heterogeneous reduced rank regression: With application to order determination in sufficient dimension reduction," Journal of Multivariate Analysis, Elsevier, vol. 190(C).
    18. Loaiza-Maya, Rubén & Smith, Michael Stanley & Nott, David J. & Danaher, Peter J., 2022. "Fast and accurate variational inference for models with many latent variables," Journal of Econometrics, Elsevier, vol. 230(2), pages 339-362.
    19. Sun, Ying & Chang, Xiaohui & Guan, Yongtao, 2018. "Flexible and efficient estimating equations for variogram estimation," Computational Statistics & Data Analysis, Elsevier, vol. 122(C), pages 45-58.
    20. Andrés García-Medina & Graciela González Farías, 2020. "Transfer entropy as a variable selection methodology of cryptocurrencies in the framework of a high dimensional predictive model," PLOS ONE, Public Library of Science, vol. 15(1), pages 1-31, January.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:orijds:v:1:y:2022:i:2:p:115-137. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.