IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v12y2024i5p777-d1351824.html
   My bibliography  Save this article

Handling Overlapping Asymmetric Data Sets—A Twice Penalized P-Spline Approach

Author

Listed:
  • Matthew McTeer

    (School of Computing, Faculty of Science, Agriculture & Engineering, Newcastle University, Newcastle upon Tyne NE1 7RU, UK)

  • Robin Henderson

    (School of Mathematics, Statistics and Physics, Faculty of Science, Agriculture & Engineering, Newcastle University, Newcastle upon Tyne NE1 7RU, UK)

  • Quentin M. Anstee

    (Translational & Clinical Research Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne NE1 7RU, UK)

  • Paolo Missier

    (School of Computer Science, University of Birmingham, Birmingham B15 2TT, UK)

Abstract

Aims: Overlapping asymmetric data sets are where a large cohort of observations have a small amount of information recorded, and within this group there exists a smaller cohort which have extensive further information available. Missing imputation is unwise if cohort size differs substantially; therefore, we aim to develop a way of modelling the smaller cohort whilst considering the larger. Methods: Through considering traditionally once penalized P-Spline approximations, we create a second penalty term through observing discrepancies in the marginal value of covariates that exist in both cohorts. Our now twice penalized P-Spline is designed to firstly prevent over/under-fitting of the smaller cohort and secondly to consider the larger cohort. Results: Through a series of data simulations, penalty parameter tunings, and model adaptations, our twice penalized model offers up to a 58% and 46% improvement in model fit upon a continuous and binary response, respectively, against existing B-Spline and once penalized P-Spline methods. Applying our model to an individual’s risk of developing steatohepatitis, we report an over 65% improvement over existing methods. Conclusions: We propose a twice penalized P-Spline method which can vastly improve the model fit of overlapping asymmetric data sets upon a common predictive endpoint, without the need for missing data imputation.

Suggested Citation

  • Matthew McTeer & Robin Henderson & Quentin M. Anstee & Paolo Missier, 2024. "Handling Overlapping Asymmetric Data Sets—A Twice Penalized P-Spline Approach," Mathematics, MDPI, vol. 12(5), pages 1-33, March.
  • Handle: RePEc:gam:jmathe:v:12:y:2024:i:5:p:777-:d:1351824
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/12/5/777/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/12/5/777/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. van Buuren, Stef & Groothuis-Oudshoorn, Karin, 2011. "mice: Multivariate Imputation by Chained Equations in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 45(i03).
    2. Simon N. Wood, 2003. "Thin plate regression splines," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 65(1), pages 95-114, February.
    3. Brezger, Andreas & Steiner, Winfried J., 2008. "Monotonic Regression Based on Bayesian PSplines: An Application to Estimating Price Response Functions From Store-Level Scanner Data," Journal of Business & Economic Statistics, American Statistical Association, vol. 26, pages 90-104, January.
    4. Bremhorst, Vincent & Lambert, Philippe, 2016. "Flexible estimation in cure survival models using Bayesian P-splines," LIDAM Reprints ISBA 2016002, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    5. Bremhorst, Vincent & Lambert, Philippe, 2016. "Flexible estimation in cure survival models using Bayesian P-splines," Computational Statistics & Data Analysis, Elsevier, vol. 93(C), pages 270-284.
    6. Aris Perperoglou & Paul Eilers, 2010. "Penalized regression with individual deviance effects," Computational Statistics, Springer, vol. 25(2), pages 341-361, June.
    7. Aldrin, Magne, 2006. "Improved predictions penalizing both slope and curvature in additive models," Computational Statistics & Data Analysis, Elsevier, vol. 50(2), pages 267-284, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Narisetty, Naveen & Koenker, Roger, 2022. "Censored quantile regression survival models with a cure proportion," Journal of Econometrics, Elsevier, vol. 226(1), pages 192-203.
    2. Dirick, Lore & Claeskens, Gerda & Vasnev, Andrey & Baesens, Bart, 2022. "A hierarchical mixture cure model with unobserved heterogeneity for credit risk," Econometrics and Statistics, Elsevier, vol. 22(C), pages 39-55.
    3. Michaela Kreyenfeld & Dirk Konietzka & Philippe Lambert & Vincent Jerald Ramos, 2023. "Second Birth Fertility in Germany: Social Class, Gender, and the Role of Economic Uncertainty," European Journal of Population, Springer;European Association for Population Studies, vol. 39(1), pages 1-27, December.
    4. Mohamed Elamin Abdallah Mohamed Elamin Omer & Mohd Rizam Abu Bakar & Mohd Bakri Adam & Mohd Shafie Mustafa, 2020. "Cure Models with Exponentiated Weibull Exponential Distribution for the Analysis of Melanoma Patients," Mathematics, MDPI, vol. 8(11), pages 1-15, November.
    5. Gressani, Oswaldo & Lambert, Philippe, 2016. "Fast Bayesian inference in semi-parametric P-spline cure survival models using Laplace approximations," LIDAM Discussion Papers ISBA 2016041, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    6. Bremhorst, Vincent & Kreyenfeld, Michaela & Lambert, Philippe, 2017. "Nonparametric double additive cure survival models: an application to the estimation of the nonlinear effect of age at first parenthood on fertility progression," LIDAM Discussion Papers ISBA 2017004, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    7. Gressani, Oswaldo & Lambert, Philippe, 2018. "Fast Bayesian inference using Laplace approximations in a flexible promotion time cure model based on P-splines," Computational Statistics & Data Analysis, Elsevier, vol. 124(C), pages 151-167.
    8. Patricio Maturana-Russel & Renate Meyer, 2021. "Bayesian spectral density estimation using P-splines with quantile-based knot placement," Computational Statistics, Springer, vol. 36(3), pages 2055-2077, September.
    9. Gabriel Riutort-Mayol & Virgilio Gómez-Rubio & José Luis Lerma & Julio M. del Hoyo-Meléndez, 2020. "Correlated Functional Models with Derivative Information for Modeling Microfading Spectrometry Data on Rock Art Paintings," Mathematics, MDPI, vol. 8(12), pages 1-25, December.
    10. Philippe Lambert, 2023. "Comments on: Nonparametric estimation in mixture cure models with covariates," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 32(2), pages 506-509, June.
    11. Lambert, Philippe & Kreyenfeld, Michaela, 2023. "Exogenous time-varying covariates in double additive cure survival model with application to fertility," LIDAM Discussion Papers ISBA 2023006, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    12. Philippe Lambert & Vincent Bremhorst, 2020. "Inclusion of time‐varying covariates in cure survival models with an application in fertility studies," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 183(1), pages 333-354, January.
    13. Minnie M. Joo & Brandon Bolte & Nguyen Huynh & Bumba Mukherjee, 2023. "Bayesian Spatial Split-Population Survival Model with Applications to Democratic Regime Failure and Civil War Recurrence," Mathematics, MDPI, vol. 11(8), pages 1-23, April.
    14. Vincent Bremhorst & Michaela Kreyenfeld & Philippe Lambert, 2016. "Fertility progression in Germany: An analysis using flexible nonparametric cure survival models," Demographic Research, Max Planck Institute for Demographic Research, Rostock, Germany, vol. 35(18), pages 505-534.
    15. M.L. Nores & M.P. Díaz, 2016. "Bootstrap hypothesis testing in generalized additive models for comparing curves of treatments in longitudinal studies," Journal of Applied Statistics, Taylor & Francis Journals, vol. 43(5), pages 810-826, April.
    16. Noémi Kreif & Richard Grieve & Iván Díaz & David Harrison, 2015. "Evaluation of the Effect of a Continuous Treatment: A Machine Learning Approach with an Application to Treatment for Traumatic Brain Injury," Health Economics, John Wiley & Sons, Ltd., vol. 24(9), pages 1213-1228, September.
    17. Abhilash Bandam & Eedris Busari & Chloi Syranidou & Jochen Linssen & Detlef Stolten, 2022. "Classification of Building Types in Germany: A Data-Driven Modeling Approach," Data, MDPI, vol. 7(4), pages 1-23, April.
    18. Georgios Gioldasis & Antonio Musolesi & Michel Simioni, 2020. "Model uncertainty, nonlinearities and out-of-sample comparison: evidence from international technology diffusion," Working Papers hal-02790523, HAL.
    19. Boonstra Philip S. & Little Roderick J.A. & West Brady T. & Andridge Rebecca R. & Alvarado-Leiton Fernanda, 2021. "A Simulation Study of Diagnostics for Selection Bias," Journal of Official Statistics, Sciendo, vol. 37(3), pages 751-769, September.
    20. Christopher J Greenwood & George J Youssef & Primrose Letcher & Jacqui A Macdonald & Lauryn J Hagg & Ann Sanson & Jenn Mcintosh & Delyse M Hutchinson & John W Toumbourou & Matthew Fuller-Tyszkiewicz &, 2020. "A comparison of penalised regression methods for informing the selection of predictive markers," PLOS ONE, Public Library of Science, vol. 15(11), pages 1-14, November.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:12:y:2024:i:5:p:777-:d:1351824. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.