IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v202y2025ics0167947324001270.html
   My bibliography  Save this article

Spline regression with automatic knot selection

Author

Listed:
  • Goepp, Vivien
  • Bouaziz, Olivier
  • Nuel, Grégory

Abstract

Spline regression has proven to be a useful tool for nonparametric regression. The flexibility of this function family is based on basepoints defining shifts in the behavior of the function – called knots. The question of setting the adequate number of knots and their placement is usually overcome by penalizing over the spline's overall smoothness (e.g. P-splines). However, there are areas of application where finding the best knot placement is of interest. A new method is introduced for automatically selecting knots in spline regression. The approach consists in setting many initial knots and fitting the spline regression through a penalized likelihood procedure called adaptive ridge, which discards the least relevant knots. The method – called A-splines, for adaptive splines – compares favorably with other knot selection methods: it runs way faster (∼10 to ∼400 faster) than comparable methods and has close to equal predictive performance. A-splines are applied to both simulated and real datasets.

Suggested Citation

  • Goepp, Vivien & Bouaziz, Olivier & Nuel, Grégory, 2025. "Spline regression with automatic knot selection," Computational Statistics & Data Analysis, Elsevier, vol. 202(C).
  • Handle: RePEc:eee:csdana:v:202:y:2025:i:c:s0167947324001270
    DOI: 10.1016/j.csda.2024.108043
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947324001270
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2024.108043?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Ruppert,David & Wand,M. P. & Carroll,R. J., 2003. "Semiparametric Regression," Cambridge Books, Cambridge University Press, number 9780521785167, January.
    2. Jiahua Chen & Zehua Chen, 2008. "Extended Bayesian information criteria for model selection with large model spaces," Biometrika, Biometrika Trust, vol. 95(3), pages 759-771.
    3. Eilers, Paul H.C. & Currie, Iain D. & Durban, Maria, 2006. "Fast and compact smoothing on large multidimensional grids," Computational Statistics & Data Analysis, Elsevier, vol. 50(1), pages 61-76, January.
    4. Wang Q. & Linton O. & Hardle W., 2004. "Semiparametric Regression Analysis With Missing Response at Random," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 334-345, January.
    5. Ralph C A Rippe & Jacqueline J Meulman & Paul H C Eilers, 2012. "Visualization of Genomic Changes by Segmented Smoothing Using an L0 Penalty," PLOS ONE, Public Library of Science, vol. 7(6), pages 1-14, June.
    6. D. G. T. Denison & B. K. Mallick & A. F. M. Smith, 1998. "Automatic Bayesian curve fitting," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 60(2), pages 333-350.
    7. Florian Frommlet & Grégory Nuel, 2016. "An Adaptive Ridge Procedure for L0 Regularization," PLOS ONE, Public Library of Science, vol. 11(2), pages 1-23, February.
    8. Marx, Brian D. & Eilers, Paul H. C., 1998. "Direct generalized additive modeling with penalized likelihood," Computational Statistics & Data Analysis, Elsevier, vol. 28(2), pages 193-209, August.
    9. Friedman, Jerome H. & Hastie, Trevor & Tibshirani, Rob, 2010. "Regularization Paths for Generalized Linear Models via Coordinate Descent," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 33(i01).
    10. Leitenstorfer, Florian & Tutz, Gerhard, 2007. "Knot selection by boosting techniques," Computational Statistics & Data Analysis, Elsevier, vol. 51(9), pages 4605-4621, May.
    11. M. P. Wand, 2000. "A Comparison of Regression Spline Smoothing Procedures," Computational Statistics, Springer, vol. 15(4), pages 443-462, December.
    12. Wallstrom, Garrick & Liebner, Jeffrey & Kass, Robert E., 2008. "An Implementation of Bayesian Adaptive Regression Splines (BARS) in C with S and R Wrappers," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 26(i01).
    13. Ruppert,David & Wand,M. P. & Carroll,R. J., 2003. "Semiparametric Regression," Cambridge Books, Cambridge University Press, number 9780521780506, January.
    14. C. C. Holmes & B. K. Mallick, 2001. "Bayesian regression with multivariate linear splines," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 63(1), pages 3-17.
    15. Inyoung Kim & Noah D. Cohen & Raymond J. Carroll, 2003. "Semiparametric Regression Splines in Matched Case-Control Studies," Biometrics, The International Biometric Society, vol. 59(4), pages 1158-1169, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Morteza Amini & Mahdi Roozbeh & Nur Anisah Mohamed, 2024. "Separation of the Linear and Nonlinear Covariates in the Sparse Semi-Parametric Regression Model in the Presence of Outliers," Mathematics, MDPI, vol. 12(2), pages 1-17, January.
    2. Maximilian Osterhaus, 2024. "A Sparse Grid Approach for the Nonparametric Estimation of High-Dimensional Random Coefficient Models," Papers 2408.07185, arXiv.org.
    3. Lee, Wang-Sheng & McKinnish, Terra, 2019. "Locus of control and marital satisfaction: Couple perspectives using Australian data," Journal of Economic Psychology, Elsevier, vol. 74(C).
    4. Sun, Shilin & Li, Qi & Hu, Wenyang & Liang, Zhongchao & Wang, Tianyang & Chu, Fulei, 2023. "Wind turbine blade breakage detection based on environment-adapted contrastive learning," Renewable Energy, Elsevier, vol. 219(P2).
    5. Arūnas P. Verbyla & Joanne Faveri & John D. Wilkie & Tom Lewis, 2018. "Tensor Cubic Smoothing Splines in Designed Experiments Requiring Residual Modelling," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 23(4), pages 478-508, December.
    6. Dlugosz, Stephan & Mammen, Enno & Wilke, Ralf A., 2017. "Generalized partially linear regression with misclassified data and an application to labour market transitions," Computational Statistics & Data Analysis, Elsevier, vol. 110(C), pages 145-159.
    7. Leitenstorfer, Florian & Tutz, Gerhard, 2007. "Knot selection by boosting techniques," Computational Statistics & Data Analysis, Elsevier, vol. 51(9), pages 4605-4621, May.
    8. Akdeniz Duran, Esra & Härdle, Wolfgang Karl & Osipenko, Maria, 2012. "Difference based ridge and Liu type estimators in semiparametric regression models," Journal of Multivariate Analysis, Elsevier, vol. 105(1), pages 164-175.
    9. Kalogridis, Ioannis & Van Aelst, Stefan, 2023. "Robust penalized estimators for functional linear regression," Journal of Multivariate Analysis, Elsevier, vol. 194(C).
    10. Elizabeth Goult & Laura Andrea Barrero Guevara & Michael Briga & Matthieu Domenech de Cellès, 2024. "Estimating the optimal age for infant measles vaccination," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    11. Caldeira, João F. & Santos, André A.P. & Torrent, Hudson S., 2023. "Semiparametric portfolios: Improving portfolio performance by exploiting non-linearities in firm characteristics," Economic Modelling, Elsevier, vol. 122(C).
    12. Hübler, Olaf, 2017. "Health and weight – gender-specific linkages under heterogeneity, interdependence and resilience factors," Economics & Human Biology, Elsevier, vol. 26(C), pages 96-111.
    13. Schmid, Matthias & Hothorn, Torsten, 2008. "Boosting additive models using component-wise P-Splines," Computational Statistics & Data Analysis, Elsevier, vol. 53(2), pages 298-311, December.
    14. Yu Yue & Paul Speckman & Dongchu Sun, 2012. "Priors for Bayesian adaptive spline smoothing," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 64(3), pages 577-613, June.
    15. Shirun Shen & Huiya Zhou & Kejun He & Lan Zhou, 2024. "Principal Component Analysis of Two-dimensional Functional Data with Serial Correlation," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 29(3), pages 601-620, September.
    16. Liu, Jingyuan & Lou, Lejia & Li, Runze, 2018. "Variable selection for partially linear models via partial correlation," Journal of Multivariate Analysis, Elsevier, vol. 167(C), pages 418-434.
    17. Afonso, António & Alves, José & Beck, Krzysztof & Jackson, Karen, 2024. "Financial, institutional, and macroeconomic determinants of cross-country portfolio equity flows: The case of developed countries," Economic Modelling, Elsevier, vol. 141(C).
    18. Mark J. Meyer & Haobo Cheng & Katherine Hobbs Knutson, 2023. "Bayesian Analysis of Multivariate Matched Proportions with Sparse Response," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 15(2), pages 490-509, July.
    19. Giancarlo Aquila & Lucas Barros Scianni Morais & Victor Augusto Durães de Faria & José Wanderley Marangon Lima & Luana Medeiros Marangon Lima & Anderson Rodrigo de Queiroz, 2023. "An Overview of Short-Term Load Forecasting for Electricity Systems Operational Planning: Machine Learning Methods and the Brazilian Experience," Energies, MDPI, vol. 16(21), pages 1-35, November.
    20. David O'Donnell & Alastair Rushworth & Adrian W. Bowman & E. Marian Scott & Mark Hallard, 2014. "Flexible regression models over river networks," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 63(1), pages 47-63, January.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:202:y:2025:i:c:s0167947324001270. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.