IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0144439.html
   My bibliography  Save this article

Comprehensible Predictive Modeling Using Regularized Logistic Regression and Comorbidity Based Features

Author

Listed:
  • Gregor Stiglic
  • Petra Povalej Brzan
  • Nino Fijacko
  • Fei Wang
  • Boris Delibasic
  • Alexandros Kalousis
  • Zoran Obradovic

Abstract

Different studies have demonstrated the importance of comorbidities to better understand the origin and evolution of medical complications. This study focuses on improvement of the predictive model interpretability based on simple logical features representing comorbidities. We use group lasso based feature interaction discovery followed by a post-processing step, where simple logic terms are added. In the final step, we reduce the feature set by applying lasso logistic regression to obtain a compact set of non-zero coefficients that represent a more comprehensible predictive model. The effectiveness of the proposed approach was demonstrated on a pediatric hospital discharge dataset that was used to build a readmission risk estimation model. The evaluation of the proposed method demonstrates a reduction of the initial set of features in a regression model by 72%, with a slight improvement in the Area Under the ROC Curve metric from 0.763 (95% CI: 0.755–0.771) to 0.769 (95% CI: 0.761–0.777). Additionally, our results show improvement in comprehensibility of the final predictive model using simple comorbidity based terms for logistic regression.

Suggested Citation

  • Gregor Stiglic & Petra Povalej Brzan & Nino Fijacko & Fei Wang & Boris Delibasic & Alexandros Kalousis & Zoran Obradovic, 2015. "Comprehensible Predictive Modeling Using Regularized Logistic Regression and Comorbidity Based Features," PLOS ONE, Public Library of Science, vol. 10(12), pages 1-11, December.
  • Handle: RePEc:plo:pone00:0144439
    DOI: 10.1371/journal.pone.0144439
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0144439
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0144439&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0144439?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Radchenko, Peter & James, Gareth M., 2010. "Variable Selection Using Adaptive Nonlinear Interaction Structures in High Dimensions," Journal of the American Statistical Association, American Statistical Association, vol. 105(492), pages 1541-1553.
    2. Choi, Nam Hee & Li, William & Zhu, Ji, 2010. "Variable Selection With the Strong Heredity Constraint and Its Oracle Property," Journal of the American Statistical Association, American Statistical Association, vol. 105(489), pages 354-364.
    3. Friedman, Jerome H., 2012. "Fast sparse regression and classification," International Journal of Forecasting, Elsevier, vol. 28(3), pages 722-738.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Feihan Lu & Yao Zheng & Harrington Cleveland & Chris Burton & David Madigan, 2018. "Bayesian hierarchical vector autoregressive models for patient-level predictive modeling," PLOS ONE, Public Library of Science, vol. 13(12), pages 1-27, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yao Dong & He Jiang, 2018. "A Two-Stage Regularization Method for Variable Selection and Forecasting in High-Order Interaction Model," Complexity, Hindawi, vol. 2018, pages 1-12, November.
    2. Loann David Denis Desboulets, 2018. "A Review on Variable Selection in Regression Analysis," Econometrics, MDPI, vol. 6(4), pages 1-27, November.
    3. Li Yun & O’Connor George T. & Dupuis Josée & Kolaczyk Eric, 2015. "Modeling gene-covariate interactions in sparse regression with group structure for genome-wide association studies," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 14(3), pages 265-277, June.
    4. Bhatnagar, Sahir R. & Lu, Tianyuan & Lovato, Amanda & Olds, David L. & Kobor, Michael S. & Meaney, Michael J. & O'Donnell, Kieran & Yang, Archer Y. & Greenwood, Celia M.T., 2023. "A sparse additive model for high-dimensional interactions with an exposure variable," Computational Statistics & Data Analysis, Elsevier, vol. 179(C).
    5. Jonathan Boss & Alexander Rix & Yin‐Hsiu Chen & Naveen N. Narisetty & Zhenke Wu & Kelly K. Ferguson & Thomas F. McElrath & John D. Meeker & Bhramar Mukherjee, 2021. "A hierarchical integrative group least absolute shrinkage and selection operator for analyzing environmental mixtures," Environmetrics, John Wiley & Sons, Ltd., vol. 32(8), December.
    6. Wang, Cheng & Chen, Haozhe & Jiang, Binyan, 2024. "HiQR: An efficient algorithm for high-dimensional quadratic regression with penalties," Computational Statistics & Data Analysis, Elsevier, vol. 192(C).
    7. Yawei He & Zehua Chen, 2016. "The EBIC and a sequential procedure for feature selection in interactive linear models with high-dimensional data," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 68(1), pages 155-180, February.
    8. Feng Li & Yajie Li & Sanying Feng, 2021. "Estimation for Varying Coefficient Models with Hierarchical Structure," Mathematics, MDPI, vol. 9(2), pages 1-18, January.
    9. Wang, Lu & Shen, Jincheng & Thall, Peter F., 2014. "A modified adaptive Lasso for identifying interactions in the Cox model with the heredity constraint," Statistics & Probability Letters, Elsevier, vol. 93(C), pages 126-133.
    10. Radchenko, Peter, 2015. "High dimensional single index models," Journal of Multivariate Analysis, Elsevier, vol. 139(C), pages 266-282.
    11. Han Li & Yashu Liu & Pinghua Gong & Changshui Zhang & Jieping Ye & for the Alzheimers Disease Neuroimaging Initiative, 2014. "Hierarchical Interactions Model for Predicting Mild Cognitive Impairment (MCI) to Alzheimer's Disease (AD) Conversion," PLOS ONE, Public Library of Science, vol. 9(1), pages 1-11, January.
    12. He Jiang, 2022. "A novel robust structural quadratic forecasting model and applications," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 41(6), pages 1156-1180, September.
    13. Zeyu Bian & Erica E. M. Moodie & Susan M. Shortreed & Sahir Bhatnagar, 2023. "Variable selection in regression‐based estimation of dynamic treatment regimes," Biometrics, The International Biometric Society, vol. 79(2), pages 988-999, June.
    14. Dewei Zhang & Yin Liu & Sam Davanloo Tajbakhsh, 2022. "A First-Order Optimization Algorithm for Statistical Learning with Hierarchical Sparsity Structure," INFORMS Journal on Computing, INFORMS, vol. 34(2), pages 1126-1140, March.
    15. Florian Ziel, 2015. "Iteratively reweighted adaptive lasso for conditional heteroscedastic time series with applications to AR-ARCH type processes," Papers 1502.06557, arXiv.org, revised Dec 2015.
    16. Fu, Penghui & Tan, Zhiqiang, 2024. "Block-wise primal-dual algorithms for large-scale doubly penalized ANOVA modeling," Computational Statistics & Data Analysis, Elsevier, vol. 194(C).
    17. Justin B. Post & Howard D. Bondell, 2013. "Factor Selection and Structural Identification in the Interaction ANOVA Model," Biometrics, The International Biometric Society, vol. 69(1), pages 70-79, March.
    18. Iason Kynigakis & Ekaterini Panopoulou, 2022. "Does model complexity add value to asset allocation? Evidence from machine learning forecasting models," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 37(3), pages 603-639, April.
    19. Ziel, Florian, 2016. "Iteratively reweighted adaptive lasso for conditional heteroscedastic time series with applications to AR–ARCH type processes," Computational Statistics & Data Analysis, Elsevier, vol. 100(C), pages 773-793.
    20. Fildes, Robert & Ma, Shaohui & Kolassa, Stephan, 2022. "Retail forecasting: Research and practice," International Journal of Forecasting, Elsevier, vol. 38(4), pages 1283-1318.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0144439. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.