IDEAS home Printed from https://ideas.repec.org/a/spr/jglopt/v77y2020i3d10.1007_s10898-020-00876-1.html
   My bibliography  Save this article

Subset selection for multiple linear regression via optimization

Author

Listed:
  • Young Woong Park

    (Iowa State University)

  • Diego Klabjan

    (Northwestern University)

Abstract

Subset selection in multiple linear regression aims to choose a subset of candidate explanatory variables that tradeoff fitting error (explanatory power) and model complexity (number of variables selected). We build mathematical programming models for regression subset selection based on mean square and absolute errors, and minimal-redundancy–maximal-relevance criteria. The proposed models are tested using a linear-program-based branch-and-bound algorithm with tailored valid inequalities and big M values and are compared against the algorithms in the literature. For high dimensional cases, an iterative heuristic algorithm is proposed based on the mathematical programming models and a core set concept, and a randomized version of the algorithm is derived to guarantee convergence to the global optimum. From the computational experiments, we find that our models quickly find a quality solution while the rest of the time is spent to prove optimality; the iterative algorithms find solutions in a relatively short time and are competitive compared to state-of-the-art algorithms; using ad-hoc big M values is not recommended.

Suggested Citation

  • Young Woong Park & Diego Klabjan, 2020. "Subset selection for multiple linear regression via optimization," Journal of Global Optimization, Springer, vol. 77(3), pages 543-574, July.
  • Handle: RePEc:spr:jglopt:v:77:y:2020:i:3:d:10.1007_s10898-020-00876-1
    DOI: 10.1007/s10898-020-00876-1
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10898-020-00876-1
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10898-020-00876-1?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. A. Charnes & W. W. Cooper & R. O. Ferguson, 1955. "Optimal Estimation of Executive Compensation by Linear Programming," Management Science, INFORMS, vol. 1(2), pages 138-151, January.
    2. Kyoungmi Hwang & Dohyun Kim & Kyungsik Lee & Chungmok Lee & Sungsoo Park, 2017. "Embedded variable selection method using signomial classification," Annals of Operations Research, Springer, vol. 254(1), pages 89-109, July.
    3. Dimitris Bertsimas & Romy Shioda, 2009. "Algorithm for cardinality-constrained quadratic optimization," Computational Optimization and Applications, Springer, vol. 43(1), pages 1-22, May.
    4. P. S. Bradley & O. L. Mangasarian & W. N. Street, 1998. "Feature Selection via Mathematical Programming," INFORMS Journal on Computing, INFORMS, vol. 10(2), pages 209-217, May.
    5. Dimitris Bertsimas & Angela King, 2016. "OR Forum—An Algorithmic Approach to Linear Regression," Operations Research, INFORMS, vol. 64(1), pages 2-16, February.
    6. DE FARIAS, Ismael R. & NEMHAUSER, Georges L., 2003. "A polyhedral study of the cardinality constrained knapsack problem," LIDAM Reprints CORE 1634, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE).
    7. Fred Glover, 1975. "Improved Linear Integer Programming Formulations of Nonlinear Integer Problems," Management Science, INFORMS, vol. 22(4), pages 455-460, December.
    8. Miyashiro, Ryuhei & Takano, Yuichi, 2015. "Mixed integer second-order cone programming formulations for variable selection in linear regression," European Journal of Operational Research, Elsevier, vol. 247(3), pages 721-731.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Serrano, Breno & Minner, Stefan & Schiffer, Maximilian & Vidal, Thibaut, 2024. "Bilevel optimization for feature selection in the data-driven newsvendor problem," European Journal of Operational Research, Elsevier, vol. 315(2), pages 703-714.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ricardo M. Lima & Ignacio E. Grossmann, 2017. "On the solution of nonconvex cardinality Boolean quadratic programming problems: a computational study," Computational Optimization and Applications, Springer, vol. 66(1), pages 1-37, January.
    2. Andrés Gómez & Oleg A. Prokopyev, 2021. "A Mixed-Integer Fractional Optimization Approach to Best Subset Selection," INFORMS Journal on Computing, INFORMS, vol. 33(2), pages 551-565, May.
    3. Dimitris Bertsimas & Ryan Cory-Wright, 2022. "A Scalable Algorithm for Sparse Portfolio Selection," INFORMS Journal on Computing, INFORMS, vol. 34(3), pages 1489-1511, May.
    4. Leonardo Di Gangi & M. Lapucci & F. Schoen & A. Sortino, 2019. "An efficient optimization approach for best subset selection in linear regression, with application to model selection and fitting in autoregressive time-series," Computational Optimization and Applications, Springer, vol. 74(3), pages 919-948, December.
    5. Ryuta Tamura & Ken Kobayashi & Yuichi Takano & Ryuhei Miyashiro & Kazuhide Nakata & Tomomi Matsui, 2019. "Mixed integer quadratic optimization formulations for eliminating multicollinearity based on variance inflation factor," Journal of Global Optimization, Springer, vol. 73(2), pages 431-446, February.
    6. Toshiki Sato & Yuichi Takano & Ryuhei Miyashiro & Akiko Yoshise, 2016. "Feature subset selection for logistic regression via mixed integer optimization," Computational Optimization and Applications, Springer, vol. 64(3), pages 865-880, July.
    7. Joey Huchette & Joey Huchette, 2019. "A Combinatorial Approach for Small and Strong Formulations of Disjunctive Constraints," Mathematics of Operations Research, INFORMS, vol. 44(3), pages 793-820, August.
    8. Lili Pan & Ziyan Luo & Naihua Xiu, 2017. "Restricted Robinson Constraint Qualification and Optimality for Cardinality-Constrained Cone Programming," Journal of Optimization Theory and Applications, Springer, vol. 175(1), pages 104-118, October.
    9. Wojtek Michalowski & Włodzimierz Ogryczak, 2001. "Extending the MAD portfolio optimization model to incorporate downside risk aversion," Naval Research Logistics (NRL), John Wiley & Sons, vol. 48(3), pages 185-200, April.
    10. Kadziński, MiŁosz & Greco, Salvatore & SŁowiński, Roman, 2012. "Extreme ranking analysis in robust ordinal regression," Omega, Elsevier, vol. 40(4), pages 488-501.
    11. Thomas L. Saaty, 2013. "The Modern Science of Multicriteria Decision Making and Its Practical Applications: The AHP/ANP Approach," Operations Research, INFORMS, vol. 61(5), pages 1101-1118, October.
    12. Serrano, Breno & Minner, Stefan & Schiffer, Maximilian & Vidal, Thibaut, 2024. "Bilevel optimization for feature selection in the data-driven newsvendor problem," European Journal of Operational Research, Elsevier, vol. 315(2), pages 703-714.
    13. Yanagida, John F. & Book, Don N., 1984. "Application of the Least Absolute Value Technique as a Data Filter for Detecting Structural Change in the Demand for Meat," Northeastern Journal of Agricultural and Resource Economics, Northeastern Agricultural and Resource Economics Association, vol. 0(Number 1), pages 1-5, April.
    14. Yokoyama, Ryohei & Kitano, Hiroyuki & Wakui, Tetsuya, 2017. "Optimal operation of heat supply systems with piping network," Energy, Elsevier, vol. 137(C), pages 888-897.
    15. Hamalainen, Raimo P. & Mantysaari, Juha, 2002. "Dynamic multi-objective heating optimization," European Journal of Operational Research, Elsevier, vol. 142(1), pages 1-15, October.
    16. Christodoulos Floudas & Xiaoxia Lin, 2005. "Mixed Integer Linear Programming in Process Scheduling: Modeling, Algorithms, and Applications," Annals of Operations Research, Springer, vol. 139(1), pages 131-162, October.
    17. Gupta, Renu & Bandopadhyaya, Lakshmisree & Puri, M. C., 1996. "Ranking in quadratic integer programming problems," European Journal of Operational Research, Elsevier, vol. 95(1), pages 231-236, November.
    18. Osman, Hany & Demirli, Kudret, 2010. "A bilinear goal programming model and a modified Benders decomposition algorithm for supply chain reconfiguration and supplier selection," International Journal of Production Economics, Elsevier, vol. 124(1), pages 97-105, March.
    19. Demirci, Mehmet & Bettinger, Pete, 2015. "Using mixed integer multi-objective goal programming for stand tending block designation: A case study from Turkey," Forest Policy and Economics, Elsevier, vol. 55(C), pages 28-36.
    20. Tsai, Wen-Hsien & Hsu, Jui-Ling, 2008. "Corporate social responsibility programs choice and costs assessment in the airline industry—A hybrid model," Journal of Air Transport Management, Elsevier, vol. 14(4), pages 188-196.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:jglopt:v:77:y:2020:i:3:d:10.1007_s10898-020-00876-1. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.