IDEAS home Printed from https://ideas.repec.org/a/pal/jorsoc/v60y2009i8d10.1057_palgrave.jors.2602651.html
   My bibliography  Save this article

Near-optimal feature selection for large databases

Author

Listed:
  • J Yang

    (Chonbuk National University)

  • S Ólafsson

    (Iowa State University)

Abstract

We analyse a new optimization-based approach for feature selection that uses the nested partitions method for combinatorial optimization as a heuristic search procedure to identify good feature subsets. In particular, we show how to improve the performance of the nested partitions method using random sampling of instances. The new approach uses a two-stage sampling scheme that determines the required sample size to guarantee convergence to a near-optimal solution. This approach therefore also has attractive theoretical characteristics. In particular, when the algorithm terminates in finite time, rigorous statements can be made concerning the quality of the final feature subset. Numerical results are reported to illustrate the key results, and show that the new approach is considerably faster than the original nested partitions method and other feature selection methods.

Suggested Citation

  • J Yang & S Ólafsson, 2009. "Near-optimal feature selection for large databases," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 60(8), pages 1045-1055, August.
  • Handle: RePEc:pal:jorsoc:v:60:y:2009:i:8:d:10.1057_palgrave.jors.2602651
    DOI: 10.1057/palgrave.jors.2602651
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1057/palgrave.jors.2602651
    File Function: Abstract
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1057/palgrave.jors.2602651?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Olafsson, Sigurdur & Li, Xiaonan & Wu, Shuning, 2008. "Operations research and data mining," European Journal of Operational Research, Elsevier, vol. 187(3), pages 1429-1448, June.
    2. Leyuan Shi & Sigurdur Ólafsson, 2000. "Nested Partitions Method for Global Optimization," Operations Research, INFORMS, vol. 48(3), pages 390-407, June.
    3. Sigurdur Ólafsson & Jaekyung Yang, 2005. "Intelligent Partitioning for Feature Selection," INFORMS Journal on Computing, INFORMS, vol. 17(3), pages 339-355, August.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Unler, Alper & Murat, Alper, 2010. "A discrete particle swarm optimization method for feature selection in binary classification problems," European Journal of Operational Research, Elsevier, vol. 206(3), pages 528-539, November.
    2. Meisel, Stephan & Mattfeld, Dirk, 2010. "Synergies of Operations Research and Data Mining," European Journal of Operational Research, Elsevier, vol. 206(1), pages 1-10, October.
    3. Chang, Kuo-Hao & Kuo, Po-Yi, 2018. "An efficient simulation optimization method for the generalized redundancy allocation problem," European Journal of Operational Research, Elsevier, vol. 265(3), pages 1094-1101.
    4. Lee, Loo Hay & Chew, Ek Peng & Manikam, Puvaneswari, 2006. "A general framework on the simulation-based optimization under fixed computing budget," European Journal of Operational Research, Elsevier, vol. 174(3), pages 1828-1841, November.
    5. Mark Gilchrist & Deana Lehmann Mooers & Glenn Skrubbeltrang & Francine Vachon, 2012. "Knowledge Discovery in Databases for Competitive Advantage," Journal of Management and Strategy, Journal of Management and Strategy, Sciedu Press, vol. 3(2), pages 2-15, April.
    6. Zhang, Zhiwang & Gao, Guangxia & Shi, Yong, 2014. "Credit risk evaluation using multi-criteria optimization classifier with kernel, fuzzification and penalty factors," European Journal of Operational Research, Elsevier, vol. 237(1), pages 335-348.
    7. Maysam Eftekhary & Peyman Gholami & Saeed Safari & Mohammad Shojaee, 2012. "Ranking Normalization Methods for Improving the Accuracy of SVM Algorithm by DEA Method," Modern Applied Science, Canadian Center of Science and Education, vol. 6(10), pages 1-26, October.
    8. David R. Morrison & Jason J. Sauppe & Wenda Zhang & Sheldon H. Jacobson & Edward C. Sewell, 2017. "Cyclic best first search: Using contours to guide branch‐and‐bound algorithms," Naval Research Logistics (NRL), John Wiley & Sons, vol. 64(1), pages 64-82, February.
    9. Ramli, Azizul Azhar & Watada, Junzo & Pedrycz, Witold, 2011. "Real-time fuzzy regression analysis: A convex hull approach," European Journal of Operational Research, Elsevier, vol. 210(3), pages 606-617, May.
    10. necula, sabina-cristiana & Radu, Laura-Diana, 2011. "Decision Support Systems Usefulness and A Practical Solution Based on Semantic Web Technologies," MPRA Paper 51547, University Library of Munich, Germany.
    11. Carrizosa, Emilio & Guerrero, Vanesa & Romero Morales, Dolores, 2018. "On Mathematical Optimization for the visualization of frequencies and adjacencies as rectangular maps," European Journal of Operational Research, Elsevier, vol. 265(1), pages 290-302.
    12. Gambella, Claudio & Ghaddar, Bissan & Naoum-Sawaya, Joe, 2021. "Optimization problems for machine learning: A survey," European Journal of Operational Research, Elsevier, vol. 290(3), pages 807-828.
    13. Blanquero, Rafael & Carrizosa, Emilio & Molero-Río, Cristina & Romero Morales, Dolores, 2020. "Sparsity in optimal randomized classification trees," European Journal of Operational Research, Elsevier, vol. 284(1), pages 255-272.
    14. Tahir Ekin & Stephen Walker & Paul Damien, 2023. "Augmented simulation methods for discrete stochastic optimization with recourse," Annals of Operations Research, Springer, vol. 320(2), pages 771-793, January.
    15. R Fildes & K Nikolopoulos & S F Crone & A A Syntetos, 2008. "Forecasting and operational research: a review," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 59(9), pages 1150-1172, September.
    16. Caballini, Claudia & Gracia, Maria D. & Mar-Ortiz, Julio & Sacone, Simona, 2020. "A combined data mining – optimization approach to manage trucks operations in container terminals with the use of a TAS: Application to an Italian and a Mexican port," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 142(C).
    17. Lingxuan Liu & Leyuan Shi, 2019. "Simulation Optimization on Complex Job Shop Scheduling with Non-Identical Job Sizes," Asia-Pacific Journal of Operational Research (APJOR), World Scientific Publishing Co. Pte. Ltd., vol. 36(05), pages 1-26, October.
    18. Besseris, George J., 2012. "Profiling effects in industrial data mining by non-parametric DOE methods: An application on screening checkweighing systems in packaging operations," European Journal of Operational Research, Elsevier, vol. 220(1), pages 147-161.
    19. Davidson, Ian & Tayi, Giri, 2009. "Data preparation using data quality matrices for classification mining," European Journal of Operational Research, Elsevier, vol. 197(2), pages 764-772, September.
    20. Choi, Hyunhong & Koo, Yoonmo, 2018. "Using Contingent Valuation and Numerical Methods to Determine Optimal Locations for Environmental Facilities: Public Arboretums in South Korea," Ecological Economics, Elsevier, vol. 149(C), pages 184-201.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:pal:jorsoc:v:60:y:2009:i:8:d:10.1057_palgrave.jors.2602651. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.palgrave-journals.com/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.