IDEAS home Printed from https://ideas.repec.org/a/spr/annopr/v174y2010i1p169-18310.1007-s10479-008-0506-z.html
   My bibliography  Save this article

Optimizing feature selection to improve medical diagnosis

Author

Listed:
  • Ya-Ju Fan
  • Wanpracha Chaovalitwongse

Abstract

In this paper, we propose a new optimization framework for improving feature selection in medical data classification. We call this framework Support Feature Machine (SFM). The use of SFM in feature selection is to find the optimal group of features that show strong separability between two classes. The separability is measured in terms of inter-class and intra-class distances. The objective of SFM optimization model is to maximize the correctly classified data samples in the training set, whose intra-class distances are smaller than inter-class distances. This concept can be incorporated with the modified nearest neighbor rule for unbalanced data. In addition, a variation of SFM that provides the feature weights (prioritization) is also presented. The proposed SFM framework and its extensions were tested on 5 real medical datasets that are related to the diagnosis of epilepsy, breast cancer, heart disease, diabetes, and liver disorders. The classification performance of SFM is compared with those of support vector machine (SVM) classification and Logical Data Analysis (LAD), which is also an optimization-based feature selection technique. SFM gives very good classification results, yet uses far fewer features to make the decision than SVM and LAD. This result provides a very significant implication in diagnostic practice. The outcome of this study suggests that the SFM framework can be used as a quick decision-making tool in real clinical settings. Copyright Springer Science+Business Media, LLC 2010

Suggested Citation

  • Ya-Ju Fan & Wanpracha Chaovalitwongse, 2010. "Optimizing feature selection to improve medical diagnosis," Annals of Operations Research, Springer, vol. 174(1), pages 169-183, February.
  • Handle: RePEc:spr:annopr:v:174:y:2010:i:1:p:169-183:10.1007/s10479-008-0506-z
    DOI: 10.1007/s10479-008-0506-z
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1007/s10479-008-0506-z
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1007/s10479-008-0506-z?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Hao Helen Zhang & Grace Wahba & Yi Lin & Meta Voelker & Michael Ferris & Ronald Klein & Barbara Klein, 2004. "Variable Selection and Model Building via Likelihood Basis Pursuit," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 659-672, January.
    2. Wanpracha Chaovalitwongse & Oleg Prokopyev & Panos Pardalos, 2006. "Electroencephalogram (EEG) time series classification: Applications in epilepsy," Annals of Operations Research, Springer, vol. 148(1), pages 227-250, November.
    3. Peter Hammer & Tibérius Bonates, 2006. "Logical analysis of data—An overview: From combinatorial optimization to medical applications," Annals of Operations Research, Springer, vol. 148(1), pages 203-225, November.
    4. O. L. Mangasarian, 1965. "Linear and Nonlinear Separation of Patterns by Linear Programming," Operations Research, INFORMS, vol. 13(3), pages 444-452, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Alaleh Razmjoo & Petros Xanthopoulos & Qipeng Phil Zheng, 2019. "Feature importance ranking for classification in mixed online environments," Annals of Operations Research, Springer, vol. 276(1), pages 315-330, May.
    2. Daniel Gartner & Rainer Kolisch & Daniel B. Neill & Rema Padman, 2015. "Machine Learning Approaches for Early DRG Classification and Resource Allocation," INFORMS Journal on Computing, INFORMS, vol. 27(4), pages 718-734, November.
    3. Tomasz Hachaj & Marek R. Ogiela & Katarzyna Koptyra, 2018. "Human actions recognition from motion capture recordings using signal resampling and pattern recognition methods," Annals of Operations Research, Springer, vol. 265(2), pages 223-239, June.
    4. Talayeh Razzaghi & Ilya Safro & Joseph Ewing & Ehsan Sadrfaridpour & John D. Scott, 2019. "Predictive models for bariatric surgery risks with imbalanced medical datasets," Annals of Operations Research, Springer, vol. 280(1), pages 1-18, September.
    5. Kamyab Karimi & Ali Ghodratnama & Reza Tavakkoli-Moghaddam, 2023. "Two new feature selection methods based on learn-heuristic techniques for breast cancer prediction: a comprehensive analysis," Annals of Operations Research, Springer, vol. 328(1), pages 665-700, September.
    6. Ning Wang & Zhuo Zhang & Jiao Zhao & Dawei Hu, 2022. "Recognition method of equipment state with the FLDA based Mahalanobis–Taguchi system," Annals of Operations Research, Springer, vol. 311(1), pages 417-435, April.
    7. Erfan Mehmanchi & Andrés Gómez & Oleg A. Prokopyev, 2021. "Solving a class of feature selection problems via fractional 0–1 programming," Annals of Operations Research, Springer, vol. 303(1), pages 265-295, August.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Wanpracha Art Chaovalitwongse, 2008. "Novel quadratic programming approach for time series clustering with biomedical application," Journal of Combinatorial Optimization, Springer, vol. 15(3), pages 225-241, April.
    2. Z. R. Gabidullina, 2013. "A Linear Separability Criterion for Sets of Euclidean Space," Journal of Optimization Theory and Applications, Springer, vol. 158(1), pages 145-171, July.
    3. Pierre Hansen & Christophe Meyer, 2011. "A new column generation algorithm for Logical Analysis of Data," Annals of Operations Research, Springer, vol. 188(1), pages 215-249, August.
    4. Elnaz Gholipour & B'ela Vizv'ari & Zolt'an Lakner, 2020. "Reconstruction Rating Model of Sovereign Debt by Logical Analysis of Data," Papers 2011.14112, arXiv.org.
    5. Emilio Carrizosa & Belen Martin-Barragan, 2011. "Maximizing upgrading and downgrading margins for ordinal regression," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 74(3), pages 381-407, December.
    6. Yu, Lean & Wang, Shouyang & Lai, Kin Keung, 2009. "An intelligent-agent-based fuzzy group decision making model for financial multicriteria decision support: The case of credit scoring," European Journal of Operational Research, Elsevier, vol. 195(3), pages 942-959, June.
    7. Ding, Hui & Zhang, Jian & Zhang, Riquan, 2022. "Nonparametric variable screening for multivariate additive models," Journal of Multivariate Analysis, Elsevier, vol. 192(C).
    8. Nieddu, Luciano & Patrizi, Giacomo, 2000. "Formal methods in pattern recognition: A review," European Journal of Operational Research, Elsevier, vol. 120(3), pages 459-495, February.
    9. Miguel Lejeune, 2012. "Pattern definition of the p-efficiency concept," Annals of Operations Research, Springer, vol. 200(1), pages 23-36, November.
    10. Brandner, Hubertus & Lessmann, Stefan & Voß, Stefan, 2013. "A memetic approach to construct transductive discrete support vector machines," European Journal of Operational Research, Elsevier, vol. 230(3), pages 581-595.
    11. W. Art Chaovalitwongse & Ya-Ju Fan & Rajesh C. Sachdeo, 2008. "Novel Optimization Models for Abnormal Brain Activity Classification," Operations Research, INFORMS, vol. 56(6), pages 1450-1460, December.
    12. R Fildes & K Nikolopoulos & S F Crone & A A Syntetos, 2008. "Forecasting and operational research: a review," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 59(9), pages 1150-1172, September.
    13. Emilio Carrizosa & Belen Martin-Barragan & Dolores Romero Morales, 2010. "Binarized Support Vector Machines," INFORMS Journal on Computing, INFORMS, vol. 22(1), pages 154-167, February.
    14. Endre Boros & Yves Crama & Peter Hammer & Toshihide Ibaraki & Alexander Kogan & Kazuhisa Makino, 2011. "Logical analysis of data: classification with justification," Annals of Operations Research, Springer, vol. 188(1), pages 33-61, August.
    15. Baldomero-Naranjo, Marta & Martínez-Merino, Luisa I. & Rodríguez-Chía, Antonio M., 2020. "Tightening big Ms in integer programming formulations for support vector machines with ramp loss," European Journal of Operational Research, Elsevier, vol. 286(1), pages 84-100.
    16. Maurizio Maravalle & Federica Ricca & Bruno Simeone & Vincenzo Spinelli, 2015. "Carpal Tunnel Syndrome automatic classification: electromyography vs. ultrasound imaging," TOP: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 23(1), pages 100-123, April.
    17. Dimitris Bertsimas & Romy Shioda, 2007. "Classification and Regression via Integer Optimization," Operations Research, INFORMS, vol. 55(2), pages 252-271, April.
    18. Orsenigo, Carlotta & Vercellis, Carlo, 2004. "Discrete support vector decision trees via tabu search," Computational Statistics & Data Analysis, Elsevier, vol. 47(2), pages 311-322, September.
    19. Heydari Majeed & Yousefli Amir, 2017. "A new optimization model for market basket analysis with allocation considerations: A genetic algorithm solution approach," Management & Marketing, Sciendo, vol. 12(1), pages 1-11, March.
    20. Lean Yu & Zebin Yang & Ling Tang, 2016. "A novel multistage deep belief network based extreme learning machine ensemble learning paradigm for credit risk assessment," Flexible Services and Manufacturing Journal, Springer, vol. 28(4), pages 576-592, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:annopr:v:174:y:2010:i:1:p:169-183:10.1007/s10479-008-0506-z. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.