IDEAS home Printed from https://ideas.repec.org/a/spr/annopr/v174y2010i1p169-18310.1007-s10479-008-0506-z.html
   My bibliography  Save this article

Optimizing feature selection to improve medical diagnosis

Author

Listed:
  • Ya-Ju Fan
  • Wanpracha Chaovalitwongse

Abstract

In this paper, we propose a new optimization framework for improving feature selection in medical data classification. We call this framework Support Feature Machine (SFM). The use of SFM in feature selection is to find the optimal group of features that show strong separability between two classes. The separability is measured in terms of inter-class and intra-class distances. The objective of SFM optimization model is to maximize the correctly classified data samples in the training set, whose intra-class distances are smaller than inter-class distances. This concept can be incorporated with the modified nearest neighbor rule for unbalanced data. In addition, a variation of SFM that provides the feature weights (prioritization) is also presented. The proposed SFM framework and its extensions were tested on 5 real medical datasets that are related to the diagnosis of epilepsy, breast cancer, heart disease, diabetes, and liver disorders. The classification performance of SFM is compared with those of support vector machine (SVM) classification and Logical Data Analysis (LAD), which is also an optimization-based feature selection technique. SFM gives very good classification results, yet uses far fewer features to make the decision than SVM and LAD. This result provides a very significant implication in diagnostic practice. The outcome of this study suggests that the SFM framework can be used as a quick decision-making tool in real clinical settings. Copyright Springer Science+Business Media, LLC 2010

Suggested Citation

  • Ya-Ju Fan & Wanpracha Chaovalitwongse, 2010. "Optimizing feature selection to improve medical diagnosis," Annals of Operations Research, Springer, vol. 174(1), pages 169-183, February.
  • Handle: RePEc:spr:annopr:v:174:y:2010:i:1:p:169-183:10.1007/s10479-008-0506-z
    DOI: 10.1007/s10479-008-0506-z
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1007/s10479-008-0506-z
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1007/s10479-008-0506-z?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Wanpracha Chaovalitwongse & Oleg Prokopyev & Panos Pardalos, 2006. "Electroencephalogram (EEG) time series classification: Applications in epilepsy," Annals of Operations Research, Springer, vol. 148(1), pages 227-250, November.
    2. Peter Hammer & Tibérius Bonates, 2006. "Logical analysis of data—An overview: From combinatorial optimization to medical applications," Annals of Operations Research, Springer, vol. 148(1), pages 203-225, November.
    3. Hao Helen Zhang & Grace Wahba & Yi Lin & Meta Voelker & Michael Ferris & Ronald Klein & Barbara Klein, 2004. "Variable Selection and Model Building via Likelihood Basis Pursuit," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 659-672, January.
    4. O. L. Mangasarian, 1965. "Linear and Nonlinear Separation of Patterns by Linear Programming," Operations Research, INFORMS, vol. 13(3), pages 444-452, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Alaleh Razmjoo & Petros Xanthopoulos & Qipeng Phil Zheng, 2019. "Feature importance ranking for classification in mixed online environments," Annals of Operations Research, Springer, vol. 276(1), pages 315-330, May.
    2. Daniel Gartner & Rainer Kolisch & Daniel B. Neill & Rema Padman, 2015. "Machine Learning Approaches for Early DRG Classification and Resource Allocation," INFORMS Journal on Computing, INFORMS, vol. 27(4), pages 718-734, November.
    3. Tomasz Hachaj & Marek R. Ogiela & Katarzyna Koptyra, 2018. "Human actions recognition from motion capture recordings using signal resampling and pattern recognition methods," Annals of Operations Research, Springer, vol. 265(2), pages 223-239, June.
    4. Talayeh Razzaghi & Ilya Safro & Joseph Ewing & Ehsan Sadrfaridpour & John D. Scott, 2019. "Predictive models for bariatric surgery risks with imbalanced medical datasets," Annals of Operations Research, Springer, vol. 280(1), pages 1-18, September.
    5. Kamyab Karimi & Ali Ghodratnama & Reza Tavakkoli-Moghaddam, 2023. "Two new feature selection methods based on learn-heuristic techniques for breast cancer prediction: a comprehensive analysis," Annals of Operations Research, Springer, vol. 328(1), pages 665-700, September.
    6. Ning Wang & Zhuo Zhang & Jiao Zhao & Dawei Hu, 2022. "Recognition method of equipment state with the FLDA based Mahalanobis–Taguchi system," Annals of Operations Research, Springer, vol. 311(1), pages 417-435, April.
    7. Erfan Mehmanchi & Andrés Gómez & Oleg A. Prokopyev, 2021. "Solving a class of feature selection problems via fractional 0–1 programming," Annals of Operations Research, Springer, vol. 303(1), pages 265-295, August.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Wanpracha Art Chaovalitwongse, 2008. "Novel quadratic programming approach for time series clustering with biomedical application," Journal of Combinatorial Optimization, Springer, vol. 15(3), pages 225-241, April.
    2. Z. R. Gabidullina, 2013. "A Linear Separability Criterion for Sets of Euclidean Space," Journal of Optimization Theory and Applications, Springer, vol. 158(1), pages 145-171, July.
    3. Emilio Carrizosa & Belen Martin-Barragan, 2011. "Maximizing upgrading and downgrading margins for ordinal regression," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 74(3), pages 381-407, December.
    4. Yu, Lean & Wang, Shouyang & Lai, Kin Keung, 2009. "An intelligent-agent-based fuzzy group decision making model for financial multicriteria decision support: The case of credit scoring," European Journal of Operational Research, Elsevier, vol. 195(3), pages 942-959, June.
    5. Ding, Hui & Zhang, Jian & Zhang, Riquan, 2022. "Nonparametric variable screening for multivariate additive models," Journal of Multivariate Analysis, Elsevier, vol. 192(C).
    6. Brandner, Hubertus & Lessmann, Stefan & Voß, Stefan, 2013. "A memetic approach to construct transductive discrete support vector machines," European Journal of Operational Research, Elsevier, vol. 230(3), pages 581-595.
    7. Dimitris Bertsimas & Romy Shioda, 2007. "Classification and Regression via Integer Optimization," Operations Research, INFORMS, vol. 55(2), pages 252-271, April.
    8. Heydari Majeed & Yousefli Amir, 2017. "A new optimization model for market basket analysis with allocation considerations: A genetic algorithm solution approach," Management & Marketing, Sciendo, vol. 12(1), pages 1-11, March.
    9. B Baesens & C Mues & D Martens & J Vanthienen, 2009. "50 years of data mining and OR: upcoming trends and challenges," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 60(1), pages 16-23, May.
    10. de Vos, Wout & Balvert, Marleen, 2023. "RPA : Learning Interpretable Input-Output Relationships by Counting Samples," Other publications TiSEM 70276b7f-9026-46ad-a8e8-1, Tilburg University, School of Economics and Management.
    11. Pedro Duarte Silva, A., 2017. "Optimization approaches to Supervised Classification," European Journal of Operational Research, Elsevier, vol. 261(2), pages 772-788.
    12. J. J. Glen, 2004. "Dichotomous categorical variable formation in mathematical programming discriminant analysis models," Naval Research Logistics (NRL), John Wiley & Sons, vol. 51(4), pages 575-596, June.
    13. Xiao, Jin & Zhong, Yu & Jia, Yanlin & Wang, Yadong & Li, Ruoyi & Jiang, Xiaoyi & Wang, Shouyang, 2024. "A novel deep ensemble model for imbalanced credit scoring in internet finance," International Journal of Forecasting, Elsevier, vol. 40(1), pages 348-372.
    14. Lejeune, Miguel & Lozin, Vadim & Lozina, Irina & Ragab, Ahmed & Yacout, Soumaya, 2019. "Recent advances in the theory and practice of Logical Analysis of Data," European Journal of Operational Research, Elsevier, vol. 275(1), pages 1-15.
    15. Xeniya Vladimirovna Grigor’eva, 2016. "Approximate Functions in a Problem of Sets Separation," Journal of Optimization Theory and Applications, Springer, vol. 171(2), pages 550-572, November.
    16. Marleen Balvert, 2024. "Iterative Rule Extension for Logic Analysis of Data: An MILP-Based Heuristic to Derive Interpretable Binary Classifiers from Large Data Sets," INFORMS Journal on Computing, INFORMS, vol. 36(3), pages 723-741, May.
    17. Wanpracha Chaovalitwongse, 2009. "Comments on: Optimization and data mining in medicine," TOP: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 17(2), pages 247-249, December.
    18. Kuangnan Fang & Xinyan Fan & Wei Lan & Bingquan Wang, 2019. "Nonparametric additive beta regression for fractional response with application to body fat data," Annals of Operations Research, Springer, vol. 276(1), pages 331-347, May.
    19. A. Astorino & A. Fuduli & M. Gaudioso, 2010. "DC models for spherical separation," Journal of Global Optimization, Springer, vol. 48(4), pages 657-669, December.
    20. Zhengyu Ma & Hong Seo Ryoo, 2021. "Spherical Classification of Data, a New Rule-Based Learning Method," Journal of Classification, Springer;The Classification Society, vol. 38(1), pages 44-71, April.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:annopr:v:174:y:2010:i:1:p:169-183:10.1007/s10479-008-0506-z. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.