IDEAS home Printed from https://ideas.repec.org/a/spr/aodasc/v6y2019i4d10.1007_s40745-019-00217-4.html
   My bibliography  Save this article

Improving Time Complexity and Accuracy of the Machine Learning Algorithms Through Selection of Highly Weighted Top k Features from Complex Datasets

Author

Listed:
  • Abdul Majeed

    (Korea Aerospace University)

Abstract

Machine learning algorithms (MLAs) usually process large and complex datasets containing a substantial number of features to extract meaningful information about the target concept (a.k.a class). In most cases, MLAs suffer from the latency and computational complexity issues while processing such complex datasets due to the presence of lesser weight (i.e., irrelevant or redundant) features. The computing time of the MLAs increases explosively with increase in the number of features, feature dependence, number of records, types of the features, and nested features categories present in such datasets. Appropriate feature selection before applying MLA is a handy solution to effectively resolve the computing speed and accuracy trade-off while processing large and complex datasets. However, selection of the features that are sufficient, necessary, and are highly co-related with the target concept is very challenging. This paper presents an efficient feature selection algorithm based on random forest to improve the performance of the MLAs without sacrificing the guarantees on the accuracy while processing the large and complex datasets. The proposed feature selection algorithm yields unique features that are closely related with the target concept (i.e., class). The proposed algorithm significantly reduces the computing time of the MLAs without degrading the accuracy much while learning the target concept from the large and complex datasets. The simulation results fortify the efficacy and effectiveness of the proposed algorithm.

Suggested Citation

  • Abdul Majeed, 2019. "Improving Time Complexity and Accuracy of the Machine Learning Algorithms Through Selection of Highly Weighted Top k Features from Complex Datasets," Annals of Data Science, Springer, vol. 6(4), pages 599-621, December.
  • Handle: RePEc:spr:aodasc:v:6:y:2019:i:4:d:10.1007_s40745-019-00217-4
    DOI: 10.1007/s40745-019-00217-4
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s40745-019-00217-4
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s40745-019-00217-4?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Alfred Maussner, 2005. "Projection Methods (GAUSS)," QM&RBC Codes 135, Quantitative Macroeconomics & Real Business Cycles.
    2. Mohammed Amin Belarbi & Saïd Mahmoudi & Ghalem Belalem, 2017. "PCA as Dimensionality Reduction for Large-Scale Image Retrieval Systems," International Journal of Ambient Computing and Intelligence (IJACI), IGI Global, vol. 8(4), pages 45-58, October.
    3. Bogumił Kamiński & Michał Jakubczyk & Przemysław Szufel, 2018. "A framework for sensitivity analysis of decision trees," Central European Journal of Operations Research, Springer;Slovak Society for Operations Research;Hungarian Operational Research Society;Czech Society for Operations Research;Österr. Gesellschaft für Operations Research (ÖGOR);Slovenian Society Informatika - Section for Operational Research;Croatian Operational Research Society, vol. 26(1), pages 135-159, March.
    4. Meiri, Ronen & Zahavi, Jacob, 2006. "Using simulated annealing to optimize the feature selection problem in marketing applications," European Journal of Operational Research, Elsevier, vol. 171(3), pages 842-858, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Nikhil J. Rathod & Manoj K. Chopra & Prem Kumar Chaurasiya & Umesh S. Vidhate & Abhishek Dasore, 2023. "Optimization on the Turning Process Parameters of SS 304 Using Taguchi and TOPSIS," Annals of Data Science, Springer, vol. 10(5), pages 1405-1419, October.
    2. Prashant Singh & Prashant Verma & Nikhil Singh, 2022. "Offline Signature Verification: An Application of GLCM Features in Machine Learning," Annals of Data Science, Springer, vol. 9(6), pages 1309-1321, December.
    3. Manoj Verma & Harish Kumar Ghritlahre & Surendra Bajpai, 2023. "A Case Study of Optimization of a Solar Power Plant Sizing and Placement in Madhya Pradesh, India Using Multi-Objective Genetic Algorithm," Annals of Data Science, Springer, vol. 10(4), pages 933-966, August.
    4. Firuz Kamalov & Fadi Thabtah & Ho Hon Leung, 2023. "Feature Selection in Imbalanced Data," Annals of Data Science, Springer, vol. 10(6), pages 1527-1541, December.
    5. Mohamed Ibrahim & Khaoula Aidi & M. Masoom Ali & Haitham M. Yousof, 2023. "A Novel Test Statistic for Right Censored Validity under a new Chen extension with Applications in Reliability and Medicine," Annals of Data Science, Springer, vol. 10(5), pages 1285-1299, October.
    6. Vojo Lakovic, 2020. "Modeling of Entrepreneurship Activity Crisis Management by Support Vector Machine," Annals of Data Science, Springer, vol. 7(4), pages 629-638, December.
    7. Anurag Barthwal & Kritika Sharma, 2022. "Analysis and prediction of urban ambient and surface temperatures using internet of things," International Journal of System Assurance Engineering and Management, Springer;The Society for Reliability, Engineering Quality and Operations Management (SREQOM),India, and Division of Operation and Maintenance, Lulea University of Technology, Sweden, vol. 13(1), pages 516-532, March.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Schlereth, Christian & Stepanchuk, Tanja & Skiera, Bernd, 2010. "Optimization and analysis of the profitability of tariff structures with two-part tariffs," European Journal of Operational Research, Elsevier, vol. 206(3), pages 691-701, November.
    2. Deac Dan Stelian & Schebesch Klaus Bruno, 2018. "Market Forecasts and Client Behavioral Data: Towards Finding Adequate Model Complexity," Studia Universitatis „Vasile Goldis” Arad – Economics Series, Sciendo, vol. 28(3), pages 50-75, September.
    3. Casado Yusta, Silvia & Nœ–ez Letamendía, Laura & Pacheco Bonrostro, Joaqu’n Antonio, 2018. "Predicting Corporate Failure: The GRASP-LOGIT Model || Predicci—n de la quiebra empresarial: el modelo GRASP-LOGIT," Revista de Métodos Cuantitativos para la Economía y la Empresa = Journal of Quantitative Methods for Economics and Business Administration, Universidad Pablo de Olavide, Department of Quantitative Methods for Economics and Business Administration, vol. 26(1), pages 294-314, Diciembre.
    4. Matthias Bogaert & Lex Delaere, 2023. "Ensemble Methods in Customer Churn Prediction: A Comparative Analysis of the State-of-the-Art," Mathematics, MDPI, vol. 11(5), pages 1-28, February.
    5. Anzanello, Michel J. & Albin, Susan L. & Chaovalitwongse, Wanpracha A., 2012. "Multicriteria variable selection for classification of production batches," European Journal of Operational Research, Elsevier, vol. 218(1), pages 97-105.
    6. Fabrizio De Caro & Amedeo Andreotti & Rodolfo Araneo & Massimo Panella & Antonello Rosato & Alfredo Vaccaro & Domenico Villacci, 2020. "A Review of the Enabling Methodologies for Knowledge Discovery from Smart Grids Data," Energies, MDPI, vol. 13(24), pages 1-25, December.
    7. Huaijun Wang & Ruomeng Ke & Junhuai Li & Yang An & Kan Wang & Lei Yu, 2018. "A correlation-based binary particle swarm optimization method for feature selection in human activity recognition," International Journal of Distributed Sensor Networks, , vol. 14(4), pages 15501477187, April.
    8. Fouskakis, D., 2012. "Bayesian variable selection in generalized linear models using a combination of stochastic optimization methods," European Journal of Operational Research, Elsevier, vol. 220(2), pages 414-422.
    9. Unler, Alper & Murat, Alper, 2010. "A discrete particle swarm optimization method for feature selection in binary classification problems," European Journal of Operational Research, Elsevier, vol. 206(3), pages 528-539, November.
    10. Kamyab Karimi & Ali Ghodratnama & Reza Tavakkoli-Moghaddam, 2023. "Two new feature selection methods based on learn-heuristic techniques for breast cancer prediction: a comprehensive analysis," Annals of Operations Research, Springer, vol. 328(1), pages 665-700, September.
    11. Miyashiro, Ryuhei & Takano, Yuichi, 2015. "Mixed integer second-order cone programming formulations for variable selection in linear regression," European Journal of Operational Research, Elsevier, vol. 247(3), pages 721-731.
    12. Dimitar Haralampiev Popov, 2022. "SME Viability Assessment Methodology: Combining Altman's Z-Score with Big Data," Bulgarian Economic Papers bep-2022-04, Faculty of Economics and Business Administration, Sofia University St Kliment Ohridski - Bulgaria // Center for Economic Theories and Policies at Sofia University St Kliment Ohridski, revised Jun 2022.
    13. Sadrani, Mohammad & Tirachini, Alejandro & Antoniou, Constantinos, 2022. "Vehicle dispatching plan for minimizing passenger waiting time in a corridor with buses of different sizes: Model formulation and solution approaches," European Journal of Operational Research, Elsevier, vol. 299(1), pages 263-282.
    14. Meisel, Stephan & Mattfeld, Dirk, 2010. "Synergies of Operations Research and Data Mining," European Journal of Operational Research, Elsevier, vol. 206(1), pages 1-10, October.
    15. Paul Pichler, 2007. "On the accuracy of low-order projection methods," Economics Bulletin, AccessEcon, vol. 3(50), pages 1-8.
    16. Ahmed, Abdulaziz & Topuz, Kazim & Moqbel, Murad & Abdulrashid, Ismail, 2024. "What makes accidents severe! explainable analytics framework with parameter optimization," European Journal of Operational Research, Elsevier, vol. 317(2), pages 425-436.
    17. Paz, Alexander & Arteaga, Cristian & Cobos, Carlos, 2019. "Specification of mixed logit models assisted by an optimization framework," Journal of choice modelling, Elsevier, vol. 30(C), pages 50-60.
    18. Bertolazzi, P. & Felici, G. & Festa, P. & Fiscon, G. & Weitschek, E., 2016. "Integer programming models for feature selection: New extensions and a randomized solution algorithm," European Journal of Operational Research, Elsevier, vol. 250(2), pages 389-399.
    19. Pacheco, Joaquín & Casado, Silvia & Núñez, Laura, 2009. "A variable selection method based on Tabu search for logistic regression models," European Journal of Operational Research, Elsevier, vol. 199(2), pages 506-511, December.
    20. Ivan Miguel Pires & Faisal Hussain & Nuno M. Garcia & Petre Lameski & Eftim Zdravevski, 2020. "Homogeneous Data Normalization and Deep Learning: A Case Study in Human Activity Classification," Future Internet, MDPI, vol. 12(11), pages 1-14, November.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:aodasc:v:6:y:2019:i:4:d:10.1007_s40745-019-00217-4. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.