IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2406.04390.html
   My bibliography  Save this paper

Sensitivity Assessing to Data Volume for forecasting: introducing similarity methods as a suitable one in Feature selection methods

Author

Listed:
  • Mahdi Goldani
  • Soraya Asadi Tirvan

Abstract

In predictive modeling, overfitting poses a significant risk, particularly when the feature count surpasses the number of observations, a common scenario in high-dimensional data sets. To mitigate this risk, feature selection is employed to enhance model generalizability by reducing the dimensionality of the data. This study focuses on evaluating the stability of feature selection techniques with respect to varying data volumes, particularly employing time series similarity methods. Utilizing a comprehensive dataset that includes the closing, opening, high, and low prices of stocks from 100 high-income companies listed in the Fortune Global 500, this research compares several feature selection methods including variance thresholds, edit distance, and Hausdorff distance metrics. The aim is to identify methods that show minimal sensitivity to the quantity of data, ensuring robustness and reliability in predictions, which is crucial for financial forecasting. Results indicate that among the tested feature selection strategies, the variance method, edit distance, and Hausdorff methods exhibit the least sensitivity to changes in data volume. These methods therefore provide a dependable approach to reducing feature space without significantly compromising the predictive accuracy. This study not only highlights the effectiveness of time series similarity methods in feature selection but also underlines their potential in applications involving fluctuating datasets, such as financial markets or dynamic economic conditions. The findings advocate for their use as principal methods for robust feature selection in predictive analytics frameworks.

Suggested Citation

  • Mahdi Goldani & Soraya Asadi Tirvan, 2024. "Sensitivity Assessing to Data Volume for forecasting: introducing similarity methods as a suitable one in Feature selection methods," Papers 2406.04390, arXiv.org.
  • Handle: RePEc:arx:papers:2406.04390
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2406.04390
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Andrius Vabalas & Emma Gowen & Ellen Poliakoff & Alexander J Casson, 2019. "Machine learning algorithm validation with a limited sample size," PLOS ONE, Public Library of Science, vol. 14(11), pages 1-20, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ephrem Habyarimana & Faheem S Baloch, 2021. "Machine learning models based on remote and proximal sensing as potential methods for in-season biomass yields prediction in commercial sorghum fields," PLOS ONE, Public Library of Science, vol. 16(3), pages 1-23, March.
    2. Leandro C. Hermida & E. Michael Gertz & Eytan Ruppin, 2022. "Predicting cancer prognosis and drug response from the tumor microbiome," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    3. Jonathan C. M. Wan & Dennis Stephens & Lingqi Luo & James R. White & Caitlin M. Stewart & Benoît Rousseau & Dana W. Y. Tsui & Luis A. Diaz, 2022. "Genome-wide mutational signatures in low-coverage whole genome sequencing of cell-free DNA," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    4. Sinha, Shruti & Sankar Rao, Chinta & Kumar, Abhishankar & Venkata Surya, Dadi & Basak, Tanmay, 2024. "Exploring and understanding the microwave-assisted pyrolysis of waste lignocellulose biomass using gradient boosting regression machine learning model," Renewable Energy, Elsevier, vol. 231(C).
    5. Reza Rezaee & Jamiu Ekundayo, 2022. "Permeability Prediction Using Machine Learning Methods for the CO 2 Injectivity of the Precipice Sandstone in Surat Basin, Australia," Energies, MDPI, vol. 15(6), pages 1-15, March.
    6. Nica-Avram, Georgiana & Harvey, John & Smith, Gavin & Smith, Andrew & Goulding, James, 2021. "Identifying food insecurity in food sharing networks via machine learning," Journal of Business Research, Elsevier, vol. 131(C), pages 469-484.
    7. Kristof Lommers & Ouns El Harzli & Jack Kim, 2021. "Confronting Machine Learning With Financial Research," Papers 2103.00366, arXiv.org, revised Mar 2021.
    8. Carlo Dindorf & Eva Bartaguiz & Freya Gassmann & Michael Fröhlich, 2022. "Conceptual Structure and Current Trends in Artificial Intelligence, Machine Learning, and Deep Learning Research in Sports: A Bibliometric Review," IJERPH, MDPI, vol. 20(1), pages 1-23, December.
    9. Zhou, Huanyu & Qiu, Yingning & Feng, Yanhui & Liu, Jing, 2022. "Power prediction of wind turbine in the wake using hybrid physical process and machine learning models," Renewable Energy, Elsevier, vol. 198(C), pages 568-586.
    10. Bhattacharjee, Biplab & Kumar, Rajiv & Senthilkumar, Arunachalam, 2022. "Unidirectional and bidirectional LSTM models for edge weight predictions in dynamic cross-market equity networks," International Review of Financial Analysis, Elsevier, vol. 84(C).
    11. Qianru Qi & Rongjun Cheng & Hongxia Ge, 2022. "Short-Term Travel Demand Prediction of Online Ride-Hailing Based on Multi-Factor GRU Model," Sustainability, MDPI, vol. 14(7), pages 1-15, March.
    12. Shravankumar Shivappa Masalvad & Chidanand Patil & Akkaram Pravalika & Basavaraj Katageri & Purandara Bekal & Prashant Patil & Nagraj Hegde & Uttam Kumar Sahoo & Praveen Kumar Sakare, 2024. "Application of geospatial technology for the land use/land cover change assessment and future change predictions using CA Markov chain model," Environment, Development and Sustainability: A Multidisciplinary Approach to the Theory and Practice of Sustainable Development, Springer, vol. 26(10), pages 24817-24842, October.
    13. Qiaoyang Li & Guiming Chen, 2021. "Recognition of industrial machine parts based on transfer learning with convolutional neural network," PLOS ONE, Public Library of Science, vol. 16(1), pages 1-21, January.
    14. Giannakeas, Ilias N. & Mazaheri, Fatemeh & Bacarreza, Omar & Khodaei, Zahra Sharif & Aliabadi, Ferri M.H., 2023. "Probabilistic residual strength assessment of smart composite aircraft panels using guided waves," Reliability Engineering and System Safety, Elsevier, vol. 237(C).
    15. Francisco Gatica-Neira & Mario Ramos-Maldonado, 2022. "Limits to the Productivity in Biobased Territorial SMEs," SAGE Open, , vol. 12(2), pages 21582440221, May.
    16. Min Yang & Baiyu Zhang & Yifu Chen & Xiaying Xin & Kenneth Lee & Bing Chen, 2021. "Impact of Microplastics on Oil Dispersion Efficiency in the Marine Environment," Sustainability, MDPI, vol. 13(24), pages 1-13, December.
    17. Michael D. Wang & Jie Lou & Dong Zhang & C. Simon Fan, 2022. "Measuring political and economic uncertainty: a supervised computational linguistic approach," SN Business & Economics, Springer, vol. 2(5), pages 1-17, May.
    18. Twumasi, Clement & Twumasi, Juliet, 2022. "Machine learning algorithms for forecasting and backcasting blood demand data with missing values and outliers: A study of Tema General Hospital of Ghana," International Journal of Forecasting, Elsevier, vol. 38(3), pages 1258-1277.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2406.04390. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.