IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v13y2025i3p347-d1573432.html
   My bibliography  Save this article

Major Issues in High-Frequency Financial Data Analysis: A Survey of Solutions

Author

Listed:
  • Lu Zhang

    (Department of Statistics & Actuarial Science, Northern Illinois University, DeKalb, IL 60115, USA)

  • Lei Hua

    (Department of Statistics & Actuarial Science, Northern Illinois University, DeKalb, IL 60115, USA)

Abstract

We review recent articles that focus on the main issues identified in high-frequency financial data analysis. The issues to be addressed include nonstationarity, low signal-to-noise ratios, asynchronous data, imbalanced data, and intraday seasonality. We focus on the research articles and survey papers published since 2020 on recent developments and new ideas that address the issues, while commonly used approaches in the literature are also reviewed. The methods for addressing the issues are mainly classified into two groups: data preprocessing methods and quantitative methods. The latter include various statistical, econometric, and machine learning methods. We also provide easy-to-read charts and tables to summarize all the surveyed methods and articles.

Suggested Citation

  • Lu Zhang & Lei Hua, 2025. "Major Issues in High-Frequency Financial Data Analysis: A Survey of Solutions," Mathematics, MDPI, vol. 13(3), pages 1-40, January.
  • Handle: RePEc:gam:jmathe:v:13:y:2025:i:3:p:347-:d:1573432
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/13/3/347/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/13/3/347/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Jianqing Fan & Yingying Li & Ke Yu, 2012. "Vast Volatility Matrix Estimation Using High-Frequency Data for Portfolio Selection," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(497), pages 412-428, March.
    2. Ao Kong & Hongliang Zhu & Robert Azencott, 2021. "Predicting intraday jumps in stock prices using liquidity measures and technical indicators," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 40(3), pages 416-438, April.
    3. Sasha Stoikov, 2018. "The micro-price: a high-frequency estimator of future prices," Quantitative Finance, Taylor & Francis Journals, vol. 18(12), pages 1959-1966, December.
    4. Tao, Minjing & Wang, Yazhen & Yao, Qiwei & Zou, Jian, 2011. "Large Volatility Matrix Inference via Combining Low-Frequency and High-Frequency Approaches," Journal of the American Statistical Association, American Statistical Association, vol. 106(495), pages 1025-1040.
    5. Liu, Yuanyuan & Niu, Zibo & Suleman, Muhammad Tahir & Yin, Libo & Zhang, Hongwei, 2022. "Forecasting the volatility of crude oil futures: The role of oil investor attention and its regime switching characteristics under a high-frequency framework," Energy, Elsevier, vol. 238(PA).
    6. Barndorff-Nielsen, Ole E. & Hansen, Peter Reinhard & Lunde, Asger & Shephard, Neil, 2011. "Multivariate realised kernels: Consistent positive semi-definite estimators of the covariation of equity prices with noise and non-synchronous trading," Journal of Econometrics, Elsevier, vol. 162(2), pages 149-169, June.
    7. Li, Xuemei & Liu, Xiaoxing, 2023. "Functional classification and dynamic prediction of cumulative intraday returns in crude oil futures," Energy, Elsevier, vol. 284(C).
    8. Brownlees, C.T. & Gallo, G.M., 2006. "Financial econometric analysis at ultra-high frequency: Data handling concerns," Computational Statistics & Data Analysis, Elsevier, vol. 51(4), pages 2232-2245, December.
    9. Ben Omrane, Walid & de Bodt, Eric, 2007. "Using self-organizing maps to adjust for intra-day seasonality," Journal of Banking & Finance, Elsevier, vol. 31(6), pages 1817-1838, June.
    10. Niu, Zibo & Demirer, Riza & Suleman, Muhammad Tahir & Zhang, Hongwei & Zhu, Xuehong, 2024. "Do industries predict stock market volatility? Evidence from machine learning models," Journal of International Financial Markets, Institutions and Money, Elsevier, vol. 90(C).
    11. Sun, Edward W. & Meinl, Thomas, 2012. "A new wavelet-based denoising algorithm for high-frequency financial data mining," European Journal of Operational Research, Elsevier, vol. 217(3), pages 589-599.
    12. Gençay, Ramazan & Selçuk, Faruk & Whitcher, Brandon, 2001. "Differentiating intraday seasonalities through wavelet multi-scaling," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 289(3), pages 543-556.
    13. Takuya Shintate & Lukáš Pichl, 2019. "Trend Prediction Classification for High Frequency Bitcoin Time Series with Deep Learning," JRFM, MDPI, vol. 12(1), pages 1-15, January.
    14. Chen, Dachuan & Mykland, Per A. & Zhang, Lan, 2024. "Realized regression with asynchronous and noisy high frequency and high dimensional data," Journal of Econometrics, Elsevier, vol. 239(2).
    15. Lahmiri, Salim & Bekiros, Stelios, 2020. "Intelligent forecasting with machine learning trading systems in chaotic intraday Bitcoin market," Chaos, Solitons & Fractals, Elsevier, vol. 133(C).
    16. Wei Bao & Jun Yue & Yulei Rao, 2017. "A deep learning framework for financial time series using stacked autoencoders and long-short term memory," PLOS ONE, Public Library of Science, vol. 12(7), pages 1-24, July.
    17. BEN OMRANE, Walid & DE BODT, Eric, 2007. "Using self-organizing maps to adjust for intra-day seasonality," LIDAM Reprints CORE 1959, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE).
    18. Chengyu Li & Luyi Shen & Guoqi Qian, 2023. "Online Hybrid Neural Network for Stock Price Prediction: A Case Study of High-Frequency Stock Trading in the Chinese Market," Econometrics, MDPI, vol. 11(2), pages 1-19, May.
    19. deB. Harris, Frederick H. & McInish, Thomas H. & Shoesmith, Gary L. & Wood, Robert A., 1995. "Cointegration, Error Correction, and Price Discovery on Informationally Linked Security Markets," Journal of Financial and Quantitative Analysis, Cambridge University Press, vol. 30(4), pages 563-579, December.
    20. Zhang, Lan, 2011. "Estimating covariation: Epps effect, microstructure noise," Journal of Econometrics, Elsevier, vol. 160(1), pages 33-47, January.
    21. Christensen, Kim & Kinnebrock, Silja & Podolskij, Mark, 2010. "Pre-averaging estimators of the ex-post covariance matrix in noisy diffusion models with non-synchronous data," Journal of Econometrics, Elsevier, vol. 159(1), pages 116-133, November.
    22. Zefan Dong & Yonghui Zhou, 2024. "A Novel Hybrid Model for Financial Forecasting Based on CEEMDAN-SE and ARIMA-CNN-LSTM," Mathematics, MDPI, vol. 12(16), pages 1-16, August.
    23. Han Lin Shang & Kaiying Ji, 2023. "Forecasting intraday financial time series with sieve bootstrapping and dynamic updating," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 42(8), pages 1973-1988, December.
    24. Kraaijeveld, Olivier & De Smedt, Johannes, 2020. "The predictive power of public Twitter sentiment for forecasting cryptocurrency prices," Journal of International Financial Markets, Institutions and Money, Elsevier, vol. 65(C).
    25. Hussain, Syed Mujahid & Ben Omrane, Walid & Al-Yahyaee, Khamis, 2020. "US macroeconomic news effects around the US and European financial crises: Evidence from Brazilian and Mexican equity indices," Global Finance Journal, Elsevier, vol. 46(C).
    26. Shephard, Neil & Xiu, Dacheng, 2017. "Econometric analysis of multivariate realised QML: Estimation of the covariation of equity prices under asynchronous trading," Journal of Econometrics, Elsevier, vol. 201(1), pages 19-42.
    27. Wang, Jiqian & Huang, Yisu & Ma, Feng & Chevallier, Julien, 2020. "Does high-frequency crude oil futures data contain useful information for predicting volatility in the US stock market? New evidence," Energy Economics, Elsevier, vol. 91(C).
    28. Aït-Sahalia, Yacine & Fan, Jianqing & Xiu, Dacheng, 2010. "High-Frequency Covariance Estimates With Noisy and Asynchronous Financial Data," Journal of the American Statistical Association, American Statistical Association, vol. 105(492), pages 1504-1517.
    29. Etaf Alshawarbeh & Alanazi Talal Abdulrahman & Eslam Hussam, 2023. "Statistical Modeling of High Frequency Datasets Using the ARIMA-ANN Hybrid," Mathematics, MDPI, vol. 11(22), pages 1-17, November.
    30. Dawei Liang & Yue Xu & Yan Hu & Qianqian Du, 2023. "Intraday Return Forecasts and High-Frequency Trading of Stock Index Futures: A Hybrid Wavelet-Deep Learning Approach," Emerging Markets Finance and Trade, Taylor & Francis Journals, vol. 59(7), pages 2118-2128, May.
    31. Dixon, Matthew & Klabjan, Diego & Bang, Jin Hoon, 2017. "Classification-based financial markets prediction using deep neural networks," Algorithmic Finance, IOS Press, vol. 6(3-4), pages 67-77.
    32. Alec N. Kercheval & Yuan Zhang, 2015. "Modelling high-frequency limit order book dynamics with support vector machines," Quantitative Finance, Taylor & Francis Journals, vol. 15(8), pages 1315-1329, August.
    33. Tao, Minjing & Wang, Yahzen & Yao, Qiwei & Zou, Jian, 2011. "Large volatility matrix inference via combining low-frequency and high-frequency approaches," LSE Research Online Documents on Economics 39321, London School of Economics and Political Science, LSE Library.
    34. Arnab Chakrabarti & Rituparna Sen, 2023. "Copula Estimation for Nonsynchronous Financial Data," Sankhya B: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 85(1), pages 116-149, May.
    35. Huang, Wenyang & Gao, Tianxiao & Hao, Yun & Wang, Xiuqing, 2023. "Transformer-based forecasting for intraday trading in the Shanghai crude oil market: Analyzing open-high-low-close prices," Energy Economics, Elsevier, vol. 127(PA).
    36. Bordignon, Silvano & Caporin, Massimiliano & Lisi, Francesco, 2007. "Generalised long-memory GARCH models for intra-daily volatility," Computational Statistics & Data Analysis, Elsevier, vol. 51(12), pages 5900-5912, August.
    37. Hui Qu & Yu Zhang, 2016. "A New Kernel of Support Vector Regression for Forecasting High-Frequency Stock Returns," Mathematical Problems in Engineering, Hindawi, vol. 2016, pages 1-9, April.
    38. Dai, Chaoxing & Lu, Kun & Xiu, Dacheng, 2019. "Knowing factors or factor loadings, or neither? Evaluating estimators of large covariance matrices with noisy and asynchronous data," Journal of Econometrics, Elsevier, vol. 208(1), pages 43-79.
    39. O. E. Barndorff-Nielsen & P. Reinhard Hansen & A. Lunde & N. Shephard, 2009. "Realized kernels in practice: trades and quotes," Econometrics Journal, Royal Economic Society, vol. 12(3), pages 1-32, November.
    40. Pham, Manh Cuong & Anderson, Heather Margot & Duong, Huu Nhan & Lajbcygier, Paul, 2020. "The effects of trade size and market depth on immediate price impact in a limit order book market," Journal of Economic Dynamics and Control, Elsevier, vol. 120(C).
    41. repec:hal:journl:peer-00732537 is not listed on IDEAS
    42. Jiqian Wang & Yisu Huang & Feng Ma & Julien Chevallier, 2020. "Does high-frequency crude oil futures data contain useful information for predicting volatility in the US stock market? New evidence," Post-Print halshs-04250251, HAL.
    43. Yacine Aït-Sahalia & Dacheng Xiu, 2019. "Principal Component Analysis of High-Frequency Data," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 114(525), pages 287-303, January.
    44. Chen Tang & Yanlin Shi, 2021. "Forecasting High-Dimensional Financial Functional Time Series: An Application to Constituent Stocks in Dow Jones Index," JRFM, MDPI, vol. 14(8), pages 1-13, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lam, Clifford & Feng, Phoenix, 2018. "A nonparametric eigenvalue-regularized integrated covariance matrix estimator for asset return data," Journal of Econometrics, Elsevier, vol. 206(1), pages 226-257.
    2. Bu, R. & Li, D. & Linton, O. & Wang, H., 2022. "Nonparametric Estimation of Large Spot Volatility Matrices for High-Frequency Financial Data," Cambridge Working Papers in Economics 2218, Faculty of Economics, University of Cambridge.
    3. Xinyu Song, 2019. "Large Volatility Matrix Prediction with High-Frequency Data," Papers 1907.01196, arXiv.org, revised Sep 2019.
    4. Boudt, Kris & Laurent, Sébastien & Lunde, Asger & Quaedvlieg, Rogier & Sauri, Orimar, 2017. "Positive semidefinite integrated covariance estimation, factorizations and asynchronicity," Journal of Econometrics, Elsevier, vol. 196(2), pages 347-367.
    5. Kim, Donggyu & Wang, Yazhen & Zou, Jian, 2016. "Asymptotic theory for large volatility matrix estimation based on high-frequency financial data," Stochastic Processes and their Applications, Elsevier, vol. 126(11), pages 3527-3577.
    6. Kim, Donggyu & Song, Xinyu & Wang, Yazhen, 2022. "Unified discrete-time factor stochastic volatility and continuous-time Itô models for combining inference based on low-frequency and high-frequency," Journal of Multivariate Analysis, Elsevier, vol. 192(C).
    7. Dai, Chaoxing & Lu, Kun & Xiu, Dacheng, 2019. "Knowing factors or factor loadings, or neither? Evaluating estimators of large covariance matrices with noisy and asynchronous data," Journal of Econometrics, Elsevier, vol. 208(1), pages 43-79.
    8. Cipollini, Fabrizio & Gallo, Giampiero M. & Palandri, Alessandro, 2021. "A dynamic conditional approach to forecasting portfolio weights," International Journal of Forecasting, Elsevier, vol. 37(3), pages 1111-1126.
    9. Lam, Clifford & Feng, Phoenix, 2018. "A nonparametric eigenvalue-regularized integrated covariance matrix estimator for asset return data," LSE Research Online Documents on Economics 88375, London School of Economics and Political Science, LSE Library.
    10. Grønborg, Niels S. & Lunde, Asger & Olesen, Kasper V. & Vander Elst, Harry, 2022. "Realizing correlations across asset classes," Journal of Financial Markets, Elsevier, vol. 59(PA).
    11. Boudt, Kris & Dragun, Kirill & Sauri, Orimar & Vanduffel, Steven, 2023. "ETF Basket-Adjusted Covariance estimation," Journal of Econometrics, Elsevier, vol. 235(2), pages 1144-1171.
    12. Cai, T. Tony & Hu, Jianchang & Li, Yingying & Zheng, Xinghua, 2020. "High-dimensional minimum variance portfolio estimation based on high-frequency data," Journal of Econometrics, Elsevier, vol. 214(2), pages 482-494.
    13. Liu, Cheng & Tang, Cheng Yong, 2014. "A quasi-maximum likelihood approach for integrated covariance matrix estimation with high frequency data," Journal of Econometrics, Elsevier, vol. 180(2), pages 217-232.
    14. Donggyu Kim & Minseok Shin, 2024. "Nonconvex High-Dimensional Time-Varying Coefficient Estimation for Noisy High-Frequency Observations with a Factor Structure," Working Papers 202418, University of California at Riverside, Department of Economics.
    15. Philip L. H. Yu & W. K. Li & F. C. Ng, 2017. "The Generalized Conditional Autoregressive Wishart Model for Multivariate Realized Volatility," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 35(4), pages 513-527, October.
    16. Fabrizio Cipollini & Giampiero Gallo & Alessandro Palandri, 2020. "A Dynamic Conditional Approach to Portfolio Weights Forecasting," Econometrics Working Papers Archive 2020_06, Universita' degli Studi di Firenze, Dipartimento di Statistica, Informatica, Applicazioni "G. Parenti".
    17. Li, Yifan & Nolte, Ingmar & Vasios, Michalis & Voev, Valeri & Xu, Qi, 2022. "Weighted Least Squares Realized Covariation Estimation," Journal of Banking & Finance, Elsevier, vol. 137(C).
    18. Shen, Keren & Yao, Jianfeng & Li, Wai Keung, 2019. "On a spiked model for large volatility matrix estimation from noisy high-frequency data," Computational Statistics & Data Analysis, Elsevier, vol. 131(C), pages 207-221.
    19. Donggyu Kim & Minseog Oh, 2024. "Dynamic Realized Minimum Variance Portfolio Models," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 42(4), pages 1238-1249, October.
    20. Aït-Sahalia, Yacine & Xiu, Dacheng, 2017. "Using principal component analysis to estimate a high dimensional factor model with high-frequency data," Journal of Econometrics, Elsevier, vol. 201(2), pages 384-399.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:13:y:2025:i:3:p:347-:d:1573432. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.