IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v170y2022ics0167947322000159.html
   My bibliography  Save this article

On the use of random forest for two-sample testing

Author

Listed:
  • Hediger, Simon
  • Michel, Loris
  • Näf, Jeffrey

Abstract

Following the line of classification-based two-sample testing, tests based on the Random Forest classifier are proposed. The developed tests are easy to use, require almost no tuning, and are applicable for any distribution on Rd. Furthermore, the built-in variable importance measure of the Random Forest gives potential insights into which variables make out the difference in distribution. An asymptotic power analysis for the proposed tests is conducted. Finally, two real-world applications illustrate the usefulness of the introduced methodology. To simplify the use of the method, the R-package “hypoRF” is provided.

Suggested Citation

  • Hediger, Simon & Michel, Loris & Näf, Jeffrey, 2022. "On the use of random forest for two-sample testing," Computational Statistics & Data Analysis, Elsevier, vol. 170(C).
  • Handle: RePEc:eee:csdana:v:170:y:2022:i:c:s0167947322000159
    DOI: 10.1016/j.csda.2022.107435
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947322000159
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2022.107435?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Lakonishok, Josef & Shleifer, Andrei & Vishny, Robert W, 1994. "Contrarian Investment, Extrapolation, and Risk," Journal of Finance, American Finance Association, vol. 49(5), pages 1541-1578, December.
    2. Holthausen, Robert W. & Larcker, David F., 1992. "The prediction of stock returns using financial statement information," Journal of Accounting and Economics, Elsevier, vol. 15(2-3), pages 373-411, August.
    3. Frederico Belo & Xiaoji Lin & Santiago Bazdresch, 2014. "Labor Hiring, Investment, and Stock Return Predictability in the Cross Section," Journal of Political Economy, University of Chicago Press, vol. 122(1), pages 129-177.
    4. Novy-Marx, Robert, 2013. "The other side of value: The gross profitability premium," Journal of Financial Economics, Elsevier, vol. 108(1), pages 1-28.
    5. Chordia, Tarun & Subrahmanyam, Avanidhar & Anshuman, V. Ravi, 2001. "Trading activity and expected stock returns," Journal of Financial Economics, Elsevier, vol. 59(1), pages 3-32, January.
    6. Shihao Gu & Bryan Kelly & Dacheng Xiu, 2020. "Empirical Asset Pricing via Machine Learning," Review of Finance, European Finance Association, vol. 33(5), pages 2223-2273.
    7. Kewei Hou & Tobias J. Moskowitz, 2005. "Market Frictions, Price Delay, and the Cross-Section of Expected Returns," The Review of Financial Studies, Society for Financial Studies, vol. 18(3), pages 981-1020.
    8. Balakrishnan, Karthik & Bartov, Eli & Faurel, Lucile, 2010. "Post loss/profit announcement drift," Journal of Accounting and Economics, Elsevier, vol. 50(1), pages 20-41, May.
    9. Jacob Thomas & Frank X. Zhang, 2011. "Tax Expense Momentum," Journal of Accounting Research, Wiley Blackwell, vol. 49(3), pages 791-821, June.
    10. Bali, Turan G. & Cakici, Nusret & Whitelaw, Robert F., 2011. "Maxing out: Stocks as lotteries and the cross-section of expected returns," Journal of Financial Economics, Elsevier, vol. 99(2), pages 427-446, February.
    11. Heitor Almeida & Murillo Campello, 2007. "Financial Constraints, Asset Tangibility, and Corporate Investment," The Review of Financial Studies, Society for Financial Studies, vol. 20(5), pages 1429-1460, 2007 12.
    12. Richardson, Scott A. & Sloan, Richard G. & Soliman, Mark T. & Tuna, Irem, 2005. "Accrual reliability, earnings persistence and stock prices," Journal of Accounting and Economics, Elsevier, vol. 39(3), pages 437-485, September.
    13. Barth, ME & Elliott, JA & Finn, MW, 1999. "Market rewards associated with patterns of increasing earnings," Journal of Accounting Research, Wiley Blackwell, vol. 37(2), pages 387-413.
    14. repec:bla:jfinan:v:59:y:2004:i:2:p:623-650 is not listed on IDEAS
    15. Gérard Biau & Erwan Scornet, 2016. "A random forest guided tour," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 25(2), pages 197-227, June.
    16. Litzenberger, Robert H & Ramaswamy, Krishna, 1982. "The Effects of Dividends on Common Stock Prices: Tax Effects or Information Effects?," Journal of Finance, American Finance Association, vol. 37(2), pages 429-443, May.
    17. Shihao Gu & Bryan Kelly & Dacheng Xiu, 2020. "Empirical Asset Pricing via Machine Learning," The Review of Financial Studies, Society for Financial Studies, vol. 33(5), pages 2223-2273.
    18. Michael J. Cooper & Huseyin Gulen & Michael J. Schill, 2008. "Asset Growth and the Cross‐Section of Stock Returns," Journal of Finance, American Finance Association, vol. 63(4), pages 1609-1651, August.
    19. Kewei Hou & David T. Robinson, 2006. "Industry Concentration and Average Stock Returns," Journal of Finance, American Finance Association, vol. 61(4), pages 1927-1956, August.
    20. Jesse Hemerik & Jelle Goeman, 2018. "Exact testing with random permutations," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 27(4), pages 811-825, December.
    21. Tobias J. Moskowitz & Mark Grinblatt, 1999. "Do Industries Explain Momentum?," Journal of Finance, American Finance Association, vol. 54(4), pages 1249-1290, August.
    22. repec:bla:jfinan:v:43:y:1988:i:2:p:507-28 is not listed on IDEAS
    23. Re‐Jin Guo & Baruch Lev & Charles Shi, 2006. "Explaining the Short‐ and Long‐Term IPO Anomalies in the US by R&D," Journal of Business Finance & Accounting, Wiley Blackwell, vol. 33(3‐4), pages 550-579, April.
    24. Jeremiah Green & John R. M. Hand & X. Frank Zhang, 2017. "The Characteristics that Provide Independent Information about Average U.S. Monthly Stock Returns," The Review of Financial Studies, Society for Financial Studies, vol. 30(12), pages 4389-4436.
    25. Palazzo, Berardino, 2012. "Cash holdings, risk, and expected returns," Journal of Financial Economics, Elsevier, vol. 104(1), pages 162-185.
    26. Banz, Rolf W., 1981. "The relationship between return and market value of common stocks," Journal of Financial Economics, Elsevier, vol. 9(1), pages 3-18, March.
    27. Valta, Philip, 2016. "Strategic Default, Debt Structure, and Stock Returns," Journal of Financial and Quantitative Analysis, Cambridge University Press, vol. 51(1), pages 197-229, February.
    28. Huang, Alan Guoming, 2009. "The cross section of cashflow volatility and expected stock returns," Journal of Empirical Finance, Elsevier, vol. 16(3), pages 409-429, June.
    29. Andrea L. Eisfeldt & Dimitris Papanikolaou, 2013. "Organization Capital and the Cross-Section of Expected Returns," Journal of Finance, American Finance Association, vol. 68(4), pages 1365-1406, August.
    30. repec:bla:jfinan:v:44:y:1989:i:2:p:479-86 is not listed on IDEAS
    31. Piotroski, JD, 2000. "Value investing: The use of historical financial statement information to separate winners from losers," Journal of Accounting Research, Wiley Blackwell, vol. 38, pages 1-41.
    32. Christopher W. Anderson & Luis Garcia‐Feijóo, 2006. "Empirical Evidence on Capital Investment, Growth Options, and Security Returns," Journal of Finance, American Finance Association, vol. 61(1), pages 171-194, February.
    33. Kewei Hou & Chen Xue & Lu Zhang, 2015. "Editor's Choice Digesting Anomalies: An Investment Approach," The Review of Financial Studies, Society for Financial Studies, vol. 28(3), pages 650-705.
    34. Amihud, Yakov, 2002. "Illiquidity and stock returns: cross-section and time-series effects," Journal of Financial Markets, Elsevier, vol. 5(1), pages 31-56, January.
    35. Fama, Eugene F. & French, Kenneth R., 2015. "A five-factor asset pricing model," Journal of Financial Economics, Elsevier, vol. 116(1), pages 1-22.
    36. Alexander J. McNeil & Rüdiger Frey & Paul Embrechts, 2015. "Quantitative Risk Management: Concepts, Techniques and Tools Revised edition," Economics Books, Princeton University Press, edition 2, number 10496.
    37. Ou, Jane A. & Penman, Stephen H., 1989. "Financial statement analysis and the prediction of stock returns," Journal of Accounting and Economics, Elsevier, vol. 11(4), pages 295-329, November.
    38. Andrew Ang & Robert J. Hodrick & Yuhang Xing & Xiaoyan Zhang, 2006. "The Cross‐Section of Volatility and Expected Returns," Journal of Finance, American Finance Association, vol. 61(1), pages 259-299, February.
    39. Titman, Sheridan & Wei, K. C. John & Xie, Feixue, 2004. "Capital Investments and Stock Returns," Journal of Financial and Quantitative Analysis, Cambridge University Press, vol. 39(4), pages 677-700, December.
    40. Itay Kama, 2009. "On the Market Reaction to Revenue and Earnings Surprises," Journal of Business Finance & Accounting, Wiley Blackwell, vol. 36(1‐2), pages 31-50, January.
    41. Selale Tuzel, 2010. "Corporate Real Estate Holdings and the Cross-Section of Stock Returns," The Review of Financial Studies, Society for Financial Studies, vol. 23(6), pages 2268-2302, June.
    42. Itay Kama, 2009. "On the Market Reaction to Revenue and Earnings Surprises," Journal of Business Finance & Accounting, Wiley Blackwell, vol. 36(1-2), pages 31-50.
    43. Gérard Biau & Erwan Scornet, 2016. "Rejoinder on: A random forest guided tour," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 25(2), pages 264-268, June.
    44. Silke Janitza & Ender Celik & Anne-Laure Boulesteix, 2018. "A computationally fast variable importance test for random forests for high-dimensional data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(4), pages 885-915, December.
    45. Jeffrey Pontiff & Artemiza Woodgate, 2008. "Share Issuance and Cross‐sectional Returns," Journal of Finance, American Finance Association, vol. 63(2), pages 921-945, April.
    46. Michaely, Roni & Thaler, Richard H & Womack, Kent L, 1995. "Price Reactions to Dividend Initiations and Omissions: Overreaction or Drift?," Journal of Finance, American Finance Association, vol. 50(2), pages 573-608, June.
    47. Basu, S, 1977. "Investment Performance of Common Stocks in Relation to Their Price-Earnings Ratios: A Test of the Efficient Market Hypothesis," Journal of Finance, American Finance Association, vol. 32(3), pages 663-682, June.
    48. Liu, Weimin, 2006. "A liquidity-augmented capital asset pricing model," Journal of Financial Economics, Elsevier, vol. 82(3), pages 631-671, December.
    49. Fama, Eugene F & MacBeth, James D, 1973. "Risk, Return, and Equilibrium: Empirical Tests," Journal of Political Economy, University of Chicago Press, vol. 81(3), pages 607-636, May-June.
    50. Hong, Harrison & Kacperczyk, Marcin, 2009. "The price of sin: The effects of social norms on markets," Journal of Financial Economics, Elsevier, vol. 93(1), pages 15-36, July.
    51. Datar, Vinay T. & Y. Naik, Narayan & Radcliffe, Robert, 1998. "Liquidity and stock returns: An alternative test," Journal of Financial Markets, Elsevier, vol. 1(2), pages 203-219, August.
    52. Re-Jin Guo & Baruch Lev & Charles Shi, 2006. "Explaining the Short- and Long-Term IPO Anomalies in the US by R&D," Journal of Business Finance & Accounting, Wiley Blackwell, vol. 33(3-4), pages 550-579.
    53. Jegadeesh, Narasimhan & Titman, Sheridan, 1993. "Returns to Buying Winners and Selling Losers: Implications for Stock Market Efficiency," Journal of Finance, American Finance Association, vol. 48(1), pages 65-91, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Zacharia Issa & Blanka Horvath, 2023. "Non-parametric online market regime detection and regime clustering for multidimensional and path-dependent data structures," Papers 2306.15835, arXiv.org.
    2. Yilin Zhao & Feng He & Ying Feng, 2022. "Research on the Current Situation of Employment Mobility and Retention Rate Predictions of “Double First-Class” University Graduates Based on the Random Forest and BP Neural Network Models," Sustainability, MDPI, vol. 14(14), pages 1-22, July.
    3. Soukarieh, Inass & Bouzebda, Salim, 2023. "Renewal type bootstrap for increasing degree U-process of a Markov chain," Journal of Multivariate Analysis, Elsevier, vol. 195(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Tran, Vu Le, 2023. "Sentiment and covariance characteristics," International Review of Financial Analysis, Elsevier, vol. 86(C).
    2. De Nard, Gianluca & Zhao, Zhao, 2022. "A large-dimensional test for cross-sectional anomalies:Efficient sorting revisited," International Review of Economics & Finance, Elsevier, vol. 80(C), pages 654-676.
    3. Weichuan Deng & Pawel Polak & Abolfazl Safikhani & Ronakdilip Shah, 2023. "A Unified Framework for Fast Large-Scale Portfolio Optimization," Papers 2303.12751, arXiv.org, revised Nov 2023.
    4. Jiaju Miao & Pawel Polak, 2023. "Online Ensemble of Models for Optimal Predictive Performance with Applications to Sector Rotation Strategy," Papers 2304.09947, arXiv.org.
    5. Bui, Dien Giau & Kong, De-Rong & Lin, Chih-Yung & Lin, Tse-Chun, 2023. "Momentum in machine learning: Evidence from the Taiwan stock market," Pacific-Basin Finance Journal, Elsevier, vol. 82(C).
    6. Gianluca De Nard & Simon Hediger & Markus Leippold, 2022. "Subsampled factor models for asset pricing: The rise of Vasa," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 41(6), pages 1217-1247, September.
    7. Hoang, Khoa & Cannavan, Damien & Gaunt, Clive & Huang, Ronghong, 2019. "Is that factor just lucky? Australian evidence," Pacific-Basin Finance Journal, Elsevier, vol. 57(C).
    8. Geertsema, Paul & Lu, Helen, 2020. "The correlation structure of anomaly strategies," Journal of Banking & Finance, Elsevier, vol. 119(C).
    9. Andrew Y. Chen & Tom Zimmermann, 2022. "Open Source Cross-Sectional Asset Pricing," Critical Finance Review, now publishers, vol. 11(2), pages 207-264, May.
    10. Hou, Kewei & Xue, Chen & Zhang, Lu, 2017. "Replicating Anomalies," Working Paper Series 2017-10, Ohio State University, Charles A. Dice Center for Research in Financial Economics.
    11. Guanhao Feng & Stefano Giglio & Dacheng Xiu, 2020. "Taming the Factor Zoo: A Test of New Factors," Journal of Finance, American Finance Association, vol. 75(3), pages 1327-1370, June.
    12. Hoang, Khoa & Huang, Ronghong & Truong, Helen, 2023. "Resurrecting the market factor: A case of data mining across international markets," Pacific-Basin Finance Journal, Elsevier, vol. 82(C).
    13. Kristoffer Pons Bertelsen, 2022. "The Prior Adaptive Group Lasso and the Factor Zoo," CREATES Research Papers 2022-05, Department of Economics and Business Economics, Aarhus University.
    14. Tobek, Ondrej & Hronec, Martin, 2021. "Does it pay to follow anomalies research? Machine learning approach with international evidence," Journal of Financial Markets, Elsevier, vol. 56(C).
    15. Yu-Chin Hsu & Hsiou-Wei Lin & Kendro Vincent, 2017. "Do Cross-Sectional Stock Return Predictors Pass the Test without Data-Snooping Bias?," IEAS Working Paper : academic research 17-A003, Institute of Economics, Academia Sinica, Taipei, Taiwan.
    16. Jozef Barunik & Martin Hronec & Ondrej Tobek, 2024. "Predicting the distributions of stock returns around the globe in the era of big data and learning," Papers 2408.07497, arXiv.org.
    17. Wang, Feifei & Yan, Xuemin Sterling, 2021. "Downside risk and the performance of volatility-managed portfolios," Journal of Banking & Finance, Elsevier, vol. 131(C).
    18. Cederburg, Scott & O’Doherty, Michael S. & Wang, Feifei & Yan, Xuemin (Sterling), 2020. "On the performance of volatility-managed portfolios," Journal of Financial Economics, Elsevier, vol. 138(1), pages 95-117.
    19. Vincent, Kendro & Hsu, Yu-Chin & Lin, Hsiou-Wei, 2021. "Investment styles and the multiple testing of cross-sectional stock return predictability," Journal of Financial Markets, Elsevier, vol. 56(C).
    20. Kewei Hou & Haitao Mo & Chen Xue & Lu Zhang, 2019. "Which Factors?," Review of Finance, European Finance Association, vol. 23(1), pages 1-35.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:170:y:2022:i:c:s0167947322000159. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.