IDEAS home Printed from https://ideas.repec.org/p/gwc/wpaper/2008-008.html
   My bibliography  Save this paper

An Exploration of Regression-Based Data Mining Techniques Using Super Computation

Author

Listed:
  • Antony Davies

    (Department of Economics Duquesne University The Mercatus Center George Mason University)

Abstract

Regression analysis is intended to be used when the researcher seeks to test a given hypothesis against a data set. Unfortunately, in many applications it is either not possible to specify a hypothesis, typically because the research is in a very early stage, or it is not desirable to form a hypothesis, typically because the number of potential explanatory variables is very large. In these cases, researchers have resorted either to overt data mining techniques such as stepwise regression, or covert data mining techniques such as running variations on regression models prior to running the final model (also known as “data peeking”). While data mining side-steps the need to form a hypothesis, it is highly susceptible to generating spurious results. This paper draws on the known properties of OLS estimators in the presence of omitted and extraneous variable models to propose a procedure for data mining that attempts to distinguish between parameter estimates that are significant due to an underlying structural relationship and those that are significant due to random chance.

Suggested Citation

  • Antony Davies, 2008. "An Exploration of Regression-Based Data Mining Techniques Using Super Computation," Working Papers 2008-008, The George Washington University, Department of Economics, H. O. Stekler Research Program on Forecasting.
  • Handle: RePEc:gwc:wpaper:2008-008
    as

    Download full text from publisher

    File URL: https://www2.gwu.edu/~forcpgm/2008-008.pdf
    File Function: First version, 2008
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Davies, Antony, 2006. "A framework for decomposing shocks and measuring volatilities derived from multi-dimensional panel data of survey forecasts," International Journal of Forecasting, Elsevier, vol. 22(2), pages 373-393.
    2. Yatchew, Adonis & Griliches, Zvi, 1985. "Specification Error in Probit Models," The Review of Economics and Statistics, MIT Press, vol. 67(1), pages 134-139, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Campbell, Randall C. & Nagel, Gregory L., 2016. "Private information and limitations of Heckman's estimator in banking and corporate finance research," Journal of Empirical Finance, Elsevier, vol. 37(C), pages 186-195.
    2. Anthony Edo & Nicolas Jacquemet & Constantine Yannelis, 2019. "Language skills and homophilous hiring discrimination: Evidence from gender and racially differentiated applications," Review of Economics of the Household, Springer, vol. 17(1), pages 349-376, March.
    3. Bedri Kamil Onur Taş, 2016. "Does the Federal Reserve have Private Information about its Future Actions?," Economica, London School of Economics and Political Science, vol. 83(331), pages 498-517, July.
    4. Charlie Tchinda & Marcus Dejardin, 2021. "Are Business Policy Measures in Response to the COVID-19 Pandemic to Be Equally Valued? An Exploration According to SMEs Owners’ Business Expectations," Sustainability, MDPI, vol. 13(21), pages 1-42, October.
    5. Pedro Garcia‐del‐Barrio & Pablo Agnese, 2023. "To comply or not to comply? How a UEFA wage‐to‐revenue requirement might affect the sport and managerial performance of soccer clubs," Managerial and Decision Economics, John Wiley & Sons, Ltd., vol. 44(2), pages 767-786, March.
    6. Wixe, Sofia & Nilsson, Pia & Naldi, Lucia & Westlund, Hans, 2017. "Disentangling Innovation in Small Food Firms: The role of External Knowledge, Support, and Collaboration," Working Paper Series in Economics and Institutions of Innovation 446, Royal Institute of Technology, CESIS - Centre of Excellence for Science and Innovation Studies.
    7. Arduini, Davide & Belotti, Federico & Denni, Mario & Giungato, Gerolamo & Zanfei, Antonello, 2010. "Technology adoption and innovation in public services the case of e-government in Italy," Information Economics and Policy, Elsevier, vol. 22(3), pages 257-275, July.
    8. Terry N. Flynn & Elisabeth Huynh & Tim J. Peters & Hareth Al‐Janabi & Sam Clemens & Alison Moody & Joanna Coast, 2015. "Scoring the Icecap‐a Capability Instrument. Estimation of a UK General Population Tariff," Health Economics, John Wiley & Sons, Ltd., vol. 24(3), pages 258-269, March.
    9. ÇAGLAYAN, Ebru & UN, Turgut, 2012. "Heteroscedastic Probit Model: An Application Of Home Ownership In Turkey," Regional and Sectoral Economic Studies, Euro-American Association of Economic Development, vol. 12(2).
    10. Bhatta, Bharat P. & Larsen, Odd I., 2011. "Errors in variables in multinomial choice modeling: A simulation study applied to a multinomial logit model of travel mode choice," Transport Policy, Elsevier, vol. 18(2), pages 326-335, March.
    11. Giorgio Brunello & Francesca Gambarotto, 2004. "Agglomeration Effects on Employer-Provided Training: Evidence from the UK," CESifo Working Paper Series 1150, CESifo.
    12. Zizhuo Wang & Chaolin Yang & Hongsong Yuan & Yaowu Zhang, 2021. "Aggregation Bias in Estimating Log‐Log Demand Function," Production and Operations Management, Production and Operations Management Society, vol. 30(11), pages 3906-3922, November.
    13. Agnello, Luca & Schuknecht, Ludger, 2011. "Booms and busts in housing markets: Determinants and implications," Journal of Housing Economics, Elsevier, vol. 20(3), pages 171-190, September.
    14. Kevin A. Clarke, 2005. "The Phantom Menace: Omitted Variable Bias in Econometric Research," Conflict Management and Peace Science, Peace Science Society (International), vol. 22(4), pages 341-352, September.
    15. Oliver J. Rutz & Garrett P. Sonnier, 2011. "The Evolution of Internal Market Structure," Marketing Science, INFORMS, vol. 30(2), pages 274-289, 03-04.
    16. Ceema Zahra Namazie, 2002. "Who Bore the Burden of Wage Arrears in the Kyrgyz Republic?," STICERD - Distributional Analysis Research Programme Papers 64, Suntory and Toyota International Centres for Economics and Related Disciplines, LSE.
    17. Difang Huang & Jiti Gao & Tatsushi Oka, 2022. "Semiparametric Single-Index Estimation for Average Treatment Effects," Papers 2206.08503, arXiv.org, revised Jan 2025.
    18. Ager, P. & Kappler, M. & Osterloh, S., 2009. "The accuracy and efficiency of the Consensus Forecasts: A further application and extension of the pooled approach," International Journal of Forecasting, Elsevier, vol. 25(1), pages 167-181.
    19. Ginker, Tim & Lieberman, Offer, 2017. "Robustness of binary choice models to conditional heteroscedasticity," Economics Letters, Elsevier, vol. 150(C), pages 130-134.
    20. Vibhanshu Abhishek & Kartik Hosanagar & Peter S. Fader, 2015. "Aggregation Bias in Sponsored Search Data: The Curse and the Cure," Marketing Science, INFORMS, vol. 34(1), pages 59-77, January.

    More about this item

    Keywords

    exhaustive; regression; all subsets; stepwise; data mining;
    All these keywords.

    JEL classification:

    • C10 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - General
    • C40 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods: Special Topics - - - General
    • C63 - Mathematical and Quantitative Methods - - Mathematical Methods; Programming Models; Mathematical and Simulation Modeling - - - Computational Techniques

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gwc:wpaper:2008-008. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: GW Economics Department (email available below). General contact details of provider: https://edirc.repec.org/data/pfgwuus.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.