IDEAS home Printed from https://ideas.repec.org/p/zur/econwp/426.html
   My bibliography  Save this paper

Neglected heterogeneity, Simpson’s paradox, and the anatomy of least squares

Author

Listed:
  • Rainer Winkelmann

Abstract

When a sample combines data from two or more groups, multivariate regression yields a matrix-weighted average of the group-specific coefficient vectors. However, it is possible that the weighted average of a specific coefficient falls outside the range of the group-specific coefficients, and it may even have a different sign compared to both group-level coefficients, a manifestation of Simpson's paradox. The result of the combined regression is then prone to misinterpretation. The purpose of this paper is to raise awareness of this problem and to state conditions under which such non-convex weighting or sign reversal can arise, for a model with two regressors and two groups. Two illustrative examples, an investment equation estimated with panel data, and a cross-sectional earnings equation for men and women, highlight the relevance of these findings for applied work.

Suggested Citation

  • Rainer Winkelmann, 2023. "Neglected heterogeneity, Simpson’s paradox, and the anatomy of least squares," ECON - Working Papers 426, Department of Economics - University of Zurich, revised Jul 2023.
  • Handle: RePEc:zur:econwp:426
    as

    Download full text from publisher

    File URL: https://www.zora.uzh.ch/id/eprint/229123/13/econwp426.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Graham, Bryan S. & Pinto, Cristine Campos de Xavier, 2022. "Semiparametrically efficient estimation of the average linear regression function," Journal of Econometrics, Elsevier, vol. 226(1), pages 115-138.
    2. Stoker, Thomas M, 1986. "Consistent Estimation of Scaled Coefficients," Econometrica, Econometric Society, vol. 54(6), pages 1461-1481, November.
    3. Jeffrey M. Wooldridge, 2004. "Estimating average partial effects under conditional moment independence assumptions," CeMMAP working papers CWP03/04, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    4. Philip Oreopoulos, 2006. "Estimating Average and Local Average Treatment Effects of Education when Compulsory Schooling Laws Really Matter," American Economic Review, American Economic Association, vol. 96(1), pages 152-175, March.
    5. Paul Goldsmith-Pinkham & Peter Hull & Michal Kolesár, 2024. "Contamination Bias in Linear Regressions," American Economic Review, American Economic Association, vol. 114(12), pages 4015-4051, December.
    6. Joshua D. Angrist, 1998. "Estimating the Labor Market Impact of Voluntary Military Service Using Social Security Data on Military Applicants," Econometrica, Econometric Society, vol. 66(2), pages 249-288, March.
    7. Guido W. Imbens & Jeffrey M. Wooldridge, 2009. "Recent Developments in the Econometrics of Program Evaluation," Journal of Economic Literature, American Economic Association, vol. 47(1), pages 5-86, March.
    8. Yitzhaki, Shlomo, 1996. "On Using Linear Regressions in Welfare Economics," Journal of Business & Economic Statistics, American Statistical Association, vol. 14(4), pages 478-486, October.
    9. Griliches, Zvi, 1977. "Estimating the Returns to Schooling: Some Econometric Problems," Econometrica, Econometric Society, vol. 45(1), pages 1-22, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Winkelmann Rainer, 2024. "Neglected Heterogeneity, Simpson’s Paradox, and the Anatomy of Least Squares," Journal of Econometric Methods, De Gruyter, vol. 13(1), pages 131-144, January.
    2. W K Newey & S Stouli, 2022. "Heterogeneous coefficients, control variables and identification of multiple treatment effects [Multivalued treatments and decomposition analysis: An application to the WIA program]," Biometrika, Biometrika Trust, vol. 109(3), pages 865-872.
    3. Sloczynski, Tymon, 2018. "A General Weighted Average Representation of the Ordinary and Two-Stage Least Squares Estimands," IZA Discussion Papers 11866, Institute of Labor Economics (IZA).
    4. Tymon S{l}oczy'nski, 2018. "Interpreting OLS Estimands When Treatment Effects Are Heterogeneous: Smaller Groups Get Larger Weights," Papers 1810.01576, arXiv.org, revised May 2020.
    5. Graham, Bryan S. & Pinto, Cristine Campos de Xavier, 2022. "Semiparametrically efficient estimation of the average linear regression function," Journal of Econometrics, Elsevier, vol. 226(1), pages 115-138.
    6. Tymon Słoczyński, 2022. "Interpreting OLS Estimands When Treatment Effects Are Heterogeneous: Smaller Groups Get Larger Weights," The Review of Economics and Statistics, MIT Press, vol. 104(3), pages 501-509, May.
    7. Jiafeng Chen, 2021. "Nonparametric Treatment Effect Identification in School Choice," Papers 2112.03872, arXiv.org, revised Oct 2023.
    8. Słoczyński, Tymon, 2012. "New Evidence on Linear Regression and Treatment Effect Heterogeneity," MPRA Paper 39524, University Library of Munich, Germany.
    9. Gernandt, Johannes & Maier, Michael & Pfeiffer, Friedhelm & Rat-Wirtzler, Julie, 2006. "Distributional effects of the high school degree in Germany," ZEW Discussion Papers 06-088, ZEW - Leibniz Centre for European Economic Research.
    10. Halbert White & Karim Chalak, 2013. "Identification and Identification Failure for Treatment Effects Using Structural Systems," Econometric Reviews, Taylor & Francis Journals, vol. 32(3), pages 273-317, November.
    11. Dirk Czarnitzki & Cindy Lopes-Bento, 2014. "Innovation Subsidies: Does the Funding Source Matter for Innovation Intensity and Performance? Empirical Evidence from Germany," Industry and Innovation, Taylor & Francis Journals, vol. 21(5), pages 380-409, July.
    12. Bhuller, Manudeep & Mogstad, Magne & Salvanes, Kjell G., 2011. "Life-Cycle Bias and the Returns to Schooling in Current and Lifetime Earnings," IZA Discussion Papers 5788, Institute of Labor Economics (IZA).
    13. Lihua Lei, 2024. "Causal Interpretation of Regressions With Ranks," Papers 2406.05548, arXiv.org.
    14. Czarnitzki, Dirk & Lopes-Bento, Cindy, 2013. "Value for money? New microeconometric evidence on public R&D grants in Flanders," Research Policy, Elsevier, vol. 42(1), pages 76-89.
    15. Peter Hull & Michal Kolesár & Christopher Walters, 2022. "Labor by design: contributions of David Card, Joshua Angrist, and Guido Imbens," Scandinavian Journal of Economics, Wiley Blackwell, vol. 124(3), pages 603-645, July.
    16. DiTraglia, Francis J. & García-Jimeno, Camilo & O’Keeffe-O’Donovan, Rossa & Sánchez-Becerra, Alejandro, 2023. "Identifying causal effects in experiments with spillovers and non-compliance," Journal of Econometrics, Elsevier, vol. 235(2), pages 1589-1624.
    17. Guido W. Imbens, 2020. "Potential Outcome and Directed Acyclic Graph Approaches to Causality: Relevance for Empirical Practice in Economics," Journal of Economic Literature, American Economic Association, vol. 58(4), pages 1129-1179, December.
    18. Jeffrey Smith & Arthur Sweetman, 2016. "Viewpoint: Estimating the causal effects of policies and programs," Canadian Journal of Economics, Canadian Economics Association, vol. 49(3), pages 871-905, August.
    19. Erich Battistin & Barbara Sianesi, 2006. "Misreported schooling and returns to education: evidence from the UK," CeMMAP working papers CWP07/06, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    20. Tamini, Lota D., 2011. "A nonparametric analysis of the impact of agri-environmental advisory activities on best management practice adoption: A case study of Québec," Ecological Economics, Elsevier, vol. 70(7), pages 1363-1374, May.

    More about this item

    Keywords

    Covariance-weighting; heterogeneity spillover; non-convex average; average treatment effect;
    All these keywords.

    JEL classification:

    • C21 - Mathematical and Quantitative Methods - - Single Equation Models; Single Variables - - - Cross-Sectional Models; Spatial Models; Treatment Effect Models

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:zur:econwp:426. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Severin Oswald (email available below). General contact details of provider: https://edirc.repec.org/data/seizhch.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.