IDEAS home Printed from https://ideas.repec.org/a/bla/obuest/v86y2024i2p417-447.html
   My bibliography  Save this article

Why Transform Y? The Pitfalls of Transformed Regressions with a Mass at Zero

Author

Listed:
  • John Mullahy
  • Edward C. Norton

Abstract

Applied economists often transform a dependent variable that is non‐negative and skewed with the natural log transformation, the inverse hyperbolic sine transformation, or power function. We show that these transformations separate the zeros from the positives such that the estimated parameters are related to those from a scaled linear probability model. The retransformed marginal effects and elasticities are sensitive to changes in a shape parameter, ranging in magnitude between those of an untransformed least squares regression and those of a scaled linear probability model. Instead of transforming the dependent variable with non‐negative outcomes that includes zeros, we recommend using a non‐transformed dependent variable, such as a two‐part model, untransformed linear regression, or Poisson.

Suggested Citation

  • John Mullahy & Edward C. Norton, 2024. "Why Transform Y? The Pitfalls of Transformed Regressions with a Mass at Zero," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 86(2), pages 417-447, April.
  • Handle: RePEc:bla:obuest:v:86:y:2024:i:2:p:417-447
    DOI: 10.1111/obes.12583
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/obes.12583
    Download Restriction: no

    File URL: https://libkey.io/10.1111/obes.12583?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Gilleskie, Donna B. & Mroz, Thomas A., 2004. "A flexible approach for estimating the effects of covariates on health expenditures," Journal of Health Economics, Elsevier, vol. 23(2), pages 391-418, March.
    2. MacKinnon, James G & Magee, Lonnie, 1990. "Transforming the Dependent Variable in Regression Models," International Economic Review, Department of Economics, University of Pennsylvania and Osaka University Institute of Social and Economic Research Association, vol. 31(2), pages 315-339, May.
    3. Wooldridge, Jeffrey M., 1999. "Distribution-free estimation of some nonlinear panel data models," Journal of Econometrics, Elsevier, vol. 90(1), pages 77-97, May.
    4. Ruud, Paul A., 1986. "Consistent estimation of limited dependent variable models despite misspecification of distribution," Journal of Econometrics, Elsevier, vol. 32(1), pages 157-187, June.
    5. Ghislain B D Aihounton & Arne Henningsen, 2021. "Units of measurement and the inverse hyperbolic sine transformation," The Econometrics Journal, Royal Economic Society, vol. 24(2), pages 334-351.
    6. Gihleb, Rania & Giuntella, Osea & Stella, Luca & Wang, Tianyi, 2022. "Industrial robots, Workers’ safety, and health," Labour Economics, Elsevier, vol. 78(C).
    7. Manning, Willard G, et al, 1987. "Health Insurance and the Demand for Medical Care: Evidence from a Randomized Experiment," American Economic Review, American Economic Association, vol. 77(3), pages 251-277, June.
    8. Heblich, Stephan & Redding, Stephen J. & Voth, Hans-Joachim, 2022. "Slavery and the British Industrial Revolution," LSE Research Online Documents on Economics 118034, London School of Economics and Political Science, LSE Library.
    9. John Mullahy, 1998. "Much Ado About Two: Reconsidering Retransformation and the Two-Part Model in Health Economics," NBER Technical Working Papers 0228, National Bureau of Economic Research, Inc.
    10. Darin Christensen & Oeindrila Dube & Johannes Haushofer & Bilal Siddiqi & Maarten Voors, 2021. "Building Resilient Health Systems: Experimental Evidence from Sierra Leone and The 2014 Ebola Outbreak," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 136(2), pages 1145-1198.
    11. Dalton, Kathleen & Norton, Edward C., 2000. "Revisiting Rogowski and Newhouse on the indirect costs of teaching: a note on functional form and retransformation in Medicare's payment formulas," Journal of Health Economics, Elsevier, vol. 19(6), pages 1027-1046, November.
    12. Santos Silva, J.M.C. & Tenreyro, Silvana, 2011. "Further simulation evidence on the performance of the Poisson pseudo-maximum likelihood estimator," Economics Letters, Elsevier, vol. 112(2), pages 220-222, August.
    13. Steven T. Yen & Andrew M. Jones, 1997. "Household Consumption of Cheese: An Inverse Hyperbolic Sine Double-Hurdle Model with Dependent Errors," American Journal of Agricultural Economics, Agricultural and Applied Economics Association, vol. 79(1), pages 246-251.
    14. Edward C. Norton, 2022. "The inverse hyperbolic sine transformation and retransformed marginal effects," Stata Journal, StataCorp LLC, vol. 22(3), pages 702-712, September.
    15. Jeffrey M Wooldridge, 2010. "Econometric Analysis of Cross Section and Panel Data," MIT Press Books, The MIT Press, edition 2, volume 1, number 0262232588, December.
    16. Rogowski, Jeannette A. & Newhouse, Joseph P., 1992. "Estimating the indirect costs of teaching," Journal of Health Economics, Elsevier, vol. 11(2), pages 153-171, August.
    17. Pence Karen M., 2006. "The Role of Wealth Transformations: An Application to Estimating the Effect of Tax Incentives on Saving," The B.E. Journal of Economic Analysis & Policy, De Gruyter, vol. 5(1), pages 1-26, July.
    18. Cragg, John G, 1971. "Some Statistical Models for Limited Dependent Variables with Application to the Demand for Durable Goods," Econometrica, Econometric Society, vol. 39(5), pages 829-844, September.
    19. Steven T. Yen & Andrew M. Jones, 1996. "Individual cigarette consumption and addiction: A flexible limited dependent variable approach," Health Economics, John Wiley & Sons, Ltd., vol. 5(2), pages 105-117, March.
    20. Marc F. Bellemare & Casey J. Wichman, 2020. "Elasticities and the Inverse Hyperbolic Sine Transformation," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 82(1), pages 50-61, February.
    21. Almond, Douglas & Cheng, Yi, 2021. "Perinatal health among 1 million Chinese-Americans," Economics & Human Biology, Elsevier, vol. 40(C).
    22. Ariel Kalil & Hope Corman & Dhaval M. Dave & Ofira Schwartz-Soicher & Nancy Reichman, 2022. "Welfare Reform and the Quality of Young Children's Home Environments," NBER Working Papers 30407, National Bureau of Economic Research, Inc.
    23. Mullahy, John, 1986. "Specification and testing of some modified count data models," Journal of Econometrics, Elsevier, vol. 33(3), pages 341-365, December.
    24. David Autor & Caroline Chin & Anna Salomons & Bryan Seegmiller, 2024. "New Frontiers: The Origins and Content of New Work, 1940–2018," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 139(3), pages 1399-1465.
    25. Cynthia Kinnan & Shing-Yi Wang & Yongxiang Wang, 2018. "Access to Migration for Rural Households," American Economic Journal: Applied Economics, American Economic Association, vol. 10(4), pages 79-119, October.
    26. Marc F. Bellemare & Christopher B. Barrett & David R. Just, 2013. "The Welfare Impacts of Commodity Price Volatility: Evidence from Rural Ethiopia," American Journal of Agricultural Economics, Agricultural and Applied Economics Association, vol. 95(4), pages 877-899.
    27. Richard C. Lindrooth & Edward C. Norton & Barbara Dickey, 2002. "Provider Selection, Bargaining, and Utilization Management in Managed Care," Economic Inquiry, Western Economic Association International, vol. 40(3), pages 348-365, July.
    28. Brown, Sarah & Greene, William H. & Harris, Mark N. & Taylor, Karl, 2015. "An inverse hyperbolic sine heteroskedastic latent class panel tobit model: An application to modelling charitable donations," Economic Modelling, Elsevier, vol. 50(C), pages 228-236.
    29. Chesher, Andrew & Peters, Simon, 1994. "Symmetry, Regression Design, and Sampling Distributions," Econometric Theory, Cambridge University Press, vol. 10(1), pages 116-129, March.
    30. Chesher, Andrew, 1995. "A Mirror Image Invariance for M-Estimators," Econometrica, Econometric Society, vol. 63(1), pages 207-211, January.
    31. John Mullahy, 1997. "Instrumental-Variable Estimation Of Count Data Models: Applications To Models Of Cigarette Smoking Behavior," The Review of Economics and Statistics, MIT Press, vol. 79(4), pages 586-593, November.
    32. Nicholas J. Cox, 2011. "Stata tip 96: Cube roots," Stata Journal, StataCorp LLC, vol. 11(1), pages 149-154, March.
    33. James Carroll & Siobhan McCarthy & Carol Newman, 2005. "An Econometric Analysis of Charitable Donations in the Republic of Ireland," The Economic and Social Review, Economic and Social Studies, vol. 36(3), pages 229-249.
    34. Blough, David K. & Madden, Carolyn W. & Hornbrook, Mark C., 1999. "Modeling risk using generalized linear models," Journal of Health Economics, Elsevier, vol. 18(2), pages 153-171, April.
    35. Halvorsen, Robert & Palmquist, Raymond, 1980. "The Interpretation of Dummy Variables in Semilogarithmic Equations," American Economic Review, American Economic Association, vol. 70(3), pages 474-475, June.
    36. Aaron Chalfin & Benjamin Hansen & Emily K. Weisburst & Morgan C. Williams Jr., 2022. "Police Force Size and Civilian Race," American Economic Review: Insights, American Economic Association, vol. 4(2), pages 139-158, June.
    37. Ruud, Paul A, 1983. "Sufficient Conditions for the Consistency of Maximum Likelihood Estimation Despite Misspecifications of Distribution in Multinomial Discrete Choice Models," Econometrica, Econometric Society, vol. 51(1), pages 225-228, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Karlsson, Martin & Wang, Yulong & Ziebarth, Nicolas R., 2024. "Getting the right tail right: Modeling tails of health expenditure distributions," Journal of Health Economics, Elsevier, vol. 97(C).
    2. Marc F. Bellemare & Jeffrey R. Bloem & Noah Wexler, 2024. "The Paper of How: Estimating Treatment Effects Using the Front‐Door Criterion," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 86(4), pages 951-993, August.
    3. Antonella Bancalari & Juan Pablo Rud, 2025. "Resource windfalls, Public Expenditures, and Local Economies," Working Papers 348, Red Nacional de Investigadores en Economía (RedNIE).
    4. Rodríguez-Puello, Gabriel & Rickardsson, Jonna, 2024. "Spatial Diffusion of Economic Shocks in the Labor Market: Evidence from a Mining Boom and Bust," OSF Preprints tzmf2_v1, Center for Open Science.
    5. Jan Schymik & Matthias Meier & Alexander Schramm & Alexander Schwemmer, 2025. "Capital (Mis)allocation, Incentives and Productivity," CRC TR 224 Discussion Paper Series crctr224_2025_637, University of Bonn and University of Mannheim, Germany.
    6. Izumi, Yutaro & Shigeoka, Hitoshi & Yagasaki, Masayuki, 2024. "Golfing CEOs," Labour Economics, Elsevier, vol. 91(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Faruk Urak & Nihat Küçük & Abdulbaki Bilgiç & Steven T Yen, 2023. "Modeling censored tourism expenditures in Turkey with non-normal and heteroscedastic errors: An application of the inverse hyperbolic sine double-hurdle model," Tourism Economics, , vol. 29(3), pages 718-741, May.
    2. Brown, Sarah & Greene, William H. & Harris, Mark N. & Taylor, Karl, 2015. "An inverse hyperbolic sine heteroskedastic latent class panel tobit model: An application to modelling charitable donations," Economic Modelling, Elsevier, vol. 50(C), pages 228-236.
    3. Jones, A.M, 2010. "Models For Health Care," Health, Econometrics and Data Group (HEDG) Working Papers 10/01, HEDG, c/o Department of Economics, University of York.
    4. Keane, Michael & Stavrunova, Olena, 2016. "Adverse selection, moral hazard and the demand for Medigap insurance," Journal of Econometrics, Elsevier, vol. 190(1), pages 62-78.
    5. Amanda Kowalski, 2016. "Censored Quantile Instrumental Variable Estimates of the Price Elasticity of Expenditure on Medical Care," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 34(1), pages 107-117, January.
    6. Cantoni, Eva & Ronchetti, Elvezio, 2006. "A robust approach for skewed and heavy-tailed outcomes in the analysis of health care expenditures," Journal of Health Economics, Elsevier, vol. 25(2), pages 198-213, March.
    7. Nguyen Thi Tuong Anh & Hung Quang Doan & Tuan Anh Bui & Nam Hoang Vu & Duong Thuy Thanh Le, 2022. "A Revisit of Motives for Chinese Outward Foreign Direct Investment: The Role of the Institution in Host Countries," SAGE Open, , vol. 12(4), pages 21582440221, December.
    8. Usala, Cristian & Primerano, Ilaria & Santelli, Francesco & Ragozini, Giancarlo, 2024. "The more the better? How degree programs’ variety affects university students’ churn risk," Socio-Economic Planning Sciences, Elsevier, vol. 94(C).
    9. Mullahy, John, 1998. "Much ado about two: reconsidering retransformation and the two-part model in health econometrics," Journal of Health Economics, Elsevier, vol. 17(3), pages 247-281, June.
    10. Keane, Michael & Stavrunova, Olena, 2016. "Adverse selection, moral hazard and the demand for Medigap insurance," Journal of Econometrics, Elsevier, vol. 190(1), pages 62-78.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:obuest:v:86:y:2024:i:2:p:417-447. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://edirc.repec.org/data/sfeixuk.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.