IDEAS home Printed from https://ideas.repec.org/a/taf/japsta/v43y2016i4p682-703.html
   My bibliography  Save this article

A statistical approach to address the problem of heaping in self-reported income data

Author

Listed:
  • S. Zinn
  • A. Würbach

Abstract

Self-reported income information particularly suffers from an intentional coarsening of the data, which is called heaping or rounding. If it does not occur completely at random -- which is usually the case -- heaping and rounding have detrimental effects on the results of statistical analysis. Conventional statistical methods do not consider this kind of reporting bias, and thus might produce invalid inference. We describe a novel statistical modeling approach that allows us to deal with self-reported heaped income data in an adequate and flexible way. We suggest modeling heaping mechanisms and the true underlying model in combination. To describe the true net income distribution, we use the zero-inflated log-normal distribution. Heaping points are identified from the data by applying a heuristic procedure comparing a hypothetical income distribution and the empirical one. To determine heaping behavior, we employ two distinct models: either we assume piecewise constant heaping probabilities, or heaping probabilities are considered to increase steadily with proximity to a heaping point. We validate our approach by some examples. To illustrate the capacity of the proposed method, we conduct a case study using income data from the German National Educational Panel Study.

Suggested Citation

  • S. Zinn & A. Würbach, 2016. "A statistical approach to address the problem of heaping in self-reported income data," Journal of Applied Statistics, Taylor & Francis Journals, vol. 43(4), pages 682-703, March.
  • Handle: RePEc:taf:japsta:v:43:y:2016:i:4:p:682-703
    DOI: 10.1080/02664763.2015.1077372
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1080/02664763.2015.1077372
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1080/02664763.2015.1077372?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Torelli, Nicola & Trivellato, Ugo, 1993. "Modelling inaccuracies in job-search duration data," Journal of Econometrics, Elsevier, vol. 59(1-2), pages 187-211, September.
    2. John Roberts & Devon Brewer, 2001. "Measures and tests of heaping in discrete quantitative distributions," Journal of Applied Statistics, Taylor & Francis Journals, vol. 28(7), pages 887-896.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Qiang Fu & Tian‐Yi Zhou & Xin Guo, 2021. "Modified Poisson regression analysis of grouped and right‐censored counts," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(4), pages 1347-1367, October.
    2. Byung-hill Jun & Hosin Song, 2019. "Tests for Detecting Probability Mass Points," Korean Economic Review, Korean Economic Association, vol. 35, pages 205-248.
    3. Speidel, Matthias & Drechsler, Jörg & Jolani, Shahab, 2018. "R package hmi: a convenient tool for hierarchical multiple imputation and beyond," IAB-Discussion Paper 201816, Institut für Arbeitsmarkt- und Berufsforschung (IAB), Nürnberg [Institute for Employment Research, Nuremberg, Germany].

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. David Madden, 2002. "Do Tobacco Taxes Influence Starting and Quitting Smoking? A Discrete Choice Approach Using Evidence from a Sample of Irish Women," Working Papers 200205, School of Economics, University College Dublin.
    2. Erich Battistin & Raffaele Miniaci & Guglielmo Weber, 2003. "What Do We Learn from Recall Consumption Data?," Journal of Human Resources, University of Wisconsin Press, vol. 38(2).
    3. Hie Joo Ahn & James Hamilton, 2022. "Measuring Labor-Force Participation and the Incidence and Duration of Unemployment," Review of Economic Dynamics, Elsevier for the Society for Economic Dynamics, vol. 44, pages 1-32, April.
    4. Zezza, Alberto & Federighi, Giovanni & Kalilou, Amadou Adamou & Hiernaux, Pierre, 2016. "Milking the data: Measuring milk off-take in extensive livestock systems. Experimental evidence from Niger," Food Policy, Elsevier, vol. 59(C), pages 174-186.
    5. Wollburg, Philip & Tiberti, Marco & Zezza, Alberto, 2021. "Recall length and measurement error in agricultural surveys," Food Policy, Elsevier, vol. 100(C).
    6. David (David Patrick) Madden, 2003. "Tobacco taxes and starting and quitting smoking : does the effect differ by education?," Open Access publications 10197/785, School of Economics, University College Dublin.
    7. Boulaga,Amadou Adamou Kalilou & Federighi,Giovanni & Hiernaux, Pierre & Zezza,Alberto & Boulaga,Amadou Adamou Kalilou & Federighi,Giovanni & Hiernaux, Pierre & Zezza,Alberto, 2014. "Milking the data : measuring income from milk production in extensive livestock systems -- experimental evidence from Niger," Policy Research Working Paper Series 7114, The World Bank.
    8. Yang, Zhenlin & Tsui, Albert K., 2004. "Analytically calibrated Box-Cox percentile limits for duration and event-time models," Insurance: Mathematics and Economics, Elsevier, vol. 35(3), pages 649-677, December.
    9. Kraus, Florian & Steiner, Viktor, 1995. "Modelling heaping effects in unemployment duration models - With an application to retrospective event data in the German socio-economic panel," ZEW Discussion Papers 95-09, ZEW - Leibniz Centre for European Economic Research.
    10. Ryu, Hang K. & Slottje, Daniel J., 2000. "Estimating the density of unemployment duration based on contaminated samples or small samples," Journal of Econometrics, Elsevier, vol. 95(1), pages 131-156, March.
    11. Ugo Trivellato, 1999. "Issues in the Design and Analysis of Panel Studies: A Cursory Review," Quality & Quantity: International Journal of Methodology, Springer, vol. 33(3), pages 339-351, August.
    12. Ragui Assaad & Caroline Krafft & Shaimaa Yassin, 2018. "Comparing retrospective and panel data collection methods to assess labor market dynamics," IZA Journal of Migration and Development, Springer;Forschungsinstitut zur Zukunft der Arbeit GmbH (IZA), vol. 8(1), pages 1-34, December.
    13. Michele Lalla & Francesco Pattarin, 2001. "Unemployment Duration: An Analysis of Incomplete, Completed, and Multiple Spells in Emilia-Romagna," Quality & Quantity: International Journal of Methodology, Springer, vol. 35(2), pages 203-230, May.
    14. Carletto, Calogero & Savastano, Sara & Zezza, Alberto, 2013. "Fact or artifact: The impact of measurement errors on the farm size–productivity relationship," Journal of Development Economics, Elsevier, vol. 103(C), pages 254-261.
    15. Romeo, Charles J, 1999. "Conducting Inference in Semiparametric Duration Models under Inequality Restrictions on the Shape of the Hazard Implied by Job Search Theory," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 14(6), pages 587-605, Nov.-Dec..
    16. Bruno Contini & Roberto Quaranta, 2019. "Is Long-Term Non-employment a Lifetime Disease?," Italian Economic Journal: A Continuation of Rivista Italiana degli Economisti and Giornale degli Economisti, Springer;Società Italiana degli Economisti (Italian Economic Association), vol. 5(1), pages 79-102, March.
    17. Senakpon, Kokoye, 2017. "Farmers’ Willingness To Pay For Soil Testing Service In Northern Haiti," 2017 Annual Meeting, February 4-7, 2017, Mobile, Alabama 252804, Southern Agricultural Economics Association.
    18. Page, Ian B. & Lichtenberg, Erik & Saavoss, Monica, 2015. "Estimating Recreation Demand When Survey Responses are Rounded," 2015 AAEA & WAEA Joint Annual Meeting, July 26-28, San Francisco, California 205653, Agricultural and Applied Economics Association.
    19. Petoussis, Kos & Gill, Richard & Zeelenberg, Kees, 1997. "Statistical analysis of heaped duration data," MPRA Paper 89263, University Library of Munich, Germany.
    20. Athanasakou, Vasiliki & Simpson, Ana, 2016. "Investor attention to rounding as a salient forecast feature," International Journal of Forecasting, Elsevier, vol. 32(4), pages 1212-1233.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:japsta:v:43:y:2016:i:4:p:682-703. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Longhurst (email available below). General contact details of provider: http://www.tandfonline.com/CJAS20 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.