IDEAS home Printed from https://ideas.repec.org/p/trr/wpaper/202203.html
   My bibliography  Save this paper

Evaluating Data Fusion Methods to Improve Income Modelling

Author

Listed:
  • Jana Emmenegger
  • Ralf Münnich
  • Jannik Schaller

Abstract

Income is an important economic indicator to measure living standards and individual well-being. In Germany, there exist different data sources that yield ambiguous evidence when analysing the income distribution. The Tax Statistics (TS) – an income register recording the total population of more than 40 million taxpayers in Germany for the year 2014 − contains the most reliable income information covering the full income distribution. However, it offers only a limited range of socio-demographic variables essential for income analysis. We tackle this challenge by enriching the tax data with information on education and working time from the Microcensus. For that purpose, we ex- amine two types of data fusion methods that seem suited for the specific data fusion scenario of the Tax Statistics and the Microcensus: Missing-data methods on the one hand and performant prediction models on the other hand. We conduct a simulation study and provide an empirical application comparing the proposed data fusion methods, and our results indicate that Multinomial Regression and Random Forest are the most suitable methods for our data fusion scenario.

Suggested Citation

  • Jana Emmenegger & Ralf Münnich & Jannik Schaller, 2022. "Evaluating Data Fusion Methods to Improve Income Modelling," Research Papers in Economics 2022-03, University of Trier, Department of Economics.
  • Handle: RePEc:trr:wpaper:202203
    as

    Download full text from publisher

    File URL: http://www.uni-trier.de/fileadmin/fb4/prof/VWL/EWF/Research_Papers/2022-03.pdf
    File Function: First version, 2022
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Ravallion, Martin & Chen, Shaohua, 1997. "What Can New Survey Data Tell Us about Recent Changes in Distribution and Poverty?," The World Bank Economic Review, World Bank, vol. 11(2), pages 357-382, May.
    2. Wright, Marvin N. & Ziegler, Andreas, 2017. "ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 77(i01).
    3. van Buuren, Stef & Groothuis-Oudshoorn, Karin, 2011. "mice: Multivariate Imputation by Chained Equations in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 45(i03).
    4. Rubin, Donald B, 1986. "Statistical Matching Using File Concatenation with Adjusted Weights and Multiple Imputations," Journal of Business & Economic Statistics, American Statistical Association, vol. 4(1), pages 87-94, January.
    5. Thomas Blanchet & Juliette Fournier & Thomas Piketty, 2022. "Generalized Pareto Curves: Theory and Applications," Review of Income and Wealth, International Association for Research in Income and Wealth, vol. 68(1), pages 263-288, March.
    6. Rodgers, Willard L, 1984. "An Evaluation of Statistical Matching," Journal of Business & Economic Statistics, American Statistical Association, vol. 2(1), pages 91-102, January.
    7. Cowell, F.A., 2000. "Measurement of inequality," Handbook of Income Distribution, in: A.B. Atkinson & F. Bourguignon (ed.), Handbook of Income Distribution, edition 1, volume 1, chapter 2, pages 87-166, Elsevier.
    8. Little, Roderick J A, 1988. "Missing-Data Adjustments in Large Surveys," Journal of Business & Economic Statistics, American Statistical Association, vol. 6(3), pages 287-296, July.
    9. repec:hal:pseose:halshs-01157487 is not listed on IDEAS
    10. Charlotte Bartels & Maria Metzing, 2019. "An integrated approach for a top-corrected income distribution," The Journal of Economic Inequality, Springer;Society for the Study of Economic Inequality, vol. 17(2), pages 125-143, June.
    11. Michał Brzeziński & Michał Myck & Mateusz Najsztub, 2019. "Reevaluating distributional consequences of the transition to market economy in Poland: new results from combined household survey and tax return data," Working Papers 2019-18, Faculty of Economic Sciences, University of Warsaw.
    12. Anthony B Atkinson & François Bourguignon, 2014. "Handbook of Income Distribution," Post-Print halshs-02923231, HAL.
    13. Benjamin Okner, 1972. "Constructing a New Data Base from Existing Microdata Sets: The 1966 Merge File," NBER Chapters, in: Annals of Economic and Social Measurement, Volume 1, number 3, pages 325-362, National Bureau of Economic Research, Inc.
    14. Thomas Piketty, 2015. "About Capital in the Twenty-First Century," American Economic Review, American Economic Association, vol. 105(5), pages 48-53, May.
    15. Little, Roderick J A, 1988. "Missing-Data Adjustments in Large Surveys: Reply," Journal of Business & Economic Statistics, American Statistical Association, vol. 6(3), pages 300-301, July.
    16. Stefan Bach & Giacomo Corneo & Viktor Steiner, 2009. "From Bottom To Top: The Entire Income Distribution In Germany, 1992–2003," Review of Income and Wealth, International Association for Research in Income and Wealth, vol. 55(2), pages 303-330, June.
    17. Jacob Mincer, 1958. "Investment in Human Capital and Personal Income Distribution," Journal of Political Economy, University of Chicago Press, vol. 66(4), pages 281-281.
    18. Stefan Angel & Franziska Disslbacher & Stefan Humer & Matthias Schnetzer, 2019. "What did you really earn last year?: explaining measurement error in survey income data," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 182(4), pages 1411-1437, October.
    19. Jonathan Haughton & Shahidur R. Khandker, 2009. "Handbook on Poverty and Inequality," World Bank Publications - Books, The World Bank Group, number 11985.
    20. Rebecca R. Andridge & Roderick J. A. Little, 2010. "A Review of Hot Deck Imputation for Survey Non‐response," International Statistical Review, International Statistical Institute, vol. 78(1), pages 40-64, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Emmenegger Jana & Münnich Ralf, 2023. "Localising the Upper Tail: How Top Income Corrections Affect Measures of Regional Inequality," Journal of Economics and Statistics (Jahrbuecher fuer Nationaloekonomie und Statistik), De Gruyter, vol. 243(3-4), pages 285-317, June.
    2. Saeideh Kamgar & Florian Meinfelder & Ralf Münnich & Hamidreza Navvabpour, 2020. "Estimation within the new integrated system of household surveys in Germany," Statistical Papers, Springer, vol. 61(5), pages 2091-2117, October.
    3. Anika Rasner & Joachim R. Frick & Markus M. Grabka, 2013. "Statistical Matching of Administrative and Survey Data," Sociological Methods & Research, , vol. 42(2), pages 192-224, May.
    4. Ralf Münnich & Siegfried Gabler & Christian Bruch & Jan Pablo Burgard & Tobias Enderle & Jan-Philipp Kolb & Thomas Zimmermann, 2015. "Tabellenauswertungen im Zensus unter Berücksichtigung fehlender Werte," AStA Wirtschafts- und Sozialstatistisches Archiv, Springer;Deutsche Statistische Gesellschaft - German Statistical Society, vol. 9(3), pages 269-304, December.
    5. Chenyang Gu & Roee Gutman, 2017. "Combining item response theory with multiple imputation to equate health assessment questionnaires," Biometrics, The International Biometric Society, vol. 73(3), pages 990-998, September.
    6. Rasner, Anika & Frick, Joachim R. & Grabka, Markus M., 2013. "Statistical Matching of Administrative and Survey Data: An Application to Wealth Inequality Analysis," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 42(2), pages 192-224.
    7. Chia-Ning Wang & Roderick Little & Bin Nan & Siobán D. Harlow, 2011. "A Hot-Deck Multiple Imputation Procedure for Gaps in Longitudinal Recurrent Event Histories," Biometrics, The International Biometric Society, vol. 67(4), pages 1573-1582, December.
    8. Adel Bosch & Steven F. Koch, 2021. "Individual and Household Debt: Does Imputation Choice Matter?," Working Papers 202141, University of Pretoria, Department of Economics.
    9. Mingyang Cai & Gerko Vink, 2022. "A note on imputing squares via polynomial combination approach," Computational Statistics, Springer, vol. 37(5), pages 2185-2201, November.
    10. Shu Yang & Jae Kwang Kim, 2020. "Asymptotic theory and inference of predictive mean matching imputation using a superpopulation model framework," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 47(3), pages 839-861, September.
    11. Joost Ginkel & Pieter Kroonenberg, 2014. "Using Generalized Procrustes Analysis for Multiple Imputation in Principal Component Analysis," Journal of Classification, Springer;The Classification Society, vol. 31(2), pages 242-269, July.
    12. Gerko Vink & Laurence E. Frank & Jeroen Pannekoek & Stef Buuren, 2014. "Predictive mean matching imputation of semicontinuous variables," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 68(1), pages 61-90, February.
    13. Brownstone, David, 1997. "Multiple Imputation Methodology for Missing Data, Non-Random Response, and Panel Attrition," University of California Transportation Center, Working Papers qt2zd6w6hh, University of California Transportation Center.
    14. Westermeier, Christian & Grabka, Markus M., 2016. "Longitudinal Wealth Data and Multiple Imputation: An Evaluation Study," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 10(3), pages 237-252.
    15. Youngjoo Cho & Debashis Ghosh, 2021. "Quantile-Based Subgroup Identification for Randomized Clinical Trials," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 13(1), pages 90-128, April.
    16. Ahfock, Daniel & Pyne, Saumyadipta & McLachlan, Geoffrey J., 2022. "Statistical file-matching of non-Gaussian data: A game theoretic approach," Computational Statistics & Data Analysis, Elsevier, vol. 168(C).
    17. Yanqing Sun & Li Qi & Fei Heng & Peter B. Gilbert, 2020. "A hybrid approach for the stratified mark‐specific proportional hazards model with missing covariates and missing marks, with application to vaccine efficacy trials," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 69(4), pages 791-814, August.
    18. Arif Mamun & David Wittenburg & Noelle Denny-Brown & Michael Levere & David Mann & Rebecca Coughlin & Sarah Croake & Heather Gordon & Denise Hoffman & Rachel Holzwart & Rosalind Keith & Brittany McGil, "undated". "Promoting Opportunity Demonstration: Interim Evaluation Report," Mathematica Policy Research Reports caa99d38a8b14f968ea3438e5, Mathematica Policy Research.
    19. Miguel Szekely & Nora Lustig & Martin Cumpa & Jose Antonio Mejia, 2004. "Do we know how much poverty there is?," Oxford Development Studies, Taylor & Francis Journals, vol. 32(4), pages 523-558.
    20. Gowri Gopalakrishna & Gerben ter Riet & Gerko Vink & Ineke Stoop & Jelte M Wicherts & Lex M Bouter, 2022. "Prevalence of questionable research practices, research misconduct and their potential explanatory factors: A survey among academic researchers in The Netherlands," PLOS ONE, Public Library of Science, vol. 17(2), pages 1-16, February.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:trr:wpaper:202203. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Matthias Neuenkirch (email available below). General contact details of provider: https://edirc.repec.org/data/petride.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.