IDEAS home Printed from https://ideas.repec.org/p/cen/wpaper/19-08.html
   My bibliography  Save this paper

Optimal Probabilistic Record Linkage: Best Practice for Linking Employers in Survey and Administrative Data

Author

Listed:
  • John M. Abowd
  • Joelle Abramowitz
  • Margaret C. Levenstein
  • Kristin McCue
  • Dhiren Patki
  • Trivellore Raghunathan
  • Ann M. Rodgers
  • Matthew D. Shapiro
  • Nada Wasi

Abstract

This paper illustrates an application of record linkage between a household-level survey and an establishment-level frame in the absence of unique identifiers. Linkage between frames in this setting is challenging because the distribution of employment across firms is highly asymmetric. To address these difficulties, this paper uses a supervised machine learning model to probabilistically link survey respondents in the Health and Retirement Study (HRS) with employers and establishments in the Census Business Register (BR) to create a new data source which we call the CenHRS. Multiple imputation is used to propagate uncertainty from the linkage step into subsequent analyses of the linked data. The linked data reveal new evidence that survey respondents’ misreporting and selective nonresponse about employer characteristics are systematically correlated with wages.

Suggested Citation

  • John M. Abowd & Joelle Abramowitz & Margaret C. Levenstein & Kristin McCue & Dhiren Patki & Trivellore Raghunathan & Ann M. Rodgers & Matthew D. Shapiro & Nada Wasi, 2019. "Optimal Probabilistic Record Linkage: Best Practice for Linking Employers in Survey and Administrative Data," Working Papers 19-08, Center for Economic Studies, U.S. Census Bureau.
  • Handle: RePEc:cen:wpaper:19-08
    as

    Download full text from publisher

    File URL: https://www2.census.gov/ces/wp/2019/CES-WP-19-08.pdf
    File Function: First version, 2019
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. John M. Abowd & Martha H. Stinson, 2013. "Estimating Measurement Error in Annual Job Earnings: A Comparison of Survey and Administrative Data," The Review of Economics and Statistics, MIT Press, vol. 95(5), pages 1451-1467, December.
    2. Brown, Charles & Medoff, James, 1989. "The Employer Size-Wage Effect," Journal of Political Economy, University of Chicago Press, vol. 97(5), pages 1027-1059, October.
    3. John M. Abowd & Bryce E. Stephens & Lars Vilhuber & Fredrik Andersson & Kevin L. McKinney & Marc Roemer & Simon Woodcock, 2009. "The LEHD Infrastructure Files and the Creation of the Quarterly Workforce Indicators," NBER Chapters, in: Producer Dynamics: New Evidence from Micro Data, pages 149-230, National Bureau of Economic Research, Inc.
    4. Ron S Jarmin & Javier Miranda, 2002. "The Longitudinal Business Database," Working Papers 02-17, Center for Economic Studies, U.S. Census Bureau.
    5. Andrea Tancredi & Brunero Liseo, 2015. "Regression analysis with linked data: problems and possible solutions," Statistica, Department of Statistics, University of Bologna, vol. 75(1), pages 19-35.
    6. Martha J. Bailey & Connor Cole & Morgan Henderson & Catherine Massey, 2020. "How Well Do Automated Linking Methods Perform? Lessons from US Historical Data," Journal of Economic Literature, American Economic Association, vol. 58(4), pages 997-1044, December.
    7. P. Lahiri & Michael D. Larsen, 2005. "Regression Analysis With Linked Data," Journal of the American Statistical Association, American Statistical Association, vol. 100, pages 222-230, March.
    8. Kevin L. McKinney & Andrew S. Green & Lars Vilhuber & John M. Abowd, 2017. "Total Error and Variability Measures with Integrated Disclosure Limitation for Quarterly Workforce Indicators and LEHD Origin Destination Employment Statistics in On The Map," Working Papers 17-71, Center for Economic Studies, U.S. Census Bureau.
    9. Oi, Walter Y. & Idson, Todd L., 1999. "Firm size and wages," Handbook of Labor Economics, in: O. Ashenfelter & D. Card (ed.), Handbook of Labor Economics, edition 1, volume 3, chapter 33, pages 2165-2214, Elsevier.
    10. Nicholas Bloom & Fatih Guvenen & Benjamin S. Smith & Jae Song & Till von Wachter, 2018. "The Disappearing Large-Firm Wage Premium," AEA Papers and Proceedings, American Economic Association, vol. 108, pages 317-322, May.
    11. Hui Zou & Trevor Hastie, 2005. "Addendum: Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(5), pages 768-768, November.
    12. Timothy Dunne & J. Bradford Jensen & Mark J. Roberts, 2009. "Producer Dynamics: New Evidence from Micro Data," NBER Books, National Bureau of Economic Research, Inc, number dunn05-1, July.
    13. Hui Zou & Trevor Hastie, 2005. "Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(2), pages 301-320, April.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Nada Wasi & Sasiwimon Warunsiri Paweenawat & Chinnawat Devahastin Na Ayudhya & Pucktada Treeratpituk & Chommanart Nittayo, 2019. "Labor Income Inequality in Thailand: the Roles of Education, Occupation and Employment History," PIER Discussion Papers 117, Puey Ungphakorn Institute for Economic Research.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. John M. Abowd & Joelle Abramowitz & Margaret C. Levenstein & Kristin McCue & Dhiren Patki & Trivellore Raghunathan & Ann M. Rodgers & Matthew D. Shapiro & Nada Wasi & Dawn Zinsser, 2021. "Finding Needles in Haystacks: Multiple-Imputation Record Linkage Using Machine Learning," Working Papers 21-35, Center for Economic Studies, U.S. Census Bureau.
    2. Nicholas Bloom & Scott W. Ohlmacher & Cristina J. Tello-Trillo & Melanie Wallskog, 2021. "Pay, Productivity and Management," NBER Working Papers 29377, National Bureau of Economic Research, Inc.
    3. Babina, Tania & Ma, Wenting & Moser, Christian & Ouimet, Paige & Zarutskie, Rebecca, 2019. "Pay, Employment, and Dynamics of Young Firms," MPRA Paper 95382, University Library of Munich, Germany.
    4. Ouimet, Paige & Zarutskie, Rebecca, 2014. "Who works for startups? The relation between firm age, employee age, and growth," Journal of Financial Economics, Elsevier, vol. 112(3), pages 386-407.
    5. Henry Hyatt & Erika McEntarfer & John Haltiwanger, 2014. "Cyclical Reallocation of Workers Across Large and Small Employers," 2014 Meeting Papers 735, Society for Economic Dynamics.
    6. C. J. Krizan & Adela Luque & Alice Zawacki, 2014. "The Effect Of Employer Health Insurance Offering On The Growth And Survival Of Small Business Prior To The Affordable Care Act," Working Papers 14-22, Center for Economic Studies, U.S. Census Bureau.
    7. Brianna Cardiff-Hicks & Francine Lafontaine & Kathryn Shaw, 2015. "Do Large Modern Retailers Pay Premium Wages?," ILR Review, Cornell University, ILR School, vol. 68(3), pages 633-665, May.
    8. Emin Dinlersoz & Henry Hyatt & Hubert Janicki, 2019. "Who Works for Whom? Worker Sorting in a Model of Entrepreneurship with Heterogeneous Labor Markets," Review of Economic Dynamics, Elsevier for the Society for Economic Dynamics, vol. 34, pages 244-266, October.
    9. Melanie Jones & Ezgi Kaya, 2023. "The UK gender pay gap: Does firm size matter?," Economica, London School of Economics and Political Science, vol. 90(359), pages 937-952, July.
    10. Egger, Hartmut & Jahn, Elke & Kornitzky, Stefan, 2022. "How does the position in business group hierarchies affect workers’ wages?," Journal of Economic Behavior & Organization, Elsevier, vol. 194(C), pages 244-263.
    11. Jaime Arellano-Bover, 2024. "Career Consequences of Firm Heterogeneity for Young Workers: First Job and Firm Size," Journal of Labor Economics, University of Chicago Press, vol. 42(2), pages 549-589.
    12. Paige Ouimet & Rebecca Zarutskie, 2011. "Who Works for Startups? The Relation between Firm Age, Employee Age, and Growth," Working Papers 11-31, Center for Economic Studies, U.S. Census Bureau.
    13. Hartmut Egger & Elke Jahn & Stefan Kornitzky, 2021. "How Does the Position in Business Group Hierarchies Affect Workers’ Wages?," Working Papers 213, Bavarian Graduate Program in Economics (BGPE).
    14. Jahangir Alam M. & Dostie Benoit & Drechsler Jörg & Vilhuber Lars, 2020. "Applying data synthesis for longitudinal business data across three countries," Statistics in Transition New Series, Polish Statistical Association, vol. 21(4), pages 212-236, August.
    15. Jahn, Elke & Egger, Hartmut & Kornitzky, Stefan, 2021. "Does the Position in Business Group Hierarchies Affect Workers' Wages?," VfS Annual Conference 2021 (Virtual Conference): Climate Economics 242374, Verein für Socialpolitik / German Economic Association.
    16. Fariha Kamal & Asha Sundaram & Cristina J. Tello-Trillo, 2020. "Family-Leave Mandates and Female Labor at U.S. Firms: Evidence from a Trade Shock," Working Papers 20-25, Center for Economic Studies, U.S. Census Bureau.
    17. Paige Ouimet & Rebecca Zarutskie, 2011. "Acquiring Labor," Working Papers 11-32, Center for Economic Studies, U.S. Census Bureau.
    18. John M. Abowd & Ian M. Schmutte & Lars Vilhuber, 2018. "Disclosure Limitation and Confidentiality Protection in Linked Data," Working Papers 18-07, Center for Economic Studies, U.S. Census Bureau.
    19. Tutz, Gerhard & Pößnecker, Wolfgang & Uhlmann, Lorenz, 2015. "Variable selection in general multinomial logit models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 207-222.
    20. Carstensen, Kai & Heinrich, Markus & Reif, Magnus & Wolters, Maik H., 2020. "Predicting ordinary and severe recessions with a three-state Markov-switching dynamic factor model," International Journal of Forecasting, Elsevier, vol. 36(3), pages 829-850.

    More about this item

    Keywords

    Probabilistic record linkage; survey data; administrative data; multiple imputation; measurement error; nonresponse;
    All these keywords.

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:cen:wpaper:19-08. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Dawn Anderson (email available below). General contact details of provider: https://edirc.repec.org/data/cesgvus.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.