IDEAS home Printed from https://ideas.repec.org/p/ces/ifowps/_409.html
   My bibliography  Save this paper

Machine Learning Based Linkage of Company Data for Economic Research: Application to the EBDC Business Panels

Author

Listed:
  • Valentin Reich

Abstract

This article presents a comprehensive approach to probabilistic linkage of German com pany data using Machine Learning and Natural Language Processing techniques. Here, the long-running ifo Institute surveys are linked to fnancial information in the Orbis database by addressing the unique challenges of company data linkage, such as corporate structures and linguistic nuances in company names. Compared to a previous linkage, the approach achieves improved match rates and is able to re-evaluate existing matches. This article contributes best practice advice for company data linkage and serves as a documentation for the resulting research dataset.

Suggested Citation

  • Valentin Reich, 2024. "Machine Learning Based Linkage of Company Data for Economic Research: Application to the EBDC Business Panels," ifo Working Paper Series 409, ifo Institute - Leibniz Institute for Economic Research at the University of Munich.
  • Handle: RePEc:ces:ifowps:_409
    as

    Download full text from publisher

    File URL: https://www.ifo.de/DocDL/wp-2024-409_reich_linkage-of-company-data.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Link, Sebastian & Peichl, Andreas & Roth, Christopher & Wohlfart, Johannes, 2023. "Information frictions among firms and households," Journal of Monetary Economics, Elsevier, vol. 135(C), pages 99-115.
    2. Sebastian Link, 2018. "Harmonization and Interpretation of the ifo Business Survey's Micro Data," CESifo Working Paper Series 7427, CESifo.
    3. Zeno Enders & Franziska Hünnekes & Gernot Müller, 2022. "Firm Expectations and Economic Activity," Journal of the European Economic Association, European Economic Association, vol. 20(6), pages 2396-2439.
    4. Bruce D. Meyer & Nikolas Mittag, 2019. "Using Linked Survey and Administrative Data to Better Measure Income: Implications for Poverty, Program Effectiveness, and Holes in the Safety Net," American Economic Journal: Applied Economics, American Economic Association, vol. 11(2), pages 176-204, April.
    5. Anna Gumpert & Henrike Steimer & Manfred Antoni, 2022. "Firm Organization with Multiple Establishments [“Organizing Offshoring: Middle Managers and Communication Costs]," The Quarterly Journal of Economics, Oxford University Press, vol. 137(2), pages 1091-1138.
    6. Kilian Huber, 2018. "Disentangling the Effects of a Banking Crisis: Evidence from German Firms and Counties," American Economic Review, American Economic Association, vol. 108(3), pages 868-898, March.
    7. Jamie C. Moore & Peter W. F. Smith & Gabriele B. Durrant, 2018. "Correlates of record linkage and estimating risks of non‐linkage biases in business data sets," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 181(4), pages 1211-1230, October.
    8. Abowd, John M. & Vilhuber, Lars, 2005. "The Sensitivity of Economic Statistics to Coding Errors in Personal Identifiers," Journal of Business & Economic Statistics, American Statistical Association, vol. 23, pages 133-152, April.
    9. John Cuffe & Nathan Goldschlag, 2018. "Squeezing More Out of Your Data: Business Record Linkage with Python," Working Papers 18-46, Center for Economic Studies, U.S. Census Bureau.
    10. Michele Peruzzi & Georg Zachmann & Reinhilde Veugelers, 2014. "Remerge- regression-based record linkage with an application to PATSTAT," Working Papers 852, Bruegel.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Stefan Sauer & Klaus Wohlrabe, 2024. "What Is Behind the ifo Business Climate? Evidence from a Meta-Survey," CESifo Working Paper Series 11482, CESifo.
    2. Link Sebastian, 2020. "Harmonization of the ifo Business Survey’s Micro Data," Journal of Economics and Statistics (Jahrbuecher fuer Nationaloekonomie und Statistik), De Gruyter, vol. 240(4), pages 543-555, August.
    3. John Abowd & Martha Stinson, 2011. "Estimating Measurement Error in SIPP Annual Job Earnings: A Comparison of Census Bureau Survey and SSA Administrative Data," Working Papers 11-20, Center for Economic Studies, U.S. Census Bureau.
    4. Pierre Mabille, 2019. "Aggregate Precautionary Savings Motives," 2019 Meeting Papers 344, Society for Economic Dynamics.
    5. Ma, Yongfan & Hu, Xingcun, 2024. "Shadow banking and SME investment: Evidence from China's new asset management regulations," International Review of Economics & Finance, Elsevier, vol. 93(PA), pages 332-349.
    6. Michael Berlemann & Vera Jahn & Robert Lehmann, 2018. "Auswege aus dem Dilemma der empirischen Mittelstandsforschung," ifo Schnelldienst, ifo Institute - Leibniz Institute for Economic Research at the University of Munich, vol. 71(23), pages 22-28, December.
    7. Chen, Cheng & Senga, Tatsuro & Sun, Chang & Zhang, Hongyong, 2023. "Uncertainty, imperfect information, and expectation formation over the firm’s life cycle," Journal of Monetary Economics, Elsevier, vol. 140(C), pages 60-77.
    8. Straub, Ludwig & Ulbricht, Robert, 2015. "Endogenous Uncertainty and Credit Crunches," TSE Working Papers 15-604, Toulouse School of Economics (TSE), revised Dec 2017.
    9. Bustos, Emil, 2023. "The Effect of Financial Constraints on Inventory Holdings," Working Paper Series 1463, Research Institute of Industrial Economics.
    10. Bailey, Warren & Muradoglu, Gulnur & Onay, Ceylan & Phylaktis, Kate, 2024. "Foreign investors, firm level productivity, and European economic integration," Journal of Corporate Finance, Elsevier, vol. 85(C).
    11. Fetzer, Thiemo & Yotzov, Ivan, 2023. "(How) Do electoral surprises drive business cycles? Evidence from a new dataset," CAGE Online Working Paper Series 672, Competitive Advantage in the Global Economy (CAGE).
    12. Sebastian Doerr & Stefan Gissler & José‐Luis Peydró & Hans‐Joachim Voth, 2022. "Financial Crises and Political Radicalization: How Failing Banks Paved Hitler's Path to Power," Journal of Finance, American Finance Association, vol. 77(6), pages 3339-3372, December.
    13. Hicks, Jeffrey & Simard-Duplain, Gaëlle & Green, David A. & Warburton, William, 2022. "The effect of reducing welfare access on employment, health, and children's long-run outcomes," CLEF Working Paper Series 51, Canadian Labour Economics Forum (CLEF), University of Waterloo.
    14. Rhys Bidder & John Krainer & Adam Shapiro, 2021. "De-leveraging or de-risking? How banks cope with loss," Review of Economic Dynamics, Elsevier for the Society for Economic Dynamics, vol. 39, pages 100-127, January.
    15. Jarko Fidrmuc & Christa Hainz & Werner Hölzl, 2024. "Individual credit market experience and beliefs about bank lending policy: evidence from a firm survey," Scandinavian Journal of Economics, Wiley Blackwell, vol. 126(2), pages 387-414, April.
    16. Ted Mouw, 2016. "The Impact of Immigration on the Labor Market Outcomes of Native Workers: Evidence using Longitudinal Data from the LEHD," Working Papers 16-56, Center for Economic Studies, U.S. Census Bureau.
    17. Wehrhöfer, Nils, 2023. "Energy prices and inflation expectations: Evidence from households and firms," Discussion Papers 28/2023, Deutsche Bundesbank.
    18. Weber, Michael & Candia, Bernardo & Ropele, Tiziano & Lluberas, Rodrigo & Frache, Serafin & Meyer, Brent & Kumar, Saten & Gorodnichenko, Yuriy & Georgarakos, Dimitris & Coibion, Olivier & Kenny, Geoff, 2023. "Tell Me Something I don't Already Know: Learning in Low and High-inflation Settings," CEPR Discussion Papers 18299, C.E.P.R. Discussion Papers.
    19. Oliver Rehbein & Simon Rother, 2020. "The Role of Social Networks in Bank Lending," ECONtribute Discussion Papers Series 033, University of Bonn and University of Cologne, Germany.
    20. Nöller, Marvin & Balleer, Almut, 2023. "Monetary Policy in the Presence of Supply Constraints: Evidence from German Firm-level Data," VfS Annual Conference 2023 (Regensburg): Growth and the "sociale Frage" 277638, Verein für Socialpolitik / German Economic Association.

    More about this item

    Keywords

    record linkage; company data; orbis; survey data;
    All these keywords.

    JEL classification:

    • C81 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Methodology for Collecting, Estimating, and Organizing Microeconomic Data; Data Access
    • C88 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Other Computer Software

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ces:ifowps:_409. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Klaus Wohlrabe (email available below). General contact details of provider: https://edirc.repec.org/data/ifooode.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.