IDEAS home Printed from https://ideas.repec.org/p/zbw/safewp/398.html
   My bibliography  Save this paper

Entity matching with similarity encoding: A supervised learning recommendation framework for linking (big) data

Author

Listed:
  • Karapanagiotis, Pantelis
  • Liebald, Marius

Abstract

In this study, we introduce a novel entity matching (EM) framework. It com-bines state-of-the-art EM approaches based on Artificial Neural Networks (ANN) with a new similarity encoding derived from matching techniques that are preva-lent in finance and economics. Our framework is on-par or outperforms alternative end-to-end frameworks in standard benchmark cases. Because similarity encod-ing is constructed using (edit) distances instead of semantic similarities, it avoids out-of-vocabulary problems when matching dirty data. We highlight this property by applying an EM application to dirty financial firm-level data extracted from historical archives.

Suggested Citation

  • Karapanagiotis, Pantelis & Liebald, Marius, 2023. "Entity matching with similarity encoding: A supervised learning recommendation framework for linking (big) data," SAFE Working Paper Series 398, Leibniz Institute for Financial Research SAFE.
  • Handle: RePEc:zbw:safewp:398
    as

    Download full text from publisher

    File URL: https://www.econstor.eu/bitstream/10419/274537/1/185636643X.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Ran Abramitzky & Leah Boustan & Katherine Eriksson & James Feigenbaum & Santiago Pérez, 2021. "Automated Linking of Historical Data," Journal of Economic Literature, American Economic Association, vol. 59(3), pages 865-918, September.
    2. Martha J. Bailey & Connor Cole & Morgan Henderson & Catherine Massey, 2020. "How Well Do Automated Linking Methods Perform? Lessons from US Historical Data," Journal of Economic Literature, American Economic Association, vol. 58(4), pages 997-1044, December.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Sabet, Navid & Liebald, Marius & Friebel, Guido, 2022. "Terrorism and Voting: The Rise of Right-Wing Populism in Germany," CEPR Discussion Papers 17525, C.E.P.R. Discussion Papers.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Matthew Jaremski, 2020. "Today’s economic history and tomorrow’s scholars," Cliometrica, Springer;Cliometric Society (Association Francaise de Cliométrie), vol. 14(1), pages 169-180, January.
    2. Escamilla-Guerrero, David & Kosack, Edward & Ward, Zachary, 2021. "Life after crossing the border: Assimilation during the first Mexican mass migration," Explorations in Economic History, Elsevier, vol. 82(C).
    3. Hanlon, W.Walker & Heblich, Stephan, 2022. "History and urban economics," Regional Science and Urban Economics, Elsevier, vol. 94(C).
    4. Andreas Vortisch, 2023. "The impact of the Johnson–Reed Act on Filipino labor market outcomes," French Stata Users' Group Meetings 2023 12, Stata Users Group.
    5. Collins, William J. & Zimran, Ariell, 2019. "The economic assimilation of Irish Famine migrants to the United States," Explorations in Economic History, Elsevier, vol. 74(C).
    6. Dahl, Christian M. & Johansen, Torben S.D. & Sørensen, Emil N. & Wittrock, Simon, 2023. "HANA: A handwritten name database for offline handwritten text recognition," Explorations in Economic History, Elsevier, vol. 87(C).
    7. Berger, Thor & Engzell, Per & Eriksson, Björn & Molinder, Jakob, 2023. "Social Mobility in Sweden before the Welfare State," The Journal of Economic History, Cambridge University Press, vol. 83(2), pages 431-463, June.
    8. Baker, Richard B. & Blanchette, John & Eriksson, Katherine, 2020. "Long-Run Impacts of Agricultural Shocks on Educational Attainment: Evidence from the Boll Weevil," The Journal of Economic History, Cambridge University Press, vol. 80(1), pages 136-174, March.
    9. Obolensky, Marguerite & Tabellini, Marco & Taylor, Charles A., 2024. "Homeward Bound: How Migrants Seek Out Familiar Climates," IZA Discussion Papers 16710, Institute of Labor Economics (IZA).
    10. Abhishek Arora & Xinmei Yang & Shao-Yu Jheng & Melissa Dell, 2023. "Linking Representations with Multimodal Contrastive Learning," Papers 2304.03464, arXiv.org, revised Jun 2024.
    11. Sarah Tahamont & Zubin Jelveh & Aaron Chalfin & Shi Yan & Benjamin Hansen, 2019. "Administrative Data Linking and Statistical Power Problems in Randomized Experiments," NBER Working Papers 25657, National Bureau of Economic Research, Inc.
    12. Zachary Ward, 2023. "Intergenerational Mobility in American History: Accounting for Race and Measurement Error," American Economic Review, American Economic Association, vol. 113(12), pages 3213-3248, December.
    13. Philipp Ager & Leah Boustan & Katherine Eriksson, 2021. "The Intergenerational Effects of a Large Wealth Shock: White Southerners after the Civil War," American Economic Review, American Economic Association, vol. 111(11), pages 3767-3794, November.
    14. Krzysztof Karbownik & Anthony Wray, 2019. "Educational, Labor-market and Intergenerational Consequences of Poor Childhood Health," NBER Working Papers 26368, National Bureau of Economic Research, Inc.
    15. Elisa Jácome & Ilyana Kuziemko & Suresh Naidu, 2021. "Mobility for All: Representative Intergenerational Mobility Estimates over the 20th Century," Working Papers 302, Princeton University, Department of Economics, Center for Economic Policy Studies..
    16. Inwood, Kris & Oxley, Les & Roberts, Evan, 2022. "The mortality risk of being overweight in the twentieth century: Evidence from two cohorts of New Zealand men," Explorations in Economic History, Elsevier, vol. 86(C).
    17. Wolfgang Keller & Carol H. Shiue, 2023. "Intergenerational Mobility of Daughters and Marital Sorting: New Evidence from Imperial China," NBER Working Papers 31695, National Bureau of Economic Research, Inc.
    18. Marguerite Obolensky & Marco Tabellini & Charles Taylor, 2024. "Homeward Bound: How Migrants Seek Out Familiar Climates," RF Berlin - CReAM Discussion Paper Series 2401, Rockwool Foundation Berlin (RF Berlin) - Centre for Research and Analysis of Migration (CReAM).
    19. Anna Aizer & Shari Eli & Adriana Lleras-Muney, 2020. "The Incentive Effects of Cash Transfers to the Poor," NBER Working Papers 27523, National Bureau of Economic Research, Inc.
    20. Daniel Aaronson & Jonathan Davis & Karl Schulze, 2018. "Internal Immigrant Mobility in the Early 20th Century: Experimental Evidence from Galveston Immigrants," Working Paper Series WP-2018-4, Federal Reserve Bank of Chicago.

    More about this item

    Keywords

    Entity matching; Entity resolution; Database linking; Machine learning; Record resolution; Similarity encoding;
    All these keywords.

    JEL classification:

    • C8 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:zbw:safewp:398. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ZBW - Leibniz Information Centre for Economics (email available below). General contact details of provider: https://edirc.repec.org/data/csafede.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.