IDEAS home Printed from https://ideas.repec.org/a/taf/vhimxx/v50y2017i3p129-143.html
   My bibliography  Save this article

Playing with matches: An assessment of accuracy in linked historical data

Author

Listed:
  • Catherine G. Massey

Abstract

This article evaluates linkage quality achieved by various record linkage techniques used in historical demography. The author creates benchmark, or truth, data by linking the 2005 Current Population Survey Annual Social and Economic Supplement to the Social Security Administration's numeric identification system by social security number. By comparing simulated linkages to the benchmark data, she examines the value added (in terms of number and quality of links) from incorporating text-string comparators, adjusting age, and using a probabilistic matching algorithm. She finds that text-string comparators and probabilistic approaches are useful for increasing the linkage rate, but use of text-string comparators may decrease accuracy in some cases. Overall, probabilistic matching offers the best balance between linkage rates and accuracy.

Suggested Citation

  • Catherine G. Massey, 2017. "Playing with matches: An assessment of accuracy in linked historical data," Historical Methods: A Journal of Quantitative and Interdisciplinary History, Taylor & Francis Journals, vol. 50(3), pages 129-143, July.
  • Handle: RePEc:taf:vhimxx:v:50:y:2017:i:3:p:129-143
    DOI: 10.1080/01615440.2017.1288598
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1080/01615440.2017.1288598
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1080/01615440.2017.1288598?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Ran Abramitzky & Roy Mill & Santiago Pérez, 2020. "Linking individuals across historical sources: A fully automated approach," Historical Methods: A Journal of Quantitative and Interdisciplinary History, Taylor & Francis Journals, vol. 53(2), pages 94-111, April.
    2. Joseph Price & Kasey Buckles & Jacob Van Leeuwen & Isaac Riley, 2019. "Combining Family History and Machine Learning to Link Historical Records," NBER Working Papers 26227, National Bureau of Economic Research, Inc.
    3. Auke Rijpma & Jeanne Cilliers & Johan Fourie, 2020. "Record linkage in the Cape of Good Hope Panel," Historical Methods: A Journal of Quantitative and Interdisciplinary History, Taylor & Francis Journals, vol. 53(2), pages 112-129, April.
    4. Martha J. Bailey & Connor Cole & Morgan Henderson & Catherine Massey, 2020. "How Well Do Automated Linking Methods Perform? Lessons from US Historical Data," Journal of Economic Literature, American Economic Association, vol. 58(4), pages 997-1044, December.
    5. Dahl, Christian M. & Johansen, Torben S.D. & Sørensen, Emil N. & Wittrock, Simon, 2023. "HANA: A handwritten name database for offline handwritten text recognition," Explorations in Economic History, Elsevier, vol. 87(C).
    6. Giacomin Favre, 2019. "Bias in social mobility estimates with historical data: evidence from Swiss microdata," ECON - Working Papers 329, Department of Economics - University of Zurich.
    7. Alexander, Rohan & Ward, Zachary, 2018. "Age at Arrival and Assimilation During the Age of Mass Migration," The Journal of Economic History, Cambridge University Press, vol. 78(3), pages 904-937, September.
    8. Bennett, Robert J. & Montebruno, Piero & Van Lieshout, Carry & Smith, Harry, 2022. "Business entry and exit: career changes of proprietors in England and Wales (1851-81) using record-linkage," LSE Research Online Documents on Economics 113867, London School of Economics and Political Science, LSE Library.
    9. Price, Joseph & Buckles, Kasey & Van Leeuwen, Jacob & Riley, Isaac, 2021. "Combining family history and machine learning to link historical records: The Census Tree data set," Explorations in Economic History, Elsevier, vol. 80(C).
    10. Anbinder, Tyler & Connor, Dylan & O Grada, Cormac & Wegge, Simone, 2021. "The Problem of False Positives in Automated Census Linking: Evidence from Nineteenth-Century New York's Irish Immigrants," CAGE Online Working Paper Series 568, Competitive Advantage in the Global Economy (CAGE).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:vhimxx:v:50:y:2017:i:3:p:129-143. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Longhurst (email available below). General contact details of provider: http://www.tandfonline.com/vhim20 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.