IDEAS home Printed from https://ideas.repec.org/p/nbr/nberwo/26227.html
   My bibliography  Save this paper

Combining Family History and Machine Learning to Link Historical Records

Author

Listed:
  • Joseph Price
  • Kasey Buckles
  • Jacob Van Leeuwen
  • Isaac Riley

Abstract

A key challenge for research on many questions in the social sciences is that it is difficult to link historical records in a way that allows investigators to observe people at different points in their life or across generations. In this paper, we develop a new approach that relies on millions of record links created by individual contributors to a large, public, wiki-style family tree. First, we use these “true” links to inform the decisions one needs to make when using traditional linking methods. Second, we use the links to construct a training data set for use in supervised machine learning methods. We describe the procedure we use and illustrate the potential of our approach by linking individuals across the 100% samples of the US decennial censuses from 1900, 1910, and 1920. We obtain an overall match rate of about 70 percent, with a false positive rate of about 12 percent. This combination of high match rate and accuracy represents a point beyond the current frontier for record linking methods.

Suggested Citation

  • Joseph Price & Kasey Buckles & Jacob Van Leeuwen & Isaac Riley, 2019. "Combining Family History and Machine Learning to Link Historical Records," NBER Working Papers 26227, National Bureau of Economic Research, Inc.
  • Handle: RePEc:nbr:nberwo:26227
    Note: CH DAE LS PE
    as

    Download full text from publisher

    File URL: http://www.nber.org/papers/w26227.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Catherine G. Massey, 2017. "Playing with matches: An assessment of accuracy in linked historical data," Historical Methods: A Journal of Quantitative and Interdisciplinary History, Taylor & Francis Journals, vol. 50(3), pages 129-143, July.
    2. Fouka, Vasiliki, 2019. "How Do Immigrants Respond to Discrimination? The Case of Germans in the US During World War I," American Political Science Review, Cambridge University Press, vol. 113(2), pages 405-422, May.
    3. Raj Chetty & Nathaniel Hendren, 2018. "The Impacts of Neighborhoods on Intergenerational Mobility II: County-Level Estimates," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 133(3), pages 1163-1228.
    4. Raj Chetty & Nathaniel Hendren, 2018. "The Impacts of Neighborhoods on Intergenerational Mobility I: Childhood Exposure Effects," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 133(3), pages 1107-1162.
    5. Collins, William J. & Wanamaker, Marianne H., 2015. "The Great Migration in Black and White: New Evidence on the Selection and Sorting of Southern Migrants," The Journal of Economic History, Cambridge University Press, vol. 75(4), pages 947-992, December.
    6. Ran Abramitzky & Roy Mill & Santiago Pérez, 2020. "Linking individuals across historical sources: A fully automated approach," Historical Methods: A Journal of Quantitative and Interdisciplinary History, Taylor & Francis Journals, vol. 53(2), pages 94-111, April.
    7. Ran Abramitzky & Leah Boustan & Katherine Eriksson & James Feigenbaum & Santiago Pérez, 2021. "Automated Linking of Historical Data," Journal of Economic Literature, American Economic Association, vol. 59(3), pages 865-918, September.
    8. Alexander, Rohan & Ward, Zachary, 2018. "Age at Arrival and Assimilation During the Age of Mass Migration," The Journal of Economic History, Cambridge University Press, vol. 78(3), pages 904-937, September.
    9. Beach, Brian & Ferrie, Joseph & Saavedra, Martin & Troesken, Werner, 2016. "Typhoid Fever, Water Quality, and Human Capital Formation," The Journal of Economic History, Cambridge University Press, vol. 76(1), pages 41-75, March.
    10. Ran Abramitzky & Leah Platt Boustan & Katherine Eriksson, 2014. "A Nation of Immigrants: Assimilation and Economic Outcomes in the Age of Mass Migration," Journal of Political Economy, University of Chicago Press, vol. 122(3), pages 467-506.
    11. Mary F. Evans & Eric Helland & Jonathan Klick & Ashwin Patel, 2016. "The Developmental Effect Of State Alcohol Prohibitions At The Turn Of The Twentieth Century," Economic Inquiry, Western Economic Association International, vol. 54(2), pages 762-777, April.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Combes, Pierre-Philippe & Gobillon, Laurent & Zylberberg, Yanos, 2022. "Urban economics in a historical perspective: Recovering data with machine learning," Regional Science and Urban Economics, Elsevier, vol. 94(C).
    2. Ran Abramitzky & Leah Boustan & Katherine Eriksson & James Feigenbaum & Santiago Pérez, 2021. "Automated Linking of Historical Data," Journal of Economic Literature, American Economic Association, vol. 59(3), pages 865-918, September.
    3. Joseph Price & Christian vom Lehn & Riley Wilson, 2020. "The Winners and Losers of Immigration: Evidence from Linked Historical Data," NBER Working Papers 27156, National Bureau of Economic Research, Inc.
    4. Dahl, Christian M. & Johansen, Torben S.D. & Sørensen, Emil N. & Wittrock, Simon, 2023. "HANA: A handwritten name database for offline handwritten text recognition," Explorations in Economic History, Elsevier, vol. 87(C).
    5. Abramitzky, Ran & Boustan, Leah & Catron, Peter & Connor, Dylan & Voigt, Rob, 2021. "Refugees without Assistance: English-Language Attainment and Economic Outcomes in the Early Twentieth Century," SocArXiv 429jp, Center for Open Science.
    6. Sarah Tahamont & Zubin Jelveh & Aaron Chalfin & Shi Yan & Benjamin Hansen, 2019. "Administrative Data Linking and Statistical Power Problems in Randomized Experiments," NBER Working Papers 25657, National Bureau of Economic Research, Inc.
    7. Jeremy K. Nguyen & Adam Karg & Abbas Valadkhani & Heath McDonald, 2022. "Predicting individual event attendance with machine learning: a ‘step-forward’ approach," Applied Economics, Taylor & Francis Journals, vol. 54(27), pages 3138-3153, June.
    8. Combes, Pierre-Philippe & Gobillon, Laurent & Zylberberg, Yanos, 2022. "Urban economics in a historical perspective: Recovering data with machine learning," Regional Science and Urban Economics, Elsevier, vol. 94(C).
    9. Price, Joseph & Buckles, Kasey & Van Leeuwen, Jacob & Riley, Isaac, 2021. "Combining family history and machine learning to link historical records: The Census Tree data set," Explorations in Economic History, Elsevier, vol. 80(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Price, Joseph & Buckles, Kasey & Van Leeuwen, Jacob & Riley, Isaac, 2021. "Combining family history and machine learning to link historical records: The Census Tree data set," Explorations in Economic History, Elsevier, vol. 80(C).
    2. Cavit Baran & Eric Chyn & Bryan A. Stuart, 2024. "The Great Migration and Educational Opportunity," American Economic Journal: Applied Economics, American Economic Association, vol. 16(3), pages 354-398, July.
    3. Collins, William J. & Zimran, Ariell, 2019. "The economic assimilation of Irish Famine migrants to the United States," Explorations in Economic History, Elsevier, vol. 74(C).
    4. Dahl, Christian M. & Johansen, Torben S.D. & Sørensen, Emil N. & Wittrock, Simon, 2023. "HANA: A handwritten name database for offline handwritten text recognition," Explorations in Economic History, Elsevier, vol. 87(C).
    5. Collins, William J., 2021. "The Great Migration of Black Americans from the US South: A guide and interpretation," Explorations in Economic History, Elsevier, vol. 80(C).
    6. Combes, Pierre-Philippe & Gobillon, Laurent & Zylberberg, Yanos, 2022. "Urban economics in a historical perspective: Recovering data with machine learning," Regional Science and Urban Economics, Elsevier, vol. 94(C).
    7. Escamilla-Guerrero, David & Kosack, Edward & Ward, Zachary, 2021. "Life after crossing the border: Assimilation during the first Mexican mass migration," Explorations in Economic History, Elsevier, vol. 82(C).
    8. Ran Abramitzky & Philipp Ager & Leah Platt Boustan & Elior Cohen & Casper W. Hansen, 2019. "The Effects of Immigration on the Economy: Lessons from the 1920s Border Closure," NBER Working Papers 26536, National Bureau of Economic Research, Inc.
    9. Chen, Shuo & Xie, Bin, 2020. "Institutional Discrimination and Assimilation: Evidence from the Chinese Exclusion Act of 1882," IZA Discussion Papers 13647, Institute of Labor Economics (IZA).
    10. Martha J. Bailey & Connor Cole & Morgan Henderson & Catherine Massey, 2020. "How Well Do Automated Linking Methods Perform? Lessons from US Historical Data," Journal of Economic Literature, American Economic Association, vol. 58(4), pages 997-1044, December.
    11. Dylan Shane Connor & Michael Storper, 2020. "The changing geography of social mobility in the United States," Proceedings of the National Academy of Sciences, Proceedings of the National Academy of Sciences, vol. 117(48), pages 30309-30317, December.
    12. Zimran, Ariell, 2022. "US immigrants’ secondary migration and geographic assimilation during the Age of Mass Migration," Explorations in Economic History, Elsevier, vol. 85(C).
    13. Elisa Jácome & Ilyana Kuziemko & Suresh Naidu, 2021. "Mobility for All: Representative Intergenerational Mobility Estimates over the 20th Century," Working Papers 302, Princeton University, Department of Economics, Center for Economic Policy Studies..
    14. Hausman, Catherine & Stolper, Samuel, 2021. "Inequality, information failures, and air pollution," Journal of Environmental Economics and Management, Elsevier, vol. 110(C).
    15. Michael Geruso & Timothy J. Layton & Jacob Wallace, 2023. "What Difference Does a Health Plan Make? Evidence from Random Plan Assignment in Medicaid," American Economic Journal: Applied Economics, American Economic Association, vol. 15(3), pages 341-379, July.
    16. Alex Bell & Raj Chetty & Xavier Jaravel & Neviana Petkova & John Van Reenen, 2019. "Who Becomes an Inventor in America? The Importance of Exposure to Innovation," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 134(2), pages 647-713.
    17. Chong Lu, 2022. "The effect of migration on rural residents’ intergenerational subjective social status mobility in China," Quality & Quantity: International Journal of Methodology, Springer, vol. 56(5), pages 3279-3308, October.
    18. John Gathergood & Fabian Gunzinger & Benedict Guttman-Kenney & Edika Quispe-Torreblanca & Neil Stewart, 2020. "Levelling Down and the COVID-19 Lockdowns: Uneven Regional Recovery in UK Consumer Spending," Papers 2012.09336, arXiv.org, revised Dec 2020.
    19. Kalra, Aarushi, 2021. "A 'Ghetto' of One's Own: Communal Violence, Residential Segregation and Group Education Outcomes in India," SocArXiv rzjct, Center for Open Science.
    20. Aline Bütikofer & René Karadakic & Kjell G. Salvanes, 2021. "Income Inequality and Mortality: A Norwegian Perspective," Fiscal Studies, John Wiley & Sons, vol. 42(1), pages 193-221, March.

    More about this item

    JEL classification:

    • C81 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Methodology for Collecting, Estimating, and Organizing Microeconomic Data; Data Access
    • J1 - Labor and Demographic Economics - - Demographic Economics
    • N01 - Economic History - - General - - - Development of the Discipline: Historiographical; Sources and Methods

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nbr:nberwo:26227. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: the person in charge (email available below). General contact details of provider: https://edirc.repec.org/data/nberrus.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.