IDEAS home Printed from https://ideas.repec.org/a/cup/apsrev/v113y2019i02p353-371_00.html
   My bibliography  Save this article

Using a Probabilistic Model to Assist Merging of Large-Scale Administrative Records

Author

Listed:
  • ENAMORADO, TED
  • FIFIELD, BENJAMIN
  • IMAI, KOSUKE

Abstract

Since most social science research relies on multiple data sources, merging data sets is an essential part of researchers’ workflow. Unfortunately, a unique identifier that unambiguously links records is often unavailable, and data may contain missing and inaccurate information. These problems are severe especially when merging large-scale administrative records. We develop a fast and scalable algorithm to implement a canonical model of probabilistic record linkage that has many advantages over deterministic methods frequently used by social scientists. The proposed methodology efficiently handles millions of observations while accounting for missing data and measurement error, incorporating auxiliary information, and adjusting for uncertainty about merging in post-merge analyses. We conduct comprehensive simulation studies to evaluate the performance of our algorithm in realistic scenarios. We also apply our methodology to merging campaign contribution records, survey data, and nationwide voter files. An open-source software package is available for implementing the proposed methodology.

Suggested Citation

  • Enamorado, Ted & Fifield, Benjamin & Imai, Kosuke, 2019. "Using a Probabilistic Model to Assist Merging of Large-Scale Administrative Records," American Political Science Review, Cambridge University Press, vol. 113(2), pages 353-371, May.
  • Handle: RePEc:cup:apsrev:v:113:y:2019:i:02:p:353-371_00
    as

    Download full text from publisher

    File URL: https://www.cambridge.org/core/product/identifier/S0003055418000783/type/journal_article
    File Function: link to article abstract page
    Download Restriction: no
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Raffiee, Joseph & Teodoridis, Florenta & Fehder, Daniel, 2023. "Partisan patent examiners? Exploring the link between the political ideology of patent examiners and patent office outcomes," Research Policy, Elsevier, vol. 52(9).
    2. Eric Chyn & Kareem Haggag, 2023. "Moved to Vote: The Long-Run Effects of Neighborhoods on Political Participation," The Review of Economics and Statistics, MIT Press, vol. 105(6), pages 1596-1605, November.
    3. Vo, Thanh Huan & Chauvet, Guillaume & Happe, André & Oger, Emmanuel & Paquelet, Stéphane & Garès, Valérie, 2023. "Extending the Fellegi-Sunter record linkage model for mixed-type data with application to the French national health data system," Computational Statistics & Data Analysis, Elsevier, vol. 179(C).
    4. Cuccaro-Alamin, Stephanie & Eastman, Andrea Lane & Foust, Regan & McCroskey, Jacquelyn & Nghiem, Huy Tran & Putnam-Hornstein, Emily, 2021. "Strategies for constructing household and family units with linked administrative records," Children and Youth Services Review, Elsevier, vol. 120(C).
    5. Kwiek, Marek & Roszka, Wojciech, 2021. "Gender-based homophily in research: A large-scale study of man-woman collaboration," Journal of Informetrics, Elsevier, vol. 15(3).
    6. Stephen B. Billings & Eric Chyn & Kareem Haggag, 2021. "The Long-Run Effects of School Racial Diversity on Political Identity," American Economic Review: Insights, American Economic Association, vol. 3(3), pages 267-284, September.
    7. Esbenshade, Lief, 2022. "Breaking Down: Teacher Attrition from Publicly Available Resources," EdArXiv e6cky, Center for Open Science.
    8. Fehder, Daniel & Teodoridis, Florenta & Raffiee, Joseph & Lu, Jino, 2024. "The partisanship of American inventors," Research Policy, Elsevier, vol. 53(7).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:cup:apsrev:v:113:y:2019:i:02:p:353-371_00. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Kirk Stebbing (email available below). General contact details of provider: https://www.cambridge.org/psr .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.