IDEAS home Printed from https://ideas.repec.org/a/bla/jorssc/v71y2022i5p1663-1687.html
   My bibliography  Save this article

Score test for assessing the conditional dependence in latent class models and its application to record linkage

Author

Listed:
  • Huiping Xu
  • Xiaochun Li
  • Zuoyi Zhang
  • Shaun Grannis

Abstract

The Fellegi–Sunter model has been widely used in probabilistic record linkage despite its often invalid conditional independence assumption. Prior research has demonstrated that conditional dependence latent class models yield improved match performance when using the correct conditional dependence structure. With a misspecified conditional dependence structure, these models can yield worse performance. It is, therefore, critically important to correctly identify the conditional dependence structure. Existing methods for identifying the conditional dependence structure include the correlation residual plot, the log‐odds ratio check, and the bivariate residual, all of which have been shown to perform inadequately. Bootstrap bivariate residual approach and score test have also been proposed and found to have better performance, with the score test having greater power and lower computational burden. In this paper, we extend the score‐test‐based approach to account for different conditional dependence structures. Through a simulation study, we develop practical recommendations on the utilisation of the score test and assess the match performance with conditional dependence identified by the proposed method. Performance of the proposed method is further evaluated using a real‐world record linkage example. Findings show that the proposed method leads to improved matching accuracy relative to the Fellegi–Sunter model.

Suggested Citation

  • Huiping Xu & Xiaochun Li & Zuoyi Zhang & Shaun Grannis, 2022. "Score test for assessing the conditional dependence in latent class models and its application to record linkage," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 71(5), pages 1663-1687, November.
  • Handle: RePEc:bla:jorssc:v:71:y:2022:i:5:p:1663-1687
    DOI: 10.1111/rssc.12590
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/rssc.12590
    Download Restriction: no

    File URL: https://libkey.io/10.1111/rssc.12590?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Subtil, Ana & de Oliveira, M. Rosário & Gonçalves, Luzia, 2012. "Conditional dependence diagnostic in the latent class model: A simulation study," Statistics & Probability Letters, Elsevier, vol. 82(7), pages 1407-1412.
    2. Geoffrey Jones & Wesley O. Johnson & Timothy E. Hanson & Ronald Christensen, 2010. "Identifiability of Models for Multiple Diagnostic Testing in the Absence of a Gold Standard," Biometrics, The International Biometric Society, vol. 66(3), pages 855-863, September.
    3. Paul S. Albert & Lori E. Dodd, 2004. "A Cautionary Note on the Robustness of Latent Class Models for Estimating Diagnostic Error without a Gold Standard," Biometrics, The International Biometric Society, vol. 60(2), pages 427-435, June.
    4. Mauricio Sadinle, 2017. "Bayesian Estimation of Bipartite Matchings for Record Linkage," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(518), pages 600-612, April.
    5. Elizabeth S. Garrett & Scott L. Zeger, 2000. "Latent Class Model Diagnosis," Biometrics, The International Biometric Society, vol. 56(4), pages 1055-1067, December.
    6. Gneiting, Tilmann & Raftery, Adrian E., 2007. "Strictly Proper Scoring Rules, Prediction, and Estimation," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 359-378, March.
    7. Daniel Oberski & Geert Kollenburg & Jeroen Vermunt, 2013. "A Monte Carlo evaluation of three methods to detect local dependence in binary data latent class models," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 7(3), pages 267-279, September.
    8. Engle, Robert F., 1984. "Wald, likelihood ratio, and Lagrange multiplier tests in econometrics," Handbook of Econometrics, in: Z. Griliches† & M. D. Intriligator (ed.), Handbook of Econometrics, edition 1, volume 2, chapter 13, pages 775-826, Elsevier.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Andrew Martinez, 2017. "Testing for Differences in Path Forecast Accuracy: Forecast-Error Dynamics Matter," Working Papers (Old Series) 1717, Federal Reserve Bank of Cleveland.
    2. Wang, Zheyu & Sebestyen, Krisztian & Monsell, Sarah E., 2017. "Model-based clustering for assessing the prognostic value of imaging biomarkers and mixed type tests," Computational Statistics & Data Analysis, Elsevier, vol. 113(C), pages 125-135.
    3. Leandro García Barrado & Els Coart & Tomasz Burzykowski, 2017. "Estimation of diagnostic accuracy of a combination of continuous biomarkers allowing for conditional dependence between the biomarkers and the imperfect reference-test," Biometrics, The International Biometric Society, vol. 73(2), pages 646-655, June.
    4. Subtil, Ana & de Oliveira, M. Rosário & Gonçalves, Luzia, 2012. "Conditional dependence diagnostic in the latent class model: A simulation study," Statistics & Probability Letters, Elsevier, vol. 82(7), pages 1407-1412.
    5. Guastadisegni, Lucia & Cagnone, Silvia & Moustaki, Irini & Vasdekis, Vassilis, 2022. "Use of the Lagrange multiplier test for assessing measurement invariance under model misspecification," LSE Research Online Documents on Economics 110358, London School of Economics and Political Science, LSE Library.
    6. Saeed Hayati & Kenji Fukumizu & Afshin Parvardeh, 2024. "Kernel mean embedding of probability measures and its applications to functional data analysis," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 51(2), pages 447-484, June.
    7. Arthur Novaes de Amorim & Rob Deardon & Vineet Saini, 2021. "A stacked ensemble method for forecasting influenza-like illness visit volumes at emergency departments," PLOS ONE, Public Library of Science, vol. 16(3), pages 1-15, March.
    8. Laura Liu & Hyungsik Roger Moon & Frank Schorfheide, 2023. "Forecasting with a panel Tobit model," Quantitative Economics, Econometric Society, vol. 14(1), pages 117-159, January.
    9. Azar, Pablo D. & Micali, Silvio, 2018. "Computational principal agent problems," Theoretical Economics, Econometric Society, vol. 13(2), May.
    10. Antonio Páez & Takashi Uchida & Kazuaki Miyamoto, 2002. "A General Framework for Estimation and Inference of Geographically Weighted Regression Models: 1. Location-Specific Kernel Bandwidths and a Test for Locational Heterogeneity," Environment and Planning A, , vol. 34(4), pages 733-754, April.
    11. Quan, Hao & Yang, Dazhi, 2020. "Probabilistic solar irradiance transposition models," Renewable and Sustainable Energy Reviews, Elsevier, vol. 125(C).
    12. Tallman, Ellis W. & Zaman, Saeed, 2020. "Combining survey long-run forecasts and nowcasts with BVAR forecasts using relative entropy," International Journal of Forecasting, Elsevier, vol. 36(2), pages 373-398.
    13. Dumas, Jonathan & Wehenkel, Antoine & Lanaspeze, Damien & Cornélusse, Bertrand & Sutera, Antonio, 2022. "A deep generative model for probabilistic energy forecasting in power systems: normalizing flows," Applied Energy, Elsevier, vol. 305(C).
    14. Angelica Gianfreda & Francesco Ravazzolo & Luca Rossini, 2023. "Large Time‐Varying Volatility Models for Hourly Electricity Prices," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 85(3), pages 545-573, June.
    15. Nowotarski, Jakub & Weron, Rafał, 2018. "Recent advances in electricity price forecasting: A review of probabilistic forecasting," Renewable and Sustainable Energy Reviews, Elsevier, vol. 81(P1), pages 1548-1568.
    16. Tobias Fissler & Yannick Hoga, 2024. "How to Compare Copula Forecasts?," Papers 2410.04165, arXiv.org.
    17. Davide Pettenuzzo & Francesco Ravazzolo, 2016. "Optimal Portfolio Choice Under Decision‐Based Model Combinations," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 31(7), pages 1312-1332, November.
    18. Rubio, F.J. & Steel, M.F.J., 2011. "Inference for grouped data with a truncated skew-Laplace distribution," Computational Statistics & Data Analysis, Elsevier, vol. 55(12), pages 3218-3231, December.
    19. Andrews, Donald W K, 1989. "Power in Econometric Applications," Econometrica, Econometric Society, vol. 57(5), pages 1059-1090, September.
    20. David Kohns & Tibor Szendrei, 2021. "Decoupling Shrinkage and Selection for the Bayesian Quantile Regression," Papers 2107.08498, arXiv.org.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jorssc:v:71:y:2022:i:5:p:1663-1687. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://edirc.repec.org/data/rssssea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.