IDEAS home Printed from https://ideas.repec.org/a/bla/biomet/v77y2021i1p67-77.html
   My bibliography  Save this article

Case contamination in electronic health records based case‐control studies

Author

Listed:
  • Lu Wang
  • Jill Schnall
  • Aeron Small
  • Rebecca A. Hubbard
  • Jason H. Moore
  • Scott M. Damrauer
  • Jinbo Chen

Abstract

Clinically relevant information from electronic health records (EHRs) permits derivation of a rich collection of phenotypes. Unlike traditionally designed studies where scientific hypotheses are specified a priori before data collection, the true phenotype status of any given individual in EHR‐based studies is not directly available. Structured and unstructured data elements need to be queried through preconstructed rules to identify case and control groups. A sufficient number of controls can usually be identified with high accuracy by making the selection criteria stringent. But more relaxed criteria are often necessary for more thorough identification of cases to ensure achievable statistical power. The resulting pool of candidate cases consists of genuine cases contaminated with noncase patients who do not satisfy the control definition. The presence of patients who are neither true cases nor controls among the identified cases is a unique challenge in EHR‐based case‐control studies. Ignoring case contamination would lead to biased estimation of odds ratio association parameters. We propose an estimating equation approach to bias correction, study its large sample property, and evaluate its performance through extensive simulation studies and an application to a pilot study of aortic stenosis in the Penn medicine EHR. Our method holds the promise of facilitating more efficient EHR studies by accommodating enlarged albeit contaminated case pools.

Suggested Citation

  • Lu Wang & Jill Schnall & Aeron Small & Rebecca A. Hubbard & Jason H. Moore & Scott M. Damrauer & Jinbo Chen, 2021. "Case contamination in electronic health records based case‐control studies," Biometrics, The International Biometric Society, vol. 77(1), pages 67-77, March.
  • Handle: RePEc:bla:biomet:v:77:y:2021:i:1:p:67-77
    DOI: 10.1111/biom.13264
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/biom.13264
    Download Restriction: no

    File URL: https://libkey.io/10.1111/biom.13264?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Carlos Daniel Paulino & Paulo Soares & John Neuhaus, 2003. "Binomial Regression with Misclassification," Biometrics, The International Biometric Society, vol. 59(3), pages 670-675, September.
    2. Robert H. Lyles, 2002. "A Note on Estimating Crude Odds Ratios in Case–Control Studies with Differentially Misclassified Exposure," Biometrics, The International Biometric Society, vol. 58(4), pages 1034-1036, December.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Guorong Dai & Yanyuan Ma & Jill Hasler & Jinbo Chen & Raymond J. Carroll, 2023. "A robust approach for electronic health record–based case‐control studies with contaminated case pools," Biometrics, The International Biometric Society, vol. 79(3), pages 2023-2035, September.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Zhang, Han, 2021. "How Using Machine Learning Classification as a Variable in Regression Leads to Attenuation Bias and What to Do About It," SocArXiv 453jk, Center for Open Science.
    2. Grace Y. Yi & Wenqing He, 2017. "Analysis of case-control data with interacting misclassified covariates," Journal of Statistical Distributions and Applications, Springer, vol. 4(1), pages 1-16, December.
    3. Qi Zhou & Yoo-Mi Chin & James D. Stamey & Joon Jin Song, 2020. "Bayesian sensitivity analysis to unmeasured confounding for misclassified data," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 104(4), pages 577-596, December.
    4. repec:rfb:journl:v:10:y:2018:i:2:p:0077-094 is not listed on IDEAS
    5. Sajad Shojaee & Nastaran Hajizadeh & Hadis Najafimehr & Luca Busani & Mohamad Amin Pourhoseingholi & Ahmad Reza Baghestani & Maryam Nasserinejad & Sara Ashtari & Mohammad Reza Zali, 2018. "Bayesian adjustment for trend of colorectal cancer incidence in misclassified registering across Iranian provinces," PLOS ONE, Public Library of Science, vol. 13(12), pages 1-10, December.
    6. Hanaan Yaseen, 2018. "Dividend policy and socio-cultural factors: some preliminary findings," The Review of Finance and Banking, Academia de Studii Economice din Bucuresti, Romania / Facultatea de Finante, Asigurari, Banci si Burse de Valori / Catedra de Finante, vol. 10(2), pages 077-094, December.
    7. Sepúlveda, Nuno & Paulino, Carlos Daniel & Penha-Gonçalves, Carlos, 2009. "Bayesian analysis of allelic penetrance models for complex binary traits," Computational Statistics & Data Analysis, Elsevier, vol. 53(4), pages 1271-1283, February.
    8. Naranjo, L. & Martín, J. & Pérez, C.J., 2014. "Bayesian binary regression with exponential power link," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 464-476.
    9. Paulino, Carlos Daniel & Silva, Giovani & Alberto Achcar, Jorge, 2005. "Bayesian analysis of correlated misclassified binary data," Computational Statistics & Data Analysis, Elsevier, vol. 49(4), pages 1120-1131, June.
    10. Mak Timothy Shin Heng & Best Nicky & Rushton Lesley, 2015. "Robust Bayesian Sensitivity Analysis for Case–Control Studies with Uncertain Exposure Misclassification Probabilities," The International Journal of Biostatistics, De Gruyter, vol. 11(1), pages 135-149, May.
    11. Jonathan N. Katz & Gabriel Katz, 2010. "Correcting for Survey Misreports Using Auxiliary Information with an Application to Estimating Turnout," American Journal of Political Science, John Wiley & Sons, vol. 54(3), pages 815-835, July.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:biomet:v:77:y:2021:i:1:p:67-77. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.blackwellpublishing.com/journal.asp?ref=0006-341X .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.