IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v13y2022i1d10.1038_s41467-022-28546-8.html
   My bibliography  Save this article

Uncovering interpretable potential confounders in electronic medical records

Author

Listed:
  • Jiaming Zeng

    (Stanford University)

  • Michael F. Gensheimer

    (Stanford School of Medicine)

  • Daniel L. Rubin

    (Stanford School of Medicine)

  • Susan Athey

    (Stanford University)

  • Ross D. Shachter

    (Stanford University)

Abstract

Randomized clinical trials (RCT) are the gold standard for informing treatment decisions. Observational studies are often plagued by selection bias, and expert-selected covariates may insufficiently adjust for confounding. We explore how unstructured clinical text can be used to reduce selection bias and improve medical practice. We develop a framework based on natural language processing to uncover interpretable potential confounders from text. We validate our method by comparing the estimated hazard ratio (HR) with and without the confounders against established RCTs. We apply our method to four cohorts built from localized prostate and lung cancer datasets from the Stanford Cancer Institute and show that our method shifts the HR estimate towards the RCT results. The uncovered terms can also be interpreted by oncologists for clinical insights. We present this proof-of-concept study to enable more credible causal inference using observational data, uncover meaningful insights from clinical text, and inform high-stakes medical decisions.

Suggested Citation

  • Jiaming Zeng & Michael F. Gensheimer & Daniel L. Rubin & Susan Athey & Ross D. Shachter, 2022. "Uncovering interpretable potential confounders in electronic medical records," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
  • Handle: RePEc:nat:natcom:v:13:y:2022:i:1:d:10.1038_s41467-022-28546-8
    DOI: 10.1038/s41467-022-28546-8
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-022-28546-8
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-022-28546-8?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Other versions of this item:

    References listed on IDEAS

    as
    1. Alexandre Belloni & Victor Chernozhukov & Christian Hansen, 2014. "High-Dimensional Methods and Inference on Structural and Treatment Effects," Journal of Economic Perspectives, American Economic Association, vol. 28(2), pages 29-50, Spring.
    2. Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2018. "Double/debiased machine learning for treatment and structural parameters," Econometrics Journal, Royal Economic Society, vol. 21(1), pages 1-68, February.
    3. Margaret E. Roberts & Brandon M. Stewart & Richard A. Nielsen, 2020. "Adjusting for Confounding with Text Matching," American Journal of Political Science, John Wiley & Sons, vol. 64(4), pages 887-903, October.
    4. Alberto Abadie & Guido W. Imbens, 2011. "Bias-Corrected Matching Estimators for Average Treatment Effects," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 29(1), pages 1-11, January.
    5. Friedman, Jerome H., 2002. "Stochastic gradient boosting," Computational Statistics & Data Analysis, Elsevier, vol. 38(4), pages 367-378, February.
    6. Mozer, Reagan & Miratrix, Luke & Kaufman, Aaron Russell & Jason Anastasopoulos, L., 2020. "Matching with Text Data: An Experimental Evaluation of Methods for Matching Documents and of Measuring Match Quality," Political Analysis, Cambridge University Press, vol. 28(4), pages 445-468, October.
    7. Rajeev H. Dehejia & Sadek Wahba, 2002. "Propensity Score-Matching Methods For Nonexperimental Causal Studies," The Review of Economics and Statistics, MIT Press, vol. 84(1), pages 151-161, February.
    8. Imbens,Guido W. & Rubin,Donald B., 2015. "Causal Inference for Statistics, Social, and Biomedical Sciences," Cambridge Books, Cambridge University Press, number 9780521885881, September.
    9. Ho, Daniel & Imai, Kosuke & King, Gary & Stuart, Elizabeth A., 2011. "MatchIt: Nonparametric Preprocessing for Parametric Causal Inference," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 42(i08).
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Takanobu Hirosawa & Yukinori Harada & Masashi Yokose & Tetsu Sakamoto & Ren Kawamura & Taro Shimizu, 2023. "Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study," IJERPH, MDPI, vol. 20(4), pages 1-10, February.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Athey, Susan & Imbens, Guido W., 2019. "Machine Learning Methods Economists Should Know About," Research Papers 3776, Stanford University, Graduate School of Business.
    2. Athey, Susan & Imbens, Guido W. & Metzger, Jonas & Munro, Evan, 2024. "Using Wasserstein Generative Adversarial Networks for the design of Monte Carlo simulations," Journal of Econometrics, Elsevier, vol. 240(2).
    3. Davide Viviano & Jelena Bradic, 2019. "Synthetic learner: model-free inference on treatments over time," Papers 1904.01490, arXiv.org, revised Aug 2022.
    4. Sallin, Aurelién, 2021. "Estimating returns to special education: combining machine learning and text analysis to address confounding," Economics Working Paper Series 2109, University of St. Gallen, School of Economics and Political Science.
    5. Pedro Carneiro & Sokbae Lee & Daniel Wilhelm, 2020. "Optimal data collection for randomized control trials," The Econometrics Journal, Royal Economic Society, vol. 23(1), pages 1-31.
    6. Michael C. Knaus, 2021. "A double machine learning approach to estimate the effects of musical practice on student’s skills," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(1), pages 282-300, January.
    7. Yiyi Huo & Yingying Fan & Fang Han, 2023. "On the adaptation of causal forests to manifold data," Papers 2311.16486, arXiv.org, revised Dec 2023.
    8. Guido W. Imbens, 2020. "Potential Outcome and Directed Acyclic Graph Approaches to Causality: Relevance for Empirical Practice in Economics," Journal of Economic Literature, American Economic Association, vol. 58(4), pages 1129-1179, December.
    9. Dmitry Arkhangelsky & Guido Imbens, 2023. "Causal Models for Longitudinal and Panel Data: A Survey," Papers 2311.15458, arXiv.org, revised Jun 2024.
    10. Goller, Daniel & Lechner, Michael & Moczall, Andreas & Wolff, Joachim, 2020. "Does the estimation of the propensity score by machine learning improve matching estimation? The case of Germany's programmes for long term unemployed," Labour Economics, Elsevier, vol. 65(C).
    11. Koch, Nicolas & Basse Mama, Houdou, 2019. "Does the EU Emissions Trading System induce investment leakage? Evidence from German multinational firms," Energy Economics, Elsevier, vol. 81(C), pages 479-492.
    12. Huang, Wei & Li, Jinxian & Zhang, Qiang, 2019. "Information asymmetry, legal environment, and family firm governance: Evidence from IPO underpricing in China," Pacific-Basin Finance Journal, Elsevier, vol. 57(C).
    13. Helmut Wasserbacher & Martin Spindler, 2024. "Credit Ratings: Heterogeneous Effect on Capital Structure," Papers 2406.18936, arXiv.org.
    14. Aur'elien Sallin, 2021. "Estimating returns to special education: combining machine learning and text analysis to address confounding," Papers 2110.08807, arXiv.org, revised Feb 2022.
    15. Ferman, Bruno, 2021. "Matching estimators with few treated and many control observations," Journal of Econometrics, Elsevier, vol. 225(2), pages 295-307.
    16. Harsh Parikh & Cynthia Rudin & Alexander Volfovsky, 2018. "MALTS: Matching After Learning to Stretch," Papers 1811.07415, arXiv.org, revised Jun 2023.
    17. Harsh Parikh & Carlos Varjao & Louise Xu & Eric Tchetgen Tchetgen, 2022. "Validating Causal Inference Methods," Papers 2202.04208, arXiv.org, revised Jul 2022.
    18. Jonathan Fuhr & Philipp Berens & Dominik Papies, 2024. "Estimating Causal Effects with Double Machine Learning -- A Method Evaluation," Papers 2403.14385, arXiv.org, revised Apr 2024.
    19. Goenner, Cullen F, 2016. "The policy impact of new rules for loan participation on credit union returns," Journal of Banking & Finance, Elsevier, vol. 73(C), pages 198-210.
    20. Davide Viviano, 2019. "Policy Targeting under Network Interference," Papers 1906.10258, arXiv.org, revised Apr 2024.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:13:y:2022:i:1:d:10.1038_s41467-022-28546-8. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.