IDEAS home Printed from https://ideas.repec.org/a/spr/drugsa/v45y2022i5d10.1007_s40264-022-01158-3.html
   My bibliography  Save this article

Applying Machine Learning in Distributed Data Networks for Pharmacoepidemiologic and Pharmacovigilance Studies: Opportunities, Challenges, and Considerations

Author

Listed:
  • Jenna Wong

    (Harvard Medical School & Harvard Pilgrim Health Care Institute)

  • Daniel Prieto-Alhambra

    (NDORMS, University of Oxford
    Erasmus University Medical Center)

  • Peter R. Rijnbeek

    (Erasmus University Medical Center)

  • Rishi J. Desai

    (Harvard Medical School)

  • Jenna M. Reps

    (Janssen Research & Development, LLC)

  • Sengwee Toh

    (Harvard Medical School & Harvard Pilgrim Health Care Institute)

Abstract

Increasing availability of electronic health databases capturing real-world experiences with medical products has garnered much interest in their use for pharmacoepidemiologic and pharmacovigilance studies. The traditional practice of having numerous groups use single databases to accomplish similar tasks and address common questions about medical products can be made more efficient through well-coordinated multi-database studies, greatly facilitated through distributed data network (DDN) architectures. Access to larger amounts of electronic health data within DDNs has created a growing interest in using data-adaptive machine learning (ML) techniques that can automatically model complex associations in high-dimensional data with minimal human guidance. However, the siloed storage and diverse nature of the databases in DDNs create unique challenges for using ML. In this paper, we discuss opportunities, challenges, and considerations for applying ML in DDNs for pharmacoepidemiologic and pharmacovigilance studies. We first discuss major types of activities performed by DDNs and how ML may be used. Next, we discuss practical data-related factors influencing how DDNs work in practice. We then combine these discussions and jointly consider how opportunities for ML are affected by practical data-related factors for DDNs, leading to several challenges. We present different approaches for addressing these challenges and highlight efforts that real-world DDNs have taken or are currently taking to help mitigate them. Despite these challenges, the time is ripe for the emerging interest to use ML in DDNs, and the utility of these data-adaptive modeling techniques in pharmacoepidemiologic and pharmacovigilance studies will likely continue to increase in the coming years.

Suggested Citation

  • Jenna Wong & Daniel Prieto-Alhambra & Peter R. Rijnbeek & Rishi J. Desai & Jenna M. Reps & Sengwee Toh, 2022. "Applying Machine Learning in Distributed Data Networks for Pharmacoepidemiologic and Pharmacovigilance Studies: Opportunities, Challenges, and Considerations," Drug Safety, Springer, vol. 45(5), pages 493-510, May.
  • Handle: RePEc:spr:drugsa:v:45:y:2022:i:5:d:10.1007_s40264-022-01158-3
    DOI: 10.1007/s40264-022-01158-3
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s40264-022-01158-3
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s40264-022-01158-3?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Jenny W Sun & Jessica M Franklin & Kathryn Rough & Rishi J Desai & Sonia Hernández-Díaz & Krista F Huybrechts & Brian T Bateman, 2020. "Predicting overdose among individuals prescribed opioids using routinely collected healthcare utilization data," PLOS ONE, Public Library of Science, vol. 15(10), pages 1-17, October.
    2. Wright, George & Lawrence, Michael J. & Collopy, Fred, 1996. "The role and validity of judgment in forecasting," International Journal of Forecasting, Elsevier, vol. 12(1), pages 1-8, March.
    3. van der Laan Mark J. & Rubin Daniel, 2006. "Targeted Maximum Likelihood Learning," The International Journal of Biostatistics, De Gruyter, vol. 2(1), pages 1-40, December.
    4. Qiong Wang & Jenna M Reps & Kristin Feeney Kostka & Patrick B Ryan & Yuhui Zou & Erica A Voss & Peter R Rijnbeek & RuiJun Chen & Gowtham A Rao & Henry Morgan Stewart & Andrew E Williams & Ross D Willi, 2020. "Development and validation of a prognostic model predicting symptomatic hemorrhagic transformation in acute ischemic stroke at scale in the OHDSI network," PLOS ONE, Public Library of Science, vol. 15(1), pages 1-12, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Susan Athey & Guido W. Imbens & Stefan Wager, 2018. "Approximate residual balancing: debiased inference of average treatment effects in high dimensions," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 80(4), pages 597-623, September.
    2. S Ariane Christie & Amanda S Conroy & Rachael A Callcut & Alan E Hubbard & Mitchell J Cohen, 2019. "Dynamic multi-outcome prediction after injury: Applying adaptive machine learning for precision medicine in trauma," PLOS ONE, Public Library of Science, vol. 14(4), pages 1-13, April.
    3. Waverly Wei & Maya Petersen & Mark J van der Laan & Zeyu Zheng & Chong Wu & Jingshen Wang, 2023. "Efficient targeted learning of heterogeneous treatment effects for multiple subgroups," Biometrics, The International Biometric Society, vol. 79(3), pages 1934-1946, September.
    4. Michael Rosenblum & Nicholas P. Jewell & Mark van der Laan & Stephen Shiboski & Ariane van der Straten & Nancy Padian, 2009. "Analysing direct effects in randomized trials with secondary interventions: an application to human immunodeficiency virus prevention trials," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 172(2), pages 443-465, April.
    5. Victor Chernozhukov & Whitney K. Newey & Victor Quintas-Martinez & Vasilis Syrgkanis, 2021. "Automatic Debiased Machine Learning via Riesz Regression," Papers 2104.14737, arXiv.org, revised Mar 2024.
    6. Paul Frédéric Blanche & Anders Holt & Thomas Scheike, 2023. "On logistic regression with right censored data, with or without competing risks, and its use for estimating treatment effects," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 29(2), pages 441-482, April.
    7. Yiyi Huo & Yingying Fan & Fang Han, 2023. "On the adaptation of causal forests to manifold data," Papers 2311.16486, arXiv.org, revised Dec 2023.
    8. Michael Lechner, 2023. "Causal Machine Learning and its use for public policy," Swiss Journal of Economics and Statistics, Springer;Swiss Society of Economics and Statistics, vol. 159(1), pages 1-15, December.
    9. Stitelman Ori M & van der Laan Mark J., 2010. "Collaborative Targeted Maximum Likelihood for Time to Event Data," The International Journal of Biostatistics, De Gruyter, vol. 6(1), pages 1-46, June.
    10. Martin Huber & Michael Lechner & Giovanni Mellace, 2016. "The Finite Sample Performance of Estimators for Mediation Analysis Under Sequential Conditional Independence," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 34(1), pages 139-160, January.
    11. Gruber Susan & van der Laan Mark J., 2010. "A Targeted Maximum Likelihood Estimator of a Causal Effect on a Bounded Continuous Outcome," The International Journal of Biostatistics, De Gruyter, vol. 6(1), pages 1-18, August.
    12. Kara E. Rudolph & Jonathan Levy & Mark J. van der Laan, 2021. "Transporting stochastic direct and indirect effects to new populations," Biometrics, The International Biometric Society, vol. 77(1), pages 197-211, March.
    13. Gruber Susan & van der Laan Mark J., 2010. "An Application of Collaborative Targeted Maximum Likelihood Estimation in Causal Inference and Genomics," The International Journal of Biostatistics, De Gruyter, vol. 6(1), pages 1-31, May.
    14. Michael C Knaus, 2022. "Double machine learning-based programme evaluation under unconfoundedness [Econometric methods for program evaluation]," The Econometrics Journal, Royal Economic Society, vol. 25(3), pages 602-627.
    15. Antonelli Joseph & Cefalu Matthew, 2020. "Averaging causal estimators in high dimensions," Journal of Causal Inference, De Gruyter, vol. 8(1), pages 92-107, January.
    16. Tuglus Catherine & van der Laan Mark J., 2011. "Repeated Measures Semiparametric Regression Using Targeted Maximum Likelihood Methodology with Application to Transcription Factor Activity Discovery," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-31, January.
    17. Yuya Sasaki & Takuya Ura & Yichong Zhang, 2022. "Unconditional quantile regression with high‐dimensional data," Quantitative Economics, Econometric Society, vol. 13(3), pages 955-978, July.
    18. Iván Díaz & Elizabeth Colantuoni & Daniel F. Hanley & Michael Rosenblum, 2019. "Improved precision in the analysis of randomized trials with survival outcomes, without assuming proportional hazards," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 25(3), pages 439-468, July.
    19. Frölich, Markus & Huber, Martin & Wiesenfarth, Manuel, 2017. "The finite sample performance of semi- and non-parametric estimators for treatment effects and policy evaluation," Computational Statistics & Data Analysis, Elsevier, vol. 115(C), pages 91-102.
    20. Rose Sherri & van der Laan Mark J., 2008. "Simple Optimal Weighting of Cases and Controls in Case-Control Studies," The International Journal of Biostatistics, De Gruyter, vol. 4(1), pages 1-26, September.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:drugsa:v:45:y:2022:i:5:d:10.1007_s40264-022-01158-3. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com/economics/journal/40264 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.