IDEAS home Printed from https://ideas.repec.org/a/bla/biomet/v78y2022i4p1674-1685.html
   My bibliography  Save this article

Efficient odds ratio estimation under two‐phase sampling using error‐prone data from a multi‐national HIV research cohort

Author

Listed:
  • Sarah C. Lotspeich
  • Bryan E. Shepherd
  • Gustavo G. C. Amorim
  • Pamela A. Shaw
  • Ran Tao

Abstract

Persons living with HIV engage in routine clinical care, generating large amounts of data in observational HIV cohorts. These data are often error‐prone, and directly using them in biomedical research could bias estimation and give misleading results. A cost‐effective solution is the two‐phase design, under which the error‐prone variables are observed for all patients during Phase I, and that information is used to select patients for data auditing during Phase II. For example, the Caribbean, Central, and South America network for HIV epidemiology (CCASAnet) selected a random sample from each site for data auditing. Herein, we consider efficient odds ratio estimation with partially audited, error‐prone data. We propose a semiparametric approach that uses all information from both phases and accommodates a number of error mechanisms. We allow both the outcome and covariates to be error‐prone and these errors to be correlated, and selection of the Phase II sample can depend on Phase I data in an arbitrary manner. We devise a computationally efficient, numerically stable EM algorithm to obtain estimators that are consistent, asymptotically normal, and asymptotically efficient. We demonstrate the advantages of the proposed methods over existing ones through extensive simulations. Finally, we provide applications to the CCASAnet cohort.

Suggested Citation

  • Sarah C. Lotspeich & Bryan E. Shepherd & Gustavo G. C. Amorim & Pamela A. Shaw & Ran Tao, 2022. "Efficient odds ratio estimation under two‐phase sampling using error‐prone data from a multi‐national HIV research cohort," Biometrics, The International Biometric Society, vol. 78(4), pages 1674-1685, December.
  • Handle: RePEc:bla:biomet:v:78:y:2022:i:4:p:1674-1685
    DOI: 10.1111/biom.13512
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/biom.13512
    Download Restriction: no

    File URL: https://libkey.io/10.1111/biom.13512?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Ran Tao & Donglin Zeng & Dan-Yu Lin, 2017. "Efficient Semiparametric Inference Under Two-Phase Sampling, With Applications to Genetic Association Studies," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(520), pages 1468-1476, October.
    2. Abhra Sarkar & Bani K. Mallick & Raymond J. Carroll, 2014. "Bayesian semiparametric regression in the presence of conditionally heteroscedastic measurement and regression errors," Biometrics, The International Biometric Society, vol. 70(4), pages 823-834, December.
    3. Koehler, Elizabeth & Brown, Elizabeth & Haneuse, Sebastien J.-P. A., 2009. "On the Assessment of Monte Carlo Error in Simulation-Based Statistical Analyses," The American Statistician, American Statistical Association, vol. 63(2), pages 155-162.
    4. Thomas Lumley & Pamela A. Shaw & James Y. Dai, 2011. "Connections between Survey Calibration Estimators and Semiparametric Models for Incomplete Data," International Statistical Review, International Statistical Institute, vol. 79(2), pages 200-220, August.
    5. Staudenmayer, John & Ruppert, David & Buonaccorsi, John P., 2008. "Density Estimation in the Presence of Heteroscedastic Measurement Error," Journal of the American Statistical Association, American Statistical Association, vol. 103, pages 726-736, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Li, Mengyan & Ma, Yanyuan & Li, Runze, 2019. "Semiparametric regression for measurement error model with heteroscedastic error," Journal of Multivariate Analysis, Elsevier, vol. 171(C), pages 320-338.
    2. Brady Ryan & Ananthika Nirmalkanna & Candemir Cigsar & Yildiz E. Yilmaz, 2023. "Evaluation of Designs and Estimation Methods Under Response-Dependent Two-Phase Sampling for Genetic Association Studies," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 15(2), pages 510-539, July.
    3. Gustavo Amorim & Ran Tao & Sarah Lotspeich & Pamela A. Shaw & Thomas Lumley & Bryan E. Shepherd, 2021. "Two‐phase sampling designs for data validation in settings with covariate measurement error and continuous outcome," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(4), pages 1368-1389, October.
    4. Delaigle, Aurore & Fan, Jianqing & Carroll, Raymond J., 2009. "A Design-Adaptive Local Polynomial Estimator for the Errors-in-Variables Problem," Journal of the American Statistical Association, American Statistical Association, vol. 104(485), pages 348-359.
    5. Tsai, Tsung-Han, 2016. "A Bayesian Approach to Dynamic Panel Models with Endogenous Rarely Changing Variables," Political Science Research and Methods, Cambridge University Press, vol. 4(3), pages 595-620, September.
    6. Vitoratou, Silia & Ntzoufras, Ioannis & Moustaki, Irini, 2016. "Explaining the behavior of joint and marginal Monte Carlo estimators in latent variable models with independence assumptions," LSE Research Online Documents on Economics 57685, London School of Economics and Political Science, LSE Library.
    7. Johannes Hönekopp & Audrey Helen Linden, 2022. "Heterogeneity estimates in a biased world," PLOS ONE, Public Library of Science, vol. 17(2), pages 1-21, February.
    8. DongHyuk Lee & Soumendra N. Lahiri & Samiran Sinha, 2020. "A test of homogeneity of distributions when observations are subject to measurement errors," Biometrics, The International Biometric Society, vol. 76(3), pages 821-833, September.
    9. Roy, Arkaprava & Sarkar, Abhra, 2023. "Bayesian semiparametric multivariate density deconvolution via stochastic rotation of replicates," Computational Statistics & Data Analysis, Elsevier, vol. 182(C).
    10. Sun-Joo Cho & Paul Boeck & Susan Embretson & Sophia Rabe-Hesketh, 2014. "Additive Multilevel Item Structure Models with Random Residuals: Item Modeling for Explanation and Item Generation," Psychometrika, Springer;The Psychometric Society, vol. 79(1), pages 84-104, January.
    11. David A. Wagstaff & Ofer Harel, 2011. "A closer examination of three small-sample approximations to the multiple-imputation degrees of freedom," Stata Journal, StataCorp LP, vol. 11(3), pages 403-419, September.
    12. Warrington Nicole M. & Tilling Kate & Howe Laura D. & Paternoster Lavinia & Pennell Craig E. & Wu Yan Yan & Briollais Laurent, 2014. "Robustness of the linear mixed effects model to error distribution assumptions and the consequences for genome-wide association studies," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 13(5), pages 567-587, October.
    13. repec:jss:jstsof:43:i11 is not listed on IDEAS
    14. Michel Winter & Isabelle Mirbel & Pierre Crescenzo, 2014. "Modeling Uncertainty when Estimating IT Projects Costs," Working Papers hal-00966573, HAL.
    15. Ng'ombe, John, 2019. "Economics of the Greenseeder Hand Planter, Discrete Choice Modeling, and On-Farm Field Experimentation," Thesis Commons jckt7, Center for Open Science.
    16. Soekhai, V. & Donkers, B. & Levitan, B. & de Bekker-Grob, E.W., 2021. "Case 2 best-worst scaling: For good or for bad but not for both," Journal of choice modelling, Elsevier, vol. 41(C).
    17. Chixiang Chen & Ming Wang & Shuo Chen, 2023. "An efficient data integration scheme for synthesizing information from multiple secondary datasets for the parameter inference of the main analysis," Biometrics, The International Biometric Society, vol. 79(4), pages 2947-2960, December.
    18. R. N. Rattihalli, 2023. "A Class of Multivariate Power Skew Symmetric Distributions: Properties and Inference for the Power-Parameter," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 85(2), pages 1356-1393, August.
    19. Tübbicke Stefan, 2022. "Entropy Balancing for Continuous Treatments," Journal of Econometric Methods, De Gruyter, vol. 11(1), pages 71-89, January.
    20. Jacob M. Maronge & Ran Tao & Jonathan S. Schildcrout & Paul J. Rathouz, 2023. "Generalized case‐control sampling under generalized linear models," Biometrics, The International Biometric Society, vol. 79(1), pages 332-343, March.
    21. Dorinth van Dijk & David Geltner & Alex van de Minne, 2018. "Revisiting supply and demand indexes in real estate," DNB Working Papers 583, Netherlands Central Bank, Research Department.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:biomet:v:78:y:2022:i:4:p:1674-1685. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.blackwellpublishing.com/journal.asp?ref=0006-341X .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.