IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2501.02318.html
   My bibliography  Save this paper

Prediction with Differential Covariate Classification: Illustrated by Racial/Ethnic Classification in Medical Risk Assessment

Author

Listed:
  • Charles F. Manski
  • John Mullahy
  • Atheendar S. Venkataramani

Abstract

A common practice in evidence-based decision-making uses estimates of conditional probabilities P(y|x) obtained from research studies to predict outcomes y on the basis of observed covariates x. Given this information, decisions are then based on the predicted outcomes. Researchers commonly assume that the predictors used in the generation of the evidence are the same as those used in applying the evidence: i.e., the meaning of x in the two circumstances is the same. This may not be the case in real-world settings. Across a wide-range of settings, ranging from clinical practice or education policy, demographic attributes (e.g., age, race, ethnicity) are often classified differently in research studies than in decision settings. This paper studies identification in such settings. We propose a formal framework for prediction with what we term differential covariate classification (DCC). Using this framework, we analyze partial identification of probabilistic predictions and assess how various assumptions influence the identification regions. We apply the findings to a range of settings, focusing mainly on differential classification of individuals' race and ethnicity in clinical medicine. We find that bounds on P(y|x) can be wide, and the information needed to narrow them available only in special cases. These findings highlight an important problem in using evidence in decision making, a problem that has not yet been fully appreciated in debates on classification in public policy and medicine.

Suggested Citation

  • Charles F. Manski & John Mullahy & Atheendar S. Venkataramani, 2025. "Prediction with Differential Covariate Classification: Illustrated by Racial/Ethnic Classification in Medical Risk Assessment," Papers 2501.02318, arXiv.org.
  • Handle: RePEc:arx:papers:2501.02318
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2501.02318
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Charles F. Manski, 2013. "Response to the Review of ‘Public Policy in an Uncertain World’," Economic Journal, Royal Economic Society, vol. 0, pages 412-415, August.
    2. Francisca Antman & Brian Duncan, 2015. "Incentives to Identify: Racial Identity in the Age of Affirmative Action," The Review of Economics and Statistics, MIT Press, vol. 97(3), pages 710-713, July.
    3. Kerwin Kofi Charles & Jonathan Guryan, 2011. "Studying Discrimination: Fundamental Challenges and Recent Progress," Annual Review of Economics, Annual Reviews, vol. 3(1), pages 479-511, September.
    4. Molinari, Francesca, 2008. "Partial identification of probability distributions with misclassified data," Journal of Econometrics, Elsevier, vol. 144(1), pages 81-117, May.
    5. Keith Finlay & Elizabeth Luh & Michael G. Mueller-Smith, 2024. "Race and Ethnicity (Mis)measurement in the U.S. Criminal Justice System," NBER Working Papers 32657, National Bureau of Economic Research, Inc.
    6. Horowitz, Joel L & Manski, Charles F, 1995. "Identification and Robustness with Contaminated and Corrupted Data," Econometrica, Econometric Society, vol. 63(2), pages 281-302, March.
    7. Charles F. Manski & John V. Pepper, 2018. "How Do Right-to-Carry Laws Affect Crime Rates? Coping with Ambiguity Using Bounded-Variation Assumptions," The Review of Economics and Statistics, MIT Press, vol. 100(2), pages 232-244, May.
    8. Charles F. Manski & John Mullahy & Atheendar S. Venkataramani, 2023. "Using measures of race to make clinical predictions: Decision making, patient health, and fairness," Proceedings of the National Academy of Sciences, Proceedings of the National Academy of Sciences, vol. 120(35), pages 2303370120-, August.
    9. Philip J. Cross & Charles F. Manski, 2002. "Regressions, Short and Long," Econometrica, Econometric Society, vol. 70(1), pages 357-368, January.
    10. Bollinger, Christopher R., 1996. "Bounding mean regressions when a binary regressor is mismeasured," Journal of Econometrics, Elsevier, vol. 73(2), pages 387-399, August.
    11. Charles F. Manski, 2018. "Credible ecological inference for medical decisions with personalized risk assessment," Quantitative Economics, Econometric Society, vol. 9(2), pages 541-569, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Craig Gundersen & Brent Kreider, 2008. "Food Stamps and Food Insecurity: What Can Be Learned in the Presence of Nonclassical Measurement Error?," Journal of Human Resources, University of Wisconsin Press, vol. 43(2), pages 352-382.
    2. Guido W. Imbens & Charles F. Manski, 2004. "Confidence Intervals for Partially Identified Parameters," Econometrica, Econometric Society, vol. 72(6), pages 1845-1857, November.
    3. Brent Kreider & John Pepper, 2008. "Inferring disability status from corrupt data," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 23(3), pages 329-349.
    4. Francesca Molinari, 2020. "Microeconometrics with Partial Identi?cation," CeMMAP working papers CWP15/20, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    5. Charles F. Manski, 2018. "Reasonable patient care under uncertainty," Health Economics, John Wiley & Sons, Ltd., vol. 27(10), pages 1397-1421, October.
    6. Giovanni Compiani & Yuichi Kitamura, 2016. "Using mixtures in econometric models: a brief review and some new results," Econometrics Journal, Royal Economic Society, vol. 19(3), pages 95-127, October.
    7. Tatiana Komarova & Denis Nekipelov & Evgeny Yakovlev, 2018. "Identification, data combination, and the risk of disclosure," Quantitative Economics, Econometric Society, vol. 9(1), pages 395-440, March.
    8. Gundersen, Craig & Kreider, Brent & Pepper, John, 2012. "The impact of the National School Lunch Program on child health: A nonparametric bounds analysis," Journal of Econometrics, Elsevier, vol. 166(1), pages 79-91.
    9. Takahide Yanagi, 2019. "Inference on local average treatment effects for misclassified treatment," Econometric Reviews, Taylor & Francis Journals, vol. 38(8), pages 938-960, September.
    10. Gundersen, Craig & Kreider, Brent, 2009. "Bounding the effects of food insecurity on children's health outcomes," Journal of Health Economics, Elsevier, vol. 28(5), pages 971-983, September.
    11. Kreider, Brent & Pepper, John V., 2011. "Identification of Expected Outcomes in a Data Error Mixing Model With Multiplicative Mean Independence," Journal of Business & Economic Statistics, American Statistical Association, vol. 29(1), pages 49-60.
    12. Acerenza, Santiago & Ban, Kyunghoon & Kedagni, Desire, 2021. "Marginal Treatment Effects with Misclassified Treatment," ISU General Staff Papers 202106180700001132, Iowa State University, Department of Economics.
    13. Yanqin Fan & Carlos A. Manzanares, 2017. "Partial identification of average treatment effects on the treated through difference-in-differences," Econometric Reviews, Taylor & Francis Journals, vol. 36(6-9), pages 1057-1080, October.
    14. Kreider, Brent, 2006. "Partially Identifying the Prevalence of Health Insurance Given Contaminated Sampling Response Error," Staff General Research Papers Archive 12588, Iowa State University, Department of Economics.
    15. Francesca Molinari, 2019. "Econometrics with Partial Identification," CeMMAP working papers CWP25/19, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    16. Meyer, Bruce D. & Mittag, Nikolas, 2017. "Misclassification in binary choice models," Journal of Econometrics, Elsevier, vol. 200(2), pages 295-311.
    17. Brent Kreider & Steven C. Hill, 2009. "Partially Identifying Treatment Effects with an Application to Covering the Uninsured," Journal of Human Resources, University of Wisconsin Press, vol. 44(2).
    18. Christian Bontemps & Thierry Magnac & Eric Maurin, 2012. "Set Identified Linear Models," Econometrica, Econometric Society, vol. 80(3), pages 1129-1155, May.
    19. Alberto Abadie & Susan Athey & Guido W. Imbens & Jeffrey M. Wooldridge, 2020. "Sampling‐Based versus Design‐Based Uncertainty in Regression Analysis," Econometrica, Econometric Society, vol. 88(1), pages 265-296, January.
    20. DiTraglia, Francis J. & García-Jimeno, Camilo, 2019. "Identifying the effect of a mis-classified, binary, endogenous regressor," Journal of Econometrics, Elsevier, vol. 209(2), pages 376-390.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2501.02318. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.