IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2404.04979.html
   My bibliography  Save this paper

CAVIAR: Categorical-Variable Embeddings for Accurate and Robust Inference

Author

Listed:
  • Anirban Mukherjee
  • Hannah Hanwen Chang

Abstract

Social science research often hinges on the relationship between categorical variables and outcomes. We introduce CAVIAR, a novel method for embedding categorical variables that assume values in a high-dimensional ambient space but are sampled from an underlying manifold. Our theoretical and numerical analyses outline challenges posed by such categorical variables in causal inference. Specifically, dynamically varying and sparse levels can lead to violations of the Donsker conditions and a failure of the estimation functionals to converge to a tight Gaussian process. Traditional approaches, including the exclusion of rare categorical levels and principled variable selection models like LASSO, fall short. CAVIAR embeds the data into a lower-dimensional global coordinate system. The mapping can be derived from both structured and unstructured data, and ensures stable and robust estimates through dimensionality reduction. In a dataset of direct-to-consumer apparel sales, we illustrate how high-dimensional categorical variables, such as zip codes, can be succinctly represented, facilitating inference and analysis.

Suggested Citation

  • Anirban Mukherjee & Hannah Hanwen Chang, 2024. "CAVIAR: Categorical-Variable Embeddings for Accurate and Robust Inference," Papers 2404.04979, arXiv.org, revised Apr 2024.
  • Handle: RePEc:arx:papers:2404.04979
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2404.04979
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Steven G. Rivkin & Eric A. Hanushek & John F. Kain, 2005. "Teachers, Schools, and Academic Achievement," Econometrica, Econometric Society, vol. 73(2), pages 417-458, March.
    2. Hanushek, Eric A & Rivkin, Steven G & Taylor, Lori L, 1996. "Aggregation and the Estimated Effects of School Resources," The Review of Economics and Statistics, MIT Press, vol. 78(4), pages 611-627, November.
    3. Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2018. "Double/debiased machine learning for treatment and structural parameters," Econometrics Journal, Royal Economic Society, vol. 21(1), pages 1-68, February.
    4. Jeffrey M Wooldridge, 2010. "Econometric Analysis of Cross Section and Panel Data," MIT Press Books, The MIT Press, edition 2, volume 1, number 0262232588, December.
    5. Kane, Thomas J. & Rockoff, Jonah E. & Staiger, Douglas O., 2008. "What does certification tell us about teacher effectiveness? Evidence from New York City," Economics of Education Review, Elsevier, vol. 27(6), pages 615-631, December.
    6. Woodberry, Robert D., 2012. "The Missionary Roots of Liberal Democracy," American Political Science Review, Cambridge University Press, vol. 106(2), pages 244-274, May.
    7. Lancaster, Tony, 2000. "The incidental parameter problem since 1948," Journal of Econometrics, Elsevier, vol. 95(2), pages 391-413, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Daisy J. Huang & Charles Ka Yui Leung & Chung-Yi Tse, 2018. "What Accounts for the Differences in Rent-Price Ratio and Turnover Rate? A Search-and-Matching Approach," The Journal of Real Estate Finance and Economics, Springer, vol. 57(3), pages 431-475, October.
    2. Brutti, Zelda & Sánchez Torres, Fabio, 2022. "Turning around teacher quality in Latin America: Renewed confidence and lessons from Colombia," Economic Analysis and Policy, Elsevier, vol. 73(C), pages 62-93.
    3. Seth Gershenson & Alison Jacknowitz & Andrew Brannegan, 2017. "Are Student Absences Worth the Worry in U.S. Primary Schools?," Education Finance and Policy, MIT Press, vol. 12(2), pages 137-165, Spring.
    4. Marine de Talancé, 2015. "Better Teachers, Better Results? Evidence from Rural Pakistan," Working Papers DT/2015/21, DIAL (Développement, Institutions et Mondialisation).
    5. Wuppermann, Amelie Catherine, 2011. "Empirical Essays in Health and Education Economics," Munich Dissertations in Economics 13187, University of Munich, Department of Economics.
    6. Zelda Brutti & Fabio Sánchez, 2017. "Does Better Teacher Selection Lead to Better Students? Evidence from a Large Scale Reform in Colombia," Documentos CEDE 15350, Universidad de los Andes, Facultad de Economía, CEDE.
    7. Murphy, Richard & Weinhardt, Felix & Wyness, Gill, 2021. "Who teaches the teachers? A RCT of peer-to-peer observation and feedback in 181 schools," Economics of Education Review, Elsevier, vol. 82(C).
    8. Matthew A. Kraft & John P. Papay & Olivia L. Chi, 2020. "Teacher Skill Development: Evidence from Performance Ratings by Principals," Journal of Policy Analysis and Management, John Wiley & Sons, Ltd., vol. 39(2), pages 315-347, March.
    9. Justin L. Tobias & Mingliang Li, 2003. "A finite-sample hierarchical analysis of wage variation across public high schools: evidence from the NLSY and high school and beyond," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 18(3), pages 315-336.
    10. Stacy, Brian, 2014. "Ranking Teachers when Teacher Value-Added is Heterogeneous Across Students," EconStor Preprints 104743, ZBW - Leibniz Information Centre for Economics.
    11. Raf Van Gestel & Tobias Müller & Johan Bosmans, 2018. "Learning from failure in healthcare: Dynamic panel evidence of a physician shock effect," Health Economics, John Wiley & Sons, Ltd., vol. 27(9), pages 1340-1353, September.
    12. Papay, John P. & Kraft, Matthew A., 2015. "Productivity returns to experience in the teacher labor market: Methodological challenges and new evidence on long-term career improvement," Journal of Public Economics, Elsevier, vol. 130(C), pages 105-119.
    13. Anirudh Shingal & Malte Ehrich, 2019. "Trade effects of standards harmonization in the EU: improved access for non-EU partners," Indian Council for Research on International Economic Relations (ICRIER) Working Paper 372, Indian Council for Research on International Economic Relations (ICRIER), New Delhi, India.
    14. Rockoff, Jonah E. & Lockwood, Benjamin B., 2010. "Stuck in the middle: Impacts of grade configuration in public schools," Journal of Public Economics, Elsevier, vol. 94(11-12), pages 1051-1061, December.
    15. Araujo, Claudio & Araujo-Bonjean, Catherine & Brunelin, Stéphanie, 2012. "Alert at Maradi: Preventing Food Crises by Using Price Signals," World Development, Elsevier, vol. 40(9), pages 1882-1894.
    16. Oriana Bandiera & Valentino Larcinese & Imran Rasul, 2010. "Heterogeneous Class Size Effects: New Evidence from a Panel of University Students," Economic Journal, Royal Economic Society, vol. 120(549), pages 1365-1398, December.
    17. Vigren, Andreas, 2020. "The Distance Factor in Swedish Bus Contracts How far are operators willing to go?," Transportation Research Part A: Policy and Practice, Elsevier, vol. 132(C), pages 188-204.
    18. Albertazzi, Ugo & Fringuellotti, Fulvia & Ongena, Steven, 2024. "Fixed rate versus adjustable rate mortgages: Evidence from euro area banks," European Economic Review, Elsevier, vol. 161(C).
    19. Allison Atteberry & Susanna Loeb & James Wyckoff, 2013. "Do First Impressions Matter? Improvement in Early Career Teacher Effectiveness," NBER Working Papers 19096, National Bureau of Economic Research, Inc.
    20. Schumann, Martin & Severini, Thomas A. & Tripathi, Gautam, 2021. "Integrated likelihood based inference for nonlinear panel data models with unobserved effects," Journal of Econometrics, Elsevier, vol. 223(1), pages 73-95.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2404.04979. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.