IDEAS home Printed from https://ideas.repec.org/p/osf/socarx/u8spj_v1.html
   My bibliography  Save this paper

Improving metadata infrastructure for complex surveys: Insights from the Fragile Families Challenge

Author

Listed:
  • Kindel, Alexander

    (Princeton University)

  • Bansal, Vineet
  • Catena, Kristin
  • Hartshorne, Thomas
  • Jaeger, Kate
  • Koffman, Dawn
  • McLanahan, Sara
  • Phillips, Maya
  • Rouhani, Shiva
  • Vinh, Ryan

Abstract

Researchers rely on metadata systems to prepare data for analysis. As the complexity of datasets increases and the breadth of data analysis practices grow, existing metadata systems can limit the efficiency and quality of data preparation. This article describes the redesign of a metadata system supporting the Fragile Families and Child Wellbeing Study based on the experiences of participants in the Fragile Families Challenge. We demonstrate how treating metadata as data—that is, releasing comprehensive information about variables in a format amenable to both automated and manual processing—can make the task of data preparation less arduous and less error-prone for all types of data analysis. We hope that our work will facilitate new applications of machine learning methods to longitudinal surveys and inspire research on data preparation in the social sciences. We have open-sourced the tools we created so that others can use and improve them.

Suggested Citation

  • Kindel, Alexander & Bansal, Vineet & Catena, Kristin & Hartshorne, Thomas & Jaeger, Kate & Koffman, Dawn & McLanahan, Sara & Phillips, Maya & Rouhani, Shiva & Vinh, Ryan, 2018. "Improving metadata infrastructure for complex surveys: Insights from the Fragile Families Challenge," SocArXiv u8spj_v1, Center for Open Science.
  • Handle: RePEc:osf:socarx:u8spj_v1
    DOI: 10.31219/osf.io/u8spj_v1
    as

    Download full text from publisher

    File URL: https://osf.io/download/5ba1348b57c3e4001b3a43c7/
    Download Restriction: no

    File URL: https://libkey.io/10.31219/osf.io/u8spj_v1?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Reichman, Nancy E. & Teitler, Julien O. & Garfinkel, Irwin & McLanahan, Sara S., 2001. "Fragile Families: sample and design," Children and Youth Services Review, Elsevier, vol. 23(4-5), pages 303-326.
    2. Sendhil Mullainathan & Jann Spiess, 2017. "Machine Learning: An Applied Econometric Approach," Journal of Economic Perspectives, American Economic Association, vol. 31(2), pages 87-106, Spring.
    3. Sandrah Eckel & Roger Peng, 2009. "Interacting with local and remote data repositories using the stashR package," Computational Statistics, Springer, vol. 24(2), pages 247-254, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Kindel, Alexander & Bansal, Vineet & Catena, Kristin & Hartshorne, Thomas & Jaeger, Kate & Koffman, Dawn & McLanahan, Sara & Phillips, Maya & Rouhani, Shiva & Vinh, Ryan, 2018. "Improving metadata infrastructure for complex surveys: Insights from the Fragile Families Challenge," SocArXiv u8spj, Center for Open Science.
    2. Alexander Kindel & Vineet Bansal & Kristin Catena & Thomas Hartshorne & Kate Jaeger, 2018. "Improving metadata infrastructure for complex surveys: 
Insights from the Fragile Families Challenge," Working Papers wp18-10-ff, Princeton University, School of Public and International Affairs, Center for Research on Child Wellbeing..
    3. Ian Lundberg & Arvind Narayanan & Karen Levy & Matthew Salganik, 2018. "Privacy, ethics, and data access: A case study of the Fragile Families Challenge," Working Papers wp18-09-ff, Princeton University, School of Public and International Affairs, Center for Research on Child Wellbeing..
    4. Allison Dwyer Emory, 2019. "Unintended Consequences: Protective State Policies and the Employment of Fathers with Criminal Records," Working Papers wp19-04-ff, Princeton University, School of Public and International Affairs, Center for Research on Child Wellbeing..
    5. Julia S. Goldberg, 2011. "Identity Salience and Involvement among Resident and Nonresident Fathers," Working Papers 1323, Princeton University, School of Public and International Affairs, Center for Research on Child Wellbeing..
    6. Juergen Deppner & Marcelo Cajias, 2024. "Accounting for Spatial Autocorrelation in Algorithm-Driven Hedonic Models: A Spatial Cross-Validation Approach," The Journal of Real Estate Finance and Economics, Springer, vol. 68(2), pages 235-273, February.
    7. Naguib, Costanza, 2019. "Estimating the Heterogeneous Impact of the Free Movement of Persons on Relative Wage Mobility," Economics Working Paper Series 1903, University of St. Gallen, School of Economics and Political Science.
    8. Philippe Goulet Coulombe & Maxime Leroux & Dalibor Stevanovic & Stéphane Surprenant, 2022. "How is machine learning useful for macroeconomic forecasting?," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 37(5), pages 920-964, August.
    9. McGovern, Mark E. & Rokicki, Slawa & Reichman, Nancy E., 2022. "Maternal depression and economic well-being: A quasi-experimental approach," Social Science & Medicine, Elsevier, vol. 305(C).
    10. Thiemo Fetzer & Stephan Kyburz, 2024. "Cohesive Institutions and Political Violence," The Review of Economics and Statistics, MIT Press, vol. 106(1), pages 133-150, January.
    11. Dang, Hai-Anh & Carletto, Calogero & Gourlay, Sydney & Abanokova, Kseniya, 2024. "Addressing Soil Quality Data Gaps with Imputation: Evidence from Ethiopia and Uganda," GLO Discussion Paper Series 1445, Global Labor Organization (GLO).
    12. Tobias Götze & Marc Gürtler & Eileen Witowski, 2020. "Improving CAT bond pricing models via machine learning," Journal of Asset Management, Palgrave Macmillan, vol. 21(5), pages 428-446, September.
    13. Kelly Noonan & Nancy E. Reichman & Hope Corman & Dhaval Dave, 2007. "Prenatal drug use and the production of infant health," Health Economics, John Wiley & Sons, Ltd., vol. 16(4), pages 361-384, April.
    14. Sascha O. Becker & Thiemo Fetzer, 2018. "Has Eastern European Migration Impacted UK-born Workers?," CAGE Online Working Paper Series 376, Competitive Advantage in the Global Economy (CAGE).
    15. Arthur Charpentier & Emmanuel Flachaire & Antoine Ly, 2017. "Econom\'etrie et Machine Learning," Papers 1708.06992, arXiv.org, revised Mar 2018.
    16. Crespo, Cristian, 2020. "Two become one: improving the targeting of conditional cash transfers with a predictive model of school dropout," LSE Research Online Documents on Economics 123139, London School of Economics and Political Science, LSE Library.
    17. Ioanna Arkoudi & Carlos Lima Azevedo & Francisco C. Pereira, 2021. "Combining Discrete Choice Models and Neural Networks through Embeddings: Formulation, Interpretability and Performance," Papers 2109.12042, arXiv.org, revised Sep 2021.
    18. Gharad Bryan & Dean Karlan & Adam Osman, 2024. "Big Loans to Small Businesses: Predicting Winners and Losers in an Entrepreneurial Lending Experiment," American Economic Review, American Economic Association, vol. 114(9), pages 2825-2860, September.
    19. Yucheng Yang & Zhong Zheng & Weinan E, 2020. "Interpretable Neural Networks for Panel Data Analysis in Economics," Papers 2010.05311, arXiv.org, revised Nov 2020.
    20. Erik Heilmann & Janosch Henze & Heike Wetzel, 2021. "Machine learning in energy forecasts with an application to high frequency electricity consumption data," MAGKS Papers on Economics 202135, Philipps-Universität Marburg, Faculty of Business Administration and Economics, Department of Economics (Volkswirtschaftliche Abteilung).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:osf:socarx:u8spj_v1. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: OSF (email available below). General contact details of provider: https://arabixiv.org .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.