IDEAS home Printed from https://ideas.repec.org/p/osf/osfxxx/pt3q9_v1.html
   My bibliography  Save this paper

Automated coding using machine-learning and remapping the U.S. nonprofit sector: A guide and benchmark

Author

Listed:
  • Ma, Ji

    (The University of Texas at Austin)

Abstract

This research developed a machine-learning classifier that reliably automates the coding process using the National Taxonomy of Exempt Entities as a schema and remapped the U.S. nonprofit sector. I achieved 90% overall accuracy for classifying the nonprofits into nine broad categories and 88% for classifying them into 25 major groups. The intercoder reliabilities between algorithms and human coders measured by kappa statistics are in the "almost perfect" range of 0.80--1.00. The results suggest that a state-of-the-art machine-learning algorithm can approximate human coders and substantially improve researchers' productivity. I also reassigned multiple category codes to over 439 thousand nonprofits and discovered a considerable amount of organizational activities that were previously ignored. The classifier is an essential methodological prerequisite for large-N and Big Data analyses, and the remapped U.S. nonprofit sector can serve as an important instrument for asking or reexamining fundamental questions of nonprofit studies. The working directory with all data sets, source codes, and historical versions are available on GitHub (https://github.com/ma-ji/npo_classifier).

Suggested Citation

  • Ma, Ji, 2020. "Automated coding using machine-learning and remapping the U.S. nonprofit sector: A guide and benchmark," OSF Preprints pt3q9_v1, Center for Open Science.
  • Handle: RePEc:osf:osfxxx:pt3q9_v1
    DOI: 10.31219/osf.io/pt3q9_v1
    as

    Download full text from publisher

    File URL: https://osf.io/download/5f812ab01f65a5025eed0a80/
    Download Restriction: no

    File URL: https://libkey.io/10.31219/osf.io/pt3q9_v1?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Salminen, Joni & Yoganathan, Vignesh & Corporan, Juan & Jansen, Bernard J. & Jung, Soon-Gyo, 2019. "Machine learning approach to auto-tagging online content for content marketing efficiency: A comparative analysis between methods and content type," Journal of Business Research, Elsevier, vol. 101(C), pages 203-217.
    2. Grimmer, Justin & Stewart, Brandon M., 2013. "Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts," Political Analysis, Cambridge University Press, vol. 21(3), pages 267-297, July.
    3. Vakil, Anna C., 1997. "Confronting the classification problem: Toward a taxonomy of NGOs," World Development, Elsevier, vol. 25(12), pages 2057-2070, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Bernhardt, Lea & Dewenter, Ralf & Thomas, Tobias, 2023. "Measuring partisan media bias in US newscasts from 2001 to 2012," European Journal of Political Economy, Elsevier, vol. 78(C).
    2. Ntentas, Raphael, 2021. "Quantifying political populism and examining the link with economic insecurity: evidence from Greece," LSE Research Online Documents on Economics 112579, London School of Economics and Political Science, LSE Library.
    3. Lin, Annie E. & Young, Jimmy A. & Guarino, Jeannine E., 2022. "Mother-Daughter sexual abuse: An exploratory study of the experiences of survivors of MDSA using Reddit," Children and Youth Services Review, Elsevier, vol. 138(C).
    4. Kate Gooding & James N Newell & Nick Emmel, 2018. "Capacity to conduct health research among NGOs in Malawi: Diverse strengths, needs and opportunities for development," PLOS ONE, Public Library of Science, vol. 13(7), pages 1-19, July.
    5. Rybinski, Krzysztof, 2020. "The forecasting power of the multi-language narrative of sell-side research: A machine learning evaluation," Finance Research Letters, Elsevier, vol. 34(C).
    6. Rauh, Christian, 2015. "Communicating supranational governance? The salience of EU affairs in the German Bundestag, 1991–2013," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 16(1), pages 116-138.
    7. Grajzl, Peter & Murrell, Peter, 2021. "A machine-learning history of English caselaw and legal ideas prior to the Industrial Revolution I: generating and interpreting the estimates," Journal of Institutional Economics, Cambridge University Press, vol. 17(1), pages 1-19, February.
    8. David Bholat & Stephen Hans & Pedro Santos & Cheryl Schonhardt-Bailey, 2015. "Text mining for central banks," Handbooks, Centre for Central Banking Studies, Bank of England, number 33, April.
    9. Julia Seiermann, 2018. "Only Words? How Power in Trade Agreement Texts Affects International Trade Flows," UNCTAD Blue Series Papers 80, United Nations Conference on Trade and Development.
    10. Boomsma, Roel & O'Dwyer, Brendan, 2019. "Constituting the governable NGO: The correlation between conduct and counter-conduct in the evolution of funder-NGO accountability relations," Accounting, Organizations and Society, Elsevier, vol. 72(C), pages 1-20.
    11. Sami Diaf & Jörg Döpke & Ulrich Fritsche & Ida Rockenbach, 2020. "Sharks and minnows in a shoal of words: Measuring latent ideological positions of German economic research institutes based on text mining techniques," Macroeconomics and Finance Series 202001, University of Hamburg, Department of Socioeconomics.
    12. Muhammad Ateeq ur REHMAN & Furman ALI & Shang XIE, 2022. "Impact of Foreign Investment News on the Return, Cost of Equity and Cash Flow Activities," Journal for Economic Forecasting, Institute for Economic Forecasting, vol. 0(4), pages 112-127, December.
    13. Dehler-Holland, Joris & Schumacher, Kira & Fichtner, Wolf, 2021. "Topic Modeling Uncovers Shifts in Media Framing of the German Renewable Energy Act," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 2(1).
    14. Weiss, Max & Zoorob, Michael, 2021. "Political frames of public health crises: Discussing the opioid epidemic in the US Congress," Social Science & Medicine, Elsevier, vol. 281(C).
    15. Maschke, Andreas, 2024. "Talking exports: The representation of Germany's current account in newspaper media," MPIfG Discussion Paper 24/1, Max Planck Institute for the Study of Societies.
    16. Arthur Dyevre & Nicolas Lampach, 2021. "Issue attention on international courts: Evidence from the European Court of Justice," The Review of International Organizations, Springer, vol. 16(4), pages 793-815, October.
    17. Dewenter, Ralf & Dulleck, Uwe & Thomas, Tobias, 2018. "The political coverage index and its application to government capture," Research Papers 6, EcoAustria – Institute for Economic Research.
    18. Pastwa, Anna M. & Shrestha, Prabal & Thewissen, James & Torsin, Wouter, 2021. "Unpacking the black box of ICO white papers: a topic modeling approach," LIDAM Discussion Papers LFIN 2021018, Université catholique de Louvain, Louvain Finance (LFIN).
    19. Maksym Polyakov & Morteza Chalak & Md. Sayed Iftekhar & Ram Pandit & Sorada Tapsuwan & Fan Zhang & Chunbo Ma, 2018. "Authorship, Collaboration, Topics, and Research Gaps in Environmental and Resource Economics 1991–2015," Environmental & Resource Economics, Springer;European Association of Environmental and Resource Economists, vol. 71(1), pages 217-239, September.
    20. Parijat Chakrabarti & Margaret Frye, 2017. "A mixed-methods framework for analyzing text data: Integrating computational techniques with qualitative methods in demography," Demographic Research, Max Planck Institute for Demographic Research, Rostock, Germany, vol. 37(42), pages 1351-1382.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:osf:osfxxx:pt3q9_v1. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: OSF (email available below). General contact details of provider: https://osf.io/preprints/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.