IDEAS home Printed from https://ideas.repec.org/a/wly/amposc/v54y2010i1p229-247.html
   My bibliography  Save this article

A Method of Automated Nonparametric Content Analysis for Social Science

Author

Listed:
  • Daniel J. Hopkins
  • Gary King

Abstract

The increasing availability of digitized text presents enormous opportunities for social scientists. Yet hand coding many blogs, speeches, government records, newspapers, or other sources of unstructured text is infeasible. Although computer scientists have methods for automated content analysis, most are optimized to classify individual documents, whereas social scientists instead want generalizations about the population of documents, such as the proportion in a given category. Unfortunately, even a method with a high percent of individual documents correctly classified can be hugely biased when estimating category proportions. By directly optimizing for this social science goal, we develop a method that gives approximately unbiased estimates of category proportions even when the optimal classifier performs poorly. We illustrate with diverse data sets, including the daily expressed opinions of thousands of people about the U.S. presidency. We also make available software that implements our methods and large corpora of text for further analysis.

Suggested Citation

  • Daniel J. Hopkins & Gary King, 2010. "A Method of Automated Nonparametric Content Analysis for Social Science," American Journal of Political Science, John Wiley & Sons, vol. 54(1), pages 229-247, January.
  • Handle: RePEc:wly:amposc:v:54:y:2010:i:1:p:229-247
    DOI: 10.1111/j.1540-5907.2009.00428.x
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/j.1540-5907.2009.00428.x
    Download Restriction: no

    File URL: https://libkey.io/10.1111/j.1540-5907.2009.00428.x?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Simon, Adam F. & Xenos, Michael, 2004. "Dimensional Reduction of Word-Frequency Data as a Substitute for Intersubjective Content Analysis," Political Analysis, Cambridge University Press, vol. 12(1), pages 63-75, January.
    2. Laver, Michael & Benoit, Kenneth & Garry, John, 2003. "Extracting Policy Positions from Political Texts Using Words as Data," American Political Science Review, Cambridge University Press, vol. 97(2), pages 311-331, May.
    3. King, Gary & Lowe, Will, 2003. "An Automated Information Extraction Tool for International Conflict Data with Performance as Good as Human Coders: A Rare Events Evaluation Design," International Organization, Cambridge University Press, vol. 57(3), pages 617-642, July.
    4. King, Gary & Zeng, Langche, 2006. "The Dangers of Extreme Counterfactuals," Political Analysis, Cambridge University Press, vol. 14(2), pages 131-159, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Michael Scharkow, 2013. "Thematic content analysis using supervised machine learning: An empirical evaluation using German online news," Quality & Quantity: International Journal of Methodology, Springer, vol. 47(2), pages 761-773, February.
    2. Gianluca Vagnani & Michele Simoni, 2016. "Technological uncertainty, market orientation and firms? economic performance," MERCATI & COMPETITIVIT?, FrancoAngeli Editore, vol. 2016(2), pages 143-167.
    3. Kevin M. Quinn & Burt L. Monroe & Michael Colaresi & Michael H. Crespin & Dragomir R. Radev, 2010. "How to Analyze Political Attention with Minimal Assumptions and Costs," American Journal of Political Science, John Wiley & Sons, vol. 54(1), pages 209-228, January.
    4. Alan Brier & Bruno Hopp, 2011. "Computer assisted text analysis in the social sciences," Quality & Quantity: International Journal of Methodology, Springer, vol. 45(1), pages 103-128, January.
    5. Simon Hug & Tobias Schulz, 2007. "Referendums in the EU’s constitution building process," The Review of International Organizations, Springer, vol. 2(2), pages 177-218, June.
    6. Sheng, Yu & Xu, Xinpeng, 2019. "The productivity impact of climate change: Evidence from Australia's Millennium drought," Economic Modelling, Elsevier, vol. 76(C), pages 182-191.
    7. Desbordes, Rodolphe & Vicard, Vincent, 2009. "Foreign direct investment and bilateral investment treaties: An international political perspective," Journal of Comparative Economics, Elsevier, vol. 37(3), pages 372-386, September.
    8. Yang, Chao & Huang, Cui, 2022. "Quantitative mapping of the evolution of AI policy distribution, targets and focuses over three decades in China," Technological Forecasting and Social Change, Elsevier, vol. 174(C).
    9. William D. Berry & Jacqueline H. R. DeMeritt & Justin Esarey, 2010. "Testing for Interaction in Binary Logit and Probit Models: Is a Product Term Essential?," American Journal of Political Science, John Wiley & Sons, vol. 54(1), pages 248-266, January.
    10. Benjamin Lu & Eli Ben-Michael & Avi Feller & Luke Miratrix, 2023. "Is It Who You Are or Where You Are? Accounting for Compositional Differences in Cross-Site Treatment Effect Variation," Journal of Educational and Behavioral Statistics, , vol. 48(4), pages 420-453, August.
    11. Canzian, Giulia & Meroni, Elena Claudia & Santangelo, Giulia, 2023. "Evaluation of a Flemish Active Labour Market Policy in the framework of the European Social Fund. Results and challenges," Socio-Economic Planning Sciences, Elsevier, vol. 88(C).
    12. Goryunov, Alexander & Ageshina, Elena & Lavrentev, Igor & Peretyatko, Polina, 2023. "Estimating the effect of Russia’s development policy in the Far Eastern region: The synthetic control approach," Applied Econometrics, Russian Presidential Academy of National Economy and Public Administration (RANEPA), vol. 72, pages 58-72.
    13. Ralf Meinhardt & Sebastian Junge & Martin Weiss, 2018. "The organizational environment with its measures, antecedents, and consequences: a review and research agenda," Management Review Quarterly, Springer, vol. 68(2), pages 195-235, April.
    14. Torsten J. Selck, 2005. "Improving the Explanatory Power of Bargaining Models," Journal of Theoretical Politics, , vol. 17(3), pages 371-375, July.
    15. Bono, Pierre-Henri & David, Quentin & Desbordes, Rodolphe & Py, Loriane, 2022. "Metro infrastructure and metropolitan attractiveness," Regional Science and Urban Economics, Elsevier, vol. 93(C).
    16. Cory Koedel & Jiaxi Li & Matthew G. Springer & Li Tan, 2018. "Teacher Performance Ratings and Professional Improvement," Working Papers 1808, Department of Economics, University of Missouri.
    17. Eliasson, Kent, 2006. "The Role of Ability in Estimating the Returns to College Choice: New Swedish Evidence," Umeå Economic Studies 691, Umeå University, Department of Economics.
    18. Huang, Cui & Yang, Chao & Su, Jun, 2021. "Identifying core policy instruments based on structural holes: A case study of China’s nuclear energy policy," Journal of Informetrics, Elsevier, vol. 15(2).
    19. Sarel, Roee & Demirtas, Melanie, 2021. "Delegation in a multi-tier court system: Are remands in the U.S. federal courts driven by moral hazard?," European Journal of Political Economy, Elsevier, vol. 68(C).
    20. Yu, Feifei & Wang, Liting & Li, Xiaotong, 2020. "The effects of government subsidies on new energy vehicle enterprises: The moderating role of intelligent transformation," Energy Policy, Elsevier, vol. 141(C).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:wly:amposc:v:54:y:2010:i:1:p:229-247. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://doi.org/10.1111/(ISSN)1540-5907 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.