A comparison of approaches for imbalanced classification problems in the context of retrieving relevant documents for an analysis
Author
Abstract
Suggested Citation
DOI: 10.1007/s42001-022-00191-7
Download full text from publisher
As the access to this document is restricted, you may want to search for a different version of it.
References listed on IDEAS
- Baerg, Nicole & Lowe, Will, 2020. "A textual Taylor rule: estimating central bank preferences combining topic and scaling methods," Political Science Research and Methods, Cambridge University Press, vol. 8(1), pages 106-122, January.
- Mikhaylov, Slava & Laver, Michael & Benoit, Kenneth R., 2012. "Coder Reliability and Misclassification in the Human Coding of Party Manifestos," Political Analysis, Cambridge University Press, vol. 20(1), pages 78-91, January.
- van Atteveldt, Wouter & Sheafer, Tamir & Shenhav, Shaul R. & Fogel-Dror, Yair, 2017. "Clause Analysis: Using Syntactic Information to Automatically Extract Source, Subject, and Predicate from Texts with an Application to the 2008–2009 Gaza War," Political Analysis, Cambridge University Press, vol. 25(2), pages 207-222, April.
- Ennser-Jedenastik, Laurenz & Meyer, Thomas M., 2018. "The Impact of Party Cues on Manual Coding of Political Texts," Political Science Research and Methods, Cambridge University Press, vol. 6(3), pages 625-633, July.
- Grün, Bettina & Hornik, Kurt, 2011. "topicmodels: An R Package for Fitting Topic Models," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 40(i13).
- D'Orazio, Vito & Landis, Steven T. & Palmer, Glenn & Schrodt, Philip, 2014. "Separating the Wheat from the Chaff: Applications of Automated Document Classification Using Support Vector Machines," Political Analysis, Cambridge University Press, vol. 22(2), pages 224-242, April.
- Gary King & Patrick Lam & Margaret E. Roberts, 2017. "Computer‐Assisted Keyword and Document Set Discovery from Unstructured Text," American Journal of Political Science, John Wiley & Sons, vol. 61(4), pages 971-988, October.
- Margaret E. Roberts & Brandon M. Stewart & Edoardo M. Airoldi, 2016. "A Model of Text for Experimentation in the Social Sciences," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(515), pages 988-1003, July.
- Nicholas Beauchamp, 2017. "Predicting and Interpolating State‐Level Polls Using Twitter Textual Data," American Journal of Political Science, John Wiley & Sons, vol. 61(2), pages 490-503, April.
- Bes, Bart Joachim & Schoonvelde, Martijn & Rauh, Christian, 2020. "Undermining, defusing or defending European integration? Assessing public communication of European executives in times of EU politicisation," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 59(2), pages 397-423.
- King, Gary & Pan, Jennifer & Roberts, Margaret E., 2013. "How Censorship in China Allows Government Criticism but Silences Collective Expression," American Political Science Review, Cambridge University Press, vol. 107(2), pages 326-343, May.
- Margaret E. Roberts & Brandon M. Stewart & Dustin Tingley & Christopher Lucas & Jetson Leder‐Luis & Shana Kushner Gadarian & Bethany Albertson & David G. Rand, 2014. "Structural Topic Models for Open‐Ended Survey Responses," American Journal of Political Science, John Wiley & Sons, vol. 58(4), pages 1064-1082, October.
- Joshua Uyheng & Kathleen M. Carley, 2020. "Bots and online hate during the COVID-19 pandemic: case studies in the United States and the Philippines," Journal of Computational Social Science, Springer, vol. 3(2), pages 445-468, November.
- Miller, Blake & Linder, Fridolin & Mebane, Walter R., 2020. "Active Learning Approaches for Labeling Text: Review and Assessment of the Performance of Active Learning Approaches," Political Analysis, Cambridge University Press, vol. 28(4), pages 532-551, October.
- Kevin M. Quinn & Burt L. Monroe & Michael Colaresi & Michael H. Crespin & Dragomir R. Radev, 2010. "How to Analyze Political Attention with Minimal Assumptions and Costs," American Journal of Political Science, John Wiley & Sons, vol. 54(1), pages 209-228, January.
- Muchlinski, David & Yang, Xiao & Birch, Sarah & Macdonald, Craig & Ounis, Iadh, 2021. "We need to go deeper: measuring electoral violence using convolutional neural networks and social media," Political Science Research and Methods, Cambridge University Press, vol. 9(1), pages 122-139, January.
- Katagiri, Azusa & Min, Eric, 2019. "The Credibility of Public and Private Signals: A Document-Based Approach," American Political Science Review, Cambridge University Press, vol. 113(1), pages 156-172, February.
- Justin Grimmer, 2013. "Appropriators not Position Takers: The Distorting Effects of Electoral Incentives on Congressional Representation," American Journal of Political Science, John Wiley & Sons, vol. 57(3), pages 624-642, July.
Most related items
These are the items that most often cite the same works as this one and are cited by the same works as this one.- Dehler-Holland, Joris & Okoh, Marvin & Keles, Dogan, 2022.
"Assessing technology legitimacy with topic models and sentiment analysis – The case of wind power in Germany,"
Technological Forecasting and Social Change, Elsevier, vol. 175(C).
- Dehler-Holland, Joris & Okoh, Marvin & Keles, Dogan, 2021. "The legitimacy of wind power in Germany," Working Paper Series in Production and Energy 54, Karlsruhe Institute of Technology (KIT), Institute for Industrial Production (IIP).
- Mourtgos, Scott M. & Adams, Ian T., 2019. "The rhetoric of de-policing: Evaluating open-ended survey responses from police officers with machine learning-based structural topic modeling," Journal of Criminal Justice, Elsevier, vol. 64(C), pages 1-1.
- Sanders, James & Lisi, Giulio & Schonhardt-Bailey, Cheryl, 2018. "Themes and topics in parliamentary oversight hearings: a new direction in textual data analysis," LSE Research Online Documents on Economics 87624, London School of Economics and Political Science, LSE Library.
- Dehler-Holland, Joris & Schumacher, Kira & Fichtner, Wolf, 2021. "Topic Modeling Uncovers Shifts in Media Framing of the German Renewable Energy Act," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 2(1).
- Zhang, Han, 2021. "How Using Machine Learning Classification as a Variable in Regression Leads to Attenuation Bias and What to Do About It," SocArXiv 453jk, Center for Open Science.
- Sumeet Sahay & Hemant Kumar Kaushik & Shikha Singh, 2023. "Discovering themes and trends in electricity supply chain area research," OPSEARCH, Springer;Operational Research Society of India, vol. 60(3), pages 1525-1560, September.
- McCannon, Bryan & Zhou, Yang & Hall, Joshua, 2021. "Measuring a Contract’s Breadth: A Text Analysis," Working Papers 11013, George Mason University, Mercatus Center.
- Marcel Fratzscher & Tobias Heidland & Lukas Menkhoff & Lucio Sarno & Maik Schmeling, 2023.
"Foreign Exchange Intervention: A New Database,"
IMF Economic Review, Palgrave Macmillan;International Monetary Fund, vol. 71(4), pages 852-884, December.
- Fratzscher, Marcel & Heidland, Tobias & Menkhoff, Lukas & Sarno, Lucio & Schmeling, Maik, 2020. "Foreign exchange intervention: A new database," Kiel Working Papers 2171, Kiel Institute for the World Economy (IfW Kiel).
- Fratzscher, Marcel & Heidland, Tobias & Menkhoff, Lukas & Sarno, Lucio & Schmeling, Maik, 2022. "Foreign exchange intervention: A new database," CEPR Discussion Papers 17558, C.E.P.R. Discussion Papers.
- Marcel Fratzscher & Tobias Heidland & Lukas Menkhoff & Lucio Sarno & Maik Schmeling, 2020. "Foreign Exchange Intervention: A New Database," Discussion Papers of DIW Berlin 1915, DIW Berlin, German Institute for Economic Research.
- Li Tang & Jennifer Kuzma & Xi Zhang & Xinyu Song & Yin Li & Hongxu Liu & Guangyuan Hu, 2023. "Synthetic biology and governance research in China: a 40-year evolution," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(9), pages 5293-5310, September.
- Mohamed M. Mostafa, 2023. "A one-hundred-year structural topic modeling analysis of the knowledge structure of international management research," Quality & Quantity: International Journal of Methodology, Springer, vol. 57(4), pages 3905-3935, August.
- Ferrara, Federico M. & Masciandaro, Donato & Moschella, Manuela & Romelli, Davide, 2022.
"Political voice on monetary policy: Evidence from the parliamentary hearings of the European Central Bank,"
European Journal of Political Economy, Elsevier, vol. 74(C).
- Federico M. Ferrara & Donato Masciandaro & Manuela Moschella & Davide Romelli, 2021. "Political Voice on Monetary Policy: Evidence from the Parliamentary Hearings of the European Central Bank," BAFFI CAREFIN Working Papers 21159, BAFFI CAREFIN, Centre for Applied Research on International Markets Banking Finance and Regulation, Universita' Bocconi, Milano, Italy.
- Ferrara, Federico M. & Masciandaro, Donato & Moschella, Manuela & Romelli, Davide, 2022. "Political voice on monetary policy: evidence from the parliamentary hearings of the European Central Bank," LSE Research Online Documents on Economics 114278, London School of Economics and Political Science, LSE Library.
- Camilla Salvatore & Silvia Biffignandi & Annamaria Bianchi, 2022. "Corporate Social Responsibility Activities Through Twitter: From Topic Model Analysis to Indexes Measuring Communication Characteristics," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 164(3), pages 1217-1248, December.
- Lüdering Jochen & Winker Peter, 2016.
"Forward or Backward Looking? The Economic Discourse and the Observed Reality,"
Journal of Economics and Statistics (Jahrbuecher fuer Nationaloekonomie und Statistik), De Gruyter, vol. 236(4), pages 483-515, August.
- Lüdering Jochen & Winker Peter, 2016. "Forward or Backward Looking? The Economic Discourse and the Observed Reality," Journal of Economics and Statistics (Jahrbuecher fuer Nationaloekonomie und Statistik), De Gruyter, vol. 236(4), pages 483-515, August.
- Lüdering Jochen & Winker Peter, 2016. "Forward or Backward Looking? The Economic Discourse and the Observed Reality," Journal of Economics and Statistics (Jahrbuecher fuer Nationaloekonomie und Statistik), De Gruyter, vol. 236(4), pages 483-515, August.
- Jochen Lüdering & Peter Winker, 2016. "Forward or Backward Looking? The Economic Discourse and the Observed Reality," MAGKS Papers on Economics 201607, Philipps-Universität Marburg, Faculty of Business Administration and Economics, Department of Economics (Volkswirtschaftliche Abteilung).
- Andreas Rehs, 2020. "A structural topic model approach to scientific reorientation of economics and chemistry after German reunification," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(2), pages 1229-1251, November.
- Ulrich Fritsche & Johannes Puckelwald, 2018. "Deciphering Professional Forecasters’ Stories - Analyzing a Corpus of Textual Predictions for the German Economy," Macroeconomics and Finance Series 201804, University of Hamburg, Department of Socioeconomics.
- Lino Wehrheim, 2019. "Economic history goes digital: topic modeling the Journal of Economic History," Cliometrica, Springer;Cliometric Society (Association Francaise de Cliométrie), vol. 13(1), pages 83-125, January.
- Szymon Sacher & Laura Battaglia & Stephen Hansen, 2021. "Hamiltonian Monte Carlo for Regression with High-Dimensional Categorical Data," Papers 2107.08112, arXiv.org, revised Feb 2024.
- Peter Grajzl & Cindy Irby, 2019. "Reflections on study abroad: a computational linguistics approach," Journal of Computational Social Science, Springer, vol. 2(2), pages 151-181, July.
- Xieling Chen & Juan Chen & Gary Cheng & Tao Gong, 2020. "Topics and trends in artificial intelligence assisted human brain research," PLOS ONE, Public Library of Science, vol. 15(4), pages 1-27, April.
- Lehotský, Lukáš & Černoch, Filip & Osička, Jan & Ocelík, Petr, 2019. "When climate change is missing: Media discourse on coal mining in the Czech Republic," Energy Policy, Elsevier, vol. 129(C), pages 774-786.
More about this item
Keywords
Imbalanced classification; Boolean query; Keyword lists; Query expansion; Topic models; Active learning;All these keywords.
Statistics
Access and download statisticsCorrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:jcsosc:v:6:y:2023:i:1:d:10.1007_s42001-022-00191-7. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .
Please note that corrections may take a couple of weeks to filter through the various RePEc services.