IDEAS home Printed from https://ideas.repec.org/p/zbw/irtgdp/2018054.html
   My bibliography  Save this paper

Topic Modeling for Analyzing Open-Ended Survey Responses

Author

Listed:
  • Pietsch, Andra-Selina
  • Lessmann, Stefan

Abstract

Open-ended responses are widely used in market research studies. Processing of such responses requires labor-intensive human coding. This paper focuses on unsupervised topic models and tests their ability to automate the analysis of open-ended responses. Since state-of-the-art topic models struggle with the shortness of open-ended responses, the paper considers three novel short text topic models: Latent Feature Latent Dirichlet Allocation, Biterm Topic Model and Word Network Topic Model. The models are fitted and evaluated on a set of realworld open-ended responses provided by a market research company. Multiple components such as topic coherence and document classification are quantitatively and qualitatively evaluated to appraise whether topic models can replace human coding. The results suggest that topic models are a viable alternative for open-ended response coding. However, their usefulness is limited when a correct one-to-one mapping of responses and topics or the exact topic distribution is needed.

Suggested Citation

  • Pietsch, Andra-Selina & Lessmann, Stefan, 2018. "Topic Modeling for Analyzing Open-Ended Survey Responses," IRTG 1792 Discussion Papers 2018-054, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
  • Handle: RePEc:zbw:irtgdp:2018054
    as

    Download full text from publisher

    File URL: https://www.econstor.eu/bitstream/10419/230765/1/irtg1792dp2018-054.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Grün, Bettina & Hornik, Kurt, 2011. "topicmodels: An R Package for Fitting Topic Models," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 40(i13).
    2. Margaret E. Roberts & Brandon M. Stewart & Dustin Tingley & Christopher Lucas & Jetson Leder‐Luis & Shana Kushner Gadarian & Bethany Albertson & David G. Rand, 2014. "Structural Topic Models for Open‐Ended Survey Responses," American Journal of Political Science, John Wiley & Sons, vol. 58(4), pages 1064-1082, October.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Gaurav, Kumar & Ghosh, Sayantari & Bhattacharya, Saumik & Singh, Yatindra Nath, 2019. "Ensuring the Spread of Referral Marketing Campaigns: A Quantitative Treatment," SocArXiv 6spnr, Center for Open Science.
    2. Evgeny Nikulchev & Dmitry Ilin & Anastasiya Silaeva & Pavel Kolyasnikov & Vladimir Belov & Andrey Runtov & Pavel Pushkin & Nikolay Laptev & Anna Alexeenko & Shamil Magomedov & Alexander Kosenkov & Ily, 2020. "Digital Psychological Platform for Mass Web-Surveys," Data, MDPI, vol. 5(4), pages 1-16, October.
    3. Yen, Ju-Chun & Wang, Tawei, 2021. "Stock price relevance of voluntary disclosures about blockchain technology and cryptocurrencies," International Journal of Accounting Information Systems, Elsevier, vol. 40(C).
    4. Ziwen Liu & Scott Allan Orr & Pakhee Kumar & Josep Grau-Bove, 2023. "Measuring the impact of COVID-19 on heritage sites in the UK using social media data," Palgrave Communications, Palgrave Macmillan, vol. 10(1), pages 1-13, December.
    5. Tobias Wekhof & Sébastien Houde, 2023. "Using narratives to infer preferences in understanding the energy efficiency gap," Nature Energy, Nature, vol. 8(9), pages 965-977, September.
    6. Valter Martins Vairinhos & Luís Agonia Pereira & Florinda Matos & Helena Nunes & Carmen Patino & Purificación Galindo-Villardón, 2022. "Framework for Classroom Student Grading with Open-Ended Questions: A Text-Mining Approach," Mathematics, MDPI, vol. 10(21), pages 1-20, November.
    7. Yatracos, Yannis G., 2018. "Residual'S Influence Index (Rinfin), Bad Leverage And Unmasking In High Dimensional L2-Regression," IRTG 1792 Discussion Papers 2018-060, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Sandra Wankmüller, 2023. "A comparison of approaches for imbalanced classification problems in the context of retrieving relevant documents for an analysis," Journal of Computational Social Science, Springer, vol. 6(1), pages 91-163, April.
    2. Ulrich Fritsche & Johannes Puckelwald, 2018. "Deciphering Professional Forecasters’ Stories - Analyzing a Corpus of Textual Predictions for the German Economy," Macroeconomics and Finance Series 201804, University of Hamburg, Department of Socioeconomics.
    3. Savin, Ivan & Ott, Ingrid & Konop, Chris, 2022. "Tracing the evolution of service robotics: Insights from a topic modeling approach," Technological Forecasting and Social Change, Elsevier, vol. 174(C).
    4. Imran Ali & Devika Kannan, 2022. "Mapping research on healthcare operations and supply chain management: a topic modelling-based literature review," Annals of Operations Research, Springer, vol. 315(1), pages 29-55, August.
    5. Sumeet Sahay & Hemant Kumar Kaushik & Shikha Singh, 2023. "Discovering themes and trends in electricity supply chain area research," OPSEARCH, Springer;Operational Research Society of India, vol. 60(3), pages 1525-1560, September.
    6. Sanders, James & Lisi, Giulio & Schonhardt-Bailey, Cheryl, 2018. "Themes and topics in parliamentary oversight hearings: a new direction in textual data analysis," LSE Research Online Documents on Economics 87624, London School of Economics and Political Science, LSE Library.
    7. Minchul Lee & Min Song, 2020. "Incorporating citation impact into analysis of research trends," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(2), pages 1191-1224, August.
    8. Grajzl, Peter & Murrell, Peter, 2021. "A machine-learning history of English caselaw and legal ideas prior to the Industrial Revolution I: generating and interpreting the estimates," Journal of Institutional Economics, Cambridge University Press, vol. 17(1), pages 1-19, February.
    9. Arsenyan, Jbid & Mirowska, Agata & Piepenbrink, Anke, 2023. "Close encounters with the virtual kind: Defining a human-virtual agent coexistence framework," Technological Forecasting and Social Change, Elsevier, vol. 193(C).
    10. Hong Joo Lee & Hoyeon Oh, 2020. "A Study on the Deduction and Diffusion of Promising Artificial Intelligence Technology for Sustainable Industrial Development," Sustainability, MDPI, vol. 12(14), pages 1-15, July.
    11. Dehler-Holland, Joris & Schumacher, Kira & Fichtner, Wolf, 2021. "Topic Modeling Uncovers Shifts in Media Framing of the German Renewable Energy Act," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 2(1).
    12. Marcel Fratzscher & Tobias Heidland & Lukas Menkhoff & Lucio Sarno & Maik Schmeling, 2023. "Foreign Exchange Intervention: A New Database," IMF Economic Review, Palgrave Macmillan;International Monetary Fund, vol. 71(4), pages 852-884, December.
    13. Bokyong Shin & Chaitawat Boonjubun, 2021. "Media and the Meanings of Land: A South Korean Case Study," American Journal of Economics and Sociology, Wiley Blackwell, vol. 80(2), pages 381-425, March.
    14. Maksym Polyakov & Morteza Chalak & Md. Sayed Iftekhar & Ram Pandit & Sorada Tapsuwan & Fan Zhang & Chunbo Ma, 2018. "Authorship, Collaboration, Topics, and Research Gaps in Environmental and Resource Economics 1991–2015," Environmental & Resource Economics, Springer;European Association of Environmental and Resource Economists, vol. 71(1), pages 217-239, September.
    15. Stefano Sbalchiero & Maciej Eder, 2020. "Topic modeling, long texts and the best number of topics. Some Problems and solutions," Quality & Quantity: International Journal of Methodology, Springer, vol. 54(4), pages 1095-1108, August.
    16. Parijat Chakrabarti & Margaret Frye, 2017. "A mixed-methods framework for analyzing text data: Integrating computational techniques with qualitative methods in demography," Demographic Research, Max Planck Institute for Demographic Research, Rostock, Germany, vol. 37(42), pages 1351-1382.
    17. Li Tang & Jennifer Kuzma & Xi Zhang & Xinyu Song & Yin Li & Hongxu Liu & Guangyuan Hu, 2023. "Synthetic biology and governance research in China: a 40-year evolution," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(9), pages 5293-5310, September.
    18. Martin Baumgaertner & Johannes Zahner, 2021. "Whatever it takes to understand a central banker - Embedding their words using neural networks," MAGKS Papers on Economics 202130, Philipps-Universität Marburg, Faculty of Business Administration and Economics, Department of Economics (Volkswirtschaftliche Abteilung).
    19. Benjamin E. Bagozzi & Daniel Berliner & Zack W. Almquist, 2021. "When does open government shut? Predicting government responses to citizen information requests," Regulation & Governance, John Wiley & Sons, vol. 15(2), pages 280-297, April.
    20. Han, Chunjia & Yang, Mu & Piterou, Athena, 2021. "Do news media and citizens have the same agenda on COVID-19? an empirical comparison of twitter posts," Technological Forecasting and Social Change, Elsevier, vol. 169(C).

    More about this item

    Keywords

    Market research; open-ended responses; text analytics; short text topic models;
    All these keywords.

    JEL classification:

    • C00 - Mathematical and Quantitative Methods - - General - - - General

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:zbw:irtgdp:2018054. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ZBW - Leibniz Information Centre for Economics (email available below). General contact details of provider: https://edirc.repec.org/data/wfhubde.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.