IDEAS home Printed from https://ideas.repec.org/a/eee/ijrema/v36y2019i1p20-38.html
   My bibliography  Save this article

Comparing automated text classification methods

Author

Listed:
  • Hartmann, Jochen
  • Huppertz, Juliana
  • Schamp, Christina
  • Heitmann, Mark

Abstract

Online social media drive the growth of unstructured text data. Many marketing applications require structuring this data at scales non-accessible to human coding, e.g., to detect communication shifts in sentiment or other researcher-defined content categories. Several methods have been proposed to automatically classify unstructured text. This paper compares the performance of ten such approaches (five lexicon-based, five machine learning algorithms) across 41 social media datasets covering major social media platforms, various sample sizes, and languages. So far, marketing research relies predominantly on support vector machines (SVM) and Linguistic Inquiry and Word Count (LIWC). Across all tasks we study, either random forest (RF) or naive Bayes (NB) performs best in terms of correctly uncovering human intuition. In particular, RF exhibits consistently high performance for three-class sentiment, NB for small samples sizes. SVM never outperform the remaining methods. All lexicon-based approaches, LIWC in particular, perform poorly compared with machine learning. In some applications, accuracies only slightly exceed chance. Since additional considerations of text classification choice are also in favor of NB and RF, our results suggest that marketing research can benefit from considering these alternatives.

Suggested Citation

  • Hartmann, Jochen & Huppertz, Juliana & Schamp, Christina & Heitmann, Mark, 2019. "Comparing automated text classification methods," International Journal of Research in Marketing, Elsevier, vol. 36(1), pages 20-38.
  • Handle: RePEc:eee:ijrema:v:36:y:2019:i:1:p:20-38
    DOI: 10.1016/j.ijresmar.2018.09.009
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167811618300545
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.ijresmar.2018.09.009?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Sanjiv R. Das & Mike Y. Chen, 2007. "Yahoo! for Amazon: Sentiment Extraction from Small Talk on the Web," Management Science, INFORMS, vol. 53(9), pages 1375-1388, September.
    2. Kannan, P.K. & Li, Hongshuang “Alice”, 2017. "Digital marketing: A framework, review and research agenda," International Journal of Research in Marketing, Elsevier, vol. 34(1), pages 22-45.
    3. Dokyun Lee & Kartik Hosanagar & Harikesh S. Nair, 2018. "Advertising Content and Consumer Engagement on Social Media: Evidence from Facebook," Management Science, INFORMS, vol. 64(11), pages 5105-5131, November.
    4. Anindya Ghose & Panagiotis G. Ipeirotis & Beibei Li, 2012. "Designing Ranking Systems for Hotels on Travel Search Engines by Mining User-Generated and Crowdsourced Content," Marketing Science, INFORMS, vol. 31(3), pages 493-520, May.
    5. Hansen, Nele & Kupfer, Ann-Kristin & Hennig-Thurau, Thorsten, 2018. "Brand crises in the digital age: The short- and long-term effects of social media firestorms on consumers and brands," International Journal of Research in Marketing, Elsevier, vol. 35(4), pages 557-574.
    6. Felbermayr, Armin & Nanopoulos, Alexandros, 2016. "The Role of Emotions for the Perceived Usefulness in Online Customer Reviews," Journal of Interactive Marketing, Elsevier, vol. 36(C), pages 60-76.
    7. Seshadri Tirunillai & Gerard J. Tellis, 2012. "Does Chatter Really Matter? Dynamics of User-Generated Content and Stock Performance," Marketing Science, INFORMS, vol. 31(2), pages 198-215, March.
    8. Camelia M. Kuhnen & Alexandra Niessen, 2012. "Public Opinion and Executive Compensation," Management Science, INFORMS, vol. 58(7), pages 1249-1272, July.
    9. Oded Netzer & Ronen Feldman & Jacob Goldenberg & Moshe Fresko, 2012. "Mine Your Own Business: Market-Structure Surveillance Through Text Mining," Marketing Science, INFORMS, vol. 31(3), pages 521-543, May.
    10. Zhang, Yuchi & Moe, Wendy W. & Schweidel, David A., 2017. "Modeling the role of message content and influencers in social media rebroadcasting," International Journal of Research in Marketing, Elsevier, vol. 34(1), pages 100-119.
    11. Dinesh Puranam & Vishal Narayan & Vrinda Kadiyali, 2017. "The Effect of Calorie Posting Regulation on Consumer Opinion: A Flexible Latent Dirichlet Allocation Model with Informative Priors," Marketing Science, INFORMS, vol. 36(5), pages 726-746, September.
    12. Dongling Huang & Lan Luo, 2016. "Consumer Preference Elicitation of Complex Products Using Fuzzy Support Vector Machine Active Learning," Marketing Science, INFORMS, vol. 35(3), pages 445-464, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Mengxia Zhang & Lan Luo, 2023. "Can Consumer-Posted Photos Serve as a Leading Indicator of Restaurant Survival? Evidence from Yelp," Management Science, INFORMS, vol. 69(1), pages 25-50, January.
    2. Xiao Liu & Param Vir Singh & Kannan Srinivasan, 2016. "A Structured Analysis of Unstructured Big Data by Leveraging Cloud Computing," Marketing Science, INFORMS, vol. 35(3), pages 363-388, May.
    3. Bitty Balducci & Detelina Marinova, 2018. "Unstructured data in marketing," Journal of the Academy of Marketing Science, Springer, vol. 46(4), pages 557-590, July.
    4. Xin (Shane) Wang & Feng Mai & Roger H. L. Chiang, 2014. "Database Submission ---Market Dynamics and User-Generated Content About Tablet Computers," Marketing Science, INFORMS, vol. 33(3), pages 449-458, May.
    5. Soumya Mukhopadhyay, 2018. "Opinion mining in management research: the state of the art and the way forward," OPSEARCH, Springer;Operational Research Society of India, vol. 55(2), pages 221-250, June.
    6. Khim-Yong Goh & Cheng-Suang Heng & Zhijie Lin, 2013. "Social Media Brand Community and Consumer Behavior: Quantifying the Relative Impact of User- and Marketer-Generated Content," Information Systems Research, INFORMS, vol. 24(1), pages 88-107, March.
    7. Carlson, Keith & Kopalle, Praveen K. & Riddell, Allen & Rockmore, Daniel & Vana, Prasad, 2023. "Complementing human effort in online reviews: A deep learning approach to automatic content generation and review synthesis," International Journal of Research in Marketing, Elsevier, vol. 40(1), pages 54-74.
    8. Li, Xi & Shi, Mengze & Wang, Xin (Shane), 2019. "Video mining: Measuring visual information using automatic methods," International Journal of Research in Marketing, Elsevier, vol. 36(2), pages 216-231.
    9. Laura Toschi & Elisa Ughetto & Andrea Fronzetti Colladon, 2023. "The identity of social impact venture capitalists: exploring social linguistic positioning and linguistic distinctiveness through text mining," Small Business Economics, Springer, vol. 60(3), pages 1249-1280, March.
    10. Ana Babić Rosario & Kristine Valck & Francesca Sotgiu, 2020. "Conceptualizing the electronic word-of-mouth process: What we know and need to know about eWOM creation, exposure, and evaluation," Journal of the Academy of Marketing Science, Springer, vol. 48(3), pages 422-448, May.
    11. Oded Netzer & Ronen Feldman & Jacob Goldenberg & Moshe Fresko, 2012. "Mine Your Own Business: Market-Structure Surveillance Through Text Mining," Marketing Science, INFORMS, vol. 31(3), pages 521-543, May.
    12. Shivaji Alaparthi & Manit Mishra, 2021. "BERT: a sentiment analysis odyssey," Journal of Marketing Analytics, Palgrave Macmillan, vol. 9(2), pages 118-126, June.
    13. Jia Liu & Olivier Toubia, 2018. "A Semantic Approach for Estimating Consumer Content Preferences from Online Search Queries," Marketing Science, INFORMS, vol. 37(6), pages 930-952, November.
    14. Boegershausen, Johannes & Datta, Hannes & Borah, Abhishek & Stephen, Andrew, 2022. "Fields of Gold: Web Scraping and APIs for Impactful Marketing Insights," Other publications TiSEM 5f1ed70a-48c3-422c-bc10-0, Tilburg University, School of Economics and Management.
    15. Ma, Liye & Sun, Baohong, 2020. "Machine learning and AI in marketing – Connecting computing power to human insights," International Journal of Research in Marketing, Elsevier, vol. 37(3), pages 481-504.
    16. Sotaro Katsumata & Seungjin Kim, 2020. "The Text-Score Allocation Model: Finding Latent Topics of Online Review Documents and Multi-Item Ratings," Discussion Papers in Economics and Business 20-01, Osaka University, Graduate School of Economics.
    17. Ratchford, Brian & Soysal, Gonca & Zentner, Alejandro & Gauri, Dinesh K., 2022. "Online and offline retailing: What we know and directions for future research," Journal of Retailing, Elsevier, vol. 98(1), pages 152-177.
    18. Imran Bashir Dar & Muhammad Bashir Khan & Abdul Zahid Khan & Bahaudin G. Mujtaba, 2021. "A qualitative analysis of the marketing analytics literature: where would ethical issues and legality rank?," Journal of Marketing Analytics, Palgrave Macmillan, vol. 9(3), pages 242-261, September.
    19. Alantari, Huwail J. & Currim, Imran S. & Deng, Yiting & Singh, Sameer, 2022. "An empirical comparison of machine learning methods for text-based sentiment analysis of online consumer reviews," International Journal of Research in Marketing, Elsevier, vol. 39(1), pages 1-19.
    20. Scott Motyka & Dhruv Grewal & Elizabeth Aguirre & Dominik Mahr & Ko Ruyter & Martin Wetzels, 2018. "The emotional review–reward effect: how do reviews increase impulsivity?," Journal of the Academy of Marketing Science, Springer, vol. 46(6), pages 1032-1051, November.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:ijrema:v:36:y:2019:i:1:p:20-38. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: https://www.journals.elsevier.com/international-journal-of-research-in-marketing/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.