IDEAS home Printed from https://ideas.repec.org/a/inm/orisre/v34y2023i1p137-156.html
   My bibliography  Save this article

sDTM: A Supervised Bayesian Deep Topic Model for Text Analytics

Author

Listed:
  • Yi Yang

    (Department of Information Systems, Business Statistics and Operations Management, Hong Kong University of Science and Technology, Hong Kong)

  • Kunpeng Zhang

    (Department of Decision, Operations and Information Technologies, Robert H. Smith School of Business, University of Maryland, College Park, College Park, Maryland 20742)

  • Yangyang Fan

    (School of Accounting and Finance, Faculty of Business, Hong Kong Polytechnic University, Hong Kong)

Abstract

Topic modeling methods such as latent Dirichlet allocation (LDA) are powerful tools for analyzing massive amounts of textual data. They have been used extensively in information systems (IS) and business discipline research to identify latent topics for data exploration and as a feature engineering mechanism to derive new variables for analyses. However, existing topic modeling approaches are mostly unsupervised and only leverage textual data, while ignoring additional useful metadata often associated with text, such as star ratings in customer reviews or categories of posts in online forums. As a result, the identified topics and variables derived based on the learned topic model may not be accurate, which could lead to incorrect estimations that affect subsequent empirical analysis and to inferior performance on predictive tasks. In this study, we propose a novel supervised deep topic modeling approach called sDTM, which combines a neural variational autoencoder model and a recurrent neural network. sDTM leverages the auxiliary data associated with text to enhance the topic modeling capability. We conduct empirical case studies and predictive analytics on an online consumer review data set and an online knowledge community data set. Experimental results show that in comparison with benchmark methods, sDTM can enhance both the empirical estimation and predictive performance. sDTM makes methodological contributions to the IS literature and has direct relevance for research using text analytics.

Suggested Citation

  • Yi Yang & Kunpeng Zhang & Yangyang Fan, 2023. "sDTM: A Supervised Bayesian Deep Topic Model for Text Analytics," Information Systems Research, INFORMS, vol. 34(1), pages 137-156, March.
  • Handle: RePEc:inm:orisre:v:34:y:2023:i:1:p:137-156
    DOI: 10.1287/isre.2022.1124
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/isre.2022.1124
    Download Restriction: no

    File URL: https://libkey.io/10.1287/isre.2022.1124?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Quentin Jones & Gilad Ravid & Sheizaf Rafaeli, 2004. "Information Overload and the Message Dynamics of Online Interaction Spaces: A Theoretical Model and Empirical Exploration," Information Systems Research, INFORMS, vol. 15(2), pages 194-210, June.
    2. Sarah Kaplan & Keyvan Vakili, 2015. "The double-edged sword of recombination in breakthrough innovation," Strategic Management Journal, Wiley Blackwell, vol. 36(10), pages 1435-1457, October.
    3. Dyer, Travis & Lang, Mark & Stice-Lawrence, Lorien, 2017. "The evolution of 10-K textual disclosure: Evidence from Latent Dirichlet Allocation," Journal of Accounting and Economics, Elsevier, vol. 64(2), pages 221-245.
    4. Xiao Liu & Param Vir Singh & Kannan Srinivasan, 2016. "A Structured Analysis of Unstructured Big Data by Leveraging Cloud Computing," Marketing Science, INFORMS, vol. 35(3), pages 363-388, May.
    5. Lei Xu & Tingting Nian & Luis Cabral, 2018. "What Makes Geeks Tick? A Study of Stack Overflow Careers," Working Papers 18-04, New York University, Leonard N. Stern School of Business, Department of Economics.
    6. Param Vir Singh & Nachiketa Sahoo & Tridas Mukhopadhyay, 2014. "How to Attract and Retain Readers in Enterprise Blogging?," Information Systems Research, INFORMS, vol. 25(1), pages 35-52, March.
    7. Mengke Qiao & Ke-Wei Huang, 2021. "Correcting Misclassification Bias in Regression Models with Variables Generated via Data Mining," Information Systems Research, INFORMS, vol. 32(2), pages 462-480, June.
    8. David Godes & Dina Mayzlin, 2004. "Using Online Conversations to Study Word-of-Mouth Communication," Marketing Science, INFORMS, vol. 23(4), pages 545-560, June.
    9. Shun†Yang Lee & Liangfei Qiu & Andrew Whinston, 2018. "Sentiment Manipulation in Online Platforms: An Analysis of Movie Tweets," Production and Operations Management, Production and Operations Management Society, vol. 27(3), pages 393-416, March.
    10. Anindya Ghose & Panagiotis G. Ipeirotis & Beibei Li, 2019. "Modeling Consumer Footprints on Search Engines: An Interplay with Social Media," Management Science, INFORMS, vol. 65(3), pages 1363-1385, March.
    11. Warut Khern-am-nuai & Karthik Kannan & Hossein Ghasemkhani, 2018. "Extrinsic versus Intrinsic Rewards for Contributing Reviews in an Online Platform," Information Systems Research, INFORMS, vol. 29(4), pages 871-892, December.
    12. Dinesh Puranam & Vishal Narayan & Vrinda Kadiyali, 2017. "The Effect of Calorie Posting Regulation on Consumer Opinion: A Flexible Latent Dirichlet Allocation Model with Informative Priors," Marketing Science, INFORMS, vol. 36(5), pages 726-746, September.
    13. Marios Kokkodis & Theodoros Lappas & Sam Ransbotham, 2020. "From Lurkers to Workers: Predicting Voluntary Contribution and Community Welfare," Information Systems Research, INFORMS, vol. 31(2), pages 607-626, June.
    14. Chih-Hung Peng & Dezhi Yin & Han Zhang, 2020. "More than Words in Medical Question-and-Answer Sites: A Content-Context Congruence Perspective," Information Systems Research, INFORMS, vol. 31(3), pages 913-928, September.
    15. Shawn Mankad & Hyunjeong Spring Han & Joel Goh & Srinagesh Gavirneni, 2016. "Understanding Online Hotel Reviews Through Automated Text Analysis," Post-Print hal-02311939, HAL.
    16. Ritu Agarwal & Vasant Dhar, 2014. "Editorial —Big Data, Data Science, and Analytics: The Opportunity and Challenge for IS Research," Information Systems Research, INFORMS, vol. 25(3), pages 443-448, September.
    17. Xiaomo Liu & G. Alan Wang & Weiguo Fan & Zhongju Zhang, 2020. "Finding Useful Solutions in Online Knowledge Communities: A Theory-Driven Design and Multilevel Analysis," Information Systems Research, INFORMS, vol. 31(3), pages 731-752, September.
    18. Lei Xu & Tingting Nian & Luís Cabral, 2020. "What Makes Geeks Tick? A Study of Stack Overflow Careers," Management Science, INFORMS, vol. 66(2), pages 587-604, February.
    19. Gustaf Bellstam & Sanjai Bhagat & J. Anthony Cookson, 2021. "A Text-Based Analysis of Corporate Innovation," Management Science, INFORMS, vol. 67(7), pages 4004-4031, July.
    20. Mochen Yang & Gediminas Adomavicius & Gordon Burtch & Yuqing Rena, 2018. "Mind the Gap: Accounting for Measurement Error and Misclassification in Variables Generated via Data Mining," Information Systems Research, INFORMS, vol. 29(1), pages 4-24, March.
    21. Theodoros Lappas & Gaurav Sabnis & Georgios Valkanas, 2016. "The Impact of Fake Reviews on Online Visibility: A Vulnerability Assessment of the Hotel Industry," Information Systems Research, INFORMS, vol. 27(4), pages 940-961, December.
    22. Quan Wang & Beibei Li & Param Vir Singh, 2018. "Copycats vs. Original Mobile Apps: A Machine Learning Copycat-Detection Method and Empirical Analysis," Information Systems Research, INFORMS, vol. 29(2), pages 273-291, June.
    23. Mingfeng Lin & Henry C. Lucas & Galit Shmueli, 2013. "Research Commentary ---Too Big to Fail: Large Samples and the p -Value Problem," Information Systems Research, INFORMS, vol. 24(4), pages 906-917, December.
    24. Alan S. Abrahams & Weiguo Fan & G. Alan Wang & Zhongju (John) Zhang & Jian Jiao, 2015. "An Integrated Text Analytic Framework for Product Defect Discovery," Production and Operations Management, Production and Operations Management Society, vol. 24(6), pages 975-990, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Xiaohui Zhang & Qianzhou Du & Zhongju Zhang, 2022. "A theory‐driven machine learning system for financial disinformation detection," Production and Operations Management, Production and Operations Management Society, vol. 31(8), pages 3160-3179, August.
    2. Hyelim Oh & Khim-Yong Goh & Tuan Q. Phan, 2023. "Are You What You Tweet? The Impact of Sentiment on Digital News Consumption and Social Media Sharing," Information Systems Research, INFORMS, vol. 34(1), pages 111-136, March.
    3. Zaiyan Wei & Mo Xiao & Rong Rong, 2021. "Network Size and Content Generation on Social Media Platforms," Production and Operations Management, Production and Operations Management Society, vol. 30(5), pages 1406-1426, May.
    4. Mochen Yang & Gediminas Adomavicius & Gordon Burtch & Yuqing Rena, 2018. "Mind the Gap: Accounting for Measurement Error and Misclassification in Variables Generated via Data Mining," Information Systems Research, INFORMS, vol. 29(1), pages 4-24, March.
    5. Milan Miric & Nan Jia & Kenneth G. Huang, 2023. "Using supervised machine learning for large‐scale classification in management research: The case for identifying artificial intelligence patents," Strategic Management Journal, Wiley Blackwell, vol. 44(2), pages 491-519, February.
    6. Cheng Zhao & Chong Alex Wang, 2023. "A cross-site comparison of online review manipulation using Benford’s law," Electronic Commerce Research, Springer, vol. 23(1), pages 365-406, March.
    7. Sulin Ba & Yuan Jin & Xinxin Li & Xianghua Lu, 2020. "One Size Fits All? The Differential Impact of Online Reviews and Coupons," Production and Operations Management, Production and Operations Management Society, vol. 29(10), pages 2403-2424, October.
    8. Zibo Liu & Zhijie Lin & Ying Zhang & Yong Tan, 2022. "The Signaling Effect of Sampling Size in Physical Goods Sampling Via Online Channels," Production and Operations Management, Production and Operations Management Society, vol. 31(2), pages 529-546, February.
    9. Yue Jin & Yong Tan & Jinghua Huang, 2022. "Managing contributor performance in knowledge‐sharing communities: A dynamic perspective," Production and Operations Management, Production and Operations Management Society, vol. 31(11), pages 3945-3962, November.
    10. Marios Kokkodis & Theodoros Lappas & Gerald C. Kane, 2022. "Optional purchase verification in e‐commerce platforms: More representative product ratings and higher quality reviews," Production and Operations Management, Production and Operations Management Society, vol. 31(7), pages 2943-2961, July.
    11. Jing Wang & Gen Li & Kai-Lung Hui, 2022. "Monetary Incentives and Knowledge Spillover: Evidence from a Natural Experiment," Management Science, INFORMS, vol. 68(5), pages 3549-3572, May.
    12. Jin P. Gerlach & Ronald T. Cenfetelli, 2022. "Overcoming the Single-IS Paradigm in Individual-Level IS Research," Information Systems Research, INFORMS, vol. 33(2), pages 476-488, June.
    13. Charles Ayoubi & Boris Thurm, 2023. "Knowledge diffusion and morality: Why do we freely share valuable information with Strangers?," Journal of Economics & Management Strategy, Wiley Blackwell, vol. 32(1), pages 75-99, January.
    14. Pastwa, Anna M. & Shrestha, Prabal & Thewissen, James & Torsin, Wouter, 2021. "Unpacking the black box of ICO white papers: a topic modeling approach," LIDAM Discussion Papers LFIN 2021018, Université catholique de Louvain, Louvain Finance (LFIN).
    15. Angela Aerry Choi & Daegon Cho & Dobin Yim & Jae Yun Moon & Wonseok Oh, 2019. "When Seeing Helps Believing: The Interactive Effects of Previews and Reviews on E-Book Purchases," Information Systems Research, INFORMS, vol. 30(4), pages 1164-1183, December.
    16. Xitong Li & Jörn Grahl & Oliver Hinz, 2022. "How Do Recommender Systems Lead to Consumer Purchases? A Causal Mediation Analysis of a Field Experiment," Information Systems Research, INFORMS, vol. 33(2), pages 620-637, June.
    17. Pei-Yu Chen & Yili Hong & Ying Liu, 2018. "The Value of Multidimensional Rating Systems: Evidence from a Natural Experiment and Randomized Experiments," Management Science, INFORMS, vol. 64(10), pages 4629-4647, October.
    18. Jingchuan Pu & Yang Liu & Yuan Chen & Liangfei Qiu & Hsing Kenneth Cheng, 2022. "What Questions Are You Inclined to Answer? Effects of Hierarchy in Corporate Q&A Communities," Information Systems Research, INFORMS, vol. 33(1), pages 244-264, March.
    19. Li, Xi & Shi, Mengze & Wang, Xin (Shane), 2019. "Video mining: Measuring visual information using automatic methods," International Journal of Research in Marketing, Elsevier, vol. 36(2), pages 216-231.
    20. Kim, Jikyung (Jeanne) & Kim, Sanghwa & Choi, Jeonghye, 2020. "Purchase now and consume later: Do online and offline environments drive online social interactions and sales?," Journal of Business Research, Elsevier, vol. 120(C), pages 274-285.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:orisre:v:34:y:2023:i:1:p:137-156. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.