IDEAS home Printed from https://ideas.repec.org/a/eee/ejores/v315y2024i2p691-702.html
   My bibliography  Save this article

Industry-sensitive language modeling for business

Author

Listed:
  • Borchert, Philipp
  • Coussement, Kristof
  • De Weerdt, Jochen
  • De Caigny, Arno

Abstract

We introduce BusinessBERT, a new industry-sensitive language model for business applications. The key novelty of our model lies in incorporating industry information to enhance decision-making in business-related natural language processing (NLP) tasks. BusinessBERT extends the Bidirectional Encoder Representations from Transformers (BERT) architecture by embedding industry information during pretraining through two innovative approaches that enable BusinessBert to capture industry-specific terminology: (1) BusinessBERT is trained on business communication corpora totaling 2.23 billion tokens consisting of company website content, MD&A statements and scientific papers in the business domain; (2) we employ industry classification as an additional pretraining objective. Our results suggest that BusinessBERT improves data-driven decision-making by providing superior performance on business-related NLP tasks. Our experiments cover 7 benchmark datasets that include text classification, named entity recognition, sentiment analysis, and question-answering tasks. Additionally, this paper reduces the complexity of using BusinessBERT for other NLP applications by making it freely available as a pretrained language model to the business community. The model, its pretraining corpora and corresponding code snippets are accessible via https://github.com/pnborchert/BusinessBERT.

Suggested Citation

  • Borchert, Philipp & Coussement, Kristof & De Weerdt, Jochen & De Caigny, Arno, 2024. "Industry-sensitive language modeling for business," European Journal of Operational Research, Elsevier, vol. 315(2), pages 691-702.
  • Handle: RePEc:eee:ejores:v:315:y:2024:i:2:p:691-702
    DOI: 10.1016/j.ejor.2024.01.023
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0377221724000444
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.ejor.2024.01.023?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Kun Chen & Xin Li & Peng Luo & J. Leon Zhao, 2021. "News-Induced Dynamic Networks for Market Signaling: Understanding the Impact of News on Firm Equity Value," Information Systems Research, INFORMS, vol. 32(2), pages 356-377, June.
    2. Yang Pan & Peng Huang & Anandasivam Gopal, 2019. "Storm Clouds on the Horizon? New Entry Threats and R&D Investments in the U.S. IT Industry," Information Systems Research, INFORMS, vol. 30(2), pages 540-562, June.
    3. Sam Arts & Bruno Cassiman & Juan Carlos Gomez, 2018. "Text matching to measure patent similarity," Strategic Management Journal, Wiley Blackwell, vol. 39(1), pages 62-84, January.
    4. Vairetti, Carla & Aránguiz, Ignacio & Maldonado, Sebastián & Karmy, Juan Pablo & Leal, Alonso, 2024. "Analytics-driven complaint prioritisation via deep learning and multicriteria decision-making," European Journal of Operational Research, Elsevier, vol. 312(3), pages 1108-1118.
    5. Lynnette Purda & David Skillicorn, 2015. "Accounting Variables, Deception, and a Bag of Words: Assessing the Tools of Fraud Detection," Contemporary Accounting Research, John Wiley & Sons, vol. 32(3), pages 1193-1223, September.
    6. Sudeep Bhatia, 2019. "Predicting Risk Perception: New Insights from Data Science," Management Science, INFORMS, vol. 65(8), pages 3800-3823, August.
    7. Baechle, Christopher & Huang, C. Derrick & Agarwal, Ankur & Behara, Ravi S. & Goo, Jahyun, 2020. "Latent topic ensemble learning for hospital readmission cost optimization," European Journal of Operational Research, Elsevier, vol. 281(3), pages 517-531.
    8. Feuerriegel, Stefan & Gordon, Julius, 2019. "News-based forecasts of macroeconomic indicators: A semantic path model for interpretable predictions," European Journal of Operational Research, Elsevier, vol. 272(1), pages 162-175.
    9. Symitsi, Efthymia & Stamolampros, Panagiotis & Daskalakis, George & Korfiatis, Nikolaos, 2021. "The informational value of employee online reviews," European Journal of Operational Research, Elsevier, vol. 288(2), pages 605-619.
    10. Stephanie Beyer Díaz & Kristof Coussement & Arno de Caigny & Luis Fernando Pérez & Stefan Creemers, 2023. "Do the US president's tweets better predict oil prices? An empirical examination using long short-term memory networks," Post-Print hal-04543480, HAL.
    11. Katsafados, Apostolos G. & Leledakis, George N. & Pyrgiotakis, Emmanouil G. & Androutsopoulos, Ion & Fergadiotis, Manos, 2024. "Machine learning in bank merger prediction: A text-based approach," European Journal of Operational Research, Elsevier, vol. 312(2), pages 783-797.
    12. Beyer, Anne & Cohen, Daniel A. & Lys, Thomas Z. & Walther, Beverly R., 2010. "The financial reporting environment: Review of the recent literature," Journal of Accounting and Economics, Elsevier, vol. 50(2-3), pages 296-343, December.
    13. Dokyun Lee & Kartik Hosanagar & Harikesh S. Nair, 2018. "Advertising Content and Consumer Engagement on Social Media: Evidence from Facebook," Management Science, INFORMS, vol. 64(11), pages 5105-5131, November.
    14. Gustaf Bellstam & Sanjai Bhagat & J. Anthony Cookson, 2021. "A Text-Based Analysis of Corporate Innovation," Management Science, INFORMS, vol. 67(7), pages 4004-4031, July.
    15. Yuqian Xu & Tom Fangyun Tan & Serguei Netessine, 2022. "The Impact of Workload on Operational Risk: Evidence from a Commercial Bank," Management Science, INFORMS, vol. 68(4), pages 2668-2693, April.
    16. Nikolay Archak & Anindya Ghose & Panagiotis G. Ipeirotis, 2011. "Deriving the Pricing Power of Product Features by Mining Consumer Reviews," Management Science, INFORMS, vol. 57(8), pages 1485-1509, August.
    17. Stevenson, Matthew & Mues, Christophe & Bravo, Cristián, 2021. "The value of text for small business default prediction: A Deep Learning approach," European Journal of Operational Research, Elsevier, vol. 295(2), pages 758-771.
    18. Steffen Nauhaus & Johannes Luger & Sebastian Raisch, 2021. "Strategic Decision Making in the Digital Age: Expert Sentiment and Corporate Capital Allocation," Journal of Management Studies, Wiley Blackwell, vol. 58(7), pages 1933-1961, November.
    19. David S. Koo & J. Julie Wu & P. Eric Yeung, 2017. "Earnings Attribution and Information Transfers," Contemporary Accounting Research, John Wiley & Sons, vol. 34(3), pages 1547-1579, September.
    20. Gerard Hoberg & Gordon Phillips, 2016. "Text-Based Network Industries and Endogenous Product Differentiation," Journal of Political Economy, University of Chicago Press, vol. 124(5), pages 1423-1465.
    21. Angela K. Davis & Jeremy M. Piger & Lisa M. Sedor, 2012. "Beyond the Numbers: Measuring the Information Content of Earnings Press Release Language," Contemporary Accounting Research, John Wiley & Sons, vol. 29(3), pages 845-868, September.
    22. Martha Jeong & Julia Minson & Michael Yeomans & Francesca Gino, 2019. "Communicating with Warmth in Distributive Negotiations Is Surprisingly Counterproductive," Management Science, INFORMS, vol. 65(12), pages 5813-5837, December.
    23. Jiyeon Hong & Paul R. Hoban, 2022. "Writing More Compelling Creative Appeals: A Deep Learning-Based Approach," Marketing Science, INFORMS, vol. 41(5), pages 941-965, September.
    24. Dinesh Puranam & Vrinda Kadiyali & Vishal Narayan, 2021. "The Impact of Increase in Minimum Wages on Consumer Perceptions of Service: A Transformer Model of Online Restaurant Reviews," Marketing Science, INFORMS, vol. 40(5), pages 985-1004, September.
    25. Feng Li, 2010. "The Information Content of Forward‐Looking Statements in Corporate Filings—A Naïve Bayesian Machine Learning Approach," Journal of Accounting Research, Wiley Blackwell, vol. 48(5), pages 1049-1102, December.
    26. Antonio Moreno & Christian Terwiesch, 2014. "Doing Business with Strangers: Reputation in Online Service Marketplaces," Information Systems Research, INFORMS, vol. 25(4), pages 865-886, December.
    27. Pekka Malo & Ankur Sinha & Pekka Korhonen & Jyrki Wallenius & Pyry Takala, 2014. "Good debt or bad debt: Detecting semantic orientations in economic texts," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 65(4), pages 782-796, April.
    28. Ming-Fu Hsu & Ying-Shao Hsin & Fu-Jiing Shiue, 2022. "Business analytics for corporate risk management and performance improvement," Annals of Operations Research, Springer, vol. 315(2), pages 629-669, August.
    29. Yang Bao & Anindya Datta, 2014. "Simultaneously Discovering and Quantifying Risk Types from Textual Risk Disclosures," Management Science, INFORMS, vol. 60(6), pages 1371-1391, June.
    30. Chengzhu Sun & Shujing Wang & Chu Zhang, 2021. "Corporate Payout Policy and Credit Risk: Evidence from Credit Default Swap Markets," Management Science, INFORMS, vol. 67(9), pages 5755-5775, September.
    31. Yuqian Xu & Mor Armony & Anindya Ghose, 2021. "The Interplay Between Online Reviews and Physician Demand: An Empirical Investigation," Management Science, INFORMS, vol. 67(12), pages 7344-7361, December.
    32. Jaeho Choi & Anoop Menon & Haris Tabakovic, 2021. "Using machine learning to revisit the diversification–performance relationship," Strategic Management Journal, Wiley Blackwell, vol. 42(9), pages 1632-1661, September.
    33. Yuanyang Liu & Gautam Pant & Olivia R. L. Sheng, 2020. "Predicting Labor Market Competition: Leveraging Interfirm Network and Employee Skills," Information Systems Research, INFORMS, vol. 31(4), pages 1443-1466, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. John L. Campbell & Hye Seung “Grace” Lee & Hsin‐Min Lu & Logan B. Steele, 2020. "Express Yourself: Why Managers' Disclosure Tone Varies Across Time and What Investors Learn From It," Contemporary Accounting Research, John Wiley & Sons, vol. 37(2), pages 1140-1171, June.
    2. Berkin, Anil & Aerts, Walter & Van Caneghem, Tom, 2023. "Feasibility analysis of machine learning for performance-related attributional statements," International Journal of Accounting Information Systems, Elsevier, vol. 48(C).
    3. Ingrid E. Fisher & Margaret R. Garnsey & Mark E. Hughes, 2016. "Natural Language Processing in Accounting, Auditing and Finance: A Synthesis of the Literature with a Roadmap for Future Research," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 23(3), pages 157-214, July.
    4. Jiao Ji & Oleksandr Talavera & Shuxing Yin, 2018. "The Hidden Information Content: Evidence from the Tone of Independent Director Reports," Working Papers 2018-28, Swansea University, School of Management.
    5. Laura Toschi & Elisa Ughetto & Andrea Fronzetti Colladon, 2023. "The identity of social impact venture capitalists: exploring social linguistic positioning and linguistic distinctiveness through text mining," Small Business Economics, Springer, vol. 60(3), pages 1249-1280, March.
    6. Moumen, Néjia & Ben Othman, Hakim & Hussainey, Khaled, 2015. "The value relevance of risk disclosure in annual reports: Evidence from MENA emerging markets," Research in International Business and Finance, Elsevier, vol. 34(C), pages 177-204.
    7. Christina Bannier & Thomas Pauls & Andreas Walter, 2019. "Content analysis of business communication: introducing a German dictionary," Journal of Business Economics, Springer, vol. 89(1), pages 79-123, February.
    8. Frankel, Richard & Jennings, Jared & Lee, Joshua, 2016. "Using unstructured and qualitative disclosures to explain accruals," Journal of Accounting and Economics, Elsevier, vol. 62(2), pages 209-227.
    9. Dasgupta, Sudipto & Banerjee, Shantanu & SHI, RUI & Yan, Jiali, 2021. "Information Complementarities and the Dynamics of Transparency Shock Spillovers," CEPR Discussion Papers 15658, C.E.P.R. Discussion Papers.
    10. Richard Frankel & Jared Jennings & Joshua Lee, 2022. "Disclosure Sentiment: Machine Learning vs. Dictionary Methods," Management Science, INFORMS, vol. 68(7), pages 5514-5532, July.
    11. Yasheng Chen & Xian Huang & Zhuojun Wu, 2023. "From natural language to accounting entries using a natural language processing method," Accounting and Finance, Accounting and Finance Association of Australia and New Zealand, vol. 63(4), pages 3781-3795, December.
    12. Fengler, Matthias & Phan, Minh Tri, 2023. "A Topic Model for 10-K Management Disclosures," Economics Working Paper Series 2307, University of St. Gallen, School of Economics and Political Science.
    13. Senave, Elseline & Jans, Mieke J. & Srivastava, Rajendra P., 2023. "The application of text mining in accounting," International Journal of Accounting Information Systems, Elsevier, vol. 50(C).
    14. Craja, Patricia & Kim, Alisa & Lessmann, Stefan, 2020. "Deep Learning application for fraud detection in financial statements," IRTG 1792 Discussion Papers 2020-007, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    15. Jason V. Chen & Itay Kama & Reuven Lehavy, 2019. "A contextual analysis of the impact of managerial expectations on asymmetric cost behavior," Review of Accounting Studies, Springer, vol. 24(2), pages 665-693, June.
    16. Özgür Arslan‐Ayaydin & James Thewissen & Wouter Torsin, 2021. "Disclosure tone management and labor unions," Journal of Business Finance & Accounting, Wiley Blackwell, vol. 48(1-2), pages 102-147, January.
    17. James P. Ryans, 2021. "Textual classification of SEC comment letters," Review of Accounting Studies, Springer, vol. 26(1), pages 37-80, March.
    18. Blankespoor, Elizabeth & deHaan, Ed & Marinovic, Iván, 2020. "Disclosure processing costs, investors’ information choice, and equity market outcomes: A review," Journal of Accounting and Economics, Elsevier, vol. 70(2).
    19. Venkatesh Shankar & Sohil Parsana, 2022. "An overview and empirical comparison of natural language processing (NLP) models and an introduction to and empirical application of autoencoder models in marketing," Journal of the Academy of Marketing Science, Springer, vol. 50(6), pages 1324-1350, November.
    20. Volkan Muslu & Sunay Mutlu & Suresh Radhakrishnan & Albert Tsang, 2019. "Corporate Social Responsibility Report Narratives and Analyst Forecast Accuracy," Journal of Business Ethics, Springer, vol. 154(4), pages 1119-1142, February.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:ejores:v:315:y:2024:i:2:p:691-702. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/eor .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.