IDEAS home Printed from https://ideas.repec.org/a/taf/tjbaxx/v7y2024i4p273-291.html
   My bibliography  Save this article

The analysis of firm web data for predicting company innovativeness: a comparison across different types of innovation

Author

Listed:
  • Sander Sõna
  • Jaan Masso
  • Shakshi Sharma
  • Priit Vahter
  • Rajesh Sharma

Abstract

This paper investigates which of the core types of innovation can be best predicted based on ’firms’ website data. In particular, we focus on four distinct key standard types of innovations in firms: product, process, organisational, and marketing innovation. Web mining of textual data on the websites of firms from Estonia, combined with the application of artificial intelligence (AI) methods, turned out to be a suitable approach to predict firm-level innovation indicators. The key novel addition to the existing literature is the finding that web mining is more applicable to predicting marketing innovation than the other three core types of innovation. As AI-based models are often black-box in nature, for transparency, we use an explainable AI approach (SHAP - SHapley Additive exPlanations), where we look at the most important words predicting a particular type of innovation. Our models confirm that the marketing innovation indicator from survey data was clearly related to marketing-related terms on the ’firms’ websites. In contrast, the results on the relevant words on websites for other innovation indicators were much less clear. Our analysis concludes that the effectiveness of web-scraping and web-text-based AI approaches in predicting cost-effective, granular and timely firm-level innovation indicators varies according to the type of innovation considered.

Suggested Citation

  • Sander Sõna & Jaan Masso & Shakshi Sharma & Priit Vahter & Rajesh Sharma, 2024. "The analysis of firm web data for predicting company innovativeness: a comparison across different types of innovation," Journal of Business Analytics, Taylor & Francis Journals, vol. 7(4), pages 273-291, October.
  • Handle: RePEc:taf:tjbaxx:v:7:y:2024:i:4:p:273-291
    DOI: 10.1080/2573234X.2024.2364886
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1080/2573234X.2024.2364886
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1080/2573234X.2024.2364886?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:tjbaxx:v:7:y:2024:i:4:p:273-291. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Longhurst (email available below). General contact details of provider: http://www.tandfonline.com/tjba .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.