IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v125y2020i3d10.1007_s11192-020-03726-9.html
   My bibliography  Save this article

Web mining for innovation ecosystem mapping: a framework and a large-scale pilot study

Author

Listed:
  • Jan Kinne

    (ZEW – Leibniz Centre for European Economic Research
    University of Salzburg
    Harvard University
    istari.ai)

  • Janna Axenbeck

    (ZEW - Leibniz Centre for European Economic Research
    Justus-Liebig-University)

Abstract

Existing approaches to model innovation ecosystems have been mostly restricted to qualitative and small-scale levels or, when relying on traditional innovation indicators such as patents and questionnaire-based survey, suffered from a lack of timeliness, granularity, and coverage. Websites of firms are a particularly interesting data source for innovation research, as they are used for publishing information about potentially innovative products, services, and cooperation with other firms. Analyzing the textual and relational content on these websites and extracting innovation-related information from them has the potential to provide researchers and policy-makers with a cost-effective way to survey millions of businesses and gain insights into their innovation activity, their cooperation, and applied technologies. For this purpose, we propose a web mining framework for consistent and reproducible mapping of innovation ecosystems. In a large-scale pilot study we use a database with 2.4 million German firms to test our framework and explore firm websites as a data source. Thereby we put particular emphasis on the investigation of a potential bias when surveying innovation systems through firm websites if only certain firm types can be surveyed using our proposed approach. We find that the availability of a websites and the characteristics of the website (number of subpages and hyperlinks, text volume, language used) differs according to firm size, age, location, and sector. We also find that patenting firms will be overrepresented in web mining studies. Web mining as a survey method also has to cope with extremely large and hyper-connected outlier websites and the fact that low broadband availability appears to prevent some firms from operating their own website and thus excludes them from web mining analysis. We then apply the proposed framework to map an exemplary innovation ecosystem of Berlin-based firms that are engaged in artificial intelligence. Finally, we outline several approaches how to transfer firm website content into valuable innovation indicators.

Suggested Citation

  • Jan Kinne & Janna Axenbeck, 2020. "Web mining for innovation ecosystem mapping: a framework and a large-scale pilot study," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(3), pages 2011-2041, December.
  • Handle: RePEc:spr:scient:v:125:y:2020:i:3:d:10.1007_s11192-020-03726-9
    DOI: 10.1007/s11192-020-03726-9
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-020-03726-9
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-020-03726-9?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Matthew Gentzkow & Bryan T. Kelly & Matt Taddy, 2017. "Text as Data," NBER Working Papers 23276, National Bureau of Economic Research, Inc.
    2. Nikolaos Askitas & Klaus F. Zimmermann, 2015. "The internet as a data source for advancement in social sciences," International Journal of Manpower, Emerald Group Publishing Limited, vol. 36(1), pages 2-12, April.
    3. Kleinknecht, Alfred & Reijnen, Jeroen O. N., 1993. "Towards literature-based innovation output indicators," Structural Change and Economic Dynamics, Elsevier, vol. 4(1), pages 199-207, June.
    4. Abdullah Gök & Alec Waterworth & Philip Shapira, 2015. "Use of web mining in studying innovation," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(1), pages 653-671, January.
    5. Zvi Griliches, 1998. "Patent Statistics as Economic Indicators: A Survey," NBER Chapters, in: R&D and Productivity: The Econometric Evidence, pages 287-343, National Bureau of Economic Research, Inc.
    6. Rammer, Christian & Crass, Dirk & Doherr, Thorsten & Hud, Martin & Hünermund, Paul & Iferd, Younes & Köhler, Christian & Peters, Bettina & Schubert, Torben, 2016. "Innovationsverhalten der deutschen Wirtschaft: Indikatorenbericht zur Innovationserhebung 2015," The Annual German Innovation Survey, Key Figures Reports 128149, ZEW - Leibniz Centre for European Economic Research.
    7. repec:zbw:bofrdp:2015_027 is not listed on IDEAS
    8. Carlino, Gerald & Kerr, William R., 2015. "Agglomeration and Innovation," Handbook of Regional and Urban Economics, in: Gilles Duranton & J. V. Henderson & William C. Strange (ed.), Handbook of Regional and Urban Economics, edition 1, volume 5, chapter 0, pages 349-404, Elsevier.
    9. Manfred M. Fischer & Arthur Getis (ed.), 2010. "Handbook of Applied Spatial Analysis," Springer Books, Springer, number 978-3-642-03647-7, June.
    10. repec:zbw:bofrdp:urn:nbn:fi:bof-201512111472 is not listed on IDEAS
    11. Zoltan J. Acs & Luc Anselin & Attila Varga, 2008. "Patents and Innovation Counts as Measures of Regional Production of New Knowledge," Chapters, in: Entrepreneurship, Growth and Public Policy, chapter 11, pages 135-151, Edward Elgar Publishing.
    12. Xu, Guannan & Wu, Yuchen & Minshall, Tim & Zhou, Yuan, 2018. "Exploring innovation ecosystems across science, technology, and business: A case of 3D printing in China," Technological Forecasting and Social Change, Elsevier, vol. 136(C), pages 208-221.
    13. Christian Rammer & Jan Kinne & Knut Blind, 2020. "Knowledge proximity and firm innovation: A microgeographic analysis for Berlin," Urban Studies, Urban Studies Journal Limited, vol. 57(5), pages 996-1014, April.
    14. Alfred Kleinknecht & Kees Van Montfort & Erik Brouwer, 2002. "The Non-Trivial Choice between Innovation Indicators," Economics of Innovation and New Technology, Taylor & Francis Journals, vol. 11(2), pages 109-121.
    15. Sanjay K. Arora & Jan Youtie & Philip Shapira & Lidan Gao & TingTing Ma, 2013. "Entry strategies in an emerging technology: a pilot web-based study of graphene firms," Scientometrics, Springer;Akadémiai Kiadó, vol. 95(3), pages 1189-1207, June.
    16. Nelson, Andrew J., 2009. "Measuring knowledge spillovers: What patents, licenses and publications reveal about innovation diffusion," Research Policy, Elsevier, vol. 38(6), pages 994-1005, July.
    17. Mohammad Arzaghi & J. Vernon Henderson, 2008. "Networking off Madison Avenue," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 75(4), pages 1011-1038.
    18. J Sylvan Katz & Viv Cothey, 2006. "Web indicators for complex innovation systems," Research Evaluation, Oxford University Press, vol. 15(2), pages 85-95, August.
    19. Rammer, Christian & Berger, Marius & Doherr, Thorsten & Hud, Martin & Hünermund, Paul & Iferd, Younes & Peters, Bettina & Schubert, Torben, 2017. "Innovationsverhalten der deutschen Wirtschaft: Indikatorenbericht zur Innovationserhebung 2016," The Annual German Innovation Survey, Key Figures Reports 155758, ZEW - Leibniz Centre for European Economic Research.
    20. Gilles Duranton & J. V. Henderson & William C. Strange (ed.), 2015. "Handbook of Regional and Urban Economics," Handbook of Regional and Urban Economics, Elsevier, edition 1, volume 5, number 5.
    21. Krüger, Miriam & Kinne, Jan & Lenz, David & Resch, Bernd, 2020. "The digital layer: How innovative firms relate on the web," ZEW Discussion Papers 20-003, ZEW - Leibniz Centre for European Economic Research.
    22. Seongsoo Jang & Jinwon Kim & Max von Zedtwitz, 2017. "The importance of spatial agglomeration in product innovation: A microgeography perspective," Post-Print hal-02004347, HAL.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Rammer, Christian & Es-Sadki, Nordine, 2023. "Using big data for generating firm-level innovation indicators - a literature review," Technological Forecasting and Social Change, Elsevier, vol. 197(C).
    2. Christoph Stich & Emmanouil Tranos & Max Nathan, 2023. "Modeling clusters from the ground up: A web data approach," Environment and Planning B, , vol. 50(1), pages 244-267, January.
    3. Occhini, Giulia & Tranos, Emmanouil & Wolf, Levi John, 2023. "Measuring a country’s digital industrial structure: commercial websites and weakly supervised classification to the rescue," SocArXiv h572n, Center for Open Science.
    4. Axenbeck, Janna & Bertschek, Irene & Breithaupt, Patrick & Erdsiek, Daniel, 2023. "Firm digitalisation and mobility - Do Covid-19-related changes persist?," ZEW Discussion Papers 23-011, ZEW - Leibniz Centre for European Economic Research.
    5. Chenxi Liu & Zhenghong Peng & Lingbo Liu & Shixuan Li, 2023. "Innovation Networks of Science and Technology Firms: Evidence from China," Land, MDPI, vol. 12(7), pages 1-21, June.
    6. Mazzoni Leonardo & Pinelli Fabio & Riccaboni Massimo, 2023. "Measuring Corporate Digital Divide with web scraping: Evidence from Italy," Papers 2301.04925, arXiv.org.
    7. Kinne, Jan & Dehghan, Robert & Schmidt, Sebastian & Lenz, David & Hottenrott, Hanna, 2024. "Location factors and ecosystem embedding of sustainability-engaged blockchain companies in the US: A web-based analysis," ZEW Discussion Papers 24-023, ZEW - Leibniz Centre for European Economic Research.
    8. MOTOHASHI Kazuyuki & ZHU Chen, 2024. "Quantifying the Differences in Innovation Processes in China, Japan and the United States by Document Level Concordance between Patents and Web Contents," Discussion papers 24075, Research Institute of Economy, Trade and Industry (RIETI).
    9. Motohashi, Kazuyuki & Zhu, Chen, 2023. "Identifying technology opportunity using dual-attention model and technology-market concordance matrix," Technological Forecasting and Social Change, Elsevier, vol. 197(C).
    10. Dahlke, Johannes & Beck, Mathias & Kinne, Jan & Lenz, David & Dehghan, Robert & Wörter, Martin & Ebersberger, Bernd, 2024. "Epidemic effects in the diffusion of emerging digital technologies: evidence from artificial intelligence adoption," Research Policy, Elsevier, vol. 53(2).
    11. Schmidt, Sebastian & Kinne, Jan & Lautenbach, Sven & Blaschke, Thomas & Lenz, David & Resch, Bernd, 2022. "Greenwashing in the US metal industry? A novel approach combining SO2 concentrations from satellite data, a plant-level firm database and web text mining," ZEW Discussion Papers 22-006, ZEW - Leibniz Centre for European Economic Research.
    12. Schubert, Torben & Ashouri, Sajad & Deschryvere, Matthias & Jäger, Angela & Visentin, Fabiana & Cunningham, Scott & Hajikhani, Arash & Pukelis, Lukas & Suominen, Arho, 2023. "The role of product digitization for productivity," MERIT Working Papers 2023-004, United Nations University - Maastricht Economic and Social Research Institute on Innovation and Technology (MERIT).
    13. Julian Schwierzy & Robert Dehghan & Sebastian Schmidt & Elisa Rodepeter & Andreas Stoemmer & Kaan Uctum & Jan Kinne & David Lenz & Hanna Hottenrott, 2022. "Technology Mapping Using WebAI: The Case of 3D Printing," Papers 2201.01125, arXiv.org.
    14. Moritz Böhmecke-Schwafert & Colin Dörries, 2024. "Measuring Innovation in Mauritius’ ICT Sector Using Unsupervised Machine Learning: A Web Mining and Topic Modeling Approach," Journal of the Knowledge Economy, Springer;Portland International Center for Management of Engineering and Technology (PICMET), vol. 15(3), pages 1-34, September.
    15. Bahoo, Salman & Cucculelli, Marco & Qamar, Dawood, 2023. "Artificial intelligence and corporate innovation: A review and research agenda," Technological Forecasting and Social Change, Elsevier, vol. 188(C).
    16. Axenbeck, Janna & Breithaupt, Patrick, 2022. "Measuring the digitalisation of firms: A novel text mining approach," ZEW Discussion Papers 22-065, ZEW - Leibniz Centre for European Economic Research.
    17. Hain, Daniel S. & Jurowetzki, Roman & Buchmann, Tobias & Wolf, Patrick, 2022. "A text-embedding-based approach to measuring patent-to-patent technological similarity," Technological Forecasting and Social Change, Elsevier, vol. 177(C).
    18. Abbasiharofteh, Milad & Kriesch, Lukas, 2024. "Not all twins are identical: the digital layer of “twin” transition market applications," Papers in Innovation Studies 2024/16, Lund University, CIRCLE - Centre for Innovation Research.
    19. Ashouri, Sajad & Hajikhani, Arash & Suominen, Arho & Pukelis, Lukas & Cunningham, Scott W., 2024. "Measuring digitalization at scale using web scraped data," Technological Forecasting and Social Change, Elsevier, vol. 207(C).
    20. Cruciata, Pietro & Pulizzotto, Davide & Beaudry, Catherine, 2024. "First impressions on sustainable innovation matter: Using NLP to replicate B-lab environmental index by analyzing companies' homepages," Technological Forecasting and Social Change, Elsevier, vol. 205(C).
    21. Bottai, Carlo & Crosato, Lisa & Domenech, Josep & Guerzoni, Marco & Liberati, Caterina, 2024. "Scraping innovativeness from corporate websites: Empirical evidence on Italian manufacturing SMEs," Technological Forecasting and Social Change, Elsevier, vol. 207(C).
    22. Anita Thonipara & Rolf Sternberg & Till Proeger & Lukas Haefner, 2023. "Digital divide, craft firms’ websites and urban-rural disparities—empirical evidence from a web-scraping approach [Digital Divide, Websites von Handwerksunternehmen und städtisch-ländliche Disparit," Review of Regional Research: Jahrbuch für Regionalwissenschaft, Springer;Gesellschaft für Regionalforschung (GfR), vol. 43(1), pages 69-99, April.
    23. Dörr, Julian Oliver & Kinne, Jan & Lenz, David & Licht, Georg & Winker, Peter, 2021. "An integrated data framework for policy guidance in times of dynamic economic shocks," ZEW Discussion Papers 21-062, ZEW - Leibniz Centre for European Economic Research.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Kinne, Jan & Axenbeck, Janna, 2018. "Web mining of firm websites: A framework for web scraping and a pilot study for Germany," ZEW Discussion Papers 18-033, ZEW - Leibniz Centre for European Economic Research.
    2. Rammer, Christian & Es-Sadki, Nordine, 2023. "Using big data for generating firm-level innovation indicators - a literature review," Technological Forecasting and Social Change, Elsevier, vol. 197(C).
    3. Dziallas, Marisa & Blind, Knut, 2019. "Innovation indicators throughout the innovation process: An extensive literature analysis," Technovation, Elsevier, vol. 80, pages 3-29.
    4. Riccardo Crescenzi & Alexander Jaax, 2017. "Innovation in Russia: The Territorial Dimension," Economic Geography, Taylor & Francis Journals, vol. 93(1), pages 66-88, January.
    5. Yongfeng Zhu & Zilong Wang & Shilei Qiu & Lingling Zhu, 2019. "Effects of Environmental Regulations on Technological Innovation Efficiency in China’s Industrial Enterprises: A Spatial Analysis," Sustainability, MDPI, vol. 11(7), pages 1-19, April.
    6. Motoyama, Yasuyuki & Cao, Cong & Appelbaum, Richard, 2014. "Observing regional divergence of Chinese nanotechnology centers," Technological Forecasting and Social Change, Elsevier, vol. 81(C), pages 11-21.
    7. Carlino, Gerald & Kerr, William R., 2015. "Agglomeration and Innovation," Handbook of Regional and Urban Economics, in: Gilles Duranton & J. V. Henderson & William C. Strange (ed.), Handbook of Regional and Urban Economics, edition 1, volume 5, chapter 0, pages 349-404, Elsevier.
    8. Diemer, Andreas & Regan, Tanner, 2022. "No inventor is an island: Social connectedness and the geography of knowledge flows in the US," Research Policy, Elsevier, vol. 51(2).
    9. Tobias Schlegel & Curdin Pfister & Dietmar Harhoff & Uschi Backes-Gellner, 2022. "Innovation effects of universities of applied sciences: an assessment of regional heterogeneity," The Journal of Technology Transfer, Springer, vol. 47(1), pages 63-118, February.
    10. Behrens, Kristian & Kichko, Sergei & Thisse, Jacques-Francois, 2024. "Working from home: Too much of a good thing?," Regional Science and Urban Economics, Elsevier, vol. 105(C).
    11. Hamidi, Shima & Zandiatashbar, Ahoura & Bonakdar, Ahmad, 2019. "The relationship between regional compactness and regional innovation capacity (RIC): Empirical evidence from a national study," Technological Forecasting and Social Change, Elsevier, vol. 142(C), pages 394-402.
    12. Blazquez, Desamparados & Domenech, Josep, 2018. "Big Data sources and methods for social and economic analyses," Technological Forecasting and Social Change, Elsevier, vol. 130(C), pages 99-113.
    13. Matthias Siller & Christoph Hauser & Janette Walde & Gottfried Tappeiner, 2015. "Measuring regional innovation in one dimension: More lost than gained?," Working Papers 2015-14, Faculty of Economics and Statistics, Universität Innsbruck.
    14. Fritsch, Michael & Wyrwich, Michael, 2021. "Is innovation (increasingly) concentrated in large cities? An international comparison," Research Policy, Elsevier, vol. 50(6).
    15. Combes, Pierre-Philippe & Gobillon, Laurent, 2015. "The Empirics of Agglomeration Economies," Handbook of Regional and Urban Economics, in: Gilles Duranton & J. V. Henderson & William C. Strange (ed.), Handbook of Regional and Urban Economics, edition 1, volume 5, chapter 0, pages 247-348, Elsevier.
    16. William R. Kerr & Frederic Robert-Nicoud, 2020. "Tech Clusters," Journal of Economic Perspectives, American Economic Association, vol. 34(3), pages 50-76, Summer.
    17. Abbasiharofteh, Milad & Kinne, Jan & Krüger, Miriam, 2021. "The strength of weak and strong ties in bridging geographic and cognitive distances," ZEW Discussion Papers 21-049, ZEW - Leibniz Centre for European Economic Research.
    18. Olof Ejermo, 2005. "Technological Diversity and Jacobs’ Externality Hypothesis Revisited," Growth and Change, Wiley Blackwell, vol. 36(2), pages 167-195, June.
    19. Bosquet, Clément & Combes, Pierre-Philippe, 2017. "Sorting and agglomeration economies in French economics departments," Journal of Urban Economics, Elsevier, vol. 101(C), pages 27-44.
    20. Bottai, Carlo & Crosato, Lisa & Domenech, Josep & Guerzoni, Marco & Liberati, Caterina, 2024. "Scraping innovativeness from corporate websites: Empirical evidence on Italian manufacturing SMEs," Technological Forecasting and Social Change, Elsevier, vol. 207(C).

    More about this item

    Keywords

    Web mining; Web scraping; Innovation;
    All these keywords.

    JEL classification:

    • O30 - Economic Development, Innovation, Technological Change, and Growth - - Innovation; Research and Development; Technological Change; Intellectual Property Rights - - - General
    • C81 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Methodology for Collecting, Estimating, and Organizing Microeconomic Data; Data Access
    • C88 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Other Computer Software

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:125:y:2020:i:3:d:10.1007_s11192-020-03726-9. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.