IDEAS home Printed from https://ideas.repec.org/p/osf/socarx/h572n.html
   My bibliography  Save this paper

Measuring a country’s digital industrial structure: commercial websites and weakly supervised classification to the rescue

Author

Listed:
  • Occhini, Giulia
  • Tranos, Emmanouil
  • Wolf, Levi John

    (University of Bristol)

Abstract

In this paper we propose the use of commercial websites and a contextualized weak supervision framework as an alternative to industrial taxonomies to identify and classify digital industrial activity. Despite the crucial importance of industrial taxonomies for government and research, their static nature leaves taxonomies unable to accurately capture a country’s industrial structure. This is particularly problematic in the context for firms producing novel, digital outputs, which are nowadays classified into the wrong industrial sectors and thus rendered almost invisible to official statistics. To address this issue we show how commercial websites can complement, or even substitute industrial classification surveys and ultimately yield a more complete, up-to-date understanding of a country’s industrial structure evolution. In the process, we compare our classification results using only commercial websites’ landing page versus using full website for classification, finding that a company’s landing page is a better predictor of industrial classes than their full website. We also suggest that our framework could support longitudinal analyses by proposing a pipeline using archival websites. This method can be used by policymakers to identify classes of industries from a bottom-up perspective, while at the same time advocating for the usage of state-of-the art NLP techniques in economics and business research.

Suggested Citation

  • Occhini, Giulia & Tranos, Emmanouil & Wolf, Levi John, 2023. "Measuring a country’s digital industrial structure: commercial websites and weakly supervised classification to the rescue," SocArXiv h572n, Center for Open Science.
  • Handle: RePEc:osf:socarx:h572n
    DOI: 10.31219/osf.io/h572n
    as

    Download full text from publisher

    File URL: https://osf.io/download/6405effec74723023d10b56b/
    Download Restriction: no

    File URL: https://libkey.io/10.31219/osf.io/h572n?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Frank Neffke & Martin Henning & Ron Boschma, 2011. "How Do Regions Diversify over Time? Industry Relatedness and the Development of New Growth Paths in Regions," Economic Geography, Taylor & Francis Journals, vol. 87(3), pages 237-265, July.
    2. Rizov, Marian & Vecchi, Michela & Domenech, Josep, 2022. "Going online: Forecasting the impact of websites on productivity and market structure," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 184, pages 1-46.
    3. Shaobo Li & Jie Hu & Yuxin Cui & Jianjun Hu, 2018. "DeepPatent: patent classification with convolutional neural networks and word embedding," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(2), pages 721-744, November.
    4. Jan Kinne & Janna Axenbeck, 2020. "Web mining for innovation ecosystem mapping: a framework and a large-scale pilot study," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(3), pages 2011-2041, December.
    5. Ron A. Boschma & Koen Frenken, 2006. "Why is economic geography not an evolutionary science? Towards an evolutionary economic geography," Journal of Economic Geography, Oxford University Press, vol. 6(3), pages 273-302, June.
    6. Daniel Arribas-Bel & Jessie Bakens, 2019. "Use and validation of location-based services in urban research: An example with Dutch restaurants," Urban Studies, Urban Studies Journal Limited, vol. 56(5), pages 868-884, April.
    7. Koen Frenken & Frank Van Oort & Thijs Verburg, 2007. "Related Variety, Unrelated Variety and Regional Economic Growth," Regional Studies, Taylor & Francis Journals, vol. 41(5), pages 685-697.
    8. Dalziel, Margaret, 2007. "A systems-based approach to industry classification," Research Policy, Elsevier, vol. 36(10), pages 1559-1574, December.
    9. Sanjeev Bhojraj & Charles M. C. Lee & Derek K. Oler, 2003. "What's My Line? A Comparison of Industry Classification Schemes for Capital Market Research," Journal of Accounting Research, Wiley Blackwell, vol. 41(5), pages 745-774, December.
    10. Nathan, Max & Rosso, Anna, 2015. "Mapping digital businesses with big data: Some early findings from the UK," Research Policy, Elsevier, vol. 44(9), pages 1714-1733.
    11. Alex Bishop & Juan Mateos-Garcia & George Richardson, 2022. "Using Text Data to Improve Industrial Statistics in the UK," Economic Statistics Centre of Excellence (ESCoE) Discussion Papers ESCoE DP-2022-01, Economic Statistics Centre of Excellence (ESCoE).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ron Boschma, 2021. "Designing Smart Specialization Policy: relatedness, unrelatedness, or what?," Papers in Evolutionary Economic Geography (PEEG) 2128, Utrecht University, Department of Human Geography and Spatial Planning, Group Economic Geography, revised Sep 2021.
    2. Frank Neffke & Martin Henning, 2011. "Inter-industry linkages in local economies," ERSA conference papers ersa11p1075, European Regional Science Association.
    3. Jürgen Essletzbichler, 2013. "Relatedness, industrial branching and technological cohesion in U.S. metropolitan areas," Papers in Evolutionary Economic Geography (PEEG) 1307, Utrecht University, Department of Human Geography and Spatial Planning, Group Economic Geography, revised May 2013.
    4. Martin, Hanna & Martin, Roman & Zukauskaite, Elena, 2018. "The Multiple Roles of Demand in Regional Development A Conceptual Analysis," Papers in Innovation Studies 2018/10, Lund University, CIRCLE - Centre for Innovation Research.
    5. Lars Mewes & Tom Broekel, 2020. "Subsidized to change? The impact of R&D policy on regional technological diversification," The Annals of Regional Science, Springer;Western Regional Science Association, vol. 65(1), pages 221-252, August.
    6. Martin, Hanna & Martin, Roman, 2016. "Policy capacities for new regional industrial path development – The case of new media and biogas in southern Sweden," Papers in Innovation Studies 2016/25, Lund University, CIRCLE - Centre for Innovation Research.
    7. Ron Boschma & Koen Frenken, 2011. "The emerging empirics of evolutionary economic geography," Journal of Economic Geography, Oxford University Press, vol. 11(2), pages 295-307, March.
    8. Luciana Lazzeretti & Niccolò Innocenti & Francesco Capone, 2015. "Does Related variety matter for Creative Industries?," Papers in Evolutionary Economic Geography (PEEG) 1510, Utrecht University, Department of Human Geography and Spatial Planning, Group Economic Geography, revised May 2015.
    9. José M. Gaspar, 2018. "A prospective review on New Economic Geography," The Annals of Regional Science, Springer;Western Regional Science Association, vol. 61(2), pages 237-272, September.
    10. Carlo Corradini, 2019. "Location determinants of green technological entry: evidence from European regions," Small Business Economics, Springer, vol. 52(4), pages 845-858, April.
    11. Stefano Breschi & Camilla Lenzi, 2015. "The Role of External Linkages and Gatekeepers for the Renewal and Expansion of US Cities' Knowledge Base, 1990-2004," Regional Studies, Taylor & Francis Journals, vol. 49(5), pages 782-797, May.
    12. Hanna Martin & Roman Martin, 2017. "Policy capacities for new regional industrial path development – The case of new media and biogas in southern Sweden," Environment and Planning C, , vol. 35(3), pages 518-536, May.
    13. Alessia Lo Turco & Daniela Maggioni, 2017. "Local Discoveries and Technological Relatedness: the Role of Foreign Firms," Papers in Evolutionary Economic Geography (PEEG) 1710, Utrecht University, Department of Human Geography and Spatial Planning, Group Economic Geography, revised Jun 2017.
    14. Tom Broekel & Rune Dahl Fitjar & Silje Haus-Reve, 2021. "The roles of diversity, complexity, and relatedness in regional development – What does the occupational perspective add?," Papers in Evolutionary Economic Geography (PEEG) 2135, Utrecht University, Department of Human Geography and Spatial Planning, Group Economic Geography, revised Nov 2021.
    15. Martin Henning & Erik Stam & Rik Wenting, 2013. "Path Dependence Research in Regional Economic Development: Cacophony or Knowledge Accumulation?," Regional Studies, Taylor & Francis Journals, vol. 47(8), pages 1348-1362, September.
    16. Silvia Rita Sedita & Ivan De Noni & Luciano Pilotti, 2014. "How do related variety and differentiated knowledge bases influence the resilience of local production systems?," "Marco Fanno" Working Papers 0180, Dipartimento di Scienze Economiche "Marco Fanno".
    17. Lars Coenen & Bjørn Asheim & Markus M Bugge & Sverre J Herstad, 2017. "Advancing regional innovation systems: What does evolutionary economic geography bring to the policy table?," Environment and Planning C, , vol. 35(4), pages 600-620, June.
    18. Kolja Hesse, 2020. "Related to whom? The impact of organisational and regional capabilities on radical breakthroughs," Bremen Papers on Economics & Innovation 2005, University of Bremen, Faculty of Business Studies and Economics.
    19. Shengjun Zhu & Chong Wang & Canfei He, 2019. "High-speed Rail Network and Changing Industrial Dynamics in Chinese Regions," International Regional Science Review, , vol. 42(5-6), pages 495-518, September.
    20. Hanna Martin & Roman Martin & Elena Zukauskaite, 2019. "The multiple roles of demand in new regional industrial path development: A conceptual analysis," Environment and Planning A, , vol. 51(8), pages 1741-1757, November.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:osf:socarx:h572n. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: OSF (email available below). General contact details of provider: https://arabixiv.org .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.